Redefining Selection of Features and Classification Algorithms for Room Occupancy Detection

Nor Samsiah Sani (1), Illa Iza Suhana Shamsuddin (2), Shahnurbanon Sahran (3), Abdul Hadi Abd Rahman (4), Ereena Nadjimin Muzaffar (5)
(1) Center For Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
(2) Center For Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
(3) Center For Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
(4) Center For Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
(5) Center For Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
Sani, Nor Samsiah, et al. “Redefining Selection of Features and Classification Algorithms for Room Occupancy Detection”. International Journal on Advanced Science, Engineering and Information Technology, vol. 8, no. 4-2, Oct. 2018, pp. 1486-93, doi:10.18517/ijaseit.8.4-2.6826.
The exponential growth of todays technologies has resulted in the growth of high-throughput data with respect to both dimensionality and sample size. Therefore, efficient and effective supervision of these data becomes increasing challenging and machine learning techniques were developed with regards to knowledge discovery and recognizing patterns from these data. This paper presents machine learning tool for preprocessing tasks and a comparative study of different classification techniques in which a machine learning tasks have been employed in an experimental set up using a dataset archived from the UCI Machine Learning Repository website. The objective of this paper is to analyse the impact of refined feature selection on different classification algorithms to improve the prediction of classification accuracy for room occupancy. Subsets of the original features constructed by filter or information gain and wrapper techniques are compared in terms of the classification performance achieved with selected machine learning algorithms. Three feature selection algorithms are tested, specifically the Information Gain Attribute Evaluation (IGAE), Correlation Attribute Evaluation (CAE) and Wrapper Subset Evaluation (WSE) algorithms. Following a refined feature selection stage, three machine learning algorithms are then compared, consisting the Multi-Layer Perceptron (MLP), Logistic Model Trees (LMT) and Instance Based k (IBk). Based on the feature analysis, the WSE was found to be optimal in identifying relevant features. The application of feature selection is certainly intended to obtain a higher accuracy performance. The experimental results also demonstrate the effectiveness of Instance Based k compared to other ML classifiers in providing the highest performance rate of room occupancy prediction.

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods q. Computers and Electrical Engineering, 40(1), 16-28. http://doi.org/10.1016/j.compeleceng.2013.11.024

Reis, E. R., Castro, P. A. L. De, & Sichman, J. S. (2016). Enhancing Classification Accuracy Through Feature Selection Methods.

Pujari, P., & Gupta, J. B. (2012). Improving Classification Accuracy by Using Feature Selection and Ensemble Model, (2), 380-386.

Arauzo-azofra, A., Luis, J., & Bení­tez, J. M. (2011). Expert Systems with Applications Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Systems with Applications, 38(7), 8170-8177. http://doi.org/10.1016/j.eswa.2010.12.160

Rahman, A. H. A., Ariffinv, K. A. Z., Sani, N. S., & Zamzuri, H. (2017). Pedestrian Detection using Triple Laser Range Finders. International Journal of Electrical and Computer Engineering (IJECE), 7(6), 3037-3045.

Ć, J. N., Strbac, P., & Ć, D. B. (2011). Toward Optimal Feature Selection Using Ranking Methods and Classification Algorithms, 21(1), 119-135. http://doi.org/10.2298/YJOR1101119N

Holliday, J. D., Sani, N., & Willett, P. (2015). Calculation of substructural analysis weights using a genetic algorithm. Journal of Chemical Information and Modeling, 55(2), 214-221.

Sani, N.S. (2017). The Use of Data Fusion on Multiple Substructural Analysis Based GA Runs. J. Appl. Environ. Biol. Sci., 7(2S)30-36, 2017

Holliday, J., Sani, N., & Willett, P. (2018). Ligand-based virtual screening using a genetic algorithm with data fusion. Match: Communications in Mathematical and in Computer Chemistry, 80, 623-638.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Robert, C. (2014). Machine learning, a probabilistic perspective.

Sammour, M., & Othman, Z. (2016). An agglomerative hierarchical clustering with various distance measurements for ground level ozone clustering in Putrajaya, Malaysia. International Journal on Advanced Science, Engineering and Information Technology, 6(6), 1127-1133.

SamsiahSani, N., Shlash, I., Hassan, M., Hadi, A., & Aliff, M. (2017). Enhancing Malaysia Rainfall Prediction Using Classification Techniques. J. Appl. Environ. Biol. Sci, 7(2S), 20-29.

de Carvalho, T. B. A., Sibaldo, M. A. A., Tsang, R., & da Cunha Cavalcanti, G. D. (2017). Principal Component Analysis for Supervised Learning: a minimum classification error approach. Journal of Information and Data Management, 8(2), 131.

ErtuÄŸrul, í–. F., Kaya, Y., & EminTaÄŸluk, M. Detecting Occupancy of an Office Room by Recurrent Extreme Learning Machines. trees, 2(4), 17.

Candanedo, L. M., & Feldheim, V. (2016). Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy and Buildings, 112, 28-39.

Dua, D. and Karra Taniskidou, E. (2018). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Garcí­a, S., Luengo, J., & Herrera, F. (2016). Data preprocessing in data mining. Springer.

Nawi, N. M., Hussein, A. S., Samsudin, N. A., Hamid, N. A., Yunus, M. A. M., & Ab Aziz, M. F. (2017). The Effect of Pre-Processing Techniques and Optimal Parameters selection on Back Propagation Neural Networks. International Journal on Advanced Science, Engineering and Information Technology, 7(3), 770-777.

Ahmad, S. R., Yaakub, M. R., & Bakar, A. A. (2016). Detecting Relationship between Features and Sentiment Words using Hybrid of Typed Dependency Relations Layer and POS Tagging (TDR Layer POS Tags) Algorithm. International Journal on Advanced Science, Engineering and Information Technology, 6(6), 1120-1126.

Aggarwal, C. C. (2015). Outlier analysis. In Data mining (pp. 237-263). Springer, Cham.

Kurniawan, R., Nazri, M. Z. A., Irsyad, M., Yendra, R., & Aklima, A. (2015, August). On machine learning technique selection for classification. In Electrical Engineering and Informatics (ICEEI), 2015 International Conference on (pp. 540-545). IEEE.

Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature selection methods effects on machine learning approaches in Malay sentiment analysis. In Proc. 1st ICRIL-Int. Conf. Inno. Sci. Technol.(IICIST) (pp. 1-2).

Das, S. N., Mathew, M., & Vijayaraghavan, P. K. (2011). An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information.

Ilangovan, S., Antonykumar, V., & Balamurugan, S. A. (2016). Comparison of Feature Ranking Methods for Effective Data Classification

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Ramdan, J., Omar, K., & Faidzul, M. (2017). A Novel Method to Detect Segmentation Points of Arabic Words Using Peaks and Neural Network. International Journal on Advanced Science, Engineering and Information Technology, 7(2), 625-631.

Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E. B., & Turaga, D. (2017, August). Learning feature engineering for classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2529-2535). AAAI Press.

Ferní¡ndez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863-905

Lee, S., & Jun, C. H. (2018). Fast incremental learning of logistic model tree using least angle regression. Expert Systems with Applications, 97, 137-145.

De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760-772.

Zhang, X., Li, Y., Kotagiri, R., Wu, L., Tari, Z., & Cheriet, M. (2017). KRNN: k Rare-class Nearest Neighbour classification. Pattern Recognition, 62, 33-44.

A Comparative Performance Analysis of Classification Algorithms Using Weka Tool of Data Mining Techniques. (2014), 5(3), 3448-3453.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).