Volume 12 Number 5
October 2015
Article Contents
Danasingh Asir Antony Gnana Singh, Subramanian Appavu Alias Balamurugan and Epiphany Jebamalar Leavline. An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers. International Journal of Automation and Computing, vol. 12, no. 5, pp. 511-517, 2015. doi: 10.1007/s11633-014-0859-5
Cite as: Danasingh Asir Antony Gnana Singh, Subramanian Appavu Alias Balamurugan and Epiphany Jebamalar Leavline. An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers. International Journal of Automation and Computing, vol. 12, no. 5, pp. 511-517, 2015. doi: 10.1007/s11633-014-0859-5

An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers

  • Received: 2013-11-04
  • Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy, efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical, finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB), instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.
  • [1] J. Sinno, Q. Y. Pan. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, vol.,22, no.,10, pp.,1345-135, 2010.
    [2] M. R. Rashedur, R. M. Fazle. Using and comparing different decision tree classification techniques for mining ICDDR, B Hospital Surveillance data. Expert Systems with Applications, vol.,38, no.,9, pp.,11421-11436, 2011.
    [3] M. Wasikowski, X. W. Chen. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, vol.,22, no.,10, pp.,1388-1400, 2010.
    [4] Q. B. Song, J. J. Ni, G. T. Wang. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, vol.,l5, no.,1, pp.,1-14, 2013.
    [5] J. F. Artur, A. T. Mário. Efficient feature selection filters for high-dimensional data. Pattern Recognition Letters, vol.,33, no.,13, pp.,1794-1804, 2012.
    [6] J. Wu, L. Chen, Y. P. Feng, Z. B. Zheng, M. C. Zhou, Z. H. Wu. Predicting quality of service for selection by neighborhood-based collaborative filtering. IEEE Transactions on Systems, Man, and Cybernetics:Systems, vol.,43, no.,2, pp.,428-439, 2012.
    [7] C. P. Hou, F. P. Nie, X. Li, D. Yi, Y. Wu. Joint embedding learning and sparse regression:A framework for unsupervised feature selection. IEEE Transactions on Cybernetics, vol.,44, no.,6, pp.,793-804, 2014.
    [8] P. Bermejo, L. dela Ossa, J. A. Gámez, J. M. Puerta. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking Original Research Article. Knowledge-based Systems, vol.,25, no.,1, pp.,35-44, 2012.
    [9] S. Atulji, G. Shameek, V. K. Jayaramanb. Hybrid biogeography based simultaneous feature selection and MHC class I peptide binding prediction using support vector machines and random forests. Journal of Immunological Methods, vol.,387, no.,1-2, pp.,284-292, 2013.
    [10] H. L. Wei, S. Billings. Feature subset selection and ranking for data dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,29, no.,1, pp.,162-166, 2007.
    [11] X. V. Nguyen, B. James. Comments on supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognition, vol.,46, no.,4, pp.,1220-1225, 2013.
    [12] A. G. Iffat, S. S. Leslie. Feature subset selection in large dimensionality domains. Pattern Recognition, vol.,43, no.,1, pp.,5-13, 2010.
    [13] M. Hall. Correlation-based Feature Selection for Machine Learning, Ph.,D dissertation, The University of Waikato, New Zealond, 1999.
    [14] Y. Lei, L. Huan. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, vol.,5, no.,1, pp.,1205-1224, 2004.
    [15] M. Dash, H. Liu, H. Motoda. Consistency based feature selection. In Proceedings of the 4th Pacific Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, pp.,98-109, 2000.
    [16] H. Peng, L. Fulmi, C. Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,27, no.,8, pp.,1226-1238, 2005.
    [17] H. Uğuz. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-based Systems, vol.,24, no.,7, pp.,1024-1032, 2011.
    [18] M. Robnik-Šikonja, I. Kononenko. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, vol.,53, no.,1-2, pp.,23-69, 2003.
    [19] S. Yijun, T. Sinisa, G. Steve. Local-learning-based feature selection for high-dimensional data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,32, no.,9, pp.,1610-1626, 2010.
    [20] W. Peng, S. Cesar, S. Edward. Prediction based on integration of decisional DNA and a feature selection algorithm RELIEF-F. Cybernetics and Systems, vol.,44, no.,3, pp.,173-183, 2013.
    [21] H. W. Liu, J. G. Sun, L. Liu, H. J. Zhang. Feature selection with dynamic mutual information. Pattern Recognition, vol.,42, no.,7, pp.,1330-1339, 2009.
    [22] M. C. Lee. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Systems with Applications, vol.,36, no.,8, pp.,10896-10904, 2009.
    [23] P. Mitra, C. A. Murthy, S. K. Pal. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,24, no.,3, pp.,301-312, 2002.
    [24] J. Handl, J. Knowles. Feature subset selection in unsupervised learning via multi objective optimization. International Journal of Computational Intelligence Research, vol.,2, no.,3, pp.,217-238, 2006.
    [25] H. Liu, L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, vol.,17, no.,4, pp.,491-502, 2005.
    [26] S. García, J. Luengo, J. A. Sáez, V. Lóez, F. Herrera. A survey of discretization techniques:Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, vol.,25, no.,4, pp.,734-750, 2013.
    [27] S. A. A. Balamurugan, R. Rajaram. Effective and efficient feature selection for large-scale data using Bayes' theorem. International Journal of Automation and Computing, vol.,6, no.,1, pp.,62-71, 2009.
    [28] J. A. Mangai, V. S. Kumar, S. A. alias Balamurugan. A novel feature selection framework for automatic web page classification. International Journal of Automation and Computing, vol.,9, no.,4, pp.,442-448, 2012.
    [29] H. J. Huang, C. N. Hsu. Bayesian classification for data from the same unknown class. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetic, vol.,32, no.,2, pp.,137-145, 2002.
    [30] S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering, vol.,14, no.,2, pp.,438-444, 2002.
    [31] P. Kemal, G. Salih. A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Systems with Applications, vol.,36, no.,2, pp.,1587-1592, 2009.
    [32] W. W. Cheng, E. Höullermeier. Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, vol.,76, no.,3, pp.,211-225, 2009.
    [33] J. H. Hsiu, P. Saumyadipta, I. L. Tsung. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Statistics and Computing, vol.,22, no.,1, pp.,287-299, 2012.
    [34] M. H. C. Law, M. A. T. Jain, A. K. Jain. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,26, no.,9, pp.,1154-1166, 2004.
    [35] T. W. Lee, M. S. Lewicki, T. J. Sejnowski. ICA mixture models for unsupervised classification of non-gaussian classes and automatic context switching in blind signal separation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.,22, no.,10, pp.,1078-1089, 2002.
    [36] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann. The WEKA data mining software:An update. ACM SIGKDD Explorations Newsletter, vol.,11, no.,1, pp.,10-18, 2009.
  • 加载中
  • [1] Ziheng Chen, Hongshik Ahn. Item Response Theory Based Ensemble in Machine Learning . International Journal of Automation and Computing, 2020, 17(5): 621-636.  doi: 10.1007/s11633-020-1239-y
    [2] Negin Alborzi, Fereshteh Poorahangaryan, Homayoun Beheshti. Spectral-spatial Classification of Hyperspectral Images Using Signal Subspace Identification and Edge-preserving Filter . International Journal of Automation and Computing, 2020, 17(2): 222-232.  doi: 10.1007/s11633-019-1188-5
    [3] Sajjad Afrakhteh, Mohammad-Reza Mosavi, Mohammad Khishe, Ahmad Ayatollahi. Accurate Classification of EEG Signals Using Neural Networks Trained by Hybrid Population-physic-based Algorithm . International Journal of Automation and Computing, 2020, 17(1): 108-122.  doi: 10.1007/s11633-018-1158-3
    [4] Ao-Xue Li, Ke-Xin Zhang, Li-Wei Wang. Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics . International Journal of Automation and Computing, 2019, 16(5): 563-574.  doi: 10.1007/s11633-019-1177-8
    [5] Ann Smith, Fengshou Gu, Andrew D. Ball. An Approach to Reducing Input Parameter Volume for Fault Classifiers . International Journal of Automation and Computing, 2019, 16(2): 199-212.  doi: 10.1007/s11633-018-1162-7
    [6] De-Gang Xu, Pan-Lei Zhao, Chun-Hua Yang, Wei-Hua Gui, Jian-Jun He. A Novel Minkowski-distance-based Consensus Clustering Algorithm . International Journal of Automation and Computing, 2017, 14(1): 33-44.  doi: 10.1007/s11633-016-1033-z
    [7] S. Arumugadevi, V. Seenivasagam. Color Image Segmentation Using Feedforward Neural Networks with FCM . International Journal of Automation and Computing, 2016, 13(5): 491-500.  doi: 10.1007/s11633-016-0975-5
    [8] De-Rong Liu,  Hong-Liang,  Li Ding Wang. Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey . International Journal of Automation and Computing, 2015, 12(3): 229-242.  doi: 10.1007/s11633-015-0893-y
    [9] Majda Ltaief,  Anis Messaoud,  Ridha Ben Abdennour. Optimal Systematic Determination of Models' Base for Multimodel Representation: Real Time Application . International Journal of Automation and Computing, 2014, 11(6): 644-652.  doi: 10.1007/s11633-014-0815-4
    [10] Chun-Ling Cheng, Chun-Ju Sun, Xiao-Long Xu, Deng-Yin Zhang. A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud . International Journal of Automation and Computing, 2014, 11(1): 109-117.  doi: 10.1007/s11633-014-0772-y
    [11] Zeineb Lassoued,  Kamel Abderrahim. New Results on PWARX Model Identification Based on Clustering Approach . International Journal of Automation and Computing, 2014, 11(2): 180-188.  doi: 10.1007/s11633-014-0779-4
    [12] Nongnuch Poolsawad,  Lisa Moore,  Chandrasekhar Kambhampati. Issues in the Mining of Heart Failure Datasets . International Journal of Automation and Computing, 2014, 11(2): 162-179.  doi: 10.1007/s11633-014-0778-5
    [13] Pavla Bromová,  Petr Škoda,  Jaroslav Vážný. Classification of Spectra of Emission Line Stars Using Machine Learning Techniques . International Journal of Automation and Computing, 2014, 11(3): 265-273.  doi: 10.1007/s11633-014-0789-2
    [14] Alamelu Mangai, Santhosh Kumar, S. Appavu alias Balamurugan. A Novel Feature Selection Framework for Automatic Web Page Classification . International Journal of Automation and Computing, 2012, 9(4): 442-448.  doi: 10.1007/s11633-012-0665-x
    [15] Appavu Alias Balamurugan Subramanian, S. Pramala, B. Rajalakshmi, Ramasamy Rajaram. Improving Decision Tree Performance by Exception Handling . International Journal of Automation and Computing, 2010, 7(3): 372-380.  doi: 10.1007/s11633-010-0517-5
    [16] Liang Xue, Xin-Ping Guan, Zhi-Xin Liu, Qing-Chao Zheng. A Power- and Coverage-aware Clustering Scheme for Wireless Sensor Networks . International Journal of Automation and Computing, 2010, 7(4): 500-508.  doi: 10.1007/s11633-010-0533-5
    [17] Siva S. Sivatha Sindhu, S. Geetha, M. Marikannan, A. Kannan. A Neuro-genetic Based Short-term Forecasting Framework for Network Intrusion Prediction System . International Journal of Automation and Computing, 2009, 6(4): 406-414.  doi: 10.1007/s11633-009-0406-y
    [18] Subramanian Appavu Alias Balamurugan, Ramasamy Rajaram. Effective and Efficient Feature Selection for Large-scale Data Using Bayes’ Theorem . International Journal of Automation and Computing, 2009, 6(1): 62-71.  doi: 10.1007/s11633-009-0062-2
    [19] Alma Lilia Garcia-Almanza,  Edward P. K. Tsang. Evolving Decision Rules to Predict Investment Opportunities . International Journal of Automation and Computing, 2008, 5(1): 22-31.  doi: 10.1007/s11633-008-0022-2
    [20] Cong-Ping Chen, Han Ding. Run by Run Control of Time-pressure Dispensing for Electronics Encapsulation . International Journal of Automation and Computing, 2008, 5(4): 419-424.  doi: 10.1007/s11633-008-0419-y
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Abstract Views (5956) PDF downloads (2128) Citations (0)

An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers

Abstract: Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy, efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical, finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB), instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.

Danasingh Asir Antony Gnana Singh, Subramanian Appavu Alias Balamurugan and Epiphany Jebamalar Leavline. An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers. International Journal of Automation and Computing, vol. 12, no. 5, pp. 511-517, 2015. doi: 10.1007/s11633-014-0859-5
Citation: Danasingh Asir Antony Gnana Singh, Subramanian Appavu Alias Balamurugan and Epiphany Jebamalar Leavline. An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers. International Journal of Automation and Computing, vol. 12, no. 5, pp. 511-517, 2015. doi: 10.1007/s11633-014-0859-5
Reference (36)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return