Volume 16 Number 6
December 2019
Article Contents
Jiao Yin, Jinli Cao, Siuly Siuly and Hua Wang. An Integrated MCI Detection Framework Based on Spectral-temporal Analysis. International Journal of Automation and Computing, vol. 16, no. 6, pp. 786-799, 2019. doi: 10.1007/s11633-019-1197-4
Cite as: Jiao Yin, Jinli Cao, Siuly Siuly and Hua Wang. An Integrated MCI Detection Framework Based on Spectral-temporal Analysis. International Journal of Automation and Computing, vol. 16, no. 6, pp. 786-799, 2019. doi: 10.1007/s11633-019-1197-4

An Integrated MCI Detection Framework Based on Spectral-temporal Analysis

Author Biography:
  • Jiao Yin received the B. Sc. degree in automation from Northeastern University, China in 2010, and the M. Sc. degree in vehicle operation engineering from Sun Yat-sen University, China in 2013. She is a Ph. D. degree candidate in computer science and computer engineering at Department of Computer Science and Information Technology, La Trobe University, Australia. Her research interests include machine learning algorithms, data-based smart health monitoring and disease detection. E-mail: j.yin@latrobe.edu.au (Corresponding author) ORCID iD: 0000-0002-0269-2624

    Jinli Cao received the Ph. D. degree from the University of Southern Queensland, Australia. She is a senior lecturer at Department of Computer Science and Information Technology, La Trobe University, Australia. She has published over 100 research papers in international conferences and journals such as IEEE Transactions on Distributed and Parallel Processing, IEEE Transactions on Knowledge and Data Engineering, Journal of Future Generation Computer Systems, and the top conferences such as International World Wide Web Conference (WWW), International Conference on Web Information Systems Engineering (WISE), International Conference on Advanced Information Systems Engineering (CAiSE) and International Conference on Database Systems for Advanced Applications (DASFAA), etc. She has 9 graduated Ph. D. students and is supervising 2 Ph. D. students currently. Her research interests include the evolutionary fields of computer science, data engineering and information systems such as data quality, big data analytics, cloud computing, recommendation systems, query mining, data security, privacy protection, reliable queries in uncertain databases, top-k query ranking and decision supporting systems.E-mail: j.cao@latrobe.edu.au

    Siuly Siuly received the Ph. D. degree in biomedical engineering from the University of Southern Queensland, Australia in 2012. Currently, she is a research fellow with the Centre for Applied Informatics, College of Engineering and Science, Victoria University, Australia. She already developed some breakthrough methods in the mentioned areas. She made significant contributions to the mentioned research areas since the beginning of her Ph. D. in July 2008. Most recently, she has authored a book, titled: EEG Signal Analysis and Classification: Techniques and Applications published by Springer in December 2016. Currently, she is serving an associate editor of a prestigious journal IEEE Transactions on Neural Systems of Rehabilitation Engineering and also as the managing editor of Health Information Science and Systems. Her research interests include biomedical signal processing, analysis and classification, detection and prediction of neurological abnormality from brain signal data (e.g., EEG data), brain-computer interface, machine learning, pattern recognition, artificial intelligence and medical data mining. E-mail: Siuly.Siuly@vu.edu.au

    Hua Wang received the Ph. D. degree from the University of Southern Queensland, Australia. He is now a full-time professor at Victoria University, Australia. He was a professor at the University of Southern Queensland before he joined Victoria University, Australia. He has more than ten years teaching and working experience in applied informatics at both enterprise and university. He has expertise in electronic commerce, business process modelling and enterprise architecture. As a chief investigator, three Australian Research Council (ARC) Discovery grants have been awarded since 2006, and 200 peer-reviewed scholar papers have been published. Six Ph. D. students have already graduated under his principal supervision. His research interests include data security, data mining, access control, privacy and web services, as well as their applications in the fields of e-health and e-environment.E-mail: Hua.Wang@vu.edu.au

  • Received: 2019-04-17
  • Accepted: 2019-07-24
  • Published Online: 2019-10-08
  • Aiming to differentiate between mild cognitive impairment (MCI) patients and elderly control subjects, this study proposes an integrated framework based on spectral-temporal analysis for the automatic analysis of resting-state electroencephalogram (EEG) recordings. This framework firstly eliminates noise by employing stationary wavelet transformation (SWT). Then, a set of features is extracted through spectral-temporal analysis. Next, a new wrapper algorithm, named three-dimensional (3-D) evaluation algorithm, is proposed to derive an optimal feature subset. Finally, the support vector machine (SVM) algorithm is adopted to identify MCI patients on the optimal feature subset. Decision tree and K-nearest neighbors (KNN) algorithms are also used to test the effectiveness of the selected feature subset. Twenty-two subjects are involved in experiments, of which eleven persons were in an MCI condition and the rest were elderly control subjects. Extensive experiments show that our method is able to classify MCI patients and elderly control subjects automatically and effectively, with the accuracy of 96.94% achieved by the SVM classifier. Decision tree and KNN algorithms also achieved superior results based on the optimal feature subset extracted by the proposed framework. This study is conducive to timely diagnosis and intervention for MCI patients, and therefore to delaying cognitive decline and dementia onset.
  • 加载中
  • [1] N. Houmani, F. Vialatte, E. Gallego-Jutglà, G. Dreyfus, V. H. Nguyen-Michel, J. Mariani, K. Kinugawa. Diagnosis of Alzheimer′s disease with electroencephalography in a differential framework. PLoS One, vol. 13, no. 3, Article number e0193607, 2018.
    [2] M. Kashefpoor, H. Rabbani, M. Barekatain.  Automatic diagnosis of mild cognitive impairment using electroencephalogram spectral features[J]. Journal of Medical Signals and Sensors, 2016, 6(1): 25-32.
    [3] S. Khatun, B. I. Morshed, G. M. Bidelman. Single channel EEG time-frequency features to detect mild cognitive impairment. In Proceedings of IEEE International Symposium on Medical Measurements and Applications, IEEE, Rochester, USA, pp. 437–442, 2017. DOI: 10.1109/MeMeA.2017.7985916.
    [4] V. C. Bibina, U. Chakraborty, R. M. Lourde, A. Kumar. Time-frequency methods for diagnosing Alzheimer′s disease using EEG: A technical review. In Proceedings of the 6th International Conference on Bioinformatics and Biomedical Science, ACM, Singapore, pp. 49–54, 2017. DOI: 10.1145/3121138.3121183.
    [5] N. K. Al-Qazzaz, S. H. B. Ali, S. A. Ahmad, K. Chellappan, M. S. Islam, J. Escudero. Role of EEG as biomarker in the early detection and classification of dementia. The Scientific World Journal, vol. 2014, Article number 906038, 2014.
    [6] Z. J. Yao, J. Bi, Y. X. Chen.  Applying deep learning to individual and community health monitoring data: A survey[J]. International Journal of Automation and Computing, 2018, 15(6): 643-655. doi: 10.1007/s11633-018-1136-9
    [7] F. Vecchio, C. Babiloni, R. Lizio, F. D. V. Fallani, K. Blinowska, G. Verrienti, G. Frisoni, P. M. Rossini.  Resting state cortical EEG rhythms in Alzheimer′s disease: Toward EEG markers for clinical applications: A review[J]. Supplements to Clinical Neurophysiology, 2013, 62(): 223-236. doi: 10.1016/B978-0-7020-5307-8.00015-6
    [8] V. Bajaj, S. Taran, A. Sengur. Emotion classification using flexible analytic wavelet transform for electroencephalogram signals. Health Information Science and Systems, vol. 6, no. 1, Article number 12, 2018.
    [9] S. Supriya, S. Siuly, H. Wang, Y. C. Zhang. An efficient framework for the analysis of big brain signals data. In Prceedings of Australasian Database Conference, Springer, Gold Coast, Australia, pp. 199–207, 2018. DOI: 10.1007/978-3-319-92013-9_16.
    [10] S. Supriya, S. Siuly, H. Wang, Y. C. Zhang. EEG sleep stages analysis and classification based on weighed complex network features. IEEE Transactions on Emerging Topics in Computational Intelligence, published online. DOI: 10.1109/TETCI.2018.2876529.
    [11] P. A. M. Kanda, E. F. Oliveira, F. J. Fraga.  EEG epochs with less alpha rhythm improve discrimination of mild Alzheimer′s[J]. Computer Methods and Programs in Biomedicine, 2017, 138(): 13-22. doi: 10.1016/j.cmpb.2016.09.023
    [12] H. Garn, C. Coronel, M. Waser, G. Caravias, G. Ransmayr.  Differential diagnosis between patients with probable Alzheimer′s disease, Parkinson′s disease dementia, or dementia with Lewy bodies and frontotemporal dementia, behavioral variant, using quantitative electroencephalographic features[J]. Journal of Neural Transmission, 2017, 124(5): 569-581. doi: 10.1007/s00702-017-1699-6
    [13] S. Siuly, E. Kabir, H. Wang, Y. C. Zhang. Exploring sampling in the detection of multicategory EEG signals. Computational and Mathematical Methods in Medicine, vol. 2015, Article number 576437, 2015.
    [14] M. Buscema, E. Grossi, M. Capriotti, C. Babiloni, P. Rossini.  The I.F.A.S.T. model allows the prediction of conversion to Alzheimer disease in patients with mild cognitive impairment with high degree of accuracy[J]. Current Alzheimer Research, 2010, 7(2): 173-187. doi: 10.2174/156720510790691137
    [15] E. Barzegaran, B. van Damme, R. Meuli, M. G. Knyazeva.  Perception-related EEG is more sensitive to Alzheimer′s disease effects than resting EEG[J]. Neurobiology of Aging, 2016, 43(): 129-139. doi: 10.1016/j.neurobiolaging.2016.03.032
    [16] P. Ghorbanian, D. M. Devilbiss, A. Verma, A. Bernstein, T. Hess, A. J. Simon, H. Ashrafiuon.  Identification of resting and active state EEG features of Alzheimer′s disease using discrete wavelet transform[J]. Annals of Biomedical Engineering, 2013, 41(6): 1243-1257. doi: 10.1007/s10439-013-0795-5
    [17] S. S. Poil, W. De Haan, W. M. van der Flier, H. D. Mansvelder, P. Scheltens, K. Linkenkaer-Hansen. Integrative EEG biomarkers predict progression to Alzheimer′s disease at the MCI stage. Frontiers in Aging Neuroscience, vol. 5, Article number 58, 2013.
    [18] F. Liu, X. S. Zhou, J. L. Cao, Z. Wang, H. Wang, Y. C. Zhang. Arrhythmias classification by integrating stacked bidirectional LSTM and two-dimensional CNN. In Proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Macau, China, pp. 136–149, 2019. DOI: 10.1007/978-3-030-16145-3_11.
    [19] L. R. Trambaiolli, N. Spolaôr, A. C. Lorena, R. Anghinah, J. R. Sato.  Feature selection before EEG classification supports the diagnosis of Alzheimer′s disease[J]. Clinical Neurophysiology, 2017, 128(10): 2058-2067. doi: 10.1016/j.clinph.2017.06.251
    [20] R. F. Wang, J. Wang, S. N. Li, H. T. Yu, B. Deng, X. L. Wei. Multiple feature extraction and classification of electroencephalograph signal for Alzheimers′ with spectrum and bispectrum. Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 25, no. 1, Article number 013110, 2015.
    [21] A. I. Triggiani, V. Bevilacqua, A. Brunetti, R. Lizio, G. Tattoli, F. Cassano, A. Soricelli, R. Ferri, F. Nobili, L. Gesualdo, M. R. Barulli, R. Tortelli, V. Cardinali, A. Giannini, P. Spagnolo, S. Armenise, F. Stocchi, G. Buenza, G. Scianatico, G. Logroscino, G. Lacidogna, F. Orzi, C. Buttinelli, F. Giubilei, C. Del Percio, G. B. Frisoni, C. Babiloni. Classification of healthy subjects and Alzheimer′s disease patients with dementia from cortical sources of resting state EEG rhythms: A study using artificial neural networks. Frontiers in Neuroscience, vol. 10, Article number 604, 2017.
    [22] F. Bertè, G. Lamponi, R. S. Calabrò, P. Bramanti.  Elman neural network for the early identification of cognitive impairment in Alzheimer′s disease[J]. Functional Neurology, 2014, 29(1): 57-65.
    [23] S. Afrakhteh, M. R. Mosavi, M. Khishe, A. Ayatollahi. Accurate classification of EEG signals using neural networks trained by hybrid population-physic-based algorithm. International Journal of Automation and Computing, published online. DOI: 10.1007/s11633-018-1158-3.
    [24] H. Aghajani, E. Zahedi, M. Jalili, A. Keikhosravi, B. V. Vahdat.  Diagnosis of early Alzheimer′s disease based on EEG source localization and a standardized realistic head model[J]. IEEE Journal of Biomedical and Health Informatics, 2013, 17(6): 1039-1045. doi: 10.1109/JBHI.2013.2253326
    [25] I. Güler, E. D. Übeyli.  Adaptive neuro-fuzzy inference system for classification of EEG signals using wavelet coefficients[J]. Journal of Neuroscience Methods, 2005, 148(2): 113-121. doi: 10.1016/j.jneumeth.2005.04.013
    [26] EEG Signals from Normal and MCI (Mild Cognitive Impairment) Cases, [Online], Available: http://www.biosigdata.com/?download=eeg-signals-from-normal-and-mci-cases, September 15, 2018.
    [27] J. Vigil, L. Tataryn.  Neurotherapies and Alzheimer′s: A protocol-oriented review[J]. NeuroRegulation, 2017, 4(2): 79-94. doi: 10.15540/nr.4.2.79
    [28] B. T. Zhang, X. P. Wang, Y. Shen, T. Lei.  Dual-modal physiological feature fusion-based sleep recognition using CFS and RF algorithm[J]. International Journal of Automation and Computing, 2019, 16(3): 286-296. doi: 10.1007/s11633-019-1171-1
    [29] S. M. Hosni, M. E. Gadallah, S. F. Bahgat, M. S. AbdelWahab. Classification of EEG signals using different feature extraction techniques for mental-task BCI. In Proceedings of International Conference on Computer Engineering & Systems, IEEE, Cairo, Egypt, pp. 220–226, 2007. DOI: 10.1109/ICCES.2007.4447052.
    [30] F. Liu, X. S. Zhou, Z. Wang, J. L. Cao, H. Wang, Y. C. Zhang. Unobtrusive mattress-based identification of hypertension by integrating classification and association rule mining. Sensors, vol. 19, no. 7, Article number 1489, 2019.
    [31] D. Pandey, X. X. Yin, H. Wang, Y. C. Zhang.  Accurate vessel segmentation using maximum entropy incorporating line detection and phase-preserving denoising[J]. Computer Vision and Image Understanding, 2017, 155(): 162-172. doi: 10.1016/j.cviu.2016.12.005
    [32] N. K. Al-Qazzaz, S. H. B. M. Ali, S. A. Ahmad, M. S. Islam, J. Escudero.  Selection of mother wavelet functions for multi-channel EEG signal analysis during a working memory task[J]. Sensors, 2015, 15(11): 29015-29035. doi: 10.3390/s151129015
    [33] S. Siuly, V. Bajaj, A. Sengur, Y. C. Zhang. An advanced analysis system for identifying alcoholic brain state through EEG signals. International Journal of Automation and Computing, published online. DOI: 10.1007/s11633-019-1178-7.
    [34] C. Lehmann, T. Koenig, V. Jelic, L. Prichep, R. E. John, L. O. Wahlund, Y. Dodge, T. Dierks.  Application and comparison of classification algorithms for recognition of Alzheimer′s disease in electrical brain activity (EEG)[J]. Journal of Neuroscience Methods, 2007, 161(2): 342-350. doi: 10.1016/j.jneumeth.2006.10.023
    [35] J. C. McBride, X. P. Zhao, N. B. Munro, C. D. Smith, G. A. Jicha, L. Hively, L. S. Broster, F. A. Schmitt, R. J. Kryscio, Y. Jiang.  Spectral and complexity analysis of scalp EEG characteristics for mild cognitive impairment and early Alzheimer′s disease[J]. Computer Methods and Programs in Biomedicine, 2014, 114(2): 153-163. doi: 10.1016/j.cmpb.2014.01.019
    [36] P. M. Rossini, M. Buscema, M. Capriotti, E. Grossi, G. Rodriguez, C. Del Percio, C. Babiloni.  Is it possible to automatically distinguish resting EEG data of normal elderly vs. mild cognitive impairment subjects with high degree of accuracy?[J]. Clinical Neurophysiology, 2008, 119(7): 1534-1545. doi: 10.1016/j.clinph.2008.03.026
    [37] G. Fiscon, E. Weitschek, A. Cialini, G. Felici, P. Bertolazzi, S. De Salvo, A. Bramanti, P. Bramanti, M. C. De Cola. Combining EEG signal processing with supervised methods for Alzheimer′s patients classification. BMC Medical Informatics and Decision Making, vol. 18, Acticle number 35, 2018.
  • 加载中
  • [1] Ziheng Chen, Hongshik Ahn. Item Response Theory Based Ensemble in Machine Learning . International Journal of Automation and Computing, 2020, 17(5): 621-636.  doi: 10.1007/s11633-020-1239-y
    [2] Sajjad Afrakhteh, Mohammad-Reza Mosavi, Mohammad Khishe, Ahmad Ayatollahi. Accurate Classification of EEG Signals Using Neural Networks Trained by Hybrid Population-physic-based Algorithm . International Journal of Automation and Computing, 2020, 17(1): 108-122.  doi: 10.1007/s11633-018-1158-3
    [3] Dilantha Haputhanthri, Gunavaran Brihadiswaran, Sahan Gunathilaka, Dulani Meedeniya, Sampath Jayarathna, Mark Jaime, Christopher Harshaw. Integration of Facial Thermography in EEG-based Classification of ASD . International Journal of Automation and Computing, 2020, 17(): 1-18.  doi: 10.1007/s11633-020-1231-6
    [4] N. R. Nayak, P. K. Dash, R. Bisoi. A Hybrid Time Frequency Response and Fuzzy Decision Tree for Non-stationary Signal Analysis and Pattern Recognition . International Journal of Automation and Computing, 2019, 16(3): 398-412.  doi: 10.1007/s11633-018-1113-3
    [5] Siuly Siuly, Varun Bajaj, Abdulkadir Sengur, Yanchun Zhang. An Advanced Analysis System for Identifying Alcoholic Brain State Through EEG Signals . International Journal of Automation and Computing, 2019, 16(6): 737-747.  doi: 10.1007/s11633-019-1178-7
    [6] Xin-Yi Gong, Hu Su, De Xu, Zheng-Tao Zhang, Fei Shen, Hua-Bin Yang. An Overview of Contour Detection Approaches . International Journal of Automation and Computing, 2018, 15(6): 656-672.  doi: 10.1007/s11633-018-1117-z
    [7] Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda, Noureddine Ghoggali. A Novel Active Learning Method Using SVM for Text Classification . International Journal of Automation and Computing, 2018, 15(3): 290-298.  doi: 10.1007/s11633-015-0912-z
    [8] Zhong-Hua Hao,  Shi-Wei Ma,  Fan Zhao. Atlas Compatibility Transformation: A Normal Manifold Learning Algorithm . International Journal of Automation and Computing, 2015, 12(4): 382-392.  doi: 10.1007/s11633-014-0854-x
    [9] Munam Ali Shah,  Sijing Zhang,  Carsten Maple. Erratum to:An Analysis on Decentralized Adaptive MAC Protocols for Cognitive Radio Networks . International Journal of Automation and Computing, 2014, 11(1): 118-118.  doi: 10.1007/s11633-014-0773-x
    [10] Wen-Jiang Feng,  Rong Jiang,  Guo-Ling Liu. Distributed Power Control in Cooperative Cognitive Ad hoc Networks . International Journal of Automation and Computing, 2014, 11(4): 412-417.  doi: 10.1007/s11633-014-0807-4
    [11] Nassim Laouti,  Sami Othman,  Mazen Alamir,  Nida Sheibat-Othman. Combination of Model-based Observer and Support Vector Machines for Fault Detection of Wind Turbines . International Journal of Automation and Computing, 2014, 11(3): 274-287.  doi: 10.1007/s11633-014-0790-9
    [12] Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance . International Journal of Automation and Computing, 2014, 11(5): 480-488.  doi: 10.1007/s11633-014-0833-2
    [13] Xiao-Cheng Wang, Xin-Ping Guan, Qiao-Ni Han, Zhi-Xin Liu, Kai Ma. A Stackelberg Game for Spectrum Leasing in Cooperative Cognitive Radio Networks . International Journal of Automation and Computing, 2013, 10(2): 125-133.  doi: 10.1007/s11633-013-0705-1
    [14] Vineet Kumar, B. C. Nakra, A. P. Mittal. Some Investigations on Fuzzy P + Fuzzy I + Fuzzy D Controller for Non-stationary Process . International Journal of Automation and Computing, 2012, 9(5): 449-457.  doi: 10.1007/s11633-012-0666-9
    [15] Wideband Beamforming for Multipath Signals Based on Frequency Invariant Transformation . International Journal of Automation and Computing, 2012, 9(4): 420-428.  doi: 10.1007/s11633-012-0663-z
    [16] Lei Liu, Feng Yang, Peng Zhang, Jing-Yi Wu, Liang Hu. SVM-based Ontology Matching Approach . International Journal of Automation and Computing, 2012, 9(3): 306-314.  doi: 10.1007/s11633-012-0649-x
    [17] Ahmed Mudheher Hasan,  Khairulmizam Samsudin,  Abd Rahman Ramli. Intelligently Tuned Wavelet Parameters for GPS/INS Error Estimation . International Journal of Automation and Computing, 2011, 8(4): 411-420.  doi: 10.1007/s11633-011-0598-9
    [18] Marek Mlynczak,  Tomasz Nowakowski. Rank Reliability Assessment of the Technical Object at Early Design Stage with Limited Operational Data - A Case Study . International Journal of Automation and Computing, 2006, 3(2): 169-176.  doi: 10.1007/s11633-006-0169-7
    [19] Xun Chen,  Thitikorn Limchimchol. Monitoring Grinding Wheel Redress-life Using Support Vector Machines . International Journal of Automation and Computing, 2006, 3(1): 56-62.  doi: 10.1007/s11633-006-0056-2
    [20] L. Meng,  Q. H. Wu. Fast Training of Support Vector Machines Using Error-Center-Based Optimization . International Journal of Automation and Computing, 2005, 2(1): 6-12.  doi: 10.1007/s11633-005-0006-4
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures (10)  / Tables (5)

Metrics

Abstract Views (880) PDF downloads (34) Citations (0)

An Integrated MCI Detection Framework Based on Spectral-temporal Analysis

Abstract: Aiming to differentiate between mild cognitive impairment (MCI) patients and elderly control subjects, this study proposes an integrated framework based on spectral-temporal analysis for the automatic analysis of resting-state electroencephalogram (EEG) recordings. This framework firstly eliminates noise by employing stationary wavelet transformation (SWT). Then, a set of features is extracted through spectral-temporal analysis. Next, a new wrapper algorithm, named three-dimensional (3-D) evaluation algorithm, is proposed to derive an optimal feature subset. Finally, the support vector machine (SVM) algorithm is adopted to identify MCI patients on the optimal feature subset. Decision tree and K-nearest neighbors (KNN) algorithms are also used to test the effectiveness of the selected feature subset. Twenty-two subjects are involved in experiments, of which eleven persons were in an MCI condition and the rest were elderly control subjects. Extensive experiments show that our method is able to classify MCI patients and elderly control subjects automatically and effectively, with the accuracy of 96.94% achieved by the SVM classifier. Decision tree and KNN algorithms also achieved superior results based on the optimal feature subset extracted by the proposed framework. This study is conducive to timely diagnosis and intervention for MCI patients, and therefore to delaying cognitive decline and dementia onset.

Jiao Yin, Jinli Cao, Siuly Siuly and Hua Wang. An Integrated MCI Detection Framework Based on Spectral-temporal Analysis. International Journal of Automation and Computing, vol. 16, no. 6, pp. 786-799, 2019. doi: 10.1007/s11633-019-1197-4
Citation: Jiao Yin, Jinli Cao, Siuly Siuly and Hua Wang. An Integrated MCI Detection Framework Based on Spectral-temporal Analysis. International Journal of Automation and Computing, vol. 16, no. 6, pp. 786-799, 2019. doi: 10.1007/s11633-019-1197-4
    • Background. Alzheimer′s disease (AD) is the most common form of neurodegenerative dementia accounting for up to 75% of all dementia cases[1]. Despite its prevalence, thus far, no cure exists for AD. To make things worse, the diagnosis of Alzheimer′s disease is often missed or delayed in clinical practice. The early detection of dementia would provide opportunities for early intervention and symptomatic treatments. Recent studies have demonstrated that AD has a pre-symptomatic phase that can last for years, known as mild cognitive impairment (MCI)[25]. Obviously, detecting MCI is essential and effective for potential patients. However, the symptoms of MCI are easily dismissed as normal consequences of ageing, which makes the medical diagnosis of MCI difficult. The objective of this study is to identify MCI patients and elderly control subjects automatically and efficiently using resting-state electroencephalography (EEG) signals.

      Resting-state EEG signals. EEG-based methods have emerged as non-invasive alternative techniques for the detection of MCI. Via multiple electrodes placed on different areas of the scalp, the electrical activities of the brain are recorded in EEG signals, which are in the form of time series of voltage fluctuations[6]. Based on diverse recording conditions, EEG signals can be divided into two groups: event-related potentials (ERPs) and resting-state EEG recordings. The former is recorded in relation to the occurrence of some specific events, while the latter are spontaneous EEG signals recorded without any kind of stimulus. Resting-state EEG recordings are very easy and rapid to carry out in a clinical environment. Furthermore, it is more comfortable and less stressful for patients, especially for elderly individuals[7]. In this study, resting-state EEG signals are adopted.

      Literature review. The EEG signals of each subject contain dozens of channels and each channel consists of a huge amount of data points[8]. Traditional specialist-led approaches are struggling to reach a correct decision efficiently[9, 10]. Therefore, automatic detection methods based on machine learning algorithms are getting more and more attention. A typical MCI detection method consists of four steps, namely data pre-processing, feature extraction, feature selection and classification. The most widely used EEG pre-processing methods include visual inspection, resampling, re-referencing, filtering, smoothing, channel selection and data segmentation. Depending on the purpose and the data acquisition conditions, some of these techniques can be selected to refine the EEG signals. Usually, band-stop filters are a good choice for removing power grid interference (50 Hz or 60 Hz, depending on the region). Band-pass filters can be used to enhance only EEG-related spectral components[11, 12]. Feature extraction is generally performed after data pre-processing. There are many widely used techniques to extract features, such as statistical indices[13], spectral analysis[14, 15] and spectral-temporal analysis[1618]. The third step, feature selection based on relevance and redundancy analysis, is optimal, depending on the total amount of epochs and features[19, 20]. In the final step, a classifier is trained and evaluated based on machine learning algorithms to differentiate between MCI patients and elderly control subjects. The most commonly used machine learning algorithms in MCI detection include artificial neural networks (ANNs)[2123], k-nearest neighbour[2], decision trees[16], support vector machine (SVM)[24] and neuro-fuzzy inference system[2, 25]. Although there is a large amount of researches on MCI detection, their performance is still not satisfactory. Furthermore, to our best knowledge, there are no standard procedures commonly accepted in the area as yet. Most of the existing methods are still in the exploratory stage. So, for specific applications and specific data, experienced data processing scientists and engineers need to investigate further to achieve desirable performance.

      Proposed method. In this study, we propose an integrated spectral-temporal analysis based framework for MCI detection using resting-state EEG signals, aiming at improving the accuracy of detection. Compared to existing algorithms, our method has several noteworthy aspects:

      1) Removing noise of EEG signals based on the spectral characteristics of raw EEG signals. According to domain knowledge, we eliminate baseline drift and other low-frequency noises by removing 0–0.5 Hz components of EEG signals, and also eliminate high-frequency noises including grid interference by removing 32 Hz–128 Hz components, so as to denoise the EEG signals.

      2) Establishing a three-dimensional discrete feature space, based on stationary wavelet transform and descriptive statistical analysis. Stationary wavelet transformation (SWT) decomposed EEG signals into coefficients in the frequency domain and descriptive statistical analysis extracts the spectral-temporal features from those coefficients.

      3) Proposing a new wrapper algorithm, named three-dimensional (3-D) evaluation algorithm, to select an optimal feature subset instead of generating new features based on existing features in the feature selection step. It presents both individual and incremental evaluation on three-dimensional feature space separately.

    • The EEG dataset is an open source dataset, which was collected from subjects who had been admitted to cardiac catheterization units of Sina and Nour Hospitals, Isfahan, Iran[26]. The data collection was ethically approved by the deputy of research and technology, Isfahan University of Medical Sciences, Isfahan, Iran[2]. It is a collection of resting-state scalp EEG signals from 27 subjects (16 cognitively healthy subjects and 11 with an MCI) aged from 60 to 77 with elementary or higher education and a history of coronary angiography over the past year. To avoid generating an imbalanced dataset in this study, we picked 11 MCI and 11 cognitively healthy subjects to form a balanced dataset. Subjects with a history of substance misuse, major psychiatric disorders, serious medical disease, head trauma, and dementia were excluded.

      All EEG signals were recorded in the morning for over 30 minutes while the subjects were resting comfortably in a quiet room with their eyes closed but without being drowsy during the procedure. EEG activities were recorded continuously through 19 electrodes positioned on the scalp according to the International 10–20 System, using a 32-channel digital EEG device (Galileo NT, EBneuro, Italy) with 256 Hz sampling rate[2]. The collected EEG signals consist of 19 channels, namely, $ F_{p1} $, $ F_{p2} $, $ F_7 $, $ F_3 $, $ F_z $, $ F_4 $, $ F_8 $, $ T_3 $, $ C_3 $, $ C_z $, $ C_4 $, $ T_4 $, $ T_5 $, $ P_3 $, $ P_z $, $ P_4 $, $ T_6 $, $ O_1 $, $ O_2 $.

      In light of Peterson′s criteria, all subjects underwent a neuropsychiatric interview to diagnose MCI. A mini-mental state examination (MMSE) was utilized to validate the MCI diagnosis, where scores from 21 to 26 indicated MCI and scores more than 26 indicated a cognitively healthy subject. The neuropsychiatry unit cognitive assessment tool (NUCOG) was also used to confirm the diagnosis of MCI[2].

    • The objective of this work is to identify MCI subjects and elderly control subjects using the resting-state EEG signals. As shown in Fig. 1, the proposed framework consists of 4 steps. The raw EEG signals are cleaned using SWT-based methods in Step 1. A hybridized method is proposed to extract spectral-temporal features based on stationary wavelet decomposition and descriptive statistical analysis in Step 2. Next, an optimal feature subset is selected through the proposed 3-D evaluation algorithm. Finally, an SVM model is chosen as the classifier in Step 4. The subsequent parts of this section describe the implementation of each step in detail.

      Figure 1.  Work flow of the proposed framework

    • EEG signals contain kinds of noise, such as baseline drift and power line interference. Mixing together with EEG recordings, the large number of artefacts have different time-frequency properties. This study employs wavelet transform to provide information on both the time domain and frequency domain, which makes it possible to preserve the characteristics of EEG signals while minimizing noise.

      Previous studies have shown that the most important frequency bands of EEG signals are between 0.5 Hz and 32 Hz[27]. Therefore, we decompose raw EEG signals into coefficients with different frequency range using SWT, setting an appropriate decomposition level. Then, the high-frequency ( > 32 Hz) coefficients and the low-frequency ( < 0.5 Hz) coefficients are removed as noise[28]. Finally, the cleaned coefficients are reconstructed into time series signals as the denoised EEG signals via inverse stationery wavelet transformation (ISWT).

    • The goal of feature extraction is to obtain features from denoised EEG signals. First of all, in order to form a dataset with a large population, all channels of the denoised EEG signals are divided into small segments synchronously.

      Then, each channel in each segment is decomposed using 1-D SWT decomposition into four coefficients corresponding to four frequency bands: $ f_1 $ (0.5 Hz–4 Hz), $ f_2 $ (4 Hz–8 Hz), $ f_3 $ (8 Hz–16 Hz) and $ f_4 $ (16 Hz–32 Hz). Those coefficients contain information in both time domain and frequency domain, and thus are suitable to obtain spectral-temporal features.

      After that, the descriptive statistical analysis method is used to extracted features from the decomposed coefficients in the previous step. Nine widely used descriptive statistical features, namely, median (med), standard deviation (std), mean (me), mode (mo), interquartile range (iqr), skewness (ske) , kurtoses (kur), first quartile ($ Q_1 $) and third quartile ($ Q_3 $) are extracted from each coefficient. To reduce the impact of individual outliers, the descriptive statistical features maximum and minimum were not adopted in this work.

      Through this way, a discrete feature space with 3 dimensions, namely, channel ($ Ch $), frequency bands ($ FB $) and descriptive statistical feature ($ DSF $), are formed, as shown in Fig. 2.

      Figure 2.  Three-dimensional discrete feature space

      The value ranges of the three dimensions are shown in the following lists:

      $ \begin{split} Ch = & \; \{F_{p1}, F_{p2}, F_7, F_3, F_z, F_4, F_8, T_3, C_3,\\ & \;\; C_z, C_4, T_4, T_5, P_3, P_z, P_4, T_6, O_1, O_2 \}\quad\quad\quad\end{split} $

      (1)

      $ FB = \{f_1, f_2, f_3 ,f_4\} \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\;$

      (2)

      $ DSF = \{med,std,me,mo,iqr,ske,kur,Q_1,Q_3\}. $

      (3)

      The discrete feature space is denoted as $ {F} $ $ \in{\bf{R}}^{U \times V \times W} $, where $ U $ denotes the number of channels, $ V $ denotes the number of frequency bands, $ W $ denotes the number of descriptive statistical feature, and then

      $ n = U \times V \times W $

      (4)

      denotes the total number of features extracted from each segment. As shown in Fig. 2, $ {F(u, \, v, \, w)} $ ($ u\in [1, \, U] $, $ v $ $ \in [1, \, V] $, $ w\in [1,W] $) is a point or a specific feature in the discrete feature space F, representing the $ w $-th statistical feature extracted from the $ v $-th frequency band of the $ u $-th channel. Fig. 2 also demonstrates that the feature subsets denoted as $ {F(u, :, :)} $, $ {F(:, v, :)} $ and $ {F(:, :, w)} $ are planes in the discrete feature space, where the symbol semicolon ":" stands for all elements in the corresponding dimension.

      After extracting features from all segments, a dataset $ {[X|y]} $ is generated. The input matrix $ {X} $ $ \in{\bf{R}}^{m \times U \times V \times W } $ and the output vector $ {y} $ $ \in{\bf{R}}^{m} $, where $ m $ denotes the total number of segments. The $ i $-th ($ i = 1,\cdots,m $) sample in this dataset is $ [{X(i, :, :, :)}| y_i ]$, where $ y_i \in \{0,1\}$, and "1" means MCI samples, "0" means elderly control samples. $ {X(i, :, :, :)} $ denotes all features extracted from the $ i $-th sample (segment). So,

      $\begin{split} X(i, \, :, \, :, \, :) =\, & [X(i,1,1,1), X(i,1,1,2),\cdots,\\ & X(i,u,v,w),\cdots, X(i,U,V,W)]. \end{split} $

      (5)
    • Too many features might lead to bias and over-fitting for MCI classification. Intensive computation and time overheads are other possible problems. Moreover, some of the extracted features might be correlated and therefore provide no new information, and thus need to be removed. Compared with those algorithms, such as principal component analysis (PCA) and linear discriminant analysis (LDA), which derive new features from existing features, we proposed a wrapper method, named a 3-D evaluation algorithm, to choose an optimal feature subset from the existing feature space F to maintain the interpretability of features.

      The basic idea of wrapper algorithm is that the classifier is considered as a black box and its performance is used to select the optimal feature subset. Based on the 3-D discrete feature space $ {F} $ established in Section 2.2.3, the proposed 3-D evaluation algorithm evaluates the elements in three dimensions specified in (1)–(3) individually and incrementally.

      Algorithm 1. Individual and incremental evaluation on channel dimension

      1) Input: $ {X} \in{\bf{R}}^{m \times U \times V \times W} $, $ {y} $, $ {F} \in{\bf{R}}^{U \times V \times W} $, $ m $, $ U $, $ V $, $ W $

      2) Output: OptFeaSub, MaxAcc, MaxSens, MaxSpec /* inationalization */

      3) $ OptFeaSub $ = $ \emptyset $, $ MaxAcc $ = 0, $ MaxSens $ = 0, $ MaxSpec $ = 0;

      4) for each $ u \in [1,U] $ do

      /* individual channel assessment */

      5) $ {X_{eva}} $=$ {X[:, \, u, \, :, \, :]} $, $ {y_{eva}} $=$ {y} $;

      6) $ {X_{eva}} $= reshape($ {X_{eva}} $, $ [m,1 \times V \times W]) $

      7) Randomly separate dataset [$ {X_{eva}} $ $ | $ $ {y_{eva}}] $ into three parts: $[ {X_{train}} $ $ | $ $ {y_{train}}] $, [$ {X_{val}} $ $ | $ $ {y_{val}} $], [$ {X_{test}} $ $ | $ $ {y_{test}} $] in a ratio of 0.6 : 0.2 : 0.2;

      8) Train a SVM classifier using [$ {X_{train}} $ $ | $ $ {y_{train}} $]

      9) Evaluate the trained SVM classifer using [$ {X_{val}} $ $ | $ $ {y_{val}} $], and calculate $ Acc, Sens, Spec $;

      10) if $ Acc>MaxAcc $ then

      11) $ OptFeaSub $ = $ {F(u, \, :, \, :)} $;

      12) MaxAcc = Acc, MaxSens = Sens, MaxSpec = Spec;

      13) end if

      /* Incremental channel assessment */

      14) $ {X_{eva}} $=$ {X[:,1:u, \, :, \, :]} $, $ {y_{eva}} $ =$ {y} $;

      15) $ {X_{eva}} $= reshape($ {X_{eva}} $, $ [m,u \times V \times W]) $;

      16) Randomly separate dataset [$ {X_{eva}} $ $ | $ $ {y_{eva}} $] into three parts: [$ {X_{train}} $ $ | $ $ {y_{train}} $], [$ {X_{val}} $ $ | $ $ {y_{val}} $], [$ {X_{test}} $ $ | $ $ {y_{test}} $] in a ratio of 0.6 : 0.2 : 0.2;

      17) Train a SVM classifier using [$ {X_{train}} $ $ | $ $ {y_{train}} $];

      18) Evaluate the trained SVM classifer using [$ {X_{val}} $ $ | $ $ {y_{val}} $], and calculate $ Acc, Sens, Spec $;

      19) if $ Acc>MaxAcc $ then

      20) $ OptFeaSub $=$ {F(1:u,:,:)} $;

      21) MaxAcc = Acc, MaxSens = Sens, MaxSpec = Spec;

      22) end if

      23) end for

      24) return OptFeaSub, MaxAcc, MaxSens, MaxSpec.

      More specifically, the pseudocode of individual and incremental assessment on channel dimension is described in Algorithm 1. The inputs are the dataset $ {[X|y]} $ formed after feature extraction, the feature space F and a series of scalars, $ m $, $ U $, $ V $, $ W $. After initialization, we conduct individual channel assessment via evaluating the performance of SVM classifier on the feature subset $ {F(u, \, :, \, :)} $ ($ u\in[1,U] $) (Steps 5–9), then seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ (Steps 10–13), where $ OptFeaSub $ denotes the selected optimal feature subset, and $ MaxAcc $, $ MaxSens $, $ MaxSpec $ denote the corresponding accuracy, sensitivity and specificity achieved on this optimal feature subset. The definitions of accuracy, sensitivity and specificity are specified in Section 2.3. Then, similarly, conduct incremental channel assessment on the feature subset $ {F(1:u, \, :, \, :)} $, where $ {1:u} $ means all elements between the first element to the $ u $-th element on the channel dimension. As $ u $ changes from 1 to $ U $, the number of features in feature subset $ {F(1:u, \, :, \, :)} $ increase incrementally. At last, $ OptFeaSub $, $ MaxAcc $, $ MaxSens $ and $ MaxSpec $ are returned as outputs.

      The pseudocode of the whole 3-D evaluation algorithm is specified in Algorithm 2. After the evaluation in channel dimension, we conduct evaluation in the frequency band ($ FB $) and descriptive statistical feature ($ DSF $) dimensions in order, as described in Steps 10–23 of Algorithm 2. Individual assessment on feature subset $ {F(:, \, v, \, :)} $ ($ u\in[1, \, U] $), $ {F(:, \, :, \, w)} $ ($ u\in[1, \, W] $) and also incremental assessment on feature subset $ {F(:, \, 1 \, : \, v, \, :)} $ and $ {F(:, \, :, \, 1 \, : \, w)} $ are carried out.

      It should be noted that, when evaluating the latter two dimensions, the initial values of $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, and $ MaxSpec $ should be the returned values from the previous dimension evaluation, instead of $ \emptyset $ or 0, as described in Steps 10 and 17 of Algorithm 2.

      Algorithm 2. 3-D evaluation algorithm

      1) Input: $ {X}\in{\bf{R}}^{m \times U \times V \times W} $, $ {y} $, $ {F}\in{\bf{R}}^{U \times V \times W} $, $ m $, $ U $, $ V $, $ W $

      2) Output: $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $

      /* initialization */

      3) $ OptFeaSub $ =$ \emptyset $, $ MaxAcc $=0, $ MaxSens $=0, $ MaxSpec $=0, $ seed $=1;

      /* feature subset selection on channel dimension */

      4) for each $ u \in [1,U] $ do

      5) Individual channel assessment on feature subset $ {F(u, \, :, \, :)} $;

      6) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ as described in Steps 10-13 of Algorithm 1;

      7) Incremental channel assessment on feature subset $ {F(1:u, \, :, \, :)} $;

      8) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ as described in Steps 20–23 of Algrothm 1;

      9) end for

      /* feature subset selection on FB dimension */

      10) Initiate $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ as the result of Step 8;

      11) for each $ v \in [1,V] $ do

      12) Individual FB assessment on feature subset $ {F(:,v,:)} $;

      13) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ similarly as Step 6;

      14) Incremental FB assessment on feature subset $ {F(:,1:v, \, :)} $;

      15) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ similarly as Step 8;

      16) end for

      /* feature subset selection on DSF dimension */

      17) Initiate $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ as the result of Step 15;

      18) for each $ w \in [1,W] $ do

      19) Individual DSF assessment on feature subset $ {F(:, \, :, \, w)} $;

      20) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ similarly as Step 6;

      21) Incremental DSF assessment on feature subset $ {F(:, \, :, \, 1:w)} $;

      22) Seek out $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $ similarly as Step 8;

      23) end for

      24) return $ OptFeaSub $, $ MaxAcc $, $ MaxSens $, $ MaxSpec $.

    • SVM has been widely used in pattern recognition and regression due to its computational efficiency and good generalization performance[29, 30]. The core of the SVM algorithm for binary classification is mapping the input data into a linearly separable space using a kernel function. It also applies a minimization algorithm to minimize the objective function and maximize the margins between two classes at the same time. SVM is stable and effective at dealing with the small or medium scale of data, because only support vectors are used to construct the separating hyperplane. Considering the scale of our dataset, SVM is chosen as the classification algorithm.

      In order to further demonstrate the effectiveness of the extracted and selected optimal feature subset, two well-known machine learning classifiers (decision tree and KNN), are also adopted to verify the classification performance of the proposed framework.

    • To evaluate the proposed algorithm and compare it with other state-of-the-art algorithms, three widely used metrics in this domain based on the confusion matrix are adopted. A confusion matrix $ {C} $ is a square matrix whose size $ k $ is equal to the total number of classes to be classified. The element $ C(i,j) $ is the count of samples known to be in class $ i $ (true condition) and predicted to be in class $ j $ (predicted condition), where $ i =1, 2,\cdots $, $ k $ and $ j = 1, 2, \cdots $, $ k $.

      The confusion matrix for binary classification is shown in Table 1. The element $ C_{11} $ is also known as $ true $ $ negatives $ (TN), which indicates the count of observations predicted to be negative and also known to be negative in the true condition. Similarly, we can get the meaning of false positives ($ FP $), false negatives (FN) and true positives (TN). Obviously,

      Total population Predicted negatives Predicted positives
      True negatives TN FP
      True positives FN TP

      Table 1.  Confusion matrix for binary classification

      $ Total\ Population = TN+FP+FN+TP. $

      (6)

      The three metrics to evaluate the performance of a classifier are defined as (7)–(9) respectively[31]. Sensitivity ($ Sens $) is a measure of the capacity to correctly identify $ true $ $ positives $. Specificity ($ Spec $) reflects the capacity to correctly identify $ true $ $ \ negatives $ and accuracy ($ Acc $) is the proportion of correct classified instances.

      $ Sens = \frac{TP}{TP+FN}\times 100{\text{%}}\quad\quad\quad\quad\quad$

      (7)

      $ Spec = \frac{TN}{TN+FP}\times 100{\text{%}}\quad\quad\quad\quad\quad$

      (8)

      $ Acc = \frac{TP+TN}{TP+FN+TN+FP}\times 100{\text{%}}.$

      (9)

      Obviously, accuracy is the average of sensitivity and specificity, so we only take accuracy into account when comparing the performance of two algorithms. Sensitivity and specificity are used as a reference to determine whether an algorithm is biased towards a single category.

    • All the experimental works are simulated and implemented under the Matlab 2018a software environment. Some built-in functions are called from the "Wavelet Toolbox" and "Statistics and Machine Learning Toolbox". The following parts of this section describe the parameter settings and the experiment results.

    • SWT decomposition and reconstruction are implemented by calling the built-in functions "swt" and "iswt" from the "Wavelet Toolbox" of Matlab. The mother wavelet basis function used in this work is $ sym9 $, which is chosen from the $ Symlets $ mother wavelet family, because $ sym9 $ is reported to be suitable for denoising, decomposition, reconstruction, and sub-band feature extraction[32]. Since the sampling rate of the collected EEG signals is 256 Hz, the decomposition level, following the Nyquist criterion, is set to 8 to obtain the coefficients with the appropriate frequency band. Fig. 3 demonstrates the level-8 1-D SWT decomposition process. SWT applies low-pass and high-pass filters to decompose the input signals and produces two time-series sequences, namely, approximation coefficient $ A_i $ and detail coefficient $ D_i $ at level $ i $. The two coefficients have the same length as the input signal to be composed. In Fig. 3, $ SR $ denotes the sampling rate of the EEG signals; and Hi_D and Lo_D denote the high-pass and low-pass decomposition separately. Levels 6 and 7 are omitted.

      Figure 3.  A level-8 SWT decomposition diagram of EEG signal

    • As described in Section 2.2.2, in the denoising stage, after decomposing via SWT, we keep the components with the frequency of 0.5 Hz–32 Hz, and remove components with other frequency.

      First, we call the "swt" function to decompose each channel of the raw EEG signals to obtain coefficients: $ A_8 $ (0–0.5 Hz), $ D_2 $ (32 Hz–64 Hz) and $ D_1 $ (64 Hz–128 Hz). Then, we reconstruct them to get the time series signals which are the noise signals to be removed, by calling the "iswt" function in Matlab. Finally, the reconstructed noise is subtracted from the original EEG signals to get the denoised EEG signals.

      Fig. 4 demonstrates the 19-channel EEG signals of an MCI subject in the time domain. Subplot Fig. 4(a) shows the raw EEG signals, and subplot Fig. 4(b) is the denoised EEG signals.

      Figure 4.  The 19-channel EEG signals of an MCI subject in time domain (Color versions of the figures in this paper are available online)

      Fig. 5 illustrates the denoising process in the frequency domain. The signals in Fig. 5 are transformed from time series signals by Fourier transform. The different coloured lines represent different channels. Fig. 5 (a) shows the raw EEG signals in the frequency domain. It is obvious that the frequency components from 0 to 0.5 Hz are extraordinarily large, which is chiefly caused by baseline drift. Furthermore, there is a peak amplitude of 50 Hz, which is mainly caused by power line interference. Fig. 5 (b) shows the low-frequency noise to be removed in 0–0.5 Hz. The partially enlarged detail in Fig. 5 (b) shows that the components with the frequency less than 0.5 Hz will be eliminated. Fig. 5 (c) presents the high-frequency noise to be removed in 32 Hz–128 Hz including the obvious 50 Hz power line interference. Since there are no ideal bandpass filters, some components $ < $ 32 Hz are removed too. However, their amplitudes are too small to interfere with the results, as shown in the partially enlarged detail of Fig. 5 (c). Fig. 5 (d) are the denoised signals after removing signals in Figs. 5 (b) and 5 (c) from Fig. 5 (a). As a schematic diagram, Fig. 5 is plotted based on a 4-second segment split from an MCI object.

      Figure 5.  Frequency domain EEG signals with 19 channels of an MCI subject

    • As mentioned in Section 2.2.3, we do segmentation before feature extraction. The length of the sliding window and the overlap ratio between two neighbour segments are two factors which need to be considered in the segmentation process. Considering the basic information of our dataset (30 minutes′ duration, 22 subjects and a quite large number of features will be extracted), we tried several possible values, i.e., 0.5 s, 1 s, 2 s, 4 s, of window length. Finally, the window length is fixed at 2 s in this work due to the performance of MCI detection. The overlapping rate is set to 0% to avoid bias result. In this case, the stride of the sliding window is equal to the length of the window. So, the total number of segments (samples) in our experiments is

      $ m = \left[floor \left( \frac{ signal\ length-window\ length}{stride}\right)+1\right]\times n_s $

      (10)

      where signal length is the total lasting seconds of a signal, which equals to 1 800 s in our experiments; window length means total lasting seconds of a segment, which equals to 2 s; stride = 2 s means the interval between two segment starting points and ns = 22 denotes the number of subjects. According to (1), m = 19 800.

      For each segment, every channel specified in (1) is decomposed into four coefficients: $ A_5 $ (0.5 Hz–4 Hz), $ D_5 $(4 Hz–8 Hz), $ D_4 $(8 Hz–16 Hz) and $ D_3 $(16 Hz–32 Hz) via 1-D SWT. The mother wavelet basis function is still set to $ sym9 $ and 5 is the appropriate decomposition level to achieve the desired frequency resolution, as shown in Fig. 3.

      Fig. 6 demonstrates the spectral-temporal characteristics of Channel $ O_1 $, which is picked from a segment of an MCI subject after denoising. Fig. 6 (a) presents 5 signals in time domain, namely, the denoised channel $ O_1 $ and its 4 coefficients of $ D_3 $, $ D_4 $, $ D_5 $, $ A_5 $. Fig. 6 (b) shows those 5 signals in frequency domain.

      Figure 6.  Channel $ O_1 $ and its four coefficients

      As shown in Fig. 6, the decomposed coefficients reflect the spectral-temporal characteristics of EEG signals. We employ descriptive statistical analysis to extract features from those coefficients. Specifically, 9 descriptive statistical features listed in (3) are extracted from each coefficient by calling the corresponding built-in functions in Matlab. Therefore, the total number of features is extracted from each segment $ n $ = 19 $ \times $ 4 $ \times $ 9 = 684 according to (4), where 19 denotes the number of channels, 4 indicates the number of frequency bands, and 9 means the number of descriptive statistical features.

    • The proposed 3-D evaluation feature selection method is a kind of wrapper model. An SVM classifer is used as a black box and its performance is used to select the optimal feature subset. We implement the SVM classifer using the built-in functions in "Statistics and Machine Learning Toolbox" of Matlab. Particularly, we create a "ClassificationSVM" object as the binary classier, and train it with the "fitcsvm" function, then use the "predict" function to make predictions with the trained SVM classifer. In order to simplify calculation, the "KernelFunction" is set as "polynomial" and "PolynomialOrder" is set as 2. "KernelScale" is set as "auto", "BoxConstraint" is set as 2, and "Standardize" is set as true. Other parameters are set as default. The SVM models share the same settings in Step 3 (3-D evaluation) and Step 4 (classification) of the proposed framework, as shown in Fig. 1.

      As described in Algrothms 1 and 2, we evaluate the feature subset in channel dimension first. Fig. 7 (a) and the left part of Table 2 show the results of individual assessment. The solid lines in Fig. 7 gives the performance on training set while the dash lines is on validation set. As emphasized with bold text in Table 2, after individual evaluation, $ OptFeaSub $ = $ {F(13, \, :, \, :)} $ and $ MaxAcc $ = 75.28% on the corresponding validation set. Obviously, using features extracted from any single channel cannot achieve good MCI detection performance. When evaluating incrementally on channel dimension, as marked by bold text in the right part of Table 2, we get the best performance on $ {F(1:19, \, :, \, :)} $, which means that $ OptFeaSub $ = $ {F(1:19, \, :, \, :)} $ and $ MaxAcc $ = 94.74%. Since we have 19 channels in total, the evaluation results on channel dimension indicates that the best performance is achieved on the whole feature set.

      Figure 7.  Evaluation results on frequency band dimension

      Individual evaluation Individual evaluation
      Feature subset Accuracy Sensitivity Specificity Feature subset Accuracy Sensitivity Specificity
      ${{F(1, \, :, \, :)}}$ 69.44% 66.14% 72.65% ${{F(1, \, :, \, :)}}$ 69.44% 66.14% 72.65%
      ${{F(2, \, :, \, :)}}$ 68.10% 73.06% 63.28% ${{F(1:2, \, :, \, :)}}$ 76.24% 78.91% 73.64%
      ${{F(3, \, :, \, :)}}$ 63.52% 66.75% 60.39% ${{F(1:3, \, :, \, :)}}$ 80.13% 80.91% 79.37%
      ${{F(4, \, :, \, :)}}$ 64.86% 63.62% 66.07% ${{F(1:4, \, :, \, :)}}$ 83.37% 84.61% 82.16%
      ${{F(5, \, :, \, :)}}$ 66.48% 58.95% 73.79% ${{F(1:5, \, :, \, :)}}$ 85.06% 86.40% 83.76%
      ${{F(6, \, :, \, :)}}$ 66.96% 65.67% 68.21% ${{F(1:6, \, :, \, :)}}$ 85.97% 86.92% 85.05%
      ${{F(7, \, :, \, :)}}$ 67.42% 73.27% 61.73% ${{F(1:7, \, :, \, :)}}$ 88.52% 89.99% 87.10%
      ${{F(8, \, :, \, :)}}$ 68.20% 64.19% 72.10% ${{F(1:8, \, :, \, :)}}$ 90.34% 91.59% 89.14%
      ${{F(9, \, :, \, :)}}$ 67.52% 64.70% 70.25% ${{F(1:9, \, :, \, :)}}$ 90.34% 91.59% 89.14%
      ${{F(10, \, :, \, :)}}$ 68.63% 67.16% 70.05% ${{F(1;10, \, :, \, :)}}$ 90.60% 91.94% 89.29%
      ${{F(11, \, :, \, :)}}$ 68.63% 67.98% 69.26% ${{F(1:11, \, :, \, :)}}$ 90.65% 92.30% 89.04%
      ${{F(12, \, :, \, :)}}$ 73.66% 73.32% 73.99% ${{F(1:12, \, :, \, :)}}$ 91.96% 93.28% 90.68%
      ${{{{F}}(13, \, :, \, :)}}$ 75.28% 74.81% 75.73% ${{F(1:13, \, :, \, :)}}$ 93.91% 94.61% 93.22%
      ${{F(14, \, :, \, :)}}$ 71.54% 69.98% 73.04% ${{F(1:14, \, :, \, :)}}$ 93.58% 94.61% 92.58%
      ${{F(15, \, :, \, :)}}$ 71.71% 75.63% 67.91% ${{F(1:15, \, :, \, :)}}$ 93.98% 94.92% 93.07%
      ${{F(16, \, :, \, :)}}$ 69.89% 73.22% 66.67% ${{F(1:16, \, :, \, :)}}$ 94.19% 94.97% 93.42%
      ${{F(17, \, :, \, :)}}$ 74.60% 75.89% 73.34% ${{F(1:17, \, :, \, :)}}$ 94.14% 95.02% 93.27%
      ${{F(18, \, :, \, :)}}$ 71.99% 75.68% 68.41% ${{F(1:18, \, :, \, :)}}$ 94.08% 95.18% 93.02%
      ${{F(19, \, :, \, :)}}$ 72.65% 80.71% 64.82% ${{{{F}}(1:19, \, :, \, :)}}$ 94.74% 95.90% 93.62%

      Table 2.  Numerical results of feature subset evaluation on channel dimension (performance on validation set)

      The evaluation results on frequency band dimension are show in Fig. 8 and Table 3. We emphasized the best performance of individual assessment and incremental assessment with bold-type, and both of them are no better than $ MaxAcc $ = 94.74%. According to the description in Algorithm 2, still $ OptFeaSub $ = $ {F(1:19, \, :, \, :)} $ and $ MaxAcc $ = 94.74%.

      Figure 8.  Evaluation results on frequency band dimension

      Individual evaluation Incremental evaluation
      Feature subset Accuracy Sensitivity Specificity Feature subset Accuracy Sensitivity Specificity
      ${{F(:, \, 1, \, :)}}$ 75.53% 80.66% 70.55% ${{F(:, \, 1, \, :)}}$ 75.53% 80.66% 70.55%
      ${{F(:, \, 2, \, :)}}$ 79.90% 84.09% 75.83% ${{F(:, \, 1:2, \, :)}}$ 82.86% 85.79% 80.02%
      ${{F(:, \, 3, \, :)}}$ 83.37% 86.51% 80.32% ${{F(:, \, 1:3, \, :)}}$ 89.13% 91.64% 86.70%
      ${{{{F}}(:, \, 4, \, :)}}$ 88.62% 89.12% 88.14% ${{{{F}}(:, \, 1:4, \, :)}}$ 94.62% 95.84% 93.42%

      Table 3.  Numerical results of feature subset evaluation on frequency band dimension (performance on evaluation set)

      As for the evaluation results on the descriptive statistical feature dimension, we demonstrated those in Fig. 9 and Table 4. The best performance of individual evaluation is marked in bold-type, and it is the best one so far. So, $ OptFeaSub $ = $ {F(:, \, :, \, 5)} $ and $ MaxAcc $ = 97.45%. The best performance of incremental evaluation is also marked in bold-type, and it is deprecated because it is less than 97.45%.

      Figure 9.  Evaluation results on the statistical feature dimension

      Individual evaluation Incremental evaluation
      Feature subset Accuracy Sensitivity Specificity Feature subset Accuracy Sensitivity Specificity
      ${{F(:, \, :, \, 1)}}$ 68.30% 76.40% 60.44% ${{F(:, \, :, \, 1)}}$ 68.30% 76.40% 60.44%
      ${{F(:, \, :, \, 2)}}$ 96.94% 97.43% 96.46% ${{F(:, \, :, \, 1:2)}}$ 89.79% 92.00% 87.64%
      ${{F(:, \, :, \, 3)}}$ 64.61% 77.37% 52.22% ${{F(:, \, :, \, 1:3)}}$ 88.47% 90.30% 86.70%
      ${{F(:, \, :, \, 4)}}$ 91.94% 93.89% 90.03% ${{F(:, \, :, \, 1:4)}}$ 90.34% 92.25% 88.49%
      ${{{{F}}(:, \, :, \, {{5}})}}$ 97.45% 97.74% 97.16% ${{F(:, \, :, \, 1:5)}}$ 94.41% 95.43% 93.42%
      ${{F(:, \, :, \, 6)}}$ 66.56% 70.81% 62.43% ${{F(:, \, :, \, 1:6)}}$ 93.20% 94.56% 91.88%
      ${{{{F}}(:, \, :, \, {{7}})}}$ 64.11% 68.55% 59.79% ${{F(:, \, :, \, 1:7)}}$ 92.75% 94.30% 91.23%
      ${{F(:, \, :, \, 8)}}$ 97.14% 97.38% 96.91% ${{F(:, \, :, \, 1:8)}}$ 93.63% 94.97% 92.33%
      ${{F(:, \, :, \, 9)}}$ 96.79% 97.38% 96.21% ${{{{F}}(:, \, :, \, 1:9)}}$ 94.69% 95.84% 93.57%

      Table 4.  Numerical results of feature subset evaluation on descriptive statistical feature dimension

      As a conclusion, the final optimal feature subset selected is $ OptFeaSub $ = $ {F(:, \, :, \, 5)} $, which means the 5th descriptive statistical feature, i.e., $ interquartile $ $ range $, extracted from all frequency bands listed in (2) and all channels listed in (1). The total number of features in the extracted optimal feature subset is

      $ n_{opt} = U\times V \times 1 = 19\times 4 \times 1 = 72. $

      (11)

      Considering the total number of features $ n $ = 684 calculated by (4), the suppression ratio of the proposed 3-D evaluation feature selection algorithm is

      $ \begin{split} {\rm{Suppression\ ratio}} = & \; \dfrac{n_{opt}}{n} \times 100 \% = \\ & \; \dfrac{72}{684} \times 100 \% = 11.11. \%. \end{split}$

      (12)
    • Finally, we test the performance of the proposed framework on the test set and compare it with other MCI detection algorithms based on EEG signals.

      As described in Section 3.3, $ {F(:, \, :, \, 5)} $ is the final optimal feature subset. So, in this stage, we only picked the data with feature subset $ {F(:, \, :, \, 5)} $ to test the classification performance. In order to get an unbiased results, we test the performance on the test set [$ {X_{test}} | {y_{test}} $], which has not been seen by the classifier when training.

      Besides SVM, we also implement two well-known algorithms[33] (decision tree and KNN) to verify the effectiveness of the selected feature subset in the classification stage. All those three classifiers are implemented by the built-in functions in the "Statistics and Machine Learning Toolbox" of Matlab. The SVM classifier keeps the same setting described in Section 3.3. We use the built-in function $ fitcknn $ to optimize the hyperparameters automatically for the KNN classifier. As a result, the optimised $ NumNeighbors $ is set to 5 and the $ Distance $ is set as $ seuclidean $. Other parameters keep the default setting. Similarly, we use $ fitctree $ to implement and optimize the decision tree classifier and the parameter $ MinLeafSize $ is set to 9.

      The classification results are shown in the last three rows of Table 5. We can see that, with the right features $ OptFeaSub $ selected by the proposed framework, all the three classifiers can achieve superior performance compared with other similar works[2, 3437] reported recently, especially the work in [2] which uses the same dataset with us. Among those three classifiers, the SVM classifier has a narrow lead.

      Algorithms Accuracy Sensitivity Specificity Classification
      NF-KNN[2] 88.89% 100% 83.33% MCI versus healthy control
      Lehmann et al.[34] 88.5% 89% 88% MCI versus healthy control
      McBride et al.[35] 92.59% 100% 84.61% MCI versus healthy control
      Rossini et al.[36] 93.46% 95.87% 91.06% MCI versus healthy control
      Wavelet+SVM[37] 91.7% 91.7% 91.7% MCI versus healthy control
      OptFeaSub + SVM 96.94% 96.89% 96.99% MCI versus healthy control
      Decision Tree 95.47% 95.38% 95.55% MCI versus healthy control
      KNN 96.89% 97.25% 96.54% MCI versus healthy control

      Table 5.  Performance comparison

    • The spectral-temporal characteristics of each channel are reflected on the four decomposed coefficients, i.e., $ D_3 $, $ D_4 $, $ D_5 $, $ A_5 $. The present study extracts descriptive statistical features from those coefficients, so the extracted features contain information in both time domain and frequency domain.

      As shown in the last row of the right parts of Tables 24, $ {F(1:19,:,:)} $ = $ {F(:,1:4,:)} $ = $ {F(:,:,1:9)} $ denotes the whole feature space F. The accuracy is located in the range of [94.62%, 94.74%] on the whole feature space F, which outperforms algorithms proposed by others listed in Table 5. Thus, the proposed spectral-temporal feature extraction strategy is quite effective.

      There is no large deviation between the sensitivity and specificity of the proposed framework, which indicates that the strategy of choosing a balanced data set is effective in reducing the inconsistency caused by the data structure. Moreover, the accuracy achieved on the whole feature set F is slightly different, ranging from 94.62% to 94.74% as shown in Tables 24. The reason behind is that, when evaluating on different dimensions, the order we incrementally add into the feature subset is different, thus the initialization state of the SVM classifier is different. However, this slight difference does not affect the consistency of results.

    • Ranking all channels in descending order of accuracy based on the individual evaluation results in the channel dimension, the result is: $ T_5 $, $ T_6 $, $ T_4 $, $ O_2 $, $ O_1 $, $ P_z $ $ P_3 $, $ P_4 $, $ F_{p1} $, $ C_z $, $ C_4 $, $ T_3 $, $ F_{p2} $, $ C_3 $, $ F_8 $, $ F_4 $, $ F_z $, $ F_3 $, $ F_7 $. Considering the location of the 19 channels on the scalp as shown in Fig. 10, we discover that the temporal and occipital areas are more effective for MCI detection than the frontal and central areas. Similarly, the frequency band sequence in descending order is $ D_3 $, $ D_4 $, $ D_5 $, $ A_5 $, which means coefficients of higher frequency bands are more effective than lower ones in the frequency scope of 0.5 Hz–32 Hz for MCI detection. The descending order of the descriptive statistical feature sequence is $ iqr $, $ 1stq $, $ std $, $ 3rdq $, $ mode $, $ median $, $ ske $, $ mean $ and $ kur $ in the statistical feature dimension. Fig. 9 (a) and the left part of Table 4 further demonstrates that $ Q_1 $, $ std $ and $ Q_3 $ are also quite effective in differentiating subjects with an MCI and those who are cognitively healthy.

      Figure 10.  International 10–20 system of electrode placement

      The results of the individual evaluation on channel and frequency band dimensions show that a single channel or a single frequency band can hardly achieve good performance in MCI detection. As for descriptive statistical feature dimension, the optimal feature subset is $ F(:, \, :, \, 5) $ which contains only one descriptive statistical feature, i.e., $ iqr $, but includes all channels and frequency bands. So, we can draw the conclusion that every channel and frequency band has unique information for MCI detection, whereas the statistical features are highly redundant and $ iqr $ is the properly selected descriptive statistical feature in this problem.

      The presented 3-D evaluation algorithm is efficient and effective for feature selection, because with a suppression ratio of 11.11%, the selected feature subset $ F(:, \, :, \, 5) $ can obtain the best performance, which is even better than that on the full feature space. Conversely, with the same number of inappropriate features, e.g., the feature subset of $ F(:, \, :, \, 7) $, the accuracy is only 64.11%, as shown in Table 4 marked with bold-type.

    • Although the aforementioned advantages, this work also suffers from a problem of the limited dataset. The total number of subjects involved in the experiments is 22, which means the trained model can hardly be used in non-patient specific scenes. In future work, we plan to collect more data and try the automatic feature extraction method based on deep learning algorithms. Multi-class classification between MCI, healthy control and AD patients will also be involved.

    • A systematic framework is proposed to identify MCI patients and elderly control subjects using resting-state EEG signals. The proposed scheme can efficiently eliminate the baseline drift and power line interference from the raw EEG signals. It also takes advantage of extracting information from both time domain and frequency domain, and a set of highly representative spectral-temporal features are extracted. Moreover, an effective feature subset is selected through the proposed 3-D evaluation algorithm. Extensive experiments were conducted based on clinical data. The results show that, compared with other similar works, our method achieves a better performance.

    • The first author was supported by a La Trobe University Postgraduate Research Scholarship and a La Trobe University Full Fee Research Scholarship, she was also supported by the Science Foundation of Chongqing University of Arts and Sciences Chongqing, China (No. Z2016RJ15).

Reference (37)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return