On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier

Main Article Content

Tita Nurul Nuklianggraita
Adiwijaya Adiwijaya
Annisa Aditsania

Abstract

Cancer is a disease that can affect all organs of humans. Based on data from the World Health Organization (WHO) fact sheet in 2018, cancer deaths have reached 9.6 million. One known way to detect cancer that is with Microarray Technique, but the microarray data have large dimensions due to the number of features that are very much compared to the number of samples. Therefore, dimension reduction should be made to produce optimum accuracy. In this paper, we compare Minimum Redundancy Maximum Relevance (MRMR) and Least Absolute Shrinkage and Selection Operator (LASSO) to reduce the dimension of microarray data. Moreover, by using Random Forest (RF) Classifier, the performance of classification (cancer detection) is compared. Based on the simulation, it can be concluded that LASSO is better than MRMR because it can produce an evaluation of 100% in lung and ovarian cancer, 92% colon cancer, 93% prostate tumor, and 83% central nervous system.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
T. Nuklianggraita, A. Adiwijaya, and A. Aditsania, “On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier”, INFOTEL, vol. 12, no. 3, pp. 89-96, Aug. 2020.
Section
Informatics

References

[1] World Health Organization, Cancer Factsheets, 2018.
[2] Rebecca L. Siegel, MPH; Kimberly D. Miller, MPH; Ahmedin Jemal, DVM, PhD; CA Cancer J Clin, American Cancer Society, 69:7-34;2019.
[3] Aydadenta, Husna, and Adiwijaya Adiwijaya. "A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest." Journal of Information Processing Systems 14.5, pp. 1167-1175, 2018
[4] Ma’ruf, Firda Aminy, and Untari Novia Wisesty. "Analysis of the influence of Minimum Redundancy Maximum Relevance as dimensionality reduction method on cancer classification based on microarray data using Support Vector Machine classifier." In Journal of Physics: Conference Series, vol. 1192, no. 1, p. 012011. IOP Publishing, 2019
[5] Z. M. Hira and D. F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data”, Advances in Bioinformatics, vol. 2015, article ID, 198363, 2015.
[6] Somnath, D and Susmita, D., “Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO”. Journal of Biometrics, Maret vol 63 No.1, pp.259-271. USA. 2007
[7] Zhu. C, Gao. D, “Influence of Data Preprocessing”, Journal of Computing Science and Engineering, vol.10, No.2, pp. 51-57, June, 2016.
[8] Li. Z, Zhou. X, Dai. Z, Zou. X, “Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm”, BMC Bioinformatics, 2010.
[9] Ding. C, Hanchuan Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data”. Journal of bioinformatics and computational biology 3 (02) pp.185-205. 2005
[10] National Human Genome Research Institute. [Online] https://www.genome.gov/about-genomics/fact-sheets/DNA-Microarray-Technology [Accesed 24 Oktober 2019]
[11] Adiwijaya, Aulia MN, Mubarok MS, and Novia WU and Nhita, F. A, " comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters pronounciation classification system. Information and Communication Technology (ICoIC7)." In 5th International Conference on pp, pp. 1-5. 2017.
[12] Gerard Biau, “Analysis of a Random Forests Model”, Journal of Machine Learning Research 13 (2012) 1063-1095.
[13] Farmani, D.K, Kencana. N, Sukarsa.G, ”Perbandingan Analisis Least Absolute Shrinkage and Selection Operator dan Partial Least Squares”, e-Jurnal Matematika, Vol. 1, No. 1, Agustus 2012, 75-80.
[14] Breiman. L, “Random Forest”, Machine Learning, Kluwer Academic Publishers, Manufactured in The Netherlands, 45, 5–32, 2001.
[15] Muhammad Murtadha ramadhan,. Imas Sukaesih Sitanggang,. Fahrendi Rizky Nasution,. Abdullah Ghifari. “Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency”. International Conference on Computer, Electronics and Communication Engineering. 2017
[16] Adiwijaya, Wisesty UN, E. Lisnawati, A. Aditsania, and Dana S. Kusumo. "Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification." Journal of Computer Science 14, no. 10, 2018.
[17] M.D. Purbolaksono, K. C. Widiastuti, Adiwijaya, M. S. Mubarok, and F. A. Ma’ruf. "Implementation of mutual information and bayes theorem for classification microarray data." In Journal of Physics: Conference Series, vol. 971, no. 1, p. 012011. IOP Publishing, 2018.
[18] Mabarti, I., Aditsania, A., "Implementation of Minimum Redundancy Maximum Relevance (MRMR) and Genetic Algorithm (GA) for Microarray Data Classification with C4.5 Decision Tree". Journal of Data Science and Its Applications, 3(1), 2020.
[19] Daeli, N.O.F, Adiwijaya. Sentiment analysis on movie reviews using Information gain and K-nearest neighbor. Journal of Data Science and Its Applications, 3(1), 2020
[20] Manuel, Bram, and Dodie Tricahyono. "Classifying electronic word of mouth and competitive position in online game industry." Journal of Data Science and Its Applications 1(1) pp. 20-27. 2018.