On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier
Main Article Content
Abstract
Cancer is a disease that can affect all organs of humans. Based on data from the World Health Organization (WHO) fact sheet in 2018, cancer deaths have reached 9.6 million. One known way to detect cancer that is with Microarray Technique, but the microarray data have large dimensions due to the number of features that are very much compared to the number of samples. Therefore, dimension reduction should be made to produce optimum accuracy. In this paper, we compare Minimum Redundancy Maximum Relevance (MRMR) and Least Absolute Shrinkage and Selection Operator (LASSO) to reduce the dimension of microarray data. Moreover, by using Random Forest (RF) Classifier, the performance of classification (cancer detection) is compared. Based on the simulation, it can be concluded that LASSO is better than MRMR because it can produce an evaluation of 100% in lung and ovarian cancer, 92% colon cancer, 93% prostate tumor, and 83% central nervous system.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
References
[2] Rebecca L. Siegel, MPH; Kimberly D. Miller, MPH; Ahmedin Jemal, DVM, PhD; CA Cancer J Clin, American Cancer Society, 69:7-34;2019.
[3] Aydadenta, Husna, and Adiwijaya Adiwijaya. "A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest." Journal of Information Processing Systems 14.5, pp. 1167-1175, 2018
[4] Ma’ruf, Firda Aminy, and Untari Novia Wisesty. "Analysis of the influence of Minimum Redundancy Maximum Relevance as dimensionality reduction method on cancer classification based on microarray data using Support Vector Machine classifier." In Journal of Physics: Conference Series, vol. 1192, no. 1, p. 012011. IOP Publishing, 2019
[5] Z. M. Hira and D. F. Gillies, “A review of feature selection and feature extraction methods applied on microarray data”, Advances in Bioinformatics, vol. 2015, article ID, 198363, 2015.
[6] Somnath, D and Susmita, D., “Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO”. Journal of Biometrics, Maret vol 63 No.1, pp.259-271. USA. 2007
[7] Zhu. C, Gao. D, “Influence of Data Preprocessing”, Journal of Computing Science and Engineering, vol.10, No.2, pp. 51-57, June, 2016.
[8] Li. Z, Zhou. X, Dai. Z, Zou. X, “Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm”, BMC Bioinformatics, 2010.
[9] Ding. C, Hanchuan Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data”. Journal of bioinformatics and computational biology 3 (02) pp.185-205. 2005
[10] National Human Genome Research Institute. [Online] https://www.genome.gov/about-genomics/fact-sheets/DNA-Microarray-Technology [Accesed 24 Oktober 2019]
[11] Adiwijaya, Aulia MN, Mubarok MS, and Novia WU and Nhita, F. A, " comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters pronounciation classification system. Information and Communication Technology (ICoIC7)." In 5th International Conference on pp, pp. 1-5. 2017.
[12] Gerard Biau, “Analysis of a Random Forests Model”, Journal of Machine Learning Research 13 (2012) 1063-1095.
[13] Farmani, D.K, Kencana. N, Sukarsa.G, ”Perbandingan Analisis Least Absolute Shrinkage and Selection Operator dan Partial Least Squares”, e-Jurnal Matematika, Vol. 1, No. 1, Agustus 2012, 75-80.
[14] Breiman. L, “Random Forest”, Machine Learning, Kluwer Academic Publishers, Manufactured in The Netherlands, 45, 5–32, 2001.
[15] Muhammad Murtadha ramadhan,. Imas Sukaesih Sitanggang,. Fahrendi Rizky Nasution,. Abdullah Ghifari. “Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency”. International Conference on Computer, Electronics and Communication Engineering. 2017
[16] Adiwijaya, Wisesty UN, E. Lisnawati, A. Aditsania, and Dana S. Kusumo. "Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification." Journal of Computer Science 14, no. 10, 2018.
[17] M.D. Purbolaksono, K. C. Widiastuti, Adiwijaya, M. S. Mubarok, and F. A. Ma’ruf. "Implementation of mutual information and bayes theorem for classification microarray data." In Journal of Physics: Conference Series, vol. 971, no. 1, p. 012011. IOP Publishing, 2018.
[18] Mabarti, I., Aditsania, A., "Implementation of Minimum Redundancy Maximum Relevance (MRMR) and Genetic Algorithm (GA) for Microarray Data Classification with C4.5 Decision Tree". Journal of Data Science and Its Applications, 3(1), 2020.
[19] Daeli, N.O.F, Adiwijaya. Sentiment analysis on movie reviews using Information gain and K-nearest neighbor. Journal of Data Science and Its Applications, 3(1), 2020
[20] Manuel, Bram, and Dodie Tricahyono. "Classifying electronic word of mouth and competitive position in online game industry." Journal of Data Science and Its Applications 1(1) pp. 20-27. 2018.