Classification Based on Configuration Objects by Using Procrustes Analysis
Main Article Content
Abstract
Classification is one of the data mining topics that will predict an object to go into a certain group. The prediction process can be performed by using similarity measures, classification trees, or regression. On the other hand, Procrustes refers to a technique of matching two configurations that have been implemented for outlier detection. Based on the result, Procrustes has a potential to tackle the misclassification problem when the outliers are assumed as the misclassified object. Therefore, the Procrustes classification algorithm (PrCA) and Procrustes nearest neighbor classification algorithm (PNNCA) were proposed in this paper. The results of those algorithms had been compared to the classical classification algorithms, namely k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), AdaBoost (AB), Random Forest (RF), Logistic Regression (LR), and Ridge Regression (RR). The data used were iris, cancer, liver, seeds, and wine dataset. The minimum and maximum accuracy values obtained by the PrCA algorithm were 0.610 and 0.925, while the PNNCA were 0.610 and 0.963. PrCA was generally better than k-NN, SVM, and AB. Meanwhile, PNNCA was generally better than k-NN, SVM, AB, and RF. Based on the results, PrCA and PNNCA certainly deserve to be proposed as a new approach in the classification process.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
References
[2] A. M. Jiménez-Carvelo, A. González-Casado, M. G. Bagur-González, and L. Cuadros-Rodríguez, “Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity – A review,” Food Res. Int., vol. 122, no. February, pp. 25–39, 2019, doi: 10.1016/j.foodres.2019.03.063.
[3] V. Rajeswari and K. Arunesh, “Analysing soil data using data mining classification techniques,” Indian J. Sci. Technol., vol. 9, no. 19, 2016, doi: 10.17485/ijst/2016/v9i19/93873.
[4] E. M. S. Djodiltachoumy, “Analysis of Data Mining Techniques for Agriculture Data,” Indian J. Sci. Technol., vol. 4, no. 2, pp. 1311–1313, 2016, doi: 10.17485/ijst/2016/v9i38/101962.
[5] H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis,” Procedia Comput. Sci., vol. 83, no. Fams, pp. 1064–1069, 2016, doi: 10.1016/j.procs.2016.04.224.
[6] S. Anwar Lashari, R. Ibrahim, N. Senan, and N. S. A. M. Taujuddin, “Application of Data Mining Techniques for Medical Data Classification: A Review,” MATEC Web Conf., vol. 150, pp. 1–6, 2018, doi: 10.1051/matecconf/201815006003.
[7] R. Prasetya and A. Ridwan, “Data Mining Application on Weather Prediction Using Classification Tree, Naïve Bayes and K-Nearest Neighbor Algorithm With Model Testing of Supervised Learning Probabilistic Brier Score, Confusion Matrix and ROC,” J. Appl. Commun. Inf. Technol., vol. 4, no. 2, pp. 25–33, 2019.
[8] A. Contreras-Valdes, J. P. Amezquita-Sanchez, D. Granados-Lieberman, and M. Valtierra-Rodriguez, “Predictive data mining techniques for fault diagnosis of electric equipment: A review,” Appl. Sci., vol. 10, no. 3, pp. 1–24, 2020, doi: 10.3390/app10030950.
[9] J. M. and C. A., “Application of Data Mining Classification in Employee Performance Prediction,” Int. J. Comput. Appl., vol. 146, no. 7, pp. 28–35, 2016, doi: 10.5120/ijca2016910883.
[10] A. Ashraf, S. Anwer, and M. G. Khan, “A Comparative Study of Predicting Student ’ s Performance by use of Data A Comparative Study of Predicting Student ’ s Performance by use of Data Mining Techniques,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 44 No.1, no. October, pp. 122–136, 2018.
[11] N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, 1992, doi: 10.1080/00031305.1992.10475879.
[12] C. C. V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, pp. 273–297, 1995, doi: 10.1109/64.163674.
[13] N. K. K. N. S. P. K. Y. V. N. H. Deekshitulu, “Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Maize Expert System,” Int. J. Inf. Sci. Tech., vol. 2, no. 3, pp. 63–75, 2012, doi: 10.1523/JNEUROSCI.4623-04.2005.
[14] T. K. Ho, “Random decision forests,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, vol. 1, pp. 278–282, 1995, doi: 10.1109/ICDAR.1995.598994.
[15] J. Cramer, “the origin of logistic regression,” 2002. [Online]. Available: https://papers.tinbergen.nl/02119.pdf.
[16] K. Rakesh and P. N. Suganthan, “An Ensemble of Kernel Ridge Regression for Multi-class Classification,” Procedia Comput. Sci., vol. 108, pp. 375–383, 2017, doi: 10.1016/j.procs.2017.05.109.
[17] T. Bakhtiar and Siswadi, “Orthogonal procrustes analysis: Its transformation arrangement and minimal distance,” Int. J. Appl. Math. Stat., vol. 20, no. M11, pp. 16–24, 2011.
[18] I. L. D. K. V. Mardia, Statistical Shape Analysis with applications in R, 2nd ed. 2016.
[19] T. S. Bakhtiar, “ON THE SYMMETRICAL PROPERTY OF PROCRUSTES MEASURE OF DISTANCE,” vol. 99, no. 3, pp. 315–324, 2015.
[20] A. Muslim and T. Bakhtiar, “Variable selection using principal component and procrustes analyses and its application in educational data,” J. Asian Sci. Res., vol. 2, no. 12, pp. 856–865, 2012, [Online]. Available: http://www.aessweb.com/pdf-files/856-865.pdf.
[21] Siswadi and T. Bakhtiar, “Goodness-of-fit of biplots via procrustes analysis,” Far East J. Math. Sci., vol. 52, no. 2, pp. 191–201, 2011.
[22] Siswadi, T. Bakhtiar, and R. Maharsi, “Procrustes analysis and the goodness-of-fit of biplots: Some thoughts and findings,” Appl. Math. Sci., vol. 6, no. 69–72, pp. 3579–3590, 2012.
[23] R. Ananda, Siswadi, and T. Bakhtiar, “Goodness-of-Fit of the Imputation Data in Biplot Analysis,” Far East J. Math. Sci., vol. 103, no. 11, pp. 1839–1849, 2018, doi: 10.17654/ms103111839.
[24] R. Ananda, A. R. Dewi, and N. Nurlaili, “a Comparison of Clustering By Imputation and Special Clustering Algorithms on the Real Incomplete Data,” J. Ilmu Komput. dan Inf., vol. 13, no. 2, pp. 65–75, 2020, doi: 10.21609/jiki.v13i2.818.
[25] F. Novika and T. Bakhtiar, “The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection,” Int. J. Eng. Manag. Res. Page Number, no. 1, pp. 194–200, 2018, [Online]. Available: www.ijemr.net.
[26] K. Iwata, Shape clustering as a type of procrustes analysis, vol. 11304 LNCS. Springer International Publishing, 2018.
[27] J. M. F. Ten Berge, “The rigid orthogonal procrustes rotation problem,” Psychometrika, vol. 71, no. 1, pp. 201–205, 2006, doi: 10.1007/s11336-004-1160-5.
[28] H. Azis, P. Purnawansyah, F. Fattah, and I. P. Putri, “Performa Klasifikasi K-NN dan Cross Validation Pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020, doi: 10.33096/ilkom.v12i2.507.81-86.
[29] M. Taboga, Lectures on Probability Theory and Mathematical Statistics, 3rd ed. CreateSpace Independent Publishing Platform, 2017.