Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen
Main Article Content
Abstract
Klasifikasi adalah proses pengelompokkan objek yang memiliki karakteristik atau ciri yang sama ke dalam beberapa kelas. Klasifikasi dokumen secara otomatis dapat dilakukan dengan menggunakan ciri atau fitur kata yang muncul pada dokumen latih. Jumlah dokumen yang besar dan banyak mengakibatkan jumlah kata yang muncul sebagai fitur akan bertambah. Oleh karena itu, peringkasan dipilih untuk mereduksi jumlah kata yang digunakan dalam proses klasifikasi. Untuk proses klasifikasi digunakan metode Support Vector Machine (SVM) untuk multikelas. SVM dipilih karena dianggap memiliki reputasi yang baik dalam klasifikasi. Penelitian ini menguji penggunaan ringkasan sebagai seleksi fitur dalam klasifikasi dokumen. Peringkasan menggunakan kompresi 50%. Hasil yang diperoleh menunjukkan bahwa proses peringkasan tidak mempengaruhi nilai akurasi dari klasifikasi dokumen yang menggunakan SVM. Akan tetapi, penggunaan peringkasan berpengaruh pada peningkatan hasil akurasi dari metode klasifikasi Simple Logistic Classifier (SLC). Hasil pengujian metode klasifikasi menunjukkan bahwa penggunaan metode Naïve Bayes Multinomial (NBM) menghasilkan akurasi yang lebih baik dari pada metode SVM.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
References
[2] E. Anguiano-hernández and L. Villaseñor-pineda, "Summarization as Feature Selection for Document Categorization on Small Datasets," pp. 39–44, 2010.
[3] E. Al-thwaib, "Text Summarization as Feature Selection for Arabic Text Classification," vol. 4, no. 7, pp. 101–104, 2014.
[4] S. Harer, "Sentiment Classification and Feature based Summarization of Movie Reviews in Mobile Environment," vol. 100, no. 1, pp. 30–35, 2014.
[5] P. Bharambe and P. S. Deokar, "Classification and Summarization on rating of Mobiles features," vol. 5, no. 9, pp. 1–5, 2015.
[6] H. Jeong, Y. Ko, and J. Seo, "How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework," Expert Syst. Appl., 2016.
[7] K. E. Dewi, N. Indriani, and E. Rainarli, "Evaluasi Sentence Extraction pada Peringkasan Dokumen Otomatis," 2017.
[8] E. Chisholm and T. G. Kolda, "NEW TERM WEIGHTING FORMULAS FOR THE VECTOR SPACE METHOD IN INFORMATION RETRIEVAL," Comput. Sci. Math. Div., pp. 1–16, 1999.
[9] J. D. Rajaraman, A.; Ullman, "Data Mining," in Mining of Massive Datasets, Cambridge University Press, 2011, pp. 1–17.
[10] J. Beel, B. Gipp, S. Langer, C. Breitinger, and C. Breitinger, "Research-paper recommender systems?: a literature survey," Int. J. Digit. Libr., no. June, 2015.
[11] K. Spärck-Jones, "‘What Might be in Summary?,’" Inf. Retrieval’93, p. 9–26., 1993.
[12] L. Auria and R. A. Moro, "Support Vector Machines (SVM) as a Technique for Solvency Analysis," DIW Berlin Discuss. Pap., vol. 811, no. August, 2008.
[13] H. Brcher, G. Knolmayer, and M.-A. Mittermayer, "Document classification methods for organizing explicit knowledge," in the Third European Conference on Organizational Knowledge, Learning, and Capabilities, 2002.
[14] A. Govada, S. Ranjani, A. Viswanathan, and S. K. Sahay, "A Novel Approach to Distributed Multi-Class SVM," Zuarinagar, Goa, PIN - 403726, India, 2011.
[15] S. Chakrabarti, S. Roy, and M. V. Soundalgekar, "Fast and accurate text classification via multiple linear discriminant projections," in Proceedings of the 28th VLDB Conference, 2002.
[16] M. Ring and B. M. Eskofier, "An approximation of the Gaussian RBF kernel for efficient classification with SVMs," Pattern Recognit. Lett., 2016.
[17] B. Ghaddar and J. Naoum-sawaya, "High Dimensional Data Classification and Feature Selection using Support Vector Machines," Eur. J. Oper. Res., 2017.
[18] X. Ju, Y. Tian, D. Liu, and Z. Qi, Nonparallel Hyperplanes Support Vector Machine for Multi-class Classification, vol. 51. Elsevier Masson SAS, 2015.
[19] B. Aisen, "A Comparison of Multiclass SVM Methods," 2006. [Online]. Available: http://courses.media.mit.edu/2006fall/mas622j/Projects/aisen-project/. [Accessed: 08-Oct-2017].
[20] R. Jindal, "Techniques for text classification?: Literature review and current trends," Webology, vol. 12, no. 2, pp. 1–28, 2015.
[21] C. Hsu and C. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. NEURAL NETWORKS, vol. 13, no. 2, pp. 415–425, 2002.
[22] U. Kreßel, "Pairwise classification and support vector machines"- in Advances in Kernel Methods—Support Vector Learning. Cambridge, MA: MIT Press, 1999.
[23] S. Raschka, "Introduction and Theory." pp. 1–20, 2014.
[24] X. Zhu, "CS838-1 Advanced NLP?: Text Categorization with Logistic Regression," no. 3. pp. 1–3, 2007.
[25] T. Joachims, "Text Categorization with Support Vector Machines?: Learning with Many Relevant Features," in Machine Learning: ECML-98, 1998, pp. 2–7.