Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

Main Article Content

Nelly Indriani Widiastuti
Ednawati Rainarli
Kania Evita Dewi

Abstract

Klasifikasi adalah proses pengelompokkan objek yang memiliki karakteristik atau ciri yang sama ke dalam beberapa kelas. Klasifikasi dokumen secara otomatis dapat dilakukan dengan menggunakan ciri atau fitur kata yang muncul pada dokumen latih. Jumlah dokumen yang besar dan banyak mengakibatkan jumlah kata yang muncul sebagai fitur akan bertambah. Oleh karena itu, peringkasan dipilih untuk mereduksi jumlah kata yang digunakan dalam proses klasifikasi. Untuk proses klasifikasi digunakan metode Support Vector Machine (SVM) untuk multikelas. SVM dipilih karena dianggap memiliki reputasi yang baik dalam klasifikasi. Penelitian ini menguji penggunaan ringkasan sebagai seleksi fitur dalam klasifikasi dokumen. Peringkasan menggunakan kompresi 50%. Hasil yang diperoleh menunjukkan bahwa proses peringkasan tidak mempengaruhi nilai akurasi dari klasifikasi dokumen yang menggunakan SVM. Akan tetapi, penggunaan peringkasan berpengaruh pada peningkatan hasil akurasi dari metode klasifikasi Simple Logistic Classifier (SLC). Hasil pengujian metode klasifikasi menunjukkan bahwa penggunaan metode Naïve Bayes Multinomial (NBM) menghasilkan akurasi yang lebih baik dari pada metode SVM.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
N. Widiastuti, E. Rainarli, and K. Dewi, “Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen”, INFOTEL, vol. 9, no. 4, pp. 416-421, Nov. 2017.
Section
Articles

References

[1] A. Kotcz, V. Prabakarmurthi, and J. Kalita, "Summarization as Feature Selection for Text Categorization," in CIKM’01, 2001, pp. 365–370.
[2] E. Anguiano-hernández and L. Villaseñor-pineda, "Summarization as Feature Selection for Document Categorization on Small Datasets," pp. 39–44, 2010.
[3] E. Al-thwaib, "Text Summarization as Feature Selection for Arabic Text Classification," vol. 4, no. 7, pp. 101–104, 2014.
[4] S. Harer, "Sentiment Classification and Feature based Summarization of Movie Reviews in Mobile Environment," vol. 100, no. 1, pp. 30–35, 2014.
[5] P. Bharambe and P. S. Deokar, "Classification and Summarization on rating of Mobiles features," vol. 5, no. 9, pp. 1–5, 2015.
[6] H. Jeong, Y. Ko, and J. Seo, "How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework," Expert Syst. Appl., 2016.
[7] K. E. Dewi, N. Indriani, and E. Rainarli, "Evaluasi Sentence Extraction pada Peringkasan Dokumen Otomatis," 2017.
[8] E. Chisholm and T. G. Kolda, "NEW TERM WEIGHTING FORMULAS FOR THE VECTOR SPACE METHOD IN INFORMATION RETRIEVAL," Comput. Sci. Math. Div., pp. 1–16, 1999.
[9] J. D. Rajaraman, A.; Ullman, "Data Mining," in Mining of Massive Datasets, Cambridge University Press, 2011, pp. 1–17.
[10] J. Beel, B. Gipp, S. Langer, C. Breitinger, and C. Breitinger, "Research-paper recommender systems?: a literature survey," Int. J. Digit. Libr., no. June, 2015.
[11] K. Spärck-Jones, "‘What Might be in Summary?,’" Inf. Retrieval’93, p. 9–26., 1993.
[12] L. Auria and R. A. Moro, "Support Vector Machines (SVM) as a Technique for Solvency Analysis," DIW Berlin Discuss. Pap., vol. 811, no. August, 2008.
[13] H. Brcher, G. Knolmayer, and M.-A. Mittermayer, "Document classification methods for organizing explicit knowledge," in the Third European Conference on Organizational Knowledge, Learning, and Capabilities, 2002.
[14] A. Govada, S. Ranjani, A. Viswanathan, and S. K. Sahay, "A Novel Approach to Distributed Multi-Class SVM," Zuarinagar, Goa, PIN - 403726, India, 2011.
[15] S. Chakrabarti, S. Roy, and M. V. Soundalgekar, "Fast and accurate text classification via multiple linear discriminant projections," in Proceedings of the 28th VLDB Conference, 2002.
[16] M. Ring and B. M. Eskofier, "An approximation of the Gaussian RBF kernel for efficient classification with SVMs," Pattern Recognit. Lett., 2016.
[17] B. Ghaddar and J. Naoum-sawaya, "High Dimensional Data Classification and Feature Selection using Support Vector Machines," Eur. J. Oper. Res., 2017.
[18] X. Ju, Y. Tian, D. Liu, and Z. Qi, Nonparallel Hyperplanes Support Vector Machine for Multi-class Classification, vol. 51. Elsevier Masson SAS, 2015.
[19] B. Aisen, "A Comparison of Multiclass SVM Methods," 2006. [Online]. Available: http://courses.media.mit.edu/2006fall/mas622j/Projects/aisen-project/. [Accessed: 08-Oct-2017].
[20] R. Jindal, "Techniques for text classification?: Literature review and current trends," Webology, vol. 12, no. 2, pp. 1–28, 2015.
[21] C. Hsu and C. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. NEURAL NETWORKS, vol. 13, no. 2, pp. 415–425, 2002.
[22] U. Kreßel, "Pairwise classification and support vector machines"- in Advances in Kernel Methods—Support Vector Learning. Cambridge, MA: MIT Press, 1999.
[23] S. Raschka, "Introduction and Theory." pp. 1–20, 2014.
[24] X. Zhu, "CS838-1 Advanced NLP?: Text Categorization with Logistic Regression," no. 3. pp. 1–3, 2007.
[25] T. Joachims, "Text Categorization with Support Vector Machines?: Learning with Many Relevant Features," in Machine Learning: ECML-98, 1998, pp. 2–7.