Data preprocessing approach  for machine learning-based sentiment classification

Sunneng Sandino Berutu; Haeni Budiati; Jatmika Jatmika; Fornieli Gulo

doi:10.20895/infotel.v15i4.1030

view PDF

Published Nov 13, 2023

DOI https://doi.org/10.20895/infotel.v15i4.1030

Sunneng Sandino Berutu

Immanuel Christian University Yogyakarta, Indonesia

Haeni Budiati

Immanuel Christian University Yogyakarta, Indonesia

Jatmika Jatmika

Immanuel Christian University Yogyakarta, Indonesia

Fornieli Gulo

Immanuel Christian University Yogyakarta, Indonesia

Abstract

Public sentiment regarding a particular issue, product, activity, or organization can be measured and monitored with an application based on artificial intelligence. The data come from comments circulating on social media. However, the rules for writing comments on social media have yet to be standardized, so non-standard words often appear in these comments. Non-standard words affect the determination of sentiment into positive, negative, and neutral categories. Therefore, this study proposes a data preprocessing approach by inserting the Rabin-Karp algorithm to improve non-standard words. This research consists of several stages, namely crawling data, data preprocessing, feature extraction, model development (based on NaÃ¯ve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) methods), and analysis of the results. The experimental results showed that the proposed approach influences the determination of the sentiment category composition. Then, model testing results showed that all models obtain the highest value in the Positive category for the precision parameter with a value 1. All models in the Neutral category obtain the highest value for the recall parameter, almost reaching 1. All models in the Neutral category achieve the highest value of the f1-score parameter, with an average value of 0.95. In general, the results of the performance analysis of the classification model showed that the NB and SVM-based models have better performance than the DT method.

Downloads

Download data is not yet available.

How to Cite

[1]

S. Berutu, H. Budiati, J. Jatmika, and F. Gulo, “Data preprocessing approach for machine learning-based sentiment classification”, INFOTEL, vol. 15, no. 4, pp. 317-325, Nov. 2023.

Issue

Vol 15 No 4 (2023): November 2023

Section

Informatics

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details