Data preprocessing approach for machine learning-based sentiment classification

Main Article Content

Sunneng Sandino Berutu
Haeni Budiati
Jatmika Jatmika
Fornieli Gulo

Abstract

Public sentiment regarding a particular issue, product, activity, or organization can be measured and monitored with an application based on artificial intelligence. The data come from comments circulating on social media. However, the rules for writing comments on social media have yet to be standardized, so non-standard words often appear in these comments. Non-standard words affect the determination of sentiment into positive, negative, and neutral categories. Therefore, this study proposes a data preprocessing approach by inserting the Rabin-Karp algorithm to improve non-standard words. This research consists of several stages, namely crawling data, data preprocessing, feature extraction, model development (based on Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) methods), and analysis of the results. The experimental results showed that the proposed approach influences the determination of the sentiment category composition. Then, model testing results showed that all models obtain the highest value in the Positive category for the precision parameter with a value 1. All models in the Neutral category obtain the highest value for the recall parameter, almost reaching 1. All models in the Neutral category achieve the highest value of the f1-score parameter, with an average value of 0.95. In general, the results of the performance analysis of the classification model showed that the NB and SVM-based models have better performance than the DT method.  

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
S. Berutu, H. Budiati, J. Jatmika, and F. Gulo, “Data preprocessing approach for machine learning-based sentiment classification”, INFOTEL, vol. 15, no. 4, pp. 317-325, Nov. 2023.
Section
Informatics