Data preprocessing approach for machine learning-based sentiment classification
Main Article Content
Abstract
Public sentiment regarding a particular issue, product, activity, or organization can be measured and monitored with an application based on artificial intelligence. The data come from comments circulating on social media. However, the rules for writing comments on social media have yet to be standardized, so non-standard words often appear in these comments. Non-standard words affect the determination of sentiment into positive, negative, and neutral categories. Therefore, this study proposes a data preprocessing approach by inserting the Rabin-Karp algorithm to improve non-standard words. This research consists of several stages, namely crawling data, data preprocessing, feature extraction, model development (based on Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) methods), and analysis of the results. The experimental results showed that the proposed approach influences the determination of the sentiment category composition. Then, model testing results showed that all models obtain the highest value in the Positive category for the precision parameter with a value 1. All models in the Neutral category obtain the highest value for the recall parameter, almost reaching 1. All models in the Neutral category achieve the highest value of the f1-score parameter, with an average value of 0.95. In general, the results of the performance analysis of the classification model showed that the NB and SVM-based models have better performance than the DT method.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work