Comparison of Scenario Pre-processing Performance on Support Vector Machine and NaÃ¯ve Bayes Algorithms for Sentiment Analysis

doi:10.25139/inform.v8i1.5667

Authors

DOI:

https://doi.org/10.25139/inform.v8i1.5667

Keywords:

Sentiment Analysis, Preprocessing, Support Vector Machines, NaÃ¯ve Bayes

Abstract

Television shows need a rating in their assessment, but public opinion is also required to complete it. Sentiment analysis is necessary for its completion. An essential step in sentiment analysis is pre-processing because, in public opinion, there are still many inappropriate writings. This study aims to compare the performance results using different pre-processing scenarios to get the best pre-processing performance on Support Vector Machine (SVM) and NaÃ¯ve Bayes (NB) on sentiment analysis about the television show X Factor Indonesia. The stages used to start from literature study, problem analysis, design, data collection, pre-processing with two scenarios, word weighting with TF-IDF, classification using SVM and NB, then resulting accuracy from Confusion Matrix. The findings of this research are that optimal performance can be achieved using a comprehensive pre-processing scenario. This scenario should include the following steps: case-folding, removing emoji, cleansing, removing repetition characters, word normalization, negation handling, stopwords removal, stemming, and tokenization, with an accuracy of 79.44% on the SVM algorithm. This research shows that the complete pre-processing of the SVM algorithm is better in terms of accuracy, precision, recall, and F1-score.

References

[1] W. E. Nurjanah, R. S. Perdana, and M. A. Fauzi, â€œAnalisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet,â€ J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1750â€“1757, 2017.
[2] T. F. Berlian, A. Herdiani, and W. Astuti, â€œAnalisis Sentimen Opini Masyarakat Terhadap Acara Televisi pada Twitter dengan Retweet Analysis dan NaÃ¯ve Bayes Classifier,â€ e-Proceeding Eng., vol. 6, no. 2, pp. 8660â€“8669, 2019.
[3] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, â€œPengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),â€ J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.
[4] S. Alam and N. Yao, â€œThe impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis,â€ Comput. Math. Organ. Theory, vol. 25, no. 3, pp. 319â€“335, 2019, doi: 10.1007/s10588-018-9266-8.
[5] J. Cervantes, X. Li, and W. Yu, â€œSVM classification for large data sets by considering models of classes distribution,â€ Proc. - 2007 6th Mex. Int. Conf. Artif. Intell. Spec. Sess. MICAI 2007, pp. 51â€“60, 2007, doi: 10.1109/MICAI.2007.27.
[6] M. Birjali, M. Kasri, and A. Beni-Hssane, â€œA comprehensive survey on sentiment analysis: Approaches, challenges and trends,â€ Knowledge-Based Syst., vol. 226, p. 107134, 2021, doi: 10.1016/j.knosys.2021.107134.
[7] J. A. Septian, T. M. Fahrudin, and A. Nugroho, â€œJournal of Intelligent Systems and Computation 43,â€ pp. 43â€“49, 2019, [Online]. Available: https://t.co/9WloaWpfD5.
[8] F. Anugerah and A. Djunaidy, â€œImproving the Performance of Repeated Character Preprocessing in Recognizing Words in the Indonesian Sentiment Classification,â€ vol. 7, no. 9, pp. 1â€“9, 2017.
[9] G. A. Buntoro, â€œAnalisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter,â€ INTEGER J. Inf. Technol., vol. 1, no. 1, pp. 32â€“41, 2017, [Online]. Available: https://www.researchgate.net/profile/Ghulam_Buntoro/publication/316617194_Analisis_Sentimen_Calon_Gubernur_DKI_Jakarta_2017_Di_Twitter/links/5907eee44585152d2e9ff992/Analisis-Sentimen-Calon-Gubernur-DKI-Jakarta-2017-Di-Twitter.pdf.
[10] V. S and J. R, â€œText Mining: open Source Tokenization Tools â€“ An Analysis,â€ Adv. Comput. Intell. An Int. J., vol. 3, no. 1, pp. 37â€“47, 2016, doi: 10.5121/acii.2016.3104.
[11] N. N. Wilim and R. S. Oetama, â€œSentiment Analysis About Indonesian Lawyers Club Television Program Using K-Nearest Neighbor, NaÃ¯ve Bayes Classifier, And Decision Tree,â€ IJNMT (International J. New Media Technol., vol. 8, no. 1, pp. 50â€“56, 2021, doi: 10.31937/ijnmt.v8i1.1965.
[12] R. Inglehart, â€œChapter 10. From Elite-Directed To Elite-Directing Politics: The Role Of Cognitive Mobilization, Changing Gender Roles, And Changing Values,â€ Cult. Shift Adv. Ind. Soc., pp. 335â€“370, 2019, doi: 10.1515/9780691186740-014.
[13] A. M. Rahat, A. Kahir, and A. K. M. Masum, â€œComparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,â€ Proc. 2019 8th Int. Conf. Syst. Model. Adv. Res. Trends, SMART 2019, pp. 266â€“270, 2020, doi: 10.1109/SMART46866.2019.9117512.
[14] B. M. Pintoko and K. M. L., â€œAnalisis Sentimen Jasa Transportasi Online pada Twitter Menggunakan Metode Naive Bayes Classifier,â€ e-Proceeding Eng., vol. 5, no. 3, pp. 8121â€“8130, 2018.
[15] A. Prabhat and V. Khullar, â€œSentiment classification on big data using NaÃ¯ve bayes and logistic regression,â€ 2017 Int. Conf. Comput. Commun. Informatics, ICCCI 2017, 2017, doi: 10.1109/ICCCI.2017.8117734.
[16] Imam Fahrur Rozi, Imam Fahrur Rozi, and Muhammad Balya Iqbal Alfahmi, â€œPENGEMBANGAN APLIKASI ANALISIS SENTIMEN TWITTER MENGGUNAKAN METODE NAÃVE BAYES CLASSIFIER (Studi Kasus SAMSAT Kota Malang),â€ J. Inform. Polinema, pp. 149â€“154, 2018.
[17] D. Normawati and S. A. Prayogi, â€œImplementasi NaÃ¯ve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,â€ J. Sains Komput. Inform., vol. 5, no. 2, pp. 697â€“711, 2021.
[18] Septian, J. A., Fahrudin, T. M. and Nugroho, A. (2019) â€˜Journal of Intelligent Systems and Computation 43â€™, pp. 43â€“49. Available at: https://t.co/9WloaWpfD5.

Comparison of Scenario Pre-processing Performance on Support Vector Machine and NaÃ¯ve Bayes Algorithms for Sentiment Analysis

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Information