Comparison of Scenario Pre-processing Performance on Support Vector Machine and Naïve Bayes Algorithms for Sentiment Analysis

  • Nabila Valinka Pusean Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta
  • Novrido Charibaldi Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta
  • Budi Santosa Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta
Abstract views: 230 , PDF downloads: 211
Keywords: Sentiment Analysis, Preprocessing, Support Vector Machines, Naïve Bayes

Abstract

Television shows need a rating in their assessment, but public opinion is also required to complete it. Sentiment analysis is necessary for its completion. An essential step in sentiment analysis is pre-processing because, in public opinion, there are still many inappropriate writings. This study aims to compare the performance results using different pre-processing scenarios to get the best pre-processing performance on Support Vector Machine (SVM) and Naïve Bayes (NB) on sentiment analysis about the television show X Factor Indonesia. The stages used to start from literature study, problem analysis, design, data collection, pre-processing with two scenarios, word weighting with TF-IDF, classification using SVM and NB, then resulting accuracy from Confusion Matrix. The findings of this research are that optimal performance can be achieved using a comprehensive pre-processing scenario. This scenario should include the following steps: case-folding, removing emoji, cleansing, removing repetition characters, word normalization, negation handling, stopwords removal, stemming, and tokenization, with an accuracy of 79.44% on the SVM algorithm. This research shows that the complete pre-processing of the SVM algorithm is better in terms of accuracy, precision, recall, and F1-score.

 

Author Biographies

Nabila Valinka Pusean, Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta

 

 

Novrido Charibaldi, Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta

 

 

 

Budi Santosa, Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Yogyakarta

 

 

References

[1] W. E. Nurjanah, R. S. Perdana, and M. A. Fauzi, “Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1750–1757, 2017.
[2] T. F. Berlian, A. Herdiani, and W. Astuti, “Analisis Sentimen Opini Masyarakat Terhadap Acara Televisi pada Twitter dengan Retweet Analysis dan Naïve Bayes Classifier,” e-Proceeding Eng., vol. 6, no. 2, pp. 8660–8669, 2019.
[3] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.
[4] S. Alam and N. Yao, “The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis,” Comput. Math. Organ. Theory, vol. 25, no. 3, pp. 319–335, 2019, doi: 10.1007/s10588-018-9266-8.
[5] J. Cervantes, X. Li, and W. Yu, “SVM classification for large data sets by considering models of classes distribution,” Proc. - 2007 6th Mex. Int. Conf. Artif. Intell. Spec. Sess. MICAI 2007, pp. 51–60, 2007, doi: 10.1109/MICAI.2007.27.
[6] M. Birjali, M. Kasri, and A. Beni-Hssane, “A comprehensive survey on sentiment analysis: Approaches, challenges and trends,” Knowledge-Based Syst., vol. 226, p. 107134, 2021, doi: 10.1016/j.knosys.2021.107134.
[7] J. A. Septian, T. M. Fahrudin, and A. Nugroho, “Journal of Intelligent Systems and Computation 43,” pp. 43–49, 2019, [Online]. Available: https://t.co/9WloaWpfD5.
[8] F. Anugerah and A. Djunaidy, “Improving the Performance of Repeated Character Preprocessing in Recognizing Words in the Indonesian Sentiment Classification,” vol. 7, no. 9, pp. 1–9, 2017.
[9] G. A. Buntoro, “Analisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter,” INTEGER J. Inf. Technol., vol. 1, no. 1, pp. 32–41, 2017, [Online]. Available: https://www.researchgate.net/profile/Ghulam_Buntoro/publication/316617194_Analisis_Sentimen_Calon_Gubernur_DKI_Jakarta_2017_Di_Twitter/links/5907eee44585152d2e9ff992/Analisis-Sentimen-Calon-Gubernur-DKI-Jakarta-2017-Di-Twitter.pdf.
[10] V. S and J. R, “Text Mining: open Source Tokenization Tools – An Analysis,” Adv. Comput. Intell. An Int. J., vol. 3, no. 1, pp. 37–47, 2016, doi: 10.5121/acii.2016.3104.
[11] N. N. Wilim and R. S. Oetama, “Sentiment Analysis About Indonesian Lawyers Club Television Program Using K-Nearest Neighbor, Naïve Bayes Classifier, And Decision Tree,” IJNMT (International J. New Media Technol., vol. 8, no. 1, pp. 50–56, 2021, doi: 10.31937/ijnmt.v8i1.1965.
[12] R. Inglehart, “Chapter 10. From Elite-Directed To Elite-Directing Politics: The Role Of Cognitive Mobilization, Changing Gender Roles, And Changing Values,” Cult. Shift Adv. Ind. Soc., pp. 335–370, 2019, doi: 10.1515/9780691186740-014.
[13] A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,” Proc. 2019 8th Int. Conf. Syst. Model. Adv. Res. Trends, SMART 2019, pp. 266–270, 2020, doi: 10.1109/SMART46866.2019.9117512.
[14] B. M. Pintoko and K. M. L., “Analisis Sentimen Jasa Transportasi Online pada Twitter Menggunakan Metode Naive Bayes Classifier,” e-Proceeding Eng., vol. 5, no. 3, pp. 8121–8130, 2018.
[15] A. Prabhat and V. Khullar, “Sentiment classification on big data using Naïve bayes and logistic regression,” 2017 Int. Conf. Comput. Commun. Informatics, ICCCI 2017, 2017, doi: 10.1109/ICCCI.2017.8117734.
[16] Imam Fahrur Rozi, Imam Fahrur Rozi, and Muhammad Balya Iqbal Alfahmi, “PENGEMBANGAN APLIKASI ANALISIS SENTIMEN TWITTER MENGGUNAKAN METODE NAÏVE BAYES CLASSIFIER (Studi Kasus SAMSAT Kota Malang),” J. Inform. Polinema, pp. 149–154, 2018.
[17] D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J. Sains Komput. Inform., vol. 5, no. 2, pp. 697–711, 2021.
[18] Septian, J. A., Fahrudin, T. M. and Nugroho, A. (2019) ‘Journal of Intelligent Systems and Computation 43’, pp. 43–49. Available at: https://t.co/9WloaWpfD5.
Published
2023-01-28
How to Cite
Pusean, N. V., Charibaldi, N., & Santosa, B. (2023). Comparison of Scenario Pre-processing Performance on Support Vector Machine and Naïve Bayes Algorithms for Sentiment Analysis. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 8(1), 57-63. https://doi.org/10.25139/inform.v8i1.5667
Section
Articles