Comparison of the Effect of Word Normalization on Naïve Bayes Classifier and K-Nearest Neighbor Methods for Sentiment Analysis

  • Novrido Charibaldi Informatics Department, Universitas Pembangunan Nasional Veteran Yogyakarta
  • Atania Harfiani Informatics Department, Universitas Pembangunan Nasional Veteran Yogyakarta
  • Oliver Samuel Simanjuntak Informatics Department, Universitas Pembangunan Nasional Veteran Yogyakarta,
Abstract views: 151 , PDF downloads: 131
Keywords: Sentiment Analysis, Word Normalization, Naïve Bayes Classifier, K-Nearest Neighbor, BPJS Kesehatan


In the pre-processing stage of sentiment analysis, there are several essential steps, one of which is word normalization, which is converting non-standard words into standard words. However, some research on sentiment analysis generally does not go through the word normalization stage, which can affect accuracy. This study aims to compare the effect of word normalization on the Naive Bayes Classifier and K-Nearest Neighbor methods for sentiment analysis of public opinion on the Agency Social Security Administrator for Health (BPJS Kesehatan). Gathering the data, labeling it, pre-processing it with two different scenarios, word weighting it with TF-IDF, classifying it using Naive Bayes Classifier and K-Nearest Neighbor, and lastly computing the accuracy of the Confusion Matrix are the steps that are involved. As a result of these discovered fact, the most superior accuracy results are obtained by the Naive Bayes Classifier method 1st scenario, namely by using word normalization at the pre-processing stage and getting an accuracy of 87.14%. This research shows that the Naive Bayes Classifier method with word normalization produces better accuracy, precision, recall, and F1-score.


Hana, K. M., Adiwijaya, Al Faraby, S., & Bramantoro, A. (2020). Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines. August. ResearchGate.

Aribowo, A. S. (2018). Analisis Sentimen Publik pada Program Kesehatan Masyarakat Menggunakan Twitter Opinion Mining. Seminar Nasional Informatika Medis (Snimed), 17–23.

Rasyada, I., Setyowati, Y., Barakbah, A., & Tafaqquh Fiddin, M. (2020). Sentiment Analysis of BPJS Kesehatan's Services Based on Affective Models. IEEE Xplore, 549–556.

Kusumawati, N., Maspupah, U., F, D. S. R., & Hamzah, A. (2022). Comparing Algorithm for Sentiment Analysis in Healthcare and Social Security Agency ( BPJS Kesehatan ). Techno Nusa Mandiri: Journal of Computing and Information Technology, 19(1), 31–37.

Karim, A. (2021). Analisis Sentimen pada Komentar Sosial Media Instagram Layanan Kesehatan BPJS Menggunakan Naïve Bayes Classifier.

Fahlapi, R., & Rianto, Y. (2020). Twitter Comment Predictions on Dues Changes BPJS Health in 2020. Jurnal dan Penelitian Teknik Informatika, 5(1), 170–183.

Saputra, Irwansyah & Kristiyanti, Dinar Ajeng. (2022). Machine Learning untuk Pemula. Penerbit Informatika.

Rish, I. (2014). An Empirical Study of The Naïve Bayes Classifier. T.J Watson Research Center.

Permana, T., Siregar, A. M., Masruriyah, A. F. N., & Juwita, A. R. (2020). Perbandingan Hasil Prediksi Kredit Macet pada Koperasi. Conference on Innovation and Application of Science and Technology, 737–746

Antinasari, P., Perdana, R. S., & Fauzi, M. A. (2017). Analisis Sentimen tentang Opini Film pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes dengan Perbaikan Kata Tidak Baku. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 1(12), 1733–1741.

Mega, P., Dharmapatni, N., Luh, N., & Merawati, P. (2020). Penerapan Algoritma Support Vector Machine dalam Sentimen Analisis Terkait Kenaikan Tarif BPJS Kesehatan. Jurnal Bumigora Information Technology, 2(2), 105–112.

Widyawati, & Sutanto. (2019). Perbandingan Algoritma Naïve Bayes dan Support Vector Machine. Jurnal Sains & Teknologi, 3(2), 178–194.

Pradana, A. W., & Hayaty, M. (2019). The Effect of Stemming and Removal of Stopwords on The Accuracy of Sentiment Analysis on Indonesian-Language Texts. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Journal homepage, 4(3).

Jumeilah, F. S. (2017). Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian. Jurnal Rekayasa Sistem dan Teknologi Informasi, 1(1), 19–25.

Fitriyah, N., Warsito, B., & Maruddani, D. A. I. (2020). Analisis Sentimen Gojek pada Media Sosial Twitter dengan Klasifikasi Support Vector Machine. Jurnal Gaussian, 9(3), 376–390.

Najiyah, I., & Haryanti, I. (2021). Sentimen Analisis Covid-19 dengan Metode Probabilistic Neural Network dan TF-IDF. Jurnal Responsif, 3(1), 100–111.

Anugerah, F., & Djunaidy, A. (2017). Improving The Performance of Repeated Character Pre-processing in Recognizing Words in The Indonesian Sentiment Classification. Journal of Basic and Applied Scientific Research, 7(9), 1–9.

Putra, M. F., Herdiani, A., & Puspandari, D. (2019). Analisis Pengaruh Normalisasi , TF-IDF , Pemilihan Feature-set terhadap Klasifikasi Sentimen Menggunakan Maximum Entropy ( Studi Kasus : Grab dan Gojek ). e-Proceeding of Engineering, 6(2), 8520–8529.

Jayashree, R., & Murthy, K. S. (2014). Effect of Stop Word Removal on The Performance of Naïve Bayesian Methods for Text Classification in The Kannada Language. Journal Artificial Intelligence and Soft Computing, 4, 264–282.

Meisya, F. (2013) Perancangan Sistem Temu Balik Informasi dengan Metode Pembobotan Kombinasi TF-IDF untuk Pencarian Dokumen Berbahasa Indonesia. Jurnal Sistem dan Teknologi Informasi, 1(1).

Septian, J. A., Fahrudin, T. M., & Nugroho, A. (2019). Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor. Journal of Intelligent System and Computation.

Nugroho, M. A., & Santoso, H. A., (2016). Klasifikasi Dokumen Komentar pada Situs Youtube Menggunakan Algoritma K-Nearest Neighbor.

Rahman, H. (2021). Klasifikasi Sentimen Masyarakat terhadap Layanan Badan Penyelenggara Jaminan Sosial (BPJS) Kesehatan di Twitter Menggunakan Metode K-Nearest Neighbor.

How to Cite
Charibaldi, N., Harfiani, A., & Samuel Simanjuntak, O. (2023). Comparison of the Effect of Word Normalization on Naïve Bayes Classifier and K-Nearest Neighbor Methods for Sentiment Analysis. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 9(1), 25-31.