Topic Modeling of the 2024 Election Using the BERTopic Method on Detik.com News Articles


Abstract
In 2024, Indonesia will hold simultaneous general elections dominated by the participation of young people, especially Generation Z and Millennials, who seek political information primarily through the Internet, highlighting the crucial role of digital media in shaping public opinion. Detik.com is actively reporting on the 2024 elections, evidenced by a special election subchannel. However, the lack of topic categorization in this subchannel makes it difficult for readers to find in-depth information, and tracking and analyzing the large volume of news articles published daily is a significant challenge. This study employs Topic Modelling techniques, specifically the BERTopic method, to analyze topics related to the 2024 elections from Kompas.com news articles. The dataset, sourced from the detik.com election sub-channel, was collected via scraping from September 1, 2023, to February 14, 2024, totalling 15,019 articles. The text preprocessing involves case folding, cleaning, tokenizing, and stopword removal. Topic modelling using BERTopic includes embeddings with sentence-transformers "distiluse-base-multilingual-cased-v1," dimensionality reduction with UMAP, clustering with K-Means using optimal k=5 value evaluated by Elbow, tokenizer with CountVectorizer, and weighting scheme using c-TF-IDF. Based on the Silhouette Score of 0.566 and Silhouette plot results, the clustering results using the K-Means model with a value of k equal to 5 produce good clustering with clear inter-cluster distances. For other evaluations, the SSE value of 70223.257 provides an overview of the cluster distribution, the Davies-Bouldin Index of 0.758 shows that the cluster has a relatively good level of inter-cluster separation with good closeness within the cluster, the Calinski-Harabasz Index of 20083.489 shows good and compact inter-cluster separation, and the Dunn Index of 0.003 shows outliers that cause overlapping clusters and lack of clear separation. The evaluation results show that implementing the K-Means model with a value of k equal to 5 again emphasizes that the clustering results are good. The modelling results show an average topic coherence value of 0.0902 and produce five main topics in the 2024 election news on Detik.com topic 0: about presidential and vice presidential candidates (5,215 articles) with the representation of the words 'ganjar', 'prabowo', 'anies' and 'imin', topic 1: about general elections and related surveys (3,191 articles) with the representation of the words '2024', 'pemilu', 'pilpres' and 'suara', topic 2: news about Joko Widodo President (2,604 articles), topic 3: news about presidential and vice presidential debates (2046 articles) with representations of the words 'presiden', 'jokowi', 'demokrat' and 'politik' and topic 4: news about the figure Gibran Rakabuming Raka and related issues (1963 articles) with representations of the words 'raka', 'rakabuming', 'nomor' and 'urut'. Using the results of this research, readers can gain insights into the most discussed issues and the attention given to key figures in the 2024 election news on the detik.com news portal.
References
K. K. Sabat, "Kepemimpinan Ideal Bagi Generasi Milenial," HARVESTER: Jurnal Teologi dan Kepemimpinan Kristen, vol. 6, no. 2, pp. 149–159, 2021, doi: 10.52104/harvester.v6i2.59.
F. I. R. Firamadhina and H. Krisnani, "Perilaku Generasi Z Terhadap Penggunaan Media Sosial Tiktok: TikTok Sebagai Media Edukasi dan Aktivisme," Share: Social Work Journal, vol. 10, no. 2, pp. 199–208, 2021, doi: 10.24198/share.v10i2.31443.
D. M. Solikha and H. P. Purba, “Perbedaan Value Pada Generasi X dan Y di Indonesia,” Jurnal Diversita, vol. 8, no. 1, pp. 38–43, 2022, doi: 10.31289/diversita.v8i1.5188.
Y. S. Putra, “Theoritical Review: Teori Perbedaan Generasi,” Jurnal STIE AMA, no. 1952, pp. 123–134, 2017.
E. Nur, “Peran Media Massa dalam Menghadapi Serbuan Media Online,” Majalah Semi Ilmiah Populer Komunikasi Massa, vol. 2, no. 1, pp. 51–64, 2021.
Detikcom, “Detikcom Company Profile,” Detikcom. [Online]. Available: https://detiknetwork.com/logo/logo/pdf-Company-Profile-detikcom-2021.pdf
C. C. Aggarwal and C. Zha, Mining Text Data. Springer publishing company, 2012.
P. Septiani and H. Kurniawan, “Analisa Penggunaan Keyword Untuk Implementasi Search,” Jurnal Teknologi Informasi, vol. 15, no. 3, pp. 83–91, 2020.
R. Egger and J. Yu, "A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.," Frontiers in sociology, vol. 7, p. 886498, 2022, doi: 10.3389/fsoc.2022.886498.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
G. Malki, "Efficient Sentiment Analysis and Topic Modeling in NLP using Knowledge Distillation and Transfer Learning," School of Electrical Engineering and Computer Science, 2023.
M. Grootendorst, "BERTopic: Neural topic modeling with a class-based TF-IDF procedure," arXiv, 2022.
Y. Matira and I. Setiawan, “Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation,” Estimasi: Journal of Statistics and Its Application, pp. 53–63, 2023.
B. Kurniawan, A. A. Aldino, and A. R. Isnain, “Sentimen Analisis Terhadap Kebijakan Penyelenggara Sistem Elektronik (Pse) Menggunakan Algoritma Bidirectional Encoder Representations from Transformers (Bert),” J. Teknol. dan Sist. Inf, vol. 3, no. 4, pp. 98–106, 2022.
D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J-SAKTI (Jurnal Sains Komputer Dan Informatika), vol. 5, no. 2, pp. 697–711, 2021.
J. Supriyanto, D. Alita, and A. R. Isnain, “Penerapan Algoritma K-Nearest Neighbor (K-NN) Untuk Analisis Sentimen Publik Terhadap Pembelajaran Daring,” Jurnal Informatika dan Rekayasa Perangkat Lunak, vol. 4, no. 1, pp. 74–80, 2023.
M. Nashrullah and D. A. Efrilianda, "Sentiment Analysis of Independent Campus Policy on Twitter Using Support Vector Machine and Naïve Bayes Classifier," Journal of Advances in Information Systems and Technology, vol. 4, no. 1, pp. 13–23, 2022.
A. Sujjada and A. Fergina, “Implementasi Metode Vector Space Model Untuk Deteksi Emosi Menggunakan Data Teks Twitter,” Jurnal RESTIKOM: Riset Teknik Informatika dan Komputer, vol. 3, no. 3, pp. 116–129, 2021.
G. Malki, "Efficient Sentiment Analysis and Topic Modeling in NLP using Knowledge Distillation and Transfer Learning." 2023.
Maarten Grootendorst, “BERTopic.” Accessed: March 27, 2024. [Online].
I. M. A. Mahesastraa and I. D. M. B. A. Darmawana, “Pemodelan Topik Teks Berita Menggunakan DistilBERT,” vol. 1, no. 1, 2022.
R. N. Fahmi, M. Jajuli, and N. Sulistiyowati, “Analisis pemetaan tingkat kriminalitas di kabupaten Karawang menggunakan Algoritma K-Means,” INTECOMS: Journal of Information Technology and Computer Science, vol. 4, no. 1, pp. 67–79, 2021.
F. Nuraeni, D. Kurniadi, and G. F. Dermawan, “Pemetaan Karakteristik Mahasiswa Penerima Kartu Indonesia Pintar Kuliah (KIP-K) menggunakan Algoritma K-Means++,” Jurnal Sisfokom (Sistem Informasi Dan Komputer), vol. 11, no. 3, pp. 437–443, 2023.
D. E. Herwindiati and T. Handhayani, “Clustering Data Covid-19 Di Indonesia Menggunakan Intelligent K-Means,” Jurnal Ilmu Komputer dan Sistem Informasi, vol. 10, no. 2, 2022.
L. Qadrini, “Metode K-Means dan DBSCAN pada Pengelompokan Data Dasar Kompetensi Laboratorium ITS Tahun 2017,” J Statistika: Jurnal Ilmiah Teori dan Aplikasi Statistika, vol. 13, no. 2, pp. 5–11, 2020.
H. Fitriyah, E. M. Safitri, N. Muna, M. Khasanah, D. A. Aprilia, and D. Nurdiansyah, “Implementasi Algoritma Clustering Dengan Modifikasi Metode Elbow Untuk Mendukung Strategi Pemerataan Bantuan Sosial Di Kabupaten Bojonegoro,” Jurnal Lebesgue: Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika, vol. 4, no. 3, pp. 1598–1607, 2023.
N. Azmi, H. S. Hafsah, Y. Yuyun, and H. Hazriani, “Penerapan Metode K-Means Clustering Dalam Mengelompokkan Data Penjualan Obat pada Apotek M23,” Prosiding SISFOTEK, vol. 7, no. 1, pp. 244–248, 2023.
R. Dwirahmanto and A. Bisri, “Menentukan Nilai K Pada Metode K-Means Menggunakan Teknik Grid Search Untuk Strategi Produk Pakaian Medis,” Jurnal Informatika Multi, vol. 1, no. 2, pp. 93–103, 2023.
A. M. Sikana and A. W. Wijayanto, “Analisis Perbandingan Pengelompokan Indeks Pembangunan Manusia Indonesia Tahun 2019 dengan Metode Partitioning dan Hierarchical Clustering,” J. Ilmu Komput, vol. 14, no. 2, pp. 66–78, 2021.
D. A. Saidah, R. Santoso, and T. Widiharih, “Pengelompokan Provinsi Di Indonesia Berdasarkan Indikator Kesehatan Lingkungan Menggunakan Metode Partitioning Around Medoids Dengan Validasi Indeks Internal,” Jurnal Gaussian, vol. 11, no. 2, pp. 302–312, 2022.
Y. Matira and I. Setiawan, “Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation,” Estimasi: Journal of Statistics and Its Application, pp. 53–63, 2023.
L. Qadrini, “Metode K-Means dan DBSCAN pada Pengelompokan Data Dasar Kompetensi Laboratorium ITS Tahun 2017,” Jurnal Statistika: Jurnal Ilmiah Teori dan Aplikasi Statistika, vol. 13, no. 2, pp. 5–11, 2020, doi: 10.36456/jstat.vol13.no2.a2886.
Copyright (c) 2024 Dini Aryani, Ivana Lucia Kharisma, Alun Sujjada, Kamdan Kamdan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.