Analytical Comparison of Lung Cancer Classification Using K-Nearest Neighbor and Naïve Bayes Algorithms


Abstract
Lung cancer stands as a significant global contributor to human mortality, constituting 25% of all cancer-related deaths in 2021. Its elusive nature, often devoid of early symptoms in a quarter of diagnosed cases, poses a challenge for timely detection. Unlike some other cancers, lung cancer remains hidden from the naked eye, with its symptoms often masquerading as those of other ailments like bronchitis, asthma, or persistent coughs. Early diagnosis is pivotal for effective treatment and increased survival rates. In light of the pressing nature of the situation, this study investigates the prediction of lung cancer by using data mining tools. It is essential to conduct data mining, which is a process that involves searching for patterns and trends inside vast data repositories to discover valuable insights. Within this context, classification emerges as a fundamental aspect that discerns objects based on their distinctive characteristics. A comparative study is undertaken to address the complexities associated with lung cancer classification, focusing on the K-Nearest Neighbor (KNN) and Naïve Bayes Classifier (NBC) algorithms. Through the utilization of a dataset that contains one thousand instances and twenty-four criteria, the purpose of this study is to determine which algorithm is preferable in the categorization of lung cancer. Upon analysis, the study yields noteworthy results. The KNN algorithm exhibits an accuracy rate of 98.34%, surpassing the NBC algorithm's accuracy of 89.37%. Consequently, this research concludes that, in lung cancer classification, the KNN algorithm outperforms the Naïve Bayes algorithm. These findings promise to enhance the efficacy of early lung cancer detection, potentially saving numerous lives through improved classification methods.
References
M. Ismail, "Lung Cancer Prediction using Data Mining Techniques," International Journal of Advanced Technology and Engineering Exploration, vol. 8, pp. 2277–3878, Nov. 2019, doi: 10.35940/ijrte.D9914.118419.
N. Kalaivani, N. Manimaran, Dr. S. Sophia, and D. D Devi, "Deep Learning Based Lung Cancer Detection and Classification," IOP Conf. Ser.: Mater. Sci. Eng., vol. 994, no. 1, p. 012026, Dec. 2020, doi: 10.1088/1757-899X/994/1/012026.
L. Rahayuwati, I. A. Rizal, T. Pahria, M. Lukman, and N. Juniarti, “Pendidikan Kesehatan tentang Pencegahan Penyakit Kanker dan Menjaga Kualitas Kesehatan,” Media Karya Kesehatan, vol. 3, no. 1, Art. no. 1, Apr. 2020, doi: 10.24198/mkk.v3i1.26629.
A. Rahman, D. Gayatri, and A. Waluyo, “Dukungan Sosial terhadap Kualitas Hidup Pasien Kanker,” Journal of Telenursing (JOTING), vol. 5, no. 1, Art. no. 1, Jun. 2023, doi: 10.31539/joting.v5i1.5770.
Y. Cheng, T. Zhang, and Q. Xu, "Therapeutic advances in non-small cell lung cancer: Focus on clinical development of targeted therapy and immunotherapy," MedComm, vol. 2, no. 4, pp. 692–729, 2021, doi: 10.1002/mco2.105.
M. Vedaraj, C. S. Anita, A. Muralidhar, V. Lavanya, K. Balasaranya, and P. Jagadeesan, "Early Prediction of Lung Cancer Using Gaussian Naive Bayes Classification Algorithm," International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 6s, Art. no. 6s, May 2023.
J. Amalia, “ANALISIS MODEL REGRESI COX PROPORTIONAL HAZARD PADA STUDI KASUS PASIEN KANKER PARU-PARU,” JURNAL ILMIAH SIMANTEK, vol. 4, no. 1, Art. no. 1, Feb. 2020.
A. M. T. I. S. Ua et al., “Penggunaan Bahasa Pemrograman Python Dalam Analisis Faktor Penyebab Kanker Paru-Paru,” Jurnal Publikasi Teknik Informatika, vol. 2, no. 2, Art. no. 2, Jul. 2023, doi: 10.55606/jupti.v2i2.1742.
G. A. P. Singh and P. K. Gupta, "Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans," Neural Comput & Applic, vol. 31, no. 10, pp. 6863–6877, Oct. 2019, doi: 10.1007/s00521-018-3518-x.
F. Taher, N. Prakash, A. Shaffie, A. Soliman, and A. El-Baz, "An Overview of Lung Cancer Classification Algorithms and their Performances," vol. 48, no. 4, 2021.
M. Nabeel, S. Majeed, M. Awan, H. Muslih-Ud-Din, M. Wasique, and R. Nasir, "Review on Effective Disease Prediction through Data Mining Techniques," International Journal on Electrical Engineering and Informatics, vol. 13, Sep. 2021, doi: 10.15676/ijeei.2021.13.3.13.
S. K. P. Loka and A. Marsal, "Perbandingan Algoritma K-Nearest Neighbor dan Naïve Bayes Classifier untuk Klasifikasi Status Gizi Pada Balita: Comparison Algorithm of K-Nearest Neighbor and Naïve Bayes Classifier for Classifying Nutritional Status in Toddlers," MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 1, Art. no. 1, May 2023, doi: 10.57152/malcom.v3i1.474.
M. A. R. Wahid, A. Nugroho, and A. H. Anshor, "Prediksi Penyakit Kanker Paru-Paru Dengan Algoritma Regresi Linier," Bulletin of Information Technology (BIT), vol. 4, no. 1, Art. no. 1, Mar. 2023, doi: 10.47065/bit.v4i1.501.
M. Y. Haffandi, E. Haerani, F. Syafria, and L. Oktavia, “KLASIFIKASI PENYAKIT PARU-PARU DENGAN MENGGUNAKAN METODE NAÏVE BAYES CLASSIFIER,” Jurnal Tekinkom (Teknik Informasi dan Komputer), vol. 5, no. 2, Art. no. 2, Dec. 2022, doi: 10.37600/tekinkom.v5i2.649.
Y. A. Suwitono and F. J. Kaunang, “Implementasi Algoritma Convolutional Neural Network (CNN) Untuk Klasifikasi Daun Dengan Metode Data Mining SEMMA Menggunakan Keras,” 1, vol. 6, no. 2, Art. no. 2, Nov. 2022, doi: 10.31603/komtika.v6i2.8054.
H. Susana, “PENERAPAN MODEL KLASIFIKASI METODE NAIVE BAYES TERHADAP PENGGUNAAN AKSES INTERNET,” Jurnal Riset Sistem Informasi dan Teknologi Informasi (JURSISTEKNI), vol. 4, no. 1, Art. no. 1, Feb. 2022, doi: 10.52005/jursistekni.v4i1.96.
K. Hayati and R. Habibi, “KLASIFIKASI KELAYAKAN MAHASISWA MASUK PROGRAM MSIB KAMPUS MERDEKA:,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 3, Art. no. 3, Nov. 2023, doi: 10.36040/jati.v7i3.6882.
E. Wulandari, "KLASIFIKASI KANKER PARU-PARU MENGGUNAKAN METODE NAIVE BAYES," International Research on Big-Data and Computer Technology: I-Robot, vol. 6, no. 2, Art. no. 2, Sep. 2022, doi: 10.53514/ir.v6i2.325.
A. Desiani et al., “Perbandingan Klasifikasi Penyakit Kanker Paru-Paru menggunakan Support Vector Machine dan K-Nearest Neighbor,” Jurnal PROCESSOR, vol. 18, no. 1, Art. no. 1, Apr. 2023, doi: 10.33998/processor.2023.18.1.700.
R. D. Marzuq, S. A. Wicaksono, and N. Y. Setiawan, “Prediksi Kanker Paru-Paru menggunakan Algoritme Random Forest Decision Tree,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 14, Art. no. 14, Oct. 2023, Accessed: Dec. 19, 2023. [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12964
K. Taunk, S. De, S. Verma, and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), May 2019, pp. 1255–1260. doi: 10.1109/ICCS45141.2019.9065747.
M. M. Saritas and A. Yasar, "Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification," International Journal of Intelligent Systems and Applications in Engineering, vol. 7, no. 2, Art. no. 2, Jun. 2019, doi: 10.18201//ijisae.2019252786.
N. B. Putri and A. W. Wijayanto, “Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing,” Komputika : Jurnal Sistem Komputer, vol. 11, no. 1, pp. 59–66, Jan. 2022, doi: 10.34010/komputika.v11i1.4350.
K. L. Kohsasih and Z. Situmorang, "Analisis Perbandingan Algoritma C4.5 dan Naïve Bayes Dalam Memprediksi Penyakit Cerebrovascular," Jurnal Informatika, vol. 9, no. 1, Art. no. 1, Apr. 2022, doi: 10.31294/inf.v9i1.11931.
I. H. Kusuma and N. Cahyono, "Analisis Sentimen Masyarakat Terhadap Penggunaan E-Commerce Menggunakan Algoritma K-Nearest Neighbor," Jurnal Informatika: Jurnal Pengembangan IT, vol. 8, no. 3, Art. no. 3, Sep. 2023, doi: 10.30591/jpit.v8i3.5734.
A. A. A. Daniswara and I. K. D. Nuryana, “Data Preprocessing Pola Pada Penilaian Mahasiswa Program Profesi Guru,” Journal of Informatics and Computer Science (JINACS), vol. 5, no. 01, pp. 97–100, Jul. 2023.
Copyright (c) 2024 Yunisa Darmayanti, Fitri Marisa, Aviv Yuniar Rahman

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Inform: Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.