K-Nearest Neighbor Method for Early Detection of Diabetes Patients Based on Symptoms and Clinical Data

  • Nindynar Rikatsih Informatics Department, Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW, Malang
  • Mochammad Anshori Informatics Department, Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW, Malang
  • Risqy Siwi Pradini Informatics Department, Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW, Malang
  • Faurika Faurika Informatics Department, Institut Teknologi, Sains, dan Kesehatan RS.DR. Soepraoen Kesdam V/BRW, Malang
Abstract views: 210 , PDF downloads: 144
Keywords: Classification Techniques, Early Detection of Diabetes, K-Nearest Neighbor Method, Clinical Data, Symptoms Data

Abstract

Diabetes is a chronic disease rarely detected and develops quickly. Diabetes can trigger other chronic diseases such as kidney failure and heart disease. Early detection is necessary to help patients treat diabetes before the disease becomes more severe. Various health examination methods to detect diabetes, but these examinations require medical expert action and cannot be carried out by anyone. In addition, examination costs are often unaffordable. This research aims to apply data mining methods, especially k-Nearest Neighbor (KNN), for early detection of diabetes patients based on disease symptoms and patient clinical data. KNN is used to classify patient symptoms and clinical data into two classes, diabetes and non-diabetes, calculating the distance between test data and training data using Euclidean Distance. The research results show that a lower k-value provides a higher accuracy value. However, accuracy at low k-values ​​is insufficient to conclude the performance of KNN for early diabetes detection. High accuracy at low k-values ​​has the potential for overfitting, and the model is not generalizing well. Apart from that, if you use a low k-value, the model only sees patterns from 1 or a few neighbors, which results in the pattern of the data not being captured by the KNN model using a k-value that is too high also risks the model becoming underfitting. The model is too general, which makes the model unreliable. This research made use of the k-fold cross-validation technique to circumvent these issues. It is possible to avoid overfitting in the constructed KNN model by employing this method. The researchers are employing k-fold=10 and k-fold=20 in their investigation. KNN This research carried out this analysis by looking at the accuracy of each iteration of the k and k-fold values. The higher the k-fold value, the more accuracy the KNN produces. Inversely proportional to the k-fold cross-validation value, the higher the k-value in KNN, the decreases the accuracy. The KNN method applied in this research provides an accuracy of 98.2692% with higher precision than recall. These findings suggest that KNN can be an effective and efficient tool for early diabetes detection.

References

J. B. Cole and J. C. Florez, "Genetics of diabetes mellitus and diabetes complications," Nat. Rev. Nephrol., vol. 16, no. 7, pp. 377–390, 2020, doi: 10.1038/s41581-020-0278-5.

N. G. Forouhi and N. J. Wareham, "Epidemiology of diabetes," Med. (United Kingdom), vol. 47, no. 1, pp. 22–27, 2019, doi: 10.1016/j.mpmed.2018.10.004.

N. H. Cho et al., "IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045," Diabetes Res. Clin. Pract., vol. 138, pp. 271–281, 2018, doi: 10.1016/j.diabres.2018.02.023.

P. Saeedi et al., "Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition," Diabetes Res. Clin. Pract., vol. 157, p. 107843, 2019, doi: 10.1016/j.diabres.2019.107843.

J. L. Harding, M. E. Pavkov, D. J. Magliano, J. E. Shaw, and E. W. Gregg, "Global trends in diabetes complications: a review of current evidence," Diabetologia, vol. 62, no. 1, pp. 3–16, 2019, doi: 10.1007/s00125-018-4711-2.

A. A. Kazi and L. Blonde, Classification of diabetes mellitus, vol. 21, no. 1. 2019. doi: 10.5005/jp/books/12855_84.

E. Standl, K. Khunti, T. B. Hansen, and O. Schnell, "The global epidemics of diabetes in the 21st century: Current situation and perspectives," Eur. J. Prev. Cardiol., vol. 26, no. 2_suppl, pp. 7–14, 2019, doi: 10.1177/2047487319881021.

W. H. Organization, "Diabetes," https://www.who.int/news-room/fact-sheets/detail/diabetes. 2021.

H. T. Cheng, X. Xu, P. S. Lim, and K. Y. Hung, "Worldwide Epidemiology of Diabetes-Related End-Stage Renal Disease, 2000-2015," Diabetes Care, vol. 44, no. 1, pp. 89–97, 2021, doi: 10.2337/dc20-1913.

Kemenkes, “Infodatin Pusat Data dan Informasi Kementerian Kesehatan RI Tetap Produktif, Cegah, dan Atasi Diabetes Melitus,” https://pusdatin.kemkes.go.id/resources/download/pusdatin/infodatin/Infodatin-2020-Diabetes-Melitus.pdf. Pusat Data dan Informasi Kementrian Kesehatan RI.

R. Williams et al., "Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition," Diabetes Res. Clin. Pract., vol. 162, 2020, doi: 10.1016/j.diabres.2020.108072.

D. Simmons et al., "Treatment of Gestational Diabetes Mellitus Diagnosed Early in Pregnancy," N. Engl. J. Med., vol. 388, no. 23, pp. 2132–2144, 2023, doi: 10.1056/nejmoa2214956.

R. Marium et al., “From Pre-Diabetes to Diabetes : Diagnosis ,” Medicina (Kaunas)., vol. 55, no. 9, p. 546, 2019.

R. Murugan, The Retinal Blood Vessel Segmentation Using Expected Maximization Algorithm, vol. 992. 2020. doi: 10.1007/978-981-13-8798-2_6.

M. Alehegn, R. R. Joshi, and P. Mulay, "Diabetes analysis and prediction using random forest, KNN, Naïve Bayes, and J48: An ensemble approach," Int. J. Sci. Technol. Res., vol. 8, no. 9, pp. 1346–1354, 2019.

A. S. Hassan, I. Malaserene, and A. A. Leema, "Diabetes Mellitus Prediction using Classification Techniques," Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 5, pp. 2080–2084, 2020, doi: 10.35940/ijitee.e2692.039520.

A. M. Argina, “Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes,” Indones. J. Data Sci., vol. 1, no. 2, pp. 29–33, 2020, doi: 10.33096/ijodas.v1i2.11.

P. D. Rinanda, B. Delvika, S. Nurhidayarnis, N. Abror, and A. Hidayat, “Perbandingan Klasifikasi Antara Naive Bayes dan K-Nearest Neighbor Terhadap Resiko Diabetes pada Ibu Hamil,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 2, no. 2, pp. 68–75, 2022, doi: 10.57152/malcom.v2i2.432.

A. Anggrawan and M. Mayadi, “Application of KNN Machine Learning and Fuzzy C-Means to Diagnose Diabetes,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 405–418, 2023, doi: 10.30812/matrik.v22i2.2777.

N. Rikatsih and A. A. Supianto, "Classification of Posture Reconstruction with Univariate Time Series Data Type," 3rd Int. Conf. Sustain. Inf. Eng. Technol. SIET 2018 - Proc., pp. 322–325, 2018, doi: 10.1109/SIET.2018.8693174.

A. Hamed, A. Sobhy, and H. Nassar, "Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm," Arab. J. Sci. Eng., vol. 46, no. 9, pp. 8261–8272, 2021, doi: 10.1007/s13369-020-05212-z.

S. Zhang, "Challenges in KNN Classification," IEEE Trans. Knowl. Data Eng., pp. 1–13, 2021, doi: 10.1109/TKDE.2021.3049250.

S. Zhang, "Cost-sensitive KNN classification," Neurocomputing, vol. 391, no. xxxx, pp. 234–242, 2020, doi: 10.1016/j.neucom.2018.11.101.

A. Rudiyan, A. E. Dzulkifli, and K. Munazar, “Klasifikasi Kebakaran Hutan Menggunakan Metode K-Nearest Neighbor : Studi Kasus Hutan Provinsi Kalimantan Barat,” JTIM J. Teknol. Inf. dan Multimed., vol. 3, no. 4, pp. 195–202, 2022, doi: 10.35746/jtim.v3i4.177.

M. Ali, L. T. Jung, A. H. Abdel-Aty, M. Y. Abubakar, M. Elhoseny, and I. Ali, "Semantic-k-NN algorithm: An enhanced version of traditional k-NN algorithm," Expert Syst. Appl., vol. 151, p. 113374, 2020, doi: 10.1016/j.eswa.2020.113374.

Y. Chen et al., "Fast density peak clustering for large scale data based on kNN," Knowledge-Based Syst., vol. 187, p. 104824, 2020, doi: 10.1016/j.knosys.2019.06.032.

ADA, “Ada 2022,” Diabetes Care, vol. 45, no. Suppl, pp. 17–38, 2022.

Saha et al, "A Review on Diabetes Mellitus : Type1 & Type2," World J. Pharm. Pharm. Sci., vol. 9, no. 10, pp. 838–850, 2020, doi: 10.20959/wjpps202010-17336.

S. Ellahham, "Artificial Intelligence: The Future for Diabetes Care," Am. J. Med., vol. 133, no. 8, pp. 895–900, 2020, doi: 10.1016/j.amjmed.2020.03.033.

S. Alam, M. K. Hasan, S. Neaz, N. Hussain, M. F. Hossain, and T. Rahman, "Diabetes Mellitus: Insights from Epidemiology, Biochemistry, Risk Factors, Diagnosis, Complications and Comprehensive Management," Diabetology, vol. 2, no. 2, pp. 36–50, 2021, doi: 10.3390/diabetology2020004.

L. Wen et al., "The Role of Catechins in Regulating Diabetes: An Update Review," Nutrients, vol. 14, no. 21, 2022, doi: 10.3390/nu14214681.

A. Deharja, M. W. Santi, M. Yunus, and E. Rachmawati, “Sistem Prototype Klasifikasi Risiko Kehamilan Dengan Algoritma k-Nearest Neighbor (k-NN),” JTIM J. Teknol. Inf. dan Multimed., vol. 4, no. 1, pp. 66–72, 2022, doi: 10.35746/jtim.v4i1.229.

W. Xing and Y. Bei, "Medical Health Big Data Classification Based on KNN Classification Algorithm," IEEE Access, vol. 8, pp. 28808–28819, 2020, doi: 10.1109/ACCESS.2019.2955754.

S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, "Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction," Sci. Rep., vol. 12, no. 1, pp. 1–11, 2022, doi: 10.1038/s41598-022-10358-x.

A. R. Lubis, M. Lubis, and Al-Khowarizmi, "Optimization of distance formula in k-nearest neighbor method," Bull. Electr. Eng. Informatics, vol. 9, no. 1, pp. 326–338, 2020, doi: 10.11591/eei.v9i1.1464.

N. Hidayati and A. Hermawan, "K-Nearest Neighbor (K-NN) algorithm with Euclidean and Manhattan in classification of student graduation," J. Eng. Appl. Technol., vol. 2, no. 2, pp. 86–91, 2021, doi: 10.21831/jeatech.v2i2.42777.

E. W. Sholeha, S. Yunita, R. Hammad, V. C. Hardita, and K. Kaharuddin, “Analisis Sentimen Pada Agen Perjalanan Online Menggunakan Naïve Bayes dan K-Nearest Neighbor,” JTIM J. Teknol. Inf. dan Multimed., vol. 3, no. 4, pp. 203–208, 2022, doi: 10.35746/jtim.v3i4.178.

Nti Isaac Kofi, O. Nyarko-Boateng, and J. Aning, "Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation," Int. J. Inf. Technol. Comput. Sci., vol. 13, no. 6, pp. 61–71, 2021, doi: 10.5815/ijitcs.2021.06.05.

B. Juba and H. S. Le, "Precision-Recall versus accuracy and the role of large data sets," 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp. Educ. Adv. Artif. Intell. EAAI 2019, pp. 4039–4048, 2019, doi: 10.1609/aaai.v33i01.33014039.

A. F. Sadeli and I. I. Lawanda, "Recall, Precision, and F-Measure for Evaluating Information Retrieval System in Electronic Document Management Systems (EDMS)," Khizanah al-Hikmah J. Ilmu Perpustakaan, Informasi, dan Kearsipan, vol. 11, no. 2, pp. 231–241, 2023, doi: 10.24252/kah.v11i2a8.

A. Gupta, A. Anand, and Y. Hasija, "Recall-based Machine Learning approach for early detection of Cervical Cancer," 2021 6th Int. Conf. Converg. Technol. I2CT 2021, pp. 1–5, 2021, doi: 10.1109/I2CT51068.2021.9418099.

Published
2024-08-07
How to Cite
Rikatsih, N., Anshori, M., Siwi Pradini, R., & Faurika, F. (2024). K-Nearest Neighbor Method for Early Detection of Diabetes Patients Based on Symptoms and Clinical Data. Inform : Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 9(2), 187-193. https://doi.org/10.25139/inform.v9i2.8582
Section
Articles