Peningkatan Kinerja Prediksi Diabetes Menggunakan Random Forest melalui kombinasi Threshold Tuning dan Class Weight Balance
Keywords:
Diabetes Prediction, Feature Importance, Machine Learning, Random Forest, Threshold TuningAbstract
Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels and may lead to serious complications if not detected early. This study aims to develop a diabetes prediction model using the Random Forest algorithm on a diabetes prediction dataset consisting of 100,000 records, which became 96,146 records after duplicate removal. The research stages included data cleaning, class distribution analysis, preprocessing using StandardScaler for numerical features and OneHotEncoder for categorical features, Random Forest model training with balanced class weighting, and comprehensive performance evaluation. Model evaluation was conducted using accuracy, precision, recall, F1-score, ROC-AUC, confusion matrix, ROC curve, precision-recall curve, calibration curve, learning curve, PCA visualization, and feature importance analysis. Experimental results showed that the Random Forest model achieved an accuracy of 96.91%, precision of 94.15%, recall of 69.28%, F1-score of 79.82%, and ROC-AUC of 96.39% at the default threshold of 0.5. Threshold tuning indicated that the optimal threshold was 0.75, improving the F1-score to 80.80% and accuracy to 97.16%. Feature importance analysis revealed that HbA1c_level, blood_glucose_level, age, and BMI were the most influential factors in diabetes prediction. The findings indicate that Random Forest combined with threshold optimization provides high predictive performance and has strong potential as a machine learning-based approach for early diabetes detection.
Downloads
References
[1] H. Husain, S. Ramadani, and N. Magfirah Ilyas, “Literature Review: Analisis Faktor Penyebab Penyakit Degeneratif (Diabetes Mellitus) pada Metabolisme Karbohidrat,” Indones. J. Sci. Public Heal., vol. 2, no. 2 SE-Articles, pp. 258–270, Sep. 2025, [Online]. Available: https://yici-journal.id/ijsph/article/view/29
[2] N. N. Rosyidah and E. A. Cahyono, “DIABETES MELITUS TIPE 2 ; ARTIKEL REVIEW,” Enferm. Cienc., vol. 3, no. 1, pp. 44–63, Feb. 2025, doi: 10.56586/ec.v3i1.74.
[3] D. Rahmawati, “Kualitas Hidup Pasien Diabetes Melitus dan Hipertensi dalam Program Penyakit Kronis (Prolanis) di Indonesia: Narative Review,” J. Mandala Pharmacon Indones., vol. 10, no. 1 SE-Review Article, pp. 116–122, Jun. 2024, doi: 10.35311/jmpi.v10i1.531.
[4] L. Muhaziroh et al., “Edukasi Pola Makan Sehat dan Aktivitas Fisik Sebagai Upaya Pencegahan Diabetes pada Transisi Epidemiologi,” BERNAS J. Pengabdi. Kpd. Masy., vol. 7, no. 2 SE-Articles, pp. 1248–1257, Apr. 2026, doi: 10.31949/jb.v7i2.17680.
[5] I. Restika BN, S. Suarnianti, and S. Syamsuriah, “Trend Diabetes Melitus Tipe 2 pada Remaja: Literatur Review,” J. Penelit. Sains dan Kesehat. Avicenna, vol. 4, no. 3 SE-Artikel, pp. 249–252, Sep. 2025, doi: 10.69677/avicenna.v4i3.192.
[6] F. Sartika and N. Hestiani, “Kadar HbA1c pada Pasien Wanita Penderita Diabetes Mellitus Tipe 2 di Rsud Dr. Doris Sylvanus Palangka Raya: HbA1c Levels in Patients Female with Type 2 Diabetes Mellitus in RSUD Dr. Doris Sylvanus Palangka Raya,” Borneo J. Med. Lab. Technol., vol. 2, no. 1 SE-Articles, pp. 97–100, Oct. 2019, doi: 10.33084/bjmlt.v2i1.1086.
[7] L. Najma Rachmawati, C. Rievania Khairunisa Fitri, and M. Exsanni Araf Octaviana, “PELUANG DAN TANTANGAN ARTIFICIAL INTELLIGENCE TERHADAP OPTIMALISASI LAYANAN KESEHATAN,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 1, pp. 882–890, Dec. 2024, doi: 10.36040/jati.v9i1.12514.
[8] N. Rokhman, S. Sumaryanto, F. N. Hakim, and P. A. Maulan, “Integrasi Machine Learning dalam Homebase Sistem Informasi untuk Analisis Produktivitas Akademik,” Go Infotech J. Ilm. STMIK AUB; Vol 31, No 2 December, 2025, doi: 10.36309/goi.v31i2.424.
[9] J. J. Hidayat, F. F. Azhari, T. M. Husna, A. N. Fahmayani, N. N. Pradana, and C. Setyowati, “Perbandingan Kinerja Algoritma Machine Learning Dalam Prediksi Kesehatan Mental Dan Burnout Mahasiswa,” J. Surya Inform., vol. 16, no. 1 SE-Articles, pp. 32–42, May 2026, doi: 10.48144/suryainformatika.v16i1.2420.
[10] A. Salam, L. Azhari, R. S. Septarini, and N. Heriyani, “Pendekatan Hybrid K-Means SMOTE dan Logistic Regression Untuk Deteksi Dini Diabetes Mellitus Pada Imbalanced Data,” Bull. Comput. Sci. Res., vol. 5, no. 3, pp. 219–227, Apr. 2025, doi: 10.47065/bulletincsr.v5i3.502.
[11] M. Samodro, “Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung,” J. Softw. Eng. Inf. Syst., vol. 6, no. 1 SE-Articles, pp. 56–62, Feb. 2026, [Online]. Available: https://ejurnal.umri.ac.id/index.php/SEIS/article/view/11050
[12] K. A. Saputro, E. M. Atsir, and H. Hasanah, “Perbandingan Tingkat Akurasi Penyakit Diabetes Menggunakan Metode Regresi Logistik dan Random Forest,” TAMIKA J. Tugas Akhir Manaj. Inform. Komputerisasi Akunt., vol. 4, no. 2, pp. 159–166, Dec. 2024, doi: 10.46880/tamika.Vol4No2.pp159-166.
[13] R. Harahap, M. Irpan, M. A. Dinata, L. Efrizoni, and R. Rahmaddeni, “Perbandingan Algoritma Random Forest dan XGBoost untuk Klasifikasi Penyakit Paru-Paru Berdasarkan Data Demografi Pasien,” J. Ilm. BETRIK Besemah Teknol. Inf. dan Komput., vol. 15, no. 2, pp. 130–141, 2024, [Online]. Available: https://ejournal.pppmitpa.or.id/index.php/betrik/article/view/231
[14] Gullam Almuzadid and Egia Rosi Subhiyakto, “Stroke Risk Classification Using the Ensemble Learning Method of XGBoost and Random Forest,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 828–837, Jun. 2025, doi: 10.30871/jaic.v9i3.9528.
[15] M. Haris Khoirul Anam, D. Kurnianingtyas, and A. Andy Soebroto, “Implementasi Algoritma Random Forest Untuk Prediksi Churn Pada Pelanggan Retail Online,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 4 SE-Artikel, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/16262
[16] J. J. Hidayat, D. E. Sujianto, M. R. Saputra, E. A. Ramdhani, M. Jihansyah, and Y. Nandya, “Klasifikasi Penyakit Diabetes Melitus Berbasis Jaringan Syaraf Tiruan Menggunakan Algoritma Multi-Layer Perceptron,” J. Komput. Teknol. Inf. Sist. Komput., vol. 5, no. 1, pp. 401–411, May 2026, doi: 10.62712/juktisi.v5i1.1042.
[17] J. J. Hidayat, M. R. Saputra, A. R. Sigand, A. L. N. Fadilah, M. D. I. Amin, and A. R. Ramadhan, “Evaluasi Kinerja Algoritma Ensemble Learning Pada Klasifikasi Penyakit Diabetes Berbasis Boosting Method,” J. Surya Inform., vol. 16, no. 1 SE-Articles, pp. 71–80, May 2026, doi: 10.48144/suryainformatika.v16i1.2424.
[18] Z. Rozikin and J. J. Hidayat, “Perbandingan Metode Oversampling SMOTE dan ADASYN pada Klasifikasi Diabetes Menggunakan Algoritma CatBoost,” J. Manaj. Inform. Teknol., vol. 6, no. 1, pp. 151–164, 2026, doi: 10.51903/mifortekh.v6i1.1157.
[19] N. Nanda Pradana, A. Agung Subekti, and E. Rilvani, “DETEKSI TRANSAKSI MENCURIGAKAN MENGGUNAKAN DECISION TREE DAN LOGISTIC REGRESSION DENGAN MITIGASI KETIDAKSEIMBANGAN KELAS,” J. Media Akad., vol. 3, no. 8 SE-Articles, 2025, doi: 10.62281/v3i8.2680.
[20] P. R. P. Rosalya Putri and R. Alit, “Klasifikasi Penyakit Diabetes Melitus Menggunakan Metode Support Vector Machine (SVM),” J. Informatics Comput. Sci., vol. 6, no. 03, pp. 740–746, Jan. 2025, doi: 10.26740/jinacs.v6n03.p740-746.
[21] S. Ernawati and I. Maulana, “Meningkatkan Klasifikasi Penyakit Diabetes Menggunakan Metode Ensemble Softvoting Dengan SMOTE-ENN dan Optimasi Bayesian,” Evolusi J. Sains dan Manaj., vol. 13, no. 1, pp. 71–86, Mar. 2025, doi: 10.31294/evolusi.v13i1.8267.
[22] A. Nugroho, Wiyanto, and D. Maulana, “COMPARATIVE ANALYSIS OF CLASSIFICATION ALGORITHMS IN HANDLING IMBALANCED DATA WITH SMOTE OVERSAMPLING APPROACH,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 11, no. 2, pp. 487–495, Nov. 2025, doi: 10.33480/jitk.v11i2.6956.
[23] N. Surojudin, S. Butsianto, and A. Firmansyah, “Perbandingan Kinerja Naïve Bayes dengan dan Tanpa SMOTE untuk Klasifikasi Gangguan Kecemasan Mahasiswa pada Data Tidak Seimbang,” Bull. Comput. Sci. Res., vol. 6, no. 2, pp. 804–812, Feb. 2026, doi: 10.47065/bulletincsr.v6i2.1021.
[24] R. Amin and A. S. F. Utami, “Prediksi Nilai Ujian Berdasarkan Kebiasaan Siswa Menggunakan Algoritma Random Forest Regressor,” Inf. Syst. Educ. Prof. J. Inf. Syst., vol. 10, no. 2, p. 149, Dec. 2025, doi: 10.51211/isbi.v10i2.3722.
[25] A. Syaifudin, R. Risqiati, and Hermanus Wim Hapsoro, “IMPLEMENTASI EXPLORATORY DATA ANALYSIS UNTUK ANALISIS DATA LEMAK TUBUH,” IC Tech Maj. Ilm., vol. 20, no. 1, pp. 1–10, Apr. 2025, doi: 10.47775/ictech.v20i1.328.
[26] A. Astofa, P. Rosyani, R. Rahmawati, and S. Apandi, “Evaluasi Komparatif Algoritma Machine Learning untuk Prediksi Dini Diabetes,” Bull. Comput. Sci. Res., vol. 6, no. 1 SE-, pp. 558–565, Dec. 2025, doi: 10.47065/bulletincsr.v6i1.859.
[27] A. Setiawan, Adelina, D. M. Hutabalian, R. Irnanda, H. Fredi, and Iswanto, “EVALUASI PERBANDINGAN KINERJA MODEL MACHINE LEARNING UNTUK PREDIKSI DIABETES: STUDI KASUS XGBOOST, RANDOM FOREST, DAN SVM,” INFOKOM (Informatika & Komputer), vol. 12, no. 2 SE-Articles, Dec. 2024, doi: 10.56689/infokom.v12i2.2350.
[28] J. J. Hidayat, A. H. Anshor, and M. S. Anwar, “Pemodelan Deteksi dan Klasifikasi Fraktur Tulang pada Radiografi X-Ray Menggunakan YOLOv8 dan Preprocessing CLAHE,” J. FASILKOM, vol. 16, no. 1, pp. 31–45, 2026, doi: 10.37859/jf.v16i1.11241.
[29] A. I. K. Akbar and Y. P. Astuti, “Lung Cancer Classification using the Naïve Bayes Method with SMOTE,” SISTEMASI, vol. 14, no. 6, p. 2954, Nov. 2025, doi: 10.32520/stmsi.v14i6.5607.
[30] A. Z. Kamalia, Choiriyatun Nisa Latansa, and Zaenur Rozikin, “Klasifikasi Kondisi Pasar Harga Emas ANTAM Indonesia Menggunakan Algoritma Decision Tree,” J. Komput. Teknol. Inf. Sist. Inf., vol. 4, no. 3, pp. 2087–2098, Jan. 2026, doi: 10.62712/juktisi.v4i3.800.
[31] E. Tri Armawan, R. Safitri, and L. Riyandari, “Evaluasi dan Interpretabilitas Model Machine learning untuk Prediksi Diabetes dengan Nested cross-validation dan SHAP,” J. Pustaka AI (Pusat Akses Kaji. Teknol. Artif. Intell., vol. 6, no. 1 SE-Artikel, pp. 12–24, Mar. 2026, doi: 10.55382/jurnalpustakaai.v6i1.1751.
[32] S. Ijayanti and D. W. Utomo, “Implementasi Stacking Ensemble Berbasis Cross Domain untuk Klasifikasi Diabetes,” J. INFOTEKMESIN, vol. 17, no. 01, pp. 48–56, 2026, doi: 10.35970/infotekmesin.v17i1.3000.
[33] J. J. Hidayat, C. Setyowati, and A. P. Werdana, “Perancangan Sistem Prediksi Penyakit pada Tanaman Padi Berbasis Image Processing Menggunakan Algoritma VGG-16 Transfer Learning dan K-Means Segmentation,” J. Pract. Comput. Sci., vol. 5, no. 1, pp. 1–15, May 2025, doi: 10.37366/jpcs.v5i1.5759.
[34] A. Ichwani, R. I. Kesuma, A. Setiawan, I. E. Wicaksono, and R. Hanifah, “Preventing Data Leakage in Classification via Integrated Machine Learning Pipelines: Preprocessing, Feature Transformation, and Hyperparameter Tuning,” J. Tek. Inform., vol. 7, no. 1, pp. 391–410, Feb. 2026, doi: 10.52436/1.jutif.2026.7.1.5490.
[35] D. O. E. Wanto, C. Harmon, and J. Jupron, “Penerapan Metode Algoritma C.48 untuk Klasisifikasi Penyakit Diabetes,” J. Janitra Inform. dan Sist. Inf., vol. 5, no. 2, pp. 180–188, Oct. 2025, doi: 10.59395/qyzpt451.





