Cervical Cancer Risk Prediction with Machine Learning: Analysis of Cervical Cancer Risk Classification Dataset

Authors

  • Yoga Paripurna Universitas Negeri Semarang
  • Irwan Budiono Universitas Negeri Semarang

DOI:

https://doi.org/10.32583/pskm.v15i4.3798

Keywords:

cervical cancer, early detection, machine learning, decision tree, SVM, risk prediction, screening, schiller test

Abstract

Cervical cancer remains one of the leading causes of cancer death in women, especially in developing countries. Early detection through screening is essential to reduce morbidity and mortality, but the main challenge is to identify individuals at high risk efficiently. This study aims to build a machine learning prediction model to classify cervical cancer biopsy results based on available risk factors. Objectives: This study aims to build a cervical cancer risk prediction model using a machine learning algorithm based on available risk factors. The public dataset "Cervical Cancer Risk Classification" includes demographic data, sexual behavior, contraceptive use, and medical test results. Three machine learning algorithms are applied: Logistic Regression, Decision Tree, and Support Vector Machine (SVM). Model evaluation uses accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC). The Decision Tree model performed best with an F1 Score of 0.956 and MCC of 0.639. Significant contributing risk factors are age, age at first sexual intercourse, Schiller test results, cytology, and number of pregnancies. Machine learning has great potential to improve the effectiveness of cervical cancer screening. Data balancing techniques and ensemble methods are recommended to increase accuracy in detecting positive cases.

References

Ashar, H., Kusrini, I., Musoddaq, A., & Asturiningtyas, I. P. (2020). First sexual intercourse and high parity are the most influential factors of precancerous cervical lesion. Majalah Obstetri & Ginekologi, 28(3), 113–118. https://doi.org/10.20473/mog.v28i32020.113-118

Barquet-Muñoz, S. A., Arteaga-Gómez, C., Díaz-López, E., Rodríguez-Trejo, A., Marquez-Acosta, J., & Aranda-Flores, C. (2024). Current status and challenges in timely detection of cervical cancer in Mexico: Expert consensus. Frontiers in Oncology, 14, 1383105. https://doi.org/10.3389/fonc.2024.1383105

Battista, K., Diao, L., Patte, K. A., Dubin, J. A., & Leatherdale, S. T. (2023). Examining the use of decision trees in population health surveillance research: An application to youth mental health survey data in the COMPASS study. Health Promotion and Chronic Disease Prevention in Canada, 43(2), 73–86. https://doi.org/10.24095/hpcdp.43.2.03

Gimeno, M., Sada Del Real, K., & Rubio, A. (2023). Precision oncology: A review to assess interpretability in several explainable methods. Briefings in Bioinformatics, 24(4), bbad200. https://doi.org/10.1093/bib/bbad200

Greenley, R., Bell, S., Rigby, S., Legood, R., Kirkby, V., McKee, M., & CBIG-SCREEN Consortium. (2023). Factors influencing the participation of groups identified as underserved in cervical cancer screening in Europe: A scoping review of the literature. Frontiers in Public Health, 11, 1144674. https://doi.org/10.3389/fpubh.2023.1144674

Israel, A. (2022). Partnering to strengthen health systems and improve access to quality cancer care. JCO Global Oncology, 8, e2200148. https://doi.org/10.1200/GO.22.00148

Kuruvilla, A., & Jayanthi, B. (2022). Analysis and review on feature selection and classification methods on cervical cancer. ICTACT Journal on Soft Computing, 12(2), 2551–2558. https://doi.org/10.21917/ijsc.2022.0365

MacEachern, S. J., & Forkert, N. D. (2021). Machine learning for precision medicine. Genome, 64(4), 416–425. https://doi.org/10.1139/gen-2020-0131

Piyathilake, C. J., Badiga, S., & Jolly, P. E. (2023). Potential effects of age-based changes in screening guidelines on the identification of women at risk for developing cervical cancer. Cancer Prevention Research, 16(2), 99–108. https://doi.org/10.1158/1940-6207.CAPR-22-0426

Shetty, A., & Shah, V. (2018). Survey of cervical cancer prediction using machine learning: A comparative approach. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCCNT.2018.8494169

Sreelatha, S., & Shivashetty, V. (2023). Proactive cervical cancer risk assessment using data-driven analytics. International Journal of Artificial Intelligence, 13(4), 4301–4311. https://doi.org/10.11591/ijai.v13.i4.pp4301-4311

Sun, W., Shen, N.-M., & Fu, S.-L. (2019). Involvement of lncRNA-mediated signaling pathway in the development of cervical cancer. European Review for Medical and Pharmacological Sciences, 23(9), 3672–3687. https://doi.org/10.26355/eurrev_201905_17791

Tobore, O. (2019). On the need for the development of a cancer early detection, diagnostic, prognosis, and treatment response system. Future Science OA, 6(2), FSO439. https://doi.org/10.2144/fsoa-2019-0028

Uddin, K. M. M., Sikder, I. A., & Hasan, M. N. (2025). A comparative study on machine learning classifiers for cervical cancer prediction: A predictive analytic approach. EAI Endorsed Transactions on Internet of Things, 11, e6223. https://doi.org/10.4108/eetiot.6223

Vu, M., Yu, J., Awolude, O. A., & Chuang, L. (2018). Cervical cancer worldwide. Current Problems in Cancer, 42(5), 457–465. https://doi.org/10.1016/j.currproblcancer.2018.06.003

Yadav, U., Bondre, V. D., Bondre, S. V., Thakre, B., Agrawal, P., & Thakur, S. (2025). Intelligent cervical cancer detection: Empowering healthcare with machine learning algorithms. International Journal of Artificial Intelligence, 14(1), 298–306. https://doi.org/10.11591/ijai.v14.i1.pp298-306

Downloads

Published

2025-06-01

How to Cite

Paripurna, Y., & Budiono, I. . (2025). Cervical Cancer Risk Prediction with Machine Learning: Analysis of Cervical Cancer Risk Classification Dataset. Jurnal Ilmiah Permas: Jurnal Ilmiah STIKES Kendal, 15(4), 799–804. https://doi.org/10.32583/pskm.v15i4.3798

Most read articles by the same author(s)