Cervical Cancer Risk Prediction with Machine Learning: Analysis of Cervical Cancer Risk Classification Dataset
DOI:
https://doi.org/10.32583/pskm.v15i4.3798Keywords:
cervical cancer, early detection, machine learning, decision tree, SVM, risk prediction, screening, schiller testAbstract
Cervical cancer remains one of the leading causes of cancer death in women, especially in developing countries. Early detection through screening is essential to reduce morbidity and mortality, but the main challenge is to identify individuals at high risk efficiently. This study aims to build a machine learning prediction model to classify cervical cancer biopsy results based on available risk factors. Objectives: This study aims to build a cervical cancer risk prediction model using a machine learning algorithm based on available risk factors. The public dataset "Cervical Cancer Risk Classification" includes demographic data, sexual behavior, contraceptive use, and medical test results. Three machine learning algorithms are applied: Logistic Regression, Decision Tree, and Support Vector Machine (SVM). Model evaluation uses accuracy, precision, recall, F1 score, and Matthews Correlation Coefficient (MCC). The Decision Tree model performed best with an F1 Score of 0.956 and MCC of 0.639. Significant contributing risk factors are age, age at first sexual intercourse, Schiller test results, cytology, and number of pregnancies. Machine learning has great potential to improve the effectiveness of cervical cancer screening. Data balancing techniques and ensemble methods are recommended to increase accuracy in detecting positive cases.
References
Ashar, H., Kusrini, I., Musoddaq, A., & Asturiningtyas, I. P. (2020). First sexual intercourse and high parity are the most influential factors of precancerous cervical lesion. Majalah Obstetri & Ginekologi, 28(3), 113–118. https://doi.org/10.20473/mog.v28i32020.113-118
Barquet-Muñoz, S. A., Arteaga-Gómez, C., Díaz-López, E., Rodríguez-Trejo, A., Marquez-Acosta, J., & Aranda-Flores, C. (2024). Current status and challenges in timely detection of cervical cancer in Mexico: Expert consensus. Frontiers in Oncology, 14, 1383105. https://doi.org/10.3389/fonc.2024.1383105
Battista, K., Diao, L., Patte, K. A., Dubin, J. A., & Leatherdale, S. T. (2023). Examining the use of decision trees in population health surveillance research: An application to youth mental health survey data in the COMPASS study. Health Promotion and Chronic Disease Prevention in Canada, 43(2), 73–86. https://doi.org/10.24095/hpcdp.43.2.03
Gimeno, M., Sada Del Real, K., & Rubio, A. (2023). Precision oncology: A review to assess interpretability in several explainable methods. Briefings in Bioinformatics, 24(4), bbad200. https://doi.org/10.1093/bib/bbad200
Greenley, R., Bell, S., Rigby, S., Legood, R., Kirkby, V., McKee, M., & CBIG-SCREEN Consortium. (2023). Factors influencing the participation of groups identified as underserved in cervical cancer screening in Europe: A scoping review of the literature. Frontiers in Public Health, 11, 1144674. https://doi.org/10.3389/fpubh.2023.1144674
Israel, A. (2022). Partnering to strengthen health systems and improve access to quality cancer care. JCO Global Oncology, 8, e2200148. https://doi.org/10.1200/GO.22.00148
Kuruvilla, A., & Jayanthi, B. (2022). Analysis and review on feature selection and classification methods on cervical cancer. ICTACT Journal on Soft Computing, 12(2), 2551–2558. https://doi.org/10.21917/ijsc.2022.0365
MacEachern, S. J., & Forkert, N. D. (2021). Machine learning for precision medicine. Genome, 64(4), 416–425. https://doi.org/10.1139/gen-2020-0131
Piyathilake, C. J., Badiga, S., & Jolly, P. E. (2023). Potential effects of age-based changes in screening guidelines on the identification of women at risk for developing cervical cancer. Cancer Prevention Research, 16(2), 99–108. https://doi.org/10.1158/1940-6207.CAPR-22-0426
Shetty, A., & Shah, V. (2018). Survey of cervical cancer prediction using machine learning: A comparative approach. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCCNT.2018.8494169
Sreelatha, S., & Shivashetty, V. (2023). Proactive cervical cancer risk assessment using data-driven analytics. International Journal of Artificial Intelligence, 13(4), 4301–4311. https://doi.org/10.11591/ijai.v13.i4.pp4301-4311
Sun, W., Shen, N.-M., & Fu, S.-L. (2019). Involvement of lncRNA-mediated signaling pathway in the development of cervical cancer. European Review for Medical and Pharmacological Sciences, 23(9), 3672–3687. https://doi.org/10.26355/eurrev_201905_17791
Tobore, O. (2019). On the need for the development of a cancer early detection, diagnostic, prognosis, and treatment response system. Future Science OA, 6(2), FSO439. https://doi.org/10.2144/fsoa-2019-0028
Uddin, K. M. M., Sikder, I. A., & Hasan, M. N. (2025). A comparative study on machine learning classifiers for cervical cancer prediction: A predictive analytic approach. EAI Endorsed Transactions on Internet of Things, 11, e6223. https://doi.org/10.4108/eetiot.6223
Vu, M., Yu, J., Awolude, O. A., & Chuang, L. (2018). Cervical cancer worldwide. Current Problems in Cancer, 42(5), 457–465. https://doi.org/10.1016/j.currproblcancer.2018.06.003
Yadav, U., Bondre, V. D., Bondre, S. V., Thakre, B., Agrawal, P., & Thakur, S. (2025). Intelligent cervical cancer detection: Empowering healthcare with machine learning algorithms. International Journal of Artificial Intelligence, 14(1), 298–306. https://doi.org/10.11591/ijai.v14.i1.pp298-306
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Ilmiah Permas: Jurnal Ilmiah STIKES Kendal

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.