Chat GPT 4.0 dan Chat GPT 3.5 dalam Menjawab Pertanyaan Medis
DOI:
https://doi.org/10.32583/keperawatan.v17i2.3011Keywords:
chat gpt, kecerdasan buatan, pertanyaan medisAbstract
Kecerdasan Buatan (AI) merupakan bidang ilmu komputer yang dikhususkan untuk memecahkan masalah kognitif yang umumnya terkait dengan kecerdasan manusia. Chat Generative Pre-Trained Transformer (Chat GPT) adalah model pemrosesan bahasa alami produk kecerdasan buatan yang dikembangkan oleh OpenAI. Chat GPT berpotensi membantu menjawab pertanyaan medis dengan tingkat kesesuaian yang akurat. Penelitian ini bertujuan untuk mengetahui perbandingan akurasi kecerdasan buatan Chat GPT-4 dan Chat GPT-3 dalam menjawab pertanyaan medis. Penelitian ini merupakan penelitian metaanalisis dan telaah sistematis dengan menggunakan diagram PRISMA. Pencarian studi primer melalui beberapa indexing database diantaranya: PubMed, Google Scholar dan BASE. Kata kunci yang digunakan untuk mempermudah pencarian artikel yaitu; “Chat GPT AND Medical Question”, atau “Chat GPT AND Accuracy”, atau “Chat GPT AND Medical Question AND Accuracy”. Kriteria inklusi penelitian ini adalah artikel yang terpublikasi menggunakan desain studi cross-sectional dari bulan Januari 2023-Agustus 2024. Analisis statistik yang digunakan pada penelitian ini menggunakan program metaanalisis RevMan 5.4.1 dengan pendekatan fixed effect dan Random effect serta menyajikan data funnel plot dan forest plot. Hasil penelitian menunjukan bahwa Chat GPT-4 memiliki tingkat akurasi yang lebih tinggi dalam menjawab pertanyaan medis. Tingkat akurasi Chat GPT-4 dalam menjawab pertanyaan medis menunjukan 3.07 kali lebih tinggi dibandingkan dengan Chat GPT-3 (OR: 3.07; 95%CI: 2.20-4.30; p<0.0001) dan signifikan secara statistik. Forest plot tersebut juga menunjukkan heterogenitas estimasi efek antar studi yang tinggi (I = 81%). Funnel plot menunjukkan terdapat bias publikasi yang cenderung melebih-lebihkan efek yang sesungguhnya (overestimate). Metaanalisis dari 9 studi menunjukan bahwa Chat GPT-4 lebih akurat dibandingkan Chat GPT-3 dalam menjawab pertanyaan medis.
References
Bagde, H., Dhopte, A., Alam, M. K., & Basri, R. (2023). A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon, 9(12). https://doi.org/10.1016/j.heliyon.2023.e23050
Davenport, T., & Kalakota, R. (2019). DIGITAL TECHNOLOGY The potential for artificial intelligence in healthcare. In Future Healthcare Journal (Vol. 6, Issue 2).
Guerra, G. A., Hofmann, H., Sobhani, S., Hofmann, G., Gomez, D., Soroudi, D., Hopkins, B. S., Dallas, J., Pangal, D. J., Cheok, S., Nguyen, V. N., Mack, W. J., & Zada, G. (2023). GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions. World Neurosurgery, 179, e160–e165. https://doi.org/10.1016/J.WNEU.2023.08.042
Jiao, C., Edupuganti, N. R., Patel, P. A., Bui, T., & Sheth, V. (2023). Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge. Cureus. https://doi.org/10.7759/cureus.45700
Kim, S. E., Lee, J. H., Choi, B. S., Han, H.-S., Lee, M. C., & Ro, D. H. (2024). Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4. Clinics in Orthopedic Surgery, 16(4), 669. https://doi.org/10.4055/cios23179
Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198
Liang, R., Zhao, A., Peng, L., Xu, X., Zhong, J., Wu, F., Yi, F., Zhang, S., Wu, S., & Hou, J. (2024). Enhanced Artificial Intelligence Strategies in Renal Oncology: Iterative Optimization and Comparative Analysis of GPT 3.5 Versus 4.0. Annals of Surgical Oncology, 31(6), 3887–3893. https://doi.org/10.1245/S10434-024-15107-0/METRICS
Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D., Pérez-Sancristóbal, I., Pato-Cour, E., Plasencia-Rodríguez, C., Cabeza-Osorio, L., Abasolo-Alcázar, L., León-Mateos, L., Fernández-Gutiérrez, B., & Rodríguez-Rodríguez, L. (2023a). Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-49483-6
Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D., Pérez-Sancristóbal, I., Pato-Cour, E., Plasencia-Rodríguez, C., Cabeza-Osorio, L., Abasolo-Alcázar, L., León-Mateos, L., Fernández-Gutiérrez, B., & Rodríguez-Rodríguez, L. (2023b). Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-49483-6
Mago, J., & Sharma, M. (2023). The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus, 15(7), e42133. https://doi.org/10.7759/cureus.42133
Mintz, Y., & Brodie, R. (2019). Introduction to artificial intelligence in medicine. Minimally Invasive Therapy & Allied Technologies : MITAT : Official Journal of the Society for Minimally Invasive Therapy, 28(2), 73–81. https://doi.org/10.1080/13645706.2019.1575882
Moshirfar, M., Altaf, A. W., Stoakes, I. M., Tuttle, J. J., & Hoopes, P. C. (2023). Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions. Cureus. https://doi.org/10.7759/cureus.40822
Takagi, S., Watari, T., Erabi, A., & Sakaguchi, K. (2023). Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Medical Education, 9. https://doi.org/10.2196/48002
Thorp, H. H. (2023). ChatGPT is fun, but not an author. In Science (Vol. 379, Issue 6630, p. 313). American Association for the Advancement of Science.
Wójcik, S., Rulkiewicz, A., Pruszczyk, P., Lisik, W., Poboży, M., & Domienik-Karłowicz, J. (2023). Reshaping medical education: Performance of ChatGPT on a PES medical examination. Cardiology Journal. https://doi.org/10.5603/cj.97517
Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu, C. W., Qiu, J., Hua, K., Su, W., Wu, J., Xu, H., Han, Y., Fu, C., Yin, Z., Liu, M., … Zhang, J. (2021). Artificial intelligence: A powerful paradigm for scientific research. In Innovation (Vol. 2, Issue 4). Cell Press. https://doi.org/10.1016/j.xinn.2021.100179
Yanagita, Y., Yokokawa, D., Uchida, S., Tawara, J., & Ikusaka, M. (2023). Can ChatGPT Answer Medical Questions of the National Medical Licensing Examination in Japan: Evaluation of Accuracy of ChatGPT (Preprint). JMIR Formative Research. https://doi.org/10.2196/48023
Yudovich, M. S., Makarova, E., Hague, C. M., & Raman, J. D. (2024). Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study. Journal of Educational Evaluation for Health Professions, 21, 17. https://doi.org/10.3352/jeehp.2024.21.17
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Jurnal Keperawatan
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.