ChatGPT in Surgical Practice
The introduction of artificial intelligence (AI) is transforming many fields, with surgery and clinical practice being an exciting application area. Machine learning (ML), a subset of AI, has already been leveraged in clinical practice to predict patient mortality, disease progression, and surgical outcomes. ML and AI also have significant potential to aid emerging technologies such as electronic health records and imaging modalities like Magnetic Resonance Imaging and Computed Tomography.1 Another notable development in AI is the development of realistic natural language models, marked by the release of ChatGPT, an AI-powered chatbot developed by OpenAI, in late 2022. Through advanced natural language processing (NLP) techniques, ChatGPT can understand and respond to human language in a conversational, easily approachable manner. Studying the application of ChatGPT in surgical practice has the potential to enhance decision making, improve physician-patient communication, and streamline surgical workflows.2 However, because it can present inaccuracies, it is essential to closely control and monitor its applications, especially in medicine.
ChatGPT is a useful tool to generate research drafts, summarize scientific papers, and correct human errors in communication. The program has also been shown to aid in statistical code generation for dataset analysis and more.3 One study assessed the performance of the ChatGPT-3.5 and -4 models in understanding complex surgical information through evaluating the accuracy of the models in the Korean general surgery board exams, the final assessment before a surgical resident in Korea officially completes their training and can move onto fully independent practice. While the earlier model achieved an overall accuracy of 46.8%, the newer GPT-4 demonstrated significant improvement at 76.4%.4 In addition, ChatGPT-4 may be able to supplement learning for surgical residents. Its ability to recognize intraoperative processes and analyze participant growth could provide real-time and personalized feedback to residents to optimize their learning, especially during virtual simulation training.5
Through analyzing images such as X-ray or CT scans, advanced computing tools may be able to generate three-dimensional models of the surgical area. These visual representations could be greatly useful in preoperative planning; during surgery, they can be used to segment and label anatomical structures to assist the surgeon. The latest ChatGPT-4 model accepts images, expanding its applications, though it must compete with other computer vision and modeling approaches. In plastic surgery, ChatGPT-4 can generate operative notes much faster than human surgeons, with similar levels of satisfaction from physicians and patients.
Despite the promising applications of ChatGPT-4 and its potential to ease physician workload and stress, there remain many limitations regarding AI in surgical settings. ChatGPT often produces factually incorrect outputs, which are referred to as “artificial hallucinations.6” Such misleading outputs can have severe consequences when they directly impact patient health. These “hallucinations” are demonstrated in several cases, where ChatGPT fabricates references in medical papers.7 In an observational study of 30 medical papers with 115 AI-generated references, ChatGPT completely fabricated 55 references, while another 53 were found to be authentic but inaccurate or irrelevant.8 Another significant concern is how ChatGPT will use sensitive patient data. Any form of data leakage, however minor, can have severe ethical consequences, creating ambiguities that could have legal ramifications.
ChatGPT has potential to improve surgical practice and is an exciting prospect that offers innovative ways to enhance research, education, and clinical practice. However, due to its many limitations, the consensus currently stands that ChatGPT cannot function without human oversight. It is clear the level of surgical expertise of ChatGPT is still inferior to that of seasoned surgeons and trained residents. Users must be aware that this model functions best as a supportive tool which can be supplemented with proper guidance and human expertise.
References
- Elfanagely, Omar, et al. “Machine Learning and Surgical Outcomes Prediction: A Systematic Review.” Journal of Surgical Research, vol. 264, Aug. 2021, pp. 346–61. https://doi.org/10.1016/j.jss.2021.02.045
- Bektaş, Mustafa, et al. “ChatGPT in Surgery: A Revolutionary Innovation?” Surgery Today, vol. 54, no. 8, Aug. 2024, pp. 964–71. https://doi.org/10.1007/s00595-024-02800-6
- Sallam, Malik. “ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.” Healthcare, vol. 11, no. 6, Jan. 2023, p. 887. https://doi.org/10.3390/healthcare11060887
- Oh, Namkee, et al. “ChatGPT Goes to the Operating Room: Evaluating GPT-4 Performance and Its Potential in Surgical Education and Training in the Era of Large Language Models.” Annals of Surgical Treatment and Research, vol. 104, no. 5, Apr. 2023, pp. 269–73. https://doi.org/10.4174/astr.2023.104.5.269
- Park, Jay J., et al. “The Role of Artificial Intelligence in Surgical Simulation.” Frontiers in Medical Technology, vol. 4, Dec. 2022. https://doi.org/10.3389/fmedt.2022.1076755
- Abdelhady, Ahmad M., and Christopher R. Davis. “Plastic Surgery and Artificial Intelligence: How ChatGPT Improved Operation Note Accuracy, Time, and Education.” Mayo Clinic Proceedings: Digital Health, vol. 1, no. 3, Sept. 2023, pp. 299–308. https://doi.org/10.1016/j.mcpdig.2023.06.002
- Alkaissi, Hussam, and Samy I. McFarlane. “Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.” Cureus, vol. 15, no. 2, Feb. 2023, p. e35179. https://doi.org/10.7759/cureus.35179
- Bhattacharyya, Mehul, et al. “High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content.” Cureus, vol. 15, no. 5, May 2023, p. e39238. https://doi.org/10.7759/cureus.39238