Introduction
Among lower extremity tendon ruptures, Achilles tendon (AT) ruptures are the most prevalent, particularly affecting individuals engaged in recreational sports, sporadic physical activity, or competitive athletics. Numerous factors, including excessive tensile load on the tendon, conditions compromising tendon integrity, and trauma to the area, contribute to AT ruptures (Shamrock, Dreyer, and Varacallo 2023). The incidence has surged in recent years, with reports indicating 40 cases per 100,000 people annually (Meulenkamp et al. 2018).
Given the increasing incidence of this condition, patients may have numerous clinical questions regarding AT rupture. Access to online resources offers patients an opportunity to educate themselves about their condition. The increasing use of artificial intelligence (AI) in healthcare serves as an additional learning platform. ChatGPT (Open AI) has increasingly been utilized to answer patient inquiries related to medical diseases and related concerns (Liu, Wang, and Liu 2023).
Patients with an AT rupture can leverage this platform, although the accuracy and reliability of ChatGPT related to patient inquiries have rarely been studied. While navigating medical literature may be challenging for patients, ChatGPT is a valuable tool that can enhance patients’ medical knowledge, if proven accurate. This study aims to evaluate the effectiveness of ChatGPT in addressing patient questions related to AT rupture. Increasing evidence of AI accuracy leads us to believe that ChatGPT can serve as a valuable resource for patients with AT rupture, offering insights into common questions regarding diagnosis, treatment options, and recovery.
Methods
Patient questions regarding AT ruptures and recovery were identified by reviewing the “Frequently Asked Questions” of several reliable health-care institution websites. The healthcare institution pages were all publicly available webpages that were not affiliated with the authors’ institutions. Ten of the most common and clinically relevant questions were selected. These questions were then entered into the ChatGPT 3.5 free online artificial intelligence chatbot (https://chat.openai.com/chat) on Dec 21, 2023. There were no repeated attempts or follow-up questions generated. Furthermore, the ChatGPT account used was new and had not been used to input questions previously. Each response was recorded and critically evaluated by the senior authors for accuracy and reliability using the DISCERN Instrument (Table 1) and the JAMA Benchmark criteria. The readability of the responses was determined using the Fleisch-Kincaid Grade Level (Table 2). All ChatGPT responses are included in the Appendix (Gilmore et al. 2024; Eng et al. 2024).
This study was Institutional Review Board (IRB) exempt.
Results
Of the ten responses, the average DISCERN score was 39 (range: 26-46), indicating overall poor responses. One response was rated as “Very Poor,” four were “Poor,” and five were “Fair.” The JAMA Benchmark Criteria was zero for all responses. Lastly, the average Flesch-Kincaid Grade Level was 13.4 (range: 11.1-15.0), which correlates to a college graduate reading level (Figure 1). The Cohen Kappa correlation between the two authors was 0.872, signifying near perfect agreement.
Question 1: What is an Achilles tendon rupture and how did it happen?
DISCERN Score: 37 (Poor)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 13.6 (College Graduate)
Analysis: The chatbot correctly described the anatomic location of the AT as well as its action. Furthermore, the chatbot not only defined what a rupture is but also illustrated the most common method through which an AT can rupture. The answer was clearly intended for a general audience, and a balance between simplicity and complexity was achieved.
However, the response to how AT ruptures occur could benefit from further elaboration. While the provided information on the primary cause of AT rupture is correct, there was no mention of other causes of tendon rupture, such as direct trauma or chronic weakening due to various forms of tendinopathies (Shamrock, Dreyer, and Varacallo 2023).
Furthermore, the risk factors for AT rupture outlined by ChatGPT align with existing literature (Carmont 2018). The inclusion of typical symptoms of AT rupture further adds to the informative value of the response.
Question 2: How is an Achilles tendon rupture diagnosed?
DISCERN Score: 36 (Poor)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 13.0 (College Graduate)
Analysis: The chatbot correctly explained that diagnosis of an AT rupture is reached through a combination of patient history and physical examination(Shamrock, Dreyer, and Varacallo 2023). Furthermore, the chatbot correctly described the process of obtaining a history and performing a physical examination of the foot. It specifically mentioned signs of a rupture, such as an indentation above the heel or swelling of the area (Shamrock, Dreyer, and Varacallo 2023). Also mentioned the Thompson test, which has become one of the most effective techniques to diagnose AT ruptures (Boyd et al. 2015).
When diagnosing an AT rupture, imaging studies are rarely necessary. A 2012 study found that magnetic resonance image (MRI) is unnecessary for diagnosing an AT rupture (Garras et al. 2012). The chatbot correctly mentioned that imaging studies are often not necessary for diagnosis (Boyd et al. 2015; Garras et al. 2012). Moreover, it mentioned the two main imaging modalities that can be used to visualize AT ruptures: X-ray and MRI (Shamrock, Dreyer, and Varacallo 2023; Garras et al. 2012). Overall, this response provided accurate information at a level that is accessible to the average person.
Question 3: How are Achilles tendon ruptures treated?
DISCERN Score: 40 (Poor)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 13.4 (College Graduate)
Analysis: Treatment options are characterized as surgical or non-surgical. The non-surgical treatment options were correctly described by ChatGPT with the inclusion of immobilization and early weight bearing (Shamrock, Dreyer, and Varacallo 2023). Additionally, ChatGPT accurately described the open and percutaneous surgical interventions. However, it failed to include the mini-open repair approach. This technique involves a smaller incision (approximately 3 cm) and the use of interlocking stitches. It is a less invasive alternative compared to open repair, which involves a single larger incision, and a slightly more invasive option compared to percutaneous surgery, which involves numerous smaller incisions (Shamrock, Dreyer, and Varacallo 2023).
Furthermore, ChatGPT accurately conveyed crucial information for surgical and non-surgical rehabilitation management, which includes rehabilitation exercises for recovering strength and restoring the AT to its pre-injured state (Shamrock, Dreyer, and Varacallo 2023).
Question 4: What is the recovery process like for an Achilles tendon repair?
DISCERN Score: 41 (Fair)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 14.8 (College Graduate)
Analysis: The response highlights the need for immobilization post-treatment, whether surgical or non-surgical (Shamrock, Dreyer, and Varacallo 2023). ChatGPT accurately discusses the early, mild, and late recovery processes, emphasizing the variability depending on the chosen treatment option for the AT rupture. However, it does not mention the use of hydrotherapy during the early and mid-recovery phases (Glazebrook and Rubinger 2019). Importance is also placed on the return to sports, acknowledging that continuation of exercises and normal activity may span several months (Shamrock, Dreyer, and Varacallo 2023). Although ChatGPT discusses the complications of re-rupture and stiffness during the healing process, other complications that were not mentioned may arise as well. Patients can develop elongation of the AT, resulting in a functional deficit (Shamrock, Dreyer, and Varacallo 2023). Furthermore, there is a risk of deep vein thrombosis or pulmonary embolism due to prolonged immobilization (Glazebrook and Rubinger 2019).
Question 5: Is operative treatment or non-operative treatment better for an Achilles tendon rupture?
DISCERN Score: 46 (Fair)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 14.8 (College Graduate)
Analysis: Regarding non-operative treatment, ChatGPT accurately highlights the lower complication rates and wound-related issues (Ochen et al. 2019). Additionally, it correctly notes prolonged immobilization of these patients (Holm, Kjaer, and Eliasson 2015). Re-rupture rates are generally higher; however, this varies based on the use of early and late weight-bearing and accelerated functional rehabilitation (Shamrock, Dreyer, and Varacallo 2023). Furthermore, ChatGPT failed to include the cost of non-operative and operative treatments. Patients not undergoing surgery have no surgical, anesthesia, and hospital admission fees, making this treatment option more affordable (Shamrock, Dreyer, and Varacallo 2023).
Regarding operative treatment, ChatGPT accurately mentions that surgical intervention allows for an earlier start of rehabilitation (Shamrock, Dreyer, and Varacallo 2023). It also includes the notion that patients undergoing surgery may have a faster return to sport and normal activity. However, one meta-analysis reported no significant difference in the return to work between patients treated non-operatively and operatively (Ochen et al. 2019). Other studies found that non-operative patients returned to work later (Shamrock, Dreyer, and Varacallo 2023); however, no significant difference in sport return time was found (Zhang et al. 2015). The discrepancies in the literature suggest that additional factors may contribute to the resumption of normal activity, sports, and work. Finally, ChatGPT raises concern about potential hardware irritation.
ChatGPT accurately describes the factors influencing the decision between operative and non-operative treatment, underscoring the potential benefits of operative management for younger patients and those demanding greater functionality (Yang et al. 2018). It also emphasizes the importance of good health and consideration of co-morbidities prior to surgical intervention (Shamrock, Dreyer, and Varacallo 2023). Outcomes not mentioned by ChatGPT that may differ between treatment options include range of motion, strength, functional outcome, and calf circumference (Soroceanu et al. 2012).
Question 6: Do you treat an Achilles tendon partial tear differently from a complete tear?
DISCERN Score: 43 (Fair)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 15.0 (College Graduate)
Analysis: The chatbot response is well-organized and understandable but omits several considerations on complete verses partial AT tears and makes no mention of the lack of consensus on treatment modalities nor the reliability of different diagnostic techniques (Park et al. 2020; Thevendran et al. 2013; Chiodo et al. 2010; Maffulli, Dymond, and Regine 1990). The response fails to include varied severity of partial tears based on factors such as cross-sectional area, location, and loss of function. Furthermore, it omitted surgical repair of partial tears, which remains a viable option for select populations (Gatz, Spang, and Alfredson 2020). The response does well emphasizing the importance of physical therapy for both surgical and non-surgical treatment, but has no mention of the recent literature demonstrating accelerated functional rehabilitation being superior to long-term cast immobilization, and comparable to surgical intervention (Soroceanu et al. 2012). The chatbot accurately concludes that decision-making regarding appropriate treatment for each patient is multifactorial and ultimately requires prompt evaluation and development of a treatment plan for an AT injury that fits both the provider and patient preferences. For example, it states that surgical repair is recommended for younger, more active populations.
Question 7: Is platelet rich plasma (PRP) useful for an Achilles tendon rupture?
DISCERN Score: 41 (Fair)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 14.6 (College Graduate)
Analysis: The chatbot accurately describes what PRP therapy is composed of, but it did not mention how PRP formulations can differ from one another (Kaushik and Kumaran 2020). It correctly explains that there is limited evidence and varied methodologies and results on PRP’s role as adjunct therapy (Shamrock, Dreyer, and Varacallo 2023; Indino, D’Ambrosi, and Usuelli 2019). It also includes a warning against monotherapy. Furthermore, the response provided transparency on ChatGPT’s limited access to latest research publications, which warrants further discussion with a specialist. The lack of access to the most recent literature in a rapidly evolving field demonstrates a definitive shortcoming of utilizing ChatGPT as a sole resource for information (Huang et al. 2023; Bagheri et al. 2023; Pretorius et al. 2023).
Question 8: What are common complications for surgical Achilles tendon repair?
DISCERN Score: 39 (Poor)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 12.6 (College Graduate)
Analysis: The chatbot successfully identified common complications for surgical AT repair (Bronheim et al. 2023). However, few details were provided on incidence rates or risk factors contributing to surgical complications which is an important consideration for deciding to undergo surgery (Lantto et al. 2016; Möller et al. 2001). No differentiation was made between type of surgery and propensity for a complication such as percutaneous AT repair having higher risk of sural nerve injury (Carmont et al. 2011). No mention was made of the literature status on re-rupture rate of operative vs non-operative treatment (Myhrvold et al. 2022). Additionally, it failed to list other key aspects such as the risk of AT shortening, patient reported outcome measures, duration of immobility, and return to activity time (Mansfield et al. 2022). While ChatGPT answers this question directly, its inability to contextualize a patient’s question into pertinent clinical considerations remains a limitation of the AI. The chatbot appropriately directs the patient to discuss further with their healthcare provider and adhere to perioperative guidelines.
Question 9: How do I prevent Achilles tendon ruptures?
DISCERN Score: 26 (Very Poor)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 11.2 (College Graduate)
Analysis: ChatGPT correctly recognized common AT rupture prevention techniques including supportive footwear, strengthening exercises, gradual training allowing for recovery, and adequate warming and stretching of the muscles prior to activity (Shamrock, Dreyer, and Varacallo 2023; Rowson, McNally, and Duma 2010). It emphasized the importance of variety for strengthening exercises to support the tendon and surrounding muscles through cross-training and eccentric and concentric movements. However, new data has shown that long-term and low-intensity eccentric training may be of higher efficacy for AT rupture prevention (Yu et al. 2022; Ahn et al. 2023). ChatGPT also correctly identified the importance of a healthy body weight and gradual training for AT rupture prevention (Ahn et al. 2023). However, it did not mention avoidance of sports or activities involving main injury patterns like stepping backwards, landing, sprinting or agility exercises with eccentric loading of the calf (Hess 2010; Hoenig et al. 2023). It also failed to mention that the primary goal of these techniques is to prevent degenerative changes in the tendon that can cause rupture (Hess 2010).
Question 10: What happens during surgery for Achilles tendon ruptures?
DISCERN Score: 41 (Fair)
JAMA Benchmark Criteria: 0
Flesch-Kincaid Grade Level: 11.1 (College Graduate)
Analysis: ChatGPT correctly described the anesthesia options and patient positioning during surgery. While the response accurately describes the incision site possibilities and mentions both open and percutaneous surgical approaches, it does not include the mini-open approach as previously discussed in the analysis of question three (Shamrock, Dreyer, and Varacallo 2023). ChatGPT describes some reinforcement options in the response, but it does not mention the use of autologous or donor grafts (Bai et al. 2019). However, it does emphasize that there are a variety of effective procedural methods for AT rupture repair that a surgeon will choose from based on the injury pattern, patient type, and surgeon’s experience (Lin, Duan, and Yang 2019). The response also effectively conveys the wound closure and necessity of immobilization after surgery with subsequent gradual transition to normal movement.
Discussion
Artificial Intelligence has emerged as a tool in various fields, including healthcare. As people continue to use ChatGPT for educational purposes, it is crucial to determine the accuracy of the information it provides, especially on medical topics (Gravel, D’Amours-Gravel, and Osmanlliu 2023). In this study, we aimed to evaluate the effectiveness of ChatGPT in addressing common questions related to AT ruptures and assess its potential as a supplementary educational resource for patients.
The results of our study indicate that ChatGPT demonstrated a poor performance in responding to questions about AT ruptures. While the chatbot was able to provide basic information, the majority of responses required clarification to better adhere to the current literature. Furthermore, ChatGPT did not provide reliable citations, which led to low DISCERN scores and JAMA Benchmark Criteria. The lack of sources limits patients’ ability to fact check information. However, it is well known that ChatGPT currently lacks the capacity to critically evaluate and incorporate the online sources it utilizes. The chatbot oftentimes cites fictitious articles called “hallucinations” (Brega et al. 2015). These findings confirm that ChatGPT is best used as a supplemental source for patient education to be used in conjunction with consultation from a healthcare specialist.
Lastly, the average Flesch Kincaid grade level score was 13.4, which is a college graduate reading level. Since the average American citizen reads at an 8th grade reading level, the National Institutes of Health (NIH) recommends that patient education material not surpass a 6th grade reading level (Cotugna, Vickery, and Carpenter-Haefele 2005; Brega et al. 2015). The advanced verbiage used in the responses greatly limits the utility and inclusivity of ChatGPT in patient interactions. Further iterations of ChatGPT can focus on providing responses that are written at a lower reading level to optimize patient education on this platform.
Limitations
Throughout this study, only ten questions were analyzed, which decreases the generalizability of our results due to the possibility of a selection bias. Furthermore, the ten questions analyzed likely do not adequately represent the full spectrum of questions patients may ask on the topic. Additionally, the questions selected were straightforward questions. However, patients may have more complex questions about their medical condition. Therefore, the generalizability of this study is further hindered. The responses generated by the chatbot also vary by system updates, question phrasing, and previous usage. Furthermore, the answers generated by ChatGPT are limited to the user input. Therefore, if a patient were to ask a general question, the chatbot response may lack information catered to the patient’s demographics. In comparison, a healthcare provider can provide information that is patient-specific. Lastly, the response analysis was subjective. However, the authors attempted to decrease bias by having two senior authors separately score the responses utilizing standard, validated scoring systems.