Introduction
ChatGPT (OpenAI) is a popular artificial intelligence (AI) driven software that provides unique conversational-style text responses to written user prompts (“ChatGPT,” n.d.). ChatGPT is eloquent and straightforward, making it a convenient resource for patients to utilize for their medical education. However, it is important to note that the products of AI may include errors and are not guaranteed to be accurate or complete, as outlined by the ChatGPT disclaimer. Given the rapid growth and accessibility of ChatGPT, there is a high likelihood that ChatGPT will join other online patient resources for questions regarding orthopedic injuries and procedures (Van Riel et al. 2017; Cocco et al. 2018). Therefore, additional data is needed to investigate the veracity of ChatGPT responses. This study was designed to investigate whether ChatGPT could appropriately answer frequently asked patient questions related to distal biceps rupture and repair.
Methods
Ten of the most common and clinically relevant questions related to distal biceps rupture and repair were gathered from the ‘Frequently Asked Questions’ section of ten well-known health care institution websites by one author. The curated questions list was entered into the ChatGPT 3.5 AI chatbot’s free online interface (https://chat.openai.com/chat) on January 18, 2024. There were no follow-up prompts or repeat questions entered. The responses from the chatbot were recorded and analyzed by two senior authors for accuracy, completeness, and readability using the DISCERN score (Table 1), Journal of the American Medical Association (JAMA) Benchmark criteria, and the Flesch-Kincaid Grade Level (Table 2) (Flesch 1948; Kincaid et al. 1975). The DISCERN score classifications can be found in Table 3 (Tahir et al. 2020). The Cohen Kappa correlation was used to determine inter-rater agreement. It was calculated to be 0.823, indicating near perfect agreement.
This study did not require institutional review board review.
Source of Funding
Results
The responses generated by ChatGPT from the ten selected frequently asked questions regarding distal biceps injury and repair are provided below along with an analysis of the responses’ accuracy. The average DISCERN score was 36.5 (range: 28-44). The average Flesch-Kincaid Grade Level was 14.9 (range: 11.87-17.44), indicating a college graduate reading level. For all responses, the JAMA Benchmark criteria score was zero due to the lack of cited sources (Table 4).
Question 1: How does a distal biceps rupture occur?
DISCERN Score: Poor (35)
Flesch-Kincaid Grade Level: 14.26
Analysis: ChatGPT provides a satisfactory synopsis of how a distal biceps rupture may occur, providing a highly simplified explanation of the anatomy, symptoms, and mechanism of injury. The chatbot correctly identifies that heavy lifting with the palm facing down (pronation) or forceful extension of the elbow from a flexed, contracted position can put significant strain on the distal biceps (Catonné et al. 1995). However, it does not recognize that distal biceps rupture more commonly occurs during excessive eccentric force while in a supinated position (Lappen et al. 2022). The ChatGPT response correctly identifies and explains several risk factors such as age, smoking, and steroid use. However, it fails to include specific demographic details associated with these risk factors such as the patients between the ages of 30-50 have the highest rupture rate with a peak age of 35 and a risk of rupture. Tobacco smokers also are 7.5 times more likely to rupture their biceps compared to non-smokers. The answer also omitted other important risk factors including high body mass index, biceps muscle overuse, generalized inflammation with elevated CRP, and fluoroquinolone use (Gwark and Park 2019; Hsu et al. 2023). In addition, it did not note the increased incidence for distal biceps rupture in males, with males comprising 96% of all distal ruptures.
Question 2: How will I know if I ruptured my distal biceps tendon?
DISCERN Score: 28 (Very Poor)
Flesch-Kincaid Grade Level: 11.87
Analysis: ChatGPT provides an extensive list of signs and symptoms of distal biceps tears. The sign of sudden, sharp pain is correct; however, ChatGPT fails to mention that the pain will become aching in the weeks following the injury and may eventually resolve in cases of complete distal biceps tendon tears (Geaney and Mazzocca 2010). Failure to clarify the possible diminishing of pain may lead people who are affected by this injury to believe that the injury has healed itself without treatment which could result in further harm. The symptoms of swelling, bruising and weakness mentioned by the chatbot are all correct and sufficiently explained (Hsu et al. 2023). The ChatGPT response incorrectly locates bulging in the arm to just above the elbow (Popeye’s Sign) which is traditionally associated with proximal biceps tendon rupture. A distal biceps tendon tear classically involves retraction of the muscle further superiorly into the proximal upper arm (Reverse Popeye’s Sign). ChatGPT also fails to mention that a possible bulge may not be seen in obese individuals or in individuals whose Lacertus Fibrosus (bicipital aponeurosis) has remained intact (Hsu et al. 2023; Vishwanathan and Soni 2021). This can lead to patients falsely diagnosing themselves and potentially delaying or foregoing treatment.
Question 3: Are there risks involved with having surgery to repair my distal biceps rupture?
DISCERN Score: 34 (Poor)
Flesch-Kincaid Grade Level: 14.51
Analysis: Although ChatGPT’s descriptions of possible complications are correct, the response fails to compare the risks of the two different approaches for distal biceps tendon repair, single-incision and double-incision. The single-incision approach has a complication rate of 28.5% while the double-incision approach has a complication rate of 20.9% (Amin et al. 2016). In addition, each approach has different associated complications. The single-incision approach has a higher rate of re-rupture and nerve palsies, which are classified as major complications (Amin et al. 2016; Amarasooriya et al. 2020). The double-incision approach has a higher rate of heterotopic ossification, a relatively minor complication that rarely affects postoperative range of motion or pain (Amin et al. 2016). By excluding this information, the response omitted key information important for plan of care discussions between patients and their physicians. Failure to discuss these risks can lead to patients having a false understanding of their postoperative outcomes and limitations.
Question 4: What is done during surgery to repair my distal biceps rupture?
DISCERN Score: 35 (Poor)
Flesch-Kincaid Grade: 13.74
Analysis: The chatbot describes a basic overview of distal biceps tendon repair on a level largely appropriate for non-medical persons. Although the chatbot correctly identifies variation in tendon reattachment techniques, it fails to mention variation in both the number and orientation of incisions made during surgery depending on the surgical approach taken, which can lead to patients having false postoperative expectations regarding scarring and incorrect understanding of the surgery itself. While single-incision, as described by the bot, is the increasingly popular approach, many orthopedic surgeons still utilize the double-incision approach as their primary technique or when treating specific populations (Hogea et al. 2023; Johnson et al. 2008). The single-incision approach includes an incision in the antecubital region as described by the bot. The double-incision approach includes an incision both on the antecubital region as well as a dorsal lateral forearm incision using a muscle-splitting technique after which a bone tunnel is made for tendon reattachment (Grewal et al. 2012; Mercurio et al. 2022). Tendon reattachment may be performed using suture anchors, bone tunnels, cortical buttons, and interference screws. Regardless of the number of incisions or the tendon reattachment method used, the radial (bicipital) tuberosity is used as the landmark for reattachment.
Question 5: How long is the recovery after a distal biceps repair?
DISCERN Score: 33 (Poor)
Flesch-Kincaid Grade Level: 15.97
Analysis: ChatGPT provided a nuanced answer regarding the recovery timeline of a distal biceps repair. The chatbot was correct in stating that recovery varies depending on surgical technique and factors that are unique to each patient. Multiple surgical techniques are used to repair distal biceps tendon ruptures, including one incision techniques and the two incision Boyd-Anderson approach (Ryhänen et al. 2006). The differences in these techniques may lead to variations in recovery timeline (Hogea et al. 2023). However, regardless of the technique, the rate of re-rupture after distal biceps tendon repair is low and is mostly associated with therapeutic non-compliance (Grewal et al. 2012).
The response accurately describes the progression of physical therapy from passive motion to active motion to resisted exercises. Logan et al. describes a physical therapy plan similar to the one presented by ChatGPT (Logan et al. 2019). The plan limits active elbow flexion and extension for six weeks, allowing only gravity-assisted flexion and extension starting at two weeks postoperatively (Logan et al. 2019). Patients then progress to isometric triceps exercises at six weeks and isotonic triceps exercises at eight weeks (Logan et al. 2019). Isometric biceps exercises are initiated at twelve weeks, and isotonic biceps exercises begin at sixteen weeks (Logan et al. 2019). The next stage allows for biceps strengthening and return to sport to be considered (Logan et al. 2019). While the plan provided by ChatGPT is not quite as detailed, it does provide an accurate timeline of the rehabilitation process. One factor that the response fails to include is the surgeon’s individual preference for allowing patients to perform certain activities. Some authors state that they prefer their patients to be casted for a set period of time after surgery, while others would like their patients to be moving within a couple weeks of surgery. For example, Miyamoto et al.'s protocol is to have patients immobilized for up to six weeks postoperatively, while Assiotis et al. allows for gentle mobilization immediately after the procedure (Miyamoto, Elser, and Millett 2010; Assiotis et al. 2022). A general consensus for length of immobilization appears to be about two weeks (Blackmore, Jander, and Culp 2006; Hogea et al. 2023). The long-term prognosis also varies by author. Blackmore et al. provides a similar timeline as ChatGPT, stating that a full recovery can be expected by five months (Blackmore, Jander, and Culp 2006). However, Smith et al. reports that they expect patients to return to full activity by three months (Smith and Amirfeyz 2016). Additionally, ChatGPT is correct in stating that recovery timelines may vary between patients as adherence to prescribed postoperative interventions impacts patient recovery (Blackmore, Jander, and Culp 2006).
Question 6: Will my distal biceps repair give my elbow full function once recovered?
DISCERN Score: 44 (Fair)
Flesch-Kincaid Grade Level: 15.67
Analysis: ChatGPT provides an accurate response, stating that most patients do regain near-normal or completely normal function postoperatively which is backed by numerous studies in the current literature. Butler et al. showed that 90% of patients achieved range of motion comparable to their opposite elbow postoperatively (Butler et al. 2023). Similarly, Huynh et al. showed no significant differences in flexion and supination range of motion between arms (Huynh et al. 2019). Barret et al. found no difference in strength between operative and non-operative arms after two years of follow-up, showing that most patients were able to return to work by two months postoperatively (Barret et al. 2019). Grewal et al. demonstrated that both single and double incision biceps tendon repairs lead to recovery of range of motion and strength (Grewal et al. 2012).
The ChatGPT response also included potential complications of distal biceps repair which are consistent with published literature. Amarasooriya et al. found the most common complication of distal biceps tendon repair to be transient sensory nerve injury, followed by tendon rupture and postoperative infection (Amarasooriya et al. 2020).
Question 7: How is a distal biceps rupture diagnosed?
DISCERN Score: 35 (Poor)
Flesch-Kincaid Grade Level: 15.16
Analysis: ChatGPT was able to provide a comprehensive overview of how a provider would diagnose distal biceps rupture. The AI chatbot focused on three aspects of the patient encounter: medical history, physical exam, and imaging studies. Regarding medical history, the chatbot correctly stated a provider would consider onset of symptoms and activity at time of injury. One missing aspect from the medical history is a sudden ‘popping’ sensation in the injured upper extremity, encountered in a majority of distal biceps rupture pathology (Kelly et al. 2015; Luokkala et al. 2022). The chatbot also correctly stated that a provider would ask about any relevant risk factors that may predispose a patient to a distal biceps rupture but did not further specify that participating in contact sports, consuming anabolic steroids, pushing or lifting heavy objects, tobacco smoking, and raised body mass index are all risk factors for rupture (Kelly et al. 2015b).
In detailing the physical exam, the chatbot did not mention motor deficits. Current literature supports that patients with distal biceps rupture have more difficulty with activities related to supination compared to flexion (Devereaux and ElMaraghy 2013; Metzman and Tivener 2015; Nesterenko et al. 2010). The ChatGPT response did include the hook test for the assessment of distal biceps tendon integrity but failed to include the Ruland biceps squeeze test (O’Driscoll, Goncalves, and Dietz 2007; Hsu et al. 2023). The hook test is performed by having the affected person supinate and hold their arm at 90 degrees of flexion. If the examiner can hook the biceps tendon distally from the lateral side, the test is negative, indicating no distal biceps tendon injury (O’Driscoll, Goncalves, and Dietz 2007). The Ruland biceps test is performed by squeezing the affected person’s pronated and flexed arm. The test is positive for distal biceps tear if reactive supination of the arm does not occur.
The chatbot correctly identified that radiographs play a role in ruling out other potential causes of the patient’s symptoms including fracture and arthritis but did not mention ruling out concurrent pathology such as bony avulsion of the radial tuberosity which is occasionally associated with distal biceps rupture (Sutton et al. 2010). In addition, the chatbot did not clarify that while ultrasound and MRI have similar sensitivity and specificity for partial rupture, MRI remains the preferred imaging modality for complete tears and partial tears that may have a negative hook and Rulands biceps test (Ruland, Dunbar, and Bowen 2005). The lack of distinction may lead to patients questioning the need for MRI, which is more expensive, compared to ultrasound (Lynch et al. 2019).
Question 8: What is the treatment for a distal biceps rupture?
DISCERN Score: 42 (Fair)
Flesch-Kincaid Grade Level: 16.16
Analysis: ChatGPT was able to provide a well phrased synopsis regarding the treatment regimen employed for distal biceps rupture starting with the variety of patient specific factors that play a role in shared decision making between nonoperative and operative treatment. Starting with non-operative treatment, the AI chatbot addressed the role of activity modification, pain management, and physical therapy. The chatbot also clarified that non-operative treatment is typically reserved for patients with less severe injuries or those who may not be candidates for surgery. The chatbot failed to mention the expected outcomes of motor function in patients treated nonoperatively. Prognostic factors are important for patients to consider when contemplating nonoperative verses operative treatment. Failing to include functional outcomes may lead to patients choosing a specific treatment option without fully understanding their long-term outcomes. Current literature estimates a loss of 40% forearm supination, 40% elbow flexion, and 15% grip strength (Freeman et al. 2009; Cuzzolin et al. 2021). However, the chatbot does mention that patient specific factors need to be considered when deciding on treatment. In the chatbot’s discussion of operative treatment, the chatbot correctly addressed timing of surgery supported by current literature along with rehabilitation following surgery which is essential in helping to restore function. As far as indications for surgery, the chatbot only mentions surgical intervention for complete rupture. However, literature also recommends surgical intervention for patients with partial tears who have failed non-operative treatment with a strong desire to restore full strength and function (Bauer, Wong, and Lazarus 2018).
Question 9: Is it better to have operative or non-operative treatment for a distal biceps rupture?
DISCERN Score: 39 (Poor)
Flesch-Kincaid Grade Level: 17.44
Analysis: The ChatGPT response correctly stated that surgical intervention is currently considered the mainstay of treatment in cases of distal biceps tendon rupture for everyone except for those with exceptionally low physical demands such as the physiologically elderly and those who are considered poor surgical candidates. In these higher risk populations, nonoperative treatment avoids the risks of surgery while often providing an acceptable outcome (Freeman et al. 2009b; Berthold et al. 2021). The response correctly identified benefits of surgical intervention including restoration of full strength and reduced re-rupture rates. It also discussed the importance of early intervention and surgery for active populations. However, by dedicating a significant portion of the response to nonoperative treatment options, the ChatGPT response may mislead a patient searching for guidance in selecting their treatment approach. Conservative management of distal biceps tendon tear was common prior to the 1980’s. However, it has fallen out of favor because it carries an increased risk of re-rupture and results in a significant loss of supination, flexion, grip strength, and endurance (Baker and Bierwagen 1985; Morrey et al. 1985; Pearl, Bessos, and Wong 1998).
Question 10: How long do I have to wait to lift weights after a distal biceps repair?
DISCERN Score: 40 (Poor)
Flesch-Kincaid Grade Level: 14.70
Analysis: The ChatGPT response for this question is thorough, providing a synopsis of an estimated timeline for healing and rehabilitation after distal biceps repair that aligns with published recommendations. It is recommended to begin gradual strengthening and aerobic conditioning at six weeks and strength training at about three months, with a full return to heavy lifting allowed between three and six months postoperatively (Srinivasan, Pederson, and Morrey 2020; Wentzell 2018; Cheung, Lazarus, and Taranta 2005). ChatGPT also successfully conveyed that recovery time varies from patient to patient. This is important for setting realistic expectations in patients who may be reading these responses for personal education. The ChatGPT response also explains why a slow, progressive recovery is vital to decrease the risk for re-injury, delayed healing, and other complications. Adding this information to the response allows a patient to better understand why it is important to take the necessary time for rehabilitation before returning to normal activities in order to avoid further harm.
Discussion
As ChatGPT continues to grow in popularity and evolve in its ability to produce accurate information, patients may turn to the chatbot to seek answers regarding medical conditions. Therefore, it is crucial that medical professionals have an awareness of the quality of the information their patients receive from the ChatGPT. This study investigated whether ChatGPT could appropriately answer frequently asked patient questions related to distal biceps rupture and repair.
In this study, the average DISCERN score was 36.5 (range: 28-44), indicating overall poor responses. One response was rated as very poor, seven were rated as poor, and two were rated as fair. Overall, most responses required clarification to successfully answer the question posed to ChatGPT to avoid misleading patients. However, the lack of reliable citations resulted in a JAMA Benchmark criterion score of zero for all responses. Given the low reliability of the responses, patient education regarding distal biceps ruptures should come primarily from a board-certified upper extremity surgeon. ChatGPT may assist as a supplementary learning tool, however, one should be cautious when relying on AI-generated responses for medical advice.
Additionally, several studies investigating the use of ChatGPT in patient education have found similar results. ChatGPT provided substandard responses on topics such as total hip arthroplasty, rotator cuff repair, foot and ankle surgeries, and anterior cruciate ligament repairs (Mika et al. 2023; Christy et al. 2023; Anastasio et al. 2023; Kaarre et al. 2023; Eng et al. 2024; Parker et al. 2025; Gilmore et al. 2024; Lack et al. 2024). These studies found that ChatGPT provided responses that needed substantial clarification due to the chatbot providing inaccurate information. ChatGPT lacks the capacity to critically evaluate the sources used to generate responses (Ayers et al. 2023). Furthermore, it tends to cite irrelevant articles, which have been termed “hallucinations” (Brega et al. 2015). Additionally, other studies have found that ChatGPT would fail orthopedic surgery board exams (Massey, Montgomery, and Zhang 2023). Providing false information to patients presents the risk of poor management in their conditions as well as incorrect prognostic expectations. Therefore, patients should be weary of the information provided by the chatbot as it lacks the quality and expertise that a consultation with a board-certified surgeon can provide.
Lastly, the average Flesch-Kincaid Grade Level was 14.9 (range: 11.87-17.44), which correlates to a college graduate reading level. The average American has the capacity to read at an 8th grade level, so the National Institute of Health (NIH) recommends that patient education material not exceed a 6th grade reading level (Cotugna, Vickery, and Carpenter-Haefele 2005; Weiss et al. 1994; Brega et al. 2015; Weiss and Coyne 1997). Similarly, other studies found that ChatGPT produced responses to patient questions at a high readability index that may be confusing to some users. Furthermore, the readability index increased when using ChatGPT 4 (Fahy et al. 2024; Eng et al. 2024). This finding underscores the need for patient education material that is inclusive to all readers. Future iterations of ChatGPT should use simpler language, less complex sentence structures, and explain medical topics more thoroughly to increase readability.
Limitations
This study focused on ten frequently asked questions on distal biceps ruptures and repair. However, the small sample does not encompass all questions patients may ask on this topic, which limits the generalizability of this study. Additionally, the chatbot produces responses that vary based on input verbiage and program updates. This can lead to inconsistencies in the information shared. Lastly, the responses were analyzed using three separate validated scores. However, this approach could introduce bias. We tried to limit biases by having two senior authors do individual, blinded analyses. Future studies can investigate orthopedic conditions using updated ChatGPT models as well as other large language model platforms. This can further investigate AI’s role in patient education as well as compare the utility of different AI software. Additionally, future studies can explore ways to integrate AI with healthcare providers’ oversight to enhance patient education.
Conclusion
Given ChatGPT’s increasing popularity for patients seeking information, it is essential for patients and healthcare workers to have an understanding of the accuracy and reliability that the responses achieve. Our findings suggest that ChatGPT may be used as a supplemental tool in patient education for distal biceps tears. The chatbot was able to answer commonly asked questions by patients pertaining to distal biceps rupture. However, clarification was needed to avoid confusion. Furthermore, the high reading level may act as a barrier to patient comprehension. Additionally, the lack of citations in all responses prevent patients from verifying the accuracy of the information generated. Still, the use of AI may be a valuable supplemental resource for some patients as long as they keep in mind that every patient and situation is unique. It remains critical that patients ultimately patients seek out an appropriate medical provider for diagnosis, treatment, and consultation.