Introduction
One of the most common conditions managed by orthopedic surgeons is knee osteoarthritis (KOA). Osteoarthritis is estimated to affect about 240 million people around the world (Katz, Arant, and Loeser 2021). Among adults in the United States age 45+, at least 19% are affected by KOA (Wallace et al. 2017). With the large prevalence of KOA, ensuring patients receive proper education on their condition is crucial. Patients frequently use the internet as a resource for medical information. It has been previously estimated that 65% of patients with orthopedic conditions who have access to the internet used it to search for information related to their condition (Fraval et al. 2012).
The use of artificial intelligence (AI) in healthcare has grown exponentially within recent years. ChatGPT is an online artificial intelligence tool that has gained over 1 billion visits since its inception (Herbold et al. 2023). Given the extensive utilization of online search engines as tools for researching medical information and the recent popularity of ChatGPT, it is reasonable to forecast that ChatGPT will become a common tool for patient education. However, ChatGPT can at times provide answers that are plausible but incorrect (Gravel, D’Amours-Gravel, and Osmanlliu 2023). Previous studies investigating the use of ChatGPT to answer patient questions have been conducted with mixed results. Some studies found the answers to be complete and accurate (Johnson et al. 2023). However, others found that its responses were inaccurate and low quality (Coskun et al. 2023; Draschl et al. 2023).
Given its prevalence, we would expect KOA to be an especially common topic researched by patients. However, it is not known how reliable the medical information provided by the chatbot on KOA is. The purpose of this study is to evaluate the effectiveness of ChatGPT in answering common patient questions about KOA.
Methods
The KOA ‘Frequently Asked Questions’ (FAQs) section of ten health-care institution pages were reviewed to identify ten questions that were deemed the most common and clinically relevant (Appendix). The healthcare institution pages were all publicly available websites that were not affiliated with the authors’ institutions. The questions gathered were then individually input to ChatGPT 3.5 (https://chat.openai.com/chat) (“ChatGPT,” n.d.), an AI chatbot with a free online interface, on December 21, 2023. There were no follow-up questions or prompts for further clarification. Furthermore, the ChatGPT account used was new and had not been used to input questions.
The responses were critically analyzed for accuracy and reliability using the DISCERN instrument (Table 1) by the two senior authors (Charnock et al. 1999). Each author analyzed the responses individually and did not have access to the other author’s scoring. The DISCERN scores were classified as: Excellent (scores 64-80), Good (scores 52-63), Fair (scores 41-51), Poor (scores 30-40), and Very Poor (scores 16-29) (Table 2) (Tahir et al. 2020). The Journal of the American Medical Association (JAMA) Benchmark criteria was used to determine if the responses utilized evidence-based material. JAMA Benchmark criteria critically evaluate healthcare websites using four key pillars- authorship, attribution, disclosure, and currency.
The readability of the responses was determined by The Flesch-Kincaid Grade Level test, which uses the verbiage complexity to assign a school grade reading level to the text (Table 3) (Flesch 1948; Kincaid et al. 1975). If disagreements in the grading occurred, consensus was determined through further discussion. The Cohen Kappa correlation was used to determine interrater agreeability.
Results
Of the ten questions, the average DISCERN score was 51. Three responses were considered good, six were fair, and one was poor. The JAMA Benchmark criteria was zero for all responses. However, this metric was greatly limited by the lack of citations in the responses. The average Flesch Kincaid grade level score was 29.33, indicating a college level grading level. The Cohen Kappa correlation was calculated to be 0.842, indicating near perfect agreement.
Question 1: What causes osteoarthritis?
DISCERN Score: 36
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 33.7
The ChatGPT response correctly identified many of the risk factors for KOA, including age, obesity, injury, and joint misalignment (Sharma 2021; Giorgino et al. 2023; Lespasio et al. 2017). It also accurately stated that the causes of KOA are multifactorial and likely vary between patients (Lespasio et al. 2017).
Question 2: Are there any treatments to repair knee cartilage damage due to osteoarthritis?
DISCERN Score: 55
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 32.1
ChatGPT correctly recognized that there are currently no treatments that restore the hyaline cartilage to the articular surfaces of the knee and provide long-term relief of KOA. Therefore, ChatGPT’s statement that treatment for KOA is focused on improving symptoms and improving functionality of the knee joint is correct.
However, the chatbot incorrectly stated that platelet rich plasma (PRP) can be used to promote healing and provide symptomatic relief. PRP contains numerous growth factors that are involved in cell proliferation, angiogenesis, cell migration, and the production of cellular matrix. However, current literature has found that PRP’s role is limited to inflammatory control and symptom improvement (Di et al. 2018; Meheux et al. 2016; Campbell et al. 2015). There has not been a robust demonstration of cartilage improvement with the use of PRP (Fuggle et al. 2020).
Question 3: How is knee osteoarthritis treated?
DISCERN Score: 58
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 25.0
The ChatGPT response regarding KOA treatment correctly states that the management of KOA requires a combined treatment approach that may including lifestyle modifications, non-pharmacologic, pharmacologic, and surgical options (Zhang et al. 2008; Lim and Al-Dadah 2022; Bannuru et al. 2019). It mentioned many relevant conservative treatments including weight loss, exercise, physical therapy, and assistive devices as well as the important first-line medications and procedural options available (Lim and Al-Dadah 2022). Importantly, the response includes the fact that some treatment adjuncts, like nutritional supplements, have debated effectiveness (Brophy and Fillingham 2022). Further, although the response does not offer specifics, it does appropriately state that the combination of treatments chosen will depend on severity of symptoms and individual patient factors (Bannuru et al. 2019).
However, the response does require clarification. Although the use of canes, braces, and insoles is supported by older treatment guidelines, current literature suggests that their effectiveness in symptomatic relief, clinically relevant biomechanical alterations, and improved functionality in KOA is lacking. Therefore, their recommended use is controversial (Bannuru et al. 2019; Brophy and Fillingham 2022; Lim and Al-Dadah 2022; Duivenvoorden et al. 2015; Jindasakchai et al. 2023). American Academy of Orthopedic Surgery (AAOS) and Osteoarthritis Research Society International (OARSI) guidelines specifically recommend against the use of lateral insoles and braces respectively (Bannuru et al. 2019). Lastly, the AI response includes both NSAIDs and acetaminophen as appropriate pharmacological first-line treatment. However, several updated guidelines do not recommend the use of acetaminophen alone due to lack of efficiency over placebo (Bannuru et al. 2019; Lim and Al-Dadah 2022).
Question 4: Do I need to stop doing my normal activities because of knee osteoarthritis?
DISCERN Score: 50
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 42.4
The ChatGPT response to this query offers an acceptable recommendation to remain active while incorporating appropriate adjustments to protect your joints when managing KOA (Kraus et al. 2019; Goh et al. 2019; Lim and Al-Dadah 2022; Brophy and Fillingham 2022; Bannuru et al. 2019; Masud et al. 2021). Exercise is listed as a first line treatment by numerous treatment guidelines and many forms of exercise (strength, aerobic, land-based, aquatic, mind-body, balance training, supervised, unsupervised) have been shown to improve pain, function, and quality of life in patients with KOA (Brophy and Fillingham 2022; Bannuru et al. 2019; Masud et al. 2021). There is no consensus on what particular exercise modality is superior. Therefore, ChatGPT’s response correctly explains that an appropriate exercise regimen is individual and realistic, taking into account patient comorbidities, personal goals, and unique benefits of each modality (Brophy and Fillingham 2022; Lim and Al-Dadah 2022; Goh et al. 2019).
The AI response also includes weight management as an important reason to stay active given its ability to slow the progression of KOA (Lim and Al-Dadah 2022; Gersing et al. 2019). Furthermore, the response accurately recommends avoidance of excessive high impact activity such as running and exercise requiring prolonged kneeling or squatting (Griffin and Guilak 2005; Petrigna et al. 2022). When employing activity in the prevention and management of joint degeneration, there is an optimal balance where the joint is neither insufficiently loaded nor overloaded that facilitates optimal cartilage health (Griffin and Guilak 2005; Petrigna et al. 2022).
Question 5: Why is losing weight recommended for knee osteoarthritis?
DISCERN Score: 55
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 32.1
The ChatGPT response regarding weight loss for individuals with KOA correctly identifies some key reasons why weight loss is recommended for individuals with KOA. These reasons include reduced joint stress, pain relief, improved joint function, delaying disease progression, enhanced response to treatment, and reduced risk of complications (Messier et al. 2000, 2018; Miller et al. 2006). These reasons are supported by current medical literature, which acknowledges the benefits of weight loss in managing KOA, particularly in terms of pain reduction and improved function (Kulkarni et al. 2016).
However, the response could be improved with a clarification regarding the complexity and variability of outcomes related to weight loss in KOA (Laperche et al. 2022). While losing weight generally offers benefits, it’s not guaranteed to lessen complications for all patients undergoing knee arthroplasty, particularly for those with severe obesity and/or osteoarthritis. This indicates that the impact of weight loss can vary depending on individual circumstances and disease severity. Therefore, a more elaborative discussion acknowledging these complexities would provide a more comprehensive understanding of the role of weight loss in KOA management (Godziuk et al. 2021). This nuanced understanding of the benefits and limitations of weight loss in the context of KOA is important for a comprehensive understanding of the subject.
Question 6: Is it safe to exercise with knee osteoarthritis?
DISCERN Score: 51
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 34.6
The ChatGPT response regarding the safety and benefits of exercise for individuals with KOA is largely accurate and aligns well with current medical understanding. ChatGPT also appropriately emphasized the importance of consulting with a healthcare provider before starting an exercise regimen for KOA (Young, Pedersen, and Bricca 2023). Exercise is recommended for KOA patients, as it can ameliorate joint function and reduce pain (Tanaka et al. 2013). The emphasis on low-impact activities, strength training, and the need for a personalized approach based on individual condition and tolerance levels is supported by current literature (Zeng et al. 2021).
However, the ChatGPT response could benefit from a slight clarification regarding the variability of exercise outcomes and the importance of tailored exercise programs (Tiffreau et al. 2007; Messier et al. 2021). For example, while exercise is generally effective in improving pain and strength in KOA patients, the evidence on its impact on function, performance, and quality of life is less clear. Additionally, while land and aquatic exercises are known to improve KOA, several specific exercise modalities such as stretching, plyometric, and proprioception training lend ambiguous results to their effectiveness in reducing KOA symptoms (Raposo, Ramos, and Lúcia Cruz 2021; Bosomworth 2009).
Question 7: How do corticosteroid injections help with knee osteoarthritis?
DISCERN Score: 51
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 6.8
The ChatGPT response appropriately identified corticosteroid injections as a temporary measure for providing patients with KOA relief from pain and inflammation (Arroll and Goodyear-Smith 2004). Many investigations have validated the utility of intraarticular injections in managing pain and inflammation in the short-term period, with a range of appreciable improvements from two to six weeks (Arroll and Goodyear-Smith 2004; Jüni et al. 2015; Najm et al. 2021). The chatbot also provided insight on the anti-inflammatory effect of corticosteroid injections, and the proposed mechanism for pain relief (Najm et al. 2021; Martin and Browne 2019). The chatbot provided a wider range of pain relief expectations, from several weeks to a few months. Although previous studies have demonstrated the role of corticosteroid injections for long term management of KOA, recent literature has indicated its use is more appropriate for short term management (Arroll and Goodyear-Smith 2004; Najm et al. 2021; Raynauld et al. 2003; Felson 2023). Further, the chatbot appropriately highlights the role of injections in improving joint function, the potential side effects, and its capacity to minimize systemic side effects because of the localized administrative nature of the treatment (Donovan et al. 2022; Kompel et al. 2019). Moreover, the chatbot defines the role of injections as a component of larger comprehensive treatment plan and responsibly recommends physician consultation (Hsu and Siwiec 2023; Arden et al. 2021; Block and Cherny 2021).
Question 8: When should I consider knee replacement surgery for knee osteoarthritis?
DISCERN Score: 47
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 20.8
The ChatGPT response regarding when to seek a surgical intervention correctly indicated that knee arthroplasties typically follow unresponsive conservative treatment options (Martel-Pelletier et al. 2019; Aweid et al. 2018; Collins, Hart, and Mills 2019; Afzali et al. 2018). The chatbot further highlights indications for surgery including pain, loss of function, deformity, cartilage loss, and joint damage (Aweid et al. 2018; Schmitt et al. 2017). Factors such as male sex, young age, African American race, diabetes, cementless fixation, preoperative limb malalignment, and coronal laxity have all been demonstrated to impact prognosis (Parvizi, Hanssen, and Spangehl 2004; Jasper et al. 2016; Julin et al. 2010; Ekhtiari et al. 2021). In this regard, the chatbot emphasizes the importance of physician consultation, touching on how variable risks and benefits should be considered on a patient-to-patient basis (Nielsen et al. 2017).
Question 9: What are the risks of knee replacement surgeries?
DISCERN Score: 47
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 34.3
The ChatGPT response accurately explained the postoperative complications including infection, thrombosis, inflammation, and stiffness (Alshammari et al. 2023). Although the chatbot provided a correct explanation of the risks for knee arthroplasties and encouraged discussion with a healthcare provider, it failed to provide the complication rates. Access to this data may be an important consideration when deciding management options. For example, knee arthroplasties have reported postoperative infection and thrombotic event rates of 1% (Gajda et al. 2023) and a 1.3% (Keller et al. 2020), respectively. Knowing the postoperative risks helps the patient make more informed decisions and provides an opportunity for deeper patient-physician communication.
Question 10: How long will my prosthetic knee last?
DISCERN Score: 44
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 31.5
The ChatGPT response regarding the longevity of a prosthetic knee correctly explained that total knee implant longevity is multifactorial and varies by patient (Cohen et al. 2023). Additionally, current literature supports the chatbot statement that implants usually last in the range of 15–20 years (Hussain et al. 2016; Dalury 2016; Cohen et al. 2020; Berger et al. 2001; Dixon et al. 2005). The chatbot also recommended that patients communicate with their orthopedic surgeon regarding the lifespan of their knee implant, which is critical in building rapport and achieving patient-centered care (O’Reilly et al. 2022).
Discussion
ChatGPT’s integration into healthcare prompts an interesting exploration of its applicability in patient education, particularly in the context of KOA. Our study primarily aimed to assess the effectiveness of ChatGPT in providing reliable information for common patient questions regarding KOA. Of the ten questions, the average DISCERN score was 51. Three responses were considered good, six were fair, and one was poor (Table 4). However, they all lacked reliable citations. Furthermore, while ChatGPT may have potential in patient education, it should only be used as supplementary source of information. Adequate patient education should be performed by a board-certified orthopedic surgeon. This highlights the importance of a balanced approach in integrating AI into patient education.
Several studies incorporating ChatGPT in orthopedic contexts have drawn similar conclusions, which indicates that ChatGPT offers satisfactory preliminary information but requires clarification for improved decision making (Draschl et al. 2023). This approach suggests a cautious stance toward relying exclusively on AI-generated responses for medical advice. This study found that all responses received a JAMA Benchmark criteria score of zero due to a lack of citations in the chatbot responses. Similarly, Eng et al. found that ChatGPT’s responses to common questions on rotator cuff repairs lacked reliable references, and they all received JAMA Benchmark criteria scores of zero. (Eng et al. 2024) Furthermore, other studies have found that ChatGPT lacks the ability to determine the reliability of the sources it utilizes to generate responses (Ayers et al. 2023). It also has the tendency to cite irrelevant articles, which have been termed “hallucinations” (Brega et al. 2015). In contrast, other investigations have highlighted inconsistencies in ChatGPT’s accuracy for self-diagnosing orthopedic conditions, pointing to the variability and potential risks of depending solely on AI in medical contexts (Kuroiwa et al. 2023; Massey, Montgomery, and Zhang 2023).
Additionally, the average Flesch Kincaid grade level score was 29.33, indicating a college level grading level. The average American reads at an 8th grade level. Thus, many sources recommend that patient education resources do not surpass a 6th grade reading level (Cotugna, Vickery, and Carpenter-Haefele 2005; Weiss et al. 1994; Brega et al. 2015). This offers the ability to provide medical information that is both comprehensible and accurate in an inclusive manner. Given the chatbot’s advanced language, it may be challenging for patients to understand and utilize the information it provides. Similarly, Fahy et al. found that the ChatGPT responses to patient questions about anterior cruciate ligament injuries were at least nine reading grade levels above the recommended 6th grade (Fahy et al. 2024). Interestingly, the responses were more complex with the more advanced 4.0 model (Fahy et al. 2024). However, this finding is not limited to ChatGPT. Sports medicine and sports orthopedics internet-based patient education resources are written at an advanced reading grade level. Doinn et al. found that these materials exceeded the recommended reading grade level by at least four grade levels (Ó Doinn et al. 2022). To improve the utility of patient education materials and future iterations of ChatGPT, the responses should be written at a lower reading grade level to optimize inclusivity for the general public.
Our findings affirm the multifaceted role of AI in healthcare. For specific conditions, like KOA, ChatGPT may be an effective tool to relay reliable information to patients. However, the level of reliability and need for additional clarification can significantly vary depending on the medical condition and context. This inconsistency underscores the importance of contextual evaluation when utilizing AI tools in healthcare, suggesting that AI may best function only as an adjunct to professional medical advice and consultation.
Limitations
While our study offers important insights, it is not without limitations. The selection of only ten questions for analysis in our study introduces the possibility of selection bias, as this small sample may not adequately represent the full spectrum of patient inquiries regarding KOA. This potentially limits the generalizability of our findings. Additionally, the questions were analyzed using scoring systems that involve subjective measurements. The authors attempted to decrease biases due to this methodology by having two authors score the responses individually. Furthermore, the chatbot failed to include reliable citations in the responses, which greatly impacted the authors’ ability to utilize the JAMA Benchmark criteria. Due to the lack of citations, all responses received a score of zero. Lastly, there is variability in ChatGPT’s responses based on question phrasing and program updates, which can lead to inconsistencies in the information produced. These limitations exemplify the need for cautious interpretation of our results and highlight areas for improvement in future research on AI applications in patient education.
Conclusion
ChatGPT-3.5 has the potential to be an informative tool for patients with questions about knee osteoarthritis. It was able to provide fair responses, however, some inquiries required clarification and all responses lacked reliable citations. Furthermore, the responses were written at a college grade reading level, which limits its utility. Therefore, proper patient education should be conducted by orthopedic surgeons. This study highlights the need for patient education resources that are accessible, accurate, and comprehensible.