Assessing ChatGPT Responses to Common Patient Questions on Knee Osteoarthritis

Nathan Gilmore; Jared N. Kushner; Anna Redden; Austin W. Hansen; Payton Yerke Hansen; Lisa Martinez

doi:10.60118/001c.121815

Gilmore, Nathan, Jared N. Kushner, Anna Redden, Austin W. Hansen, Payton Yerke Hansen, and Lisa Martinez. 2024. “Assessing ChatGPT Responses to Common Patient Questions on Knee Osteoarthritis.” Journal of Orthopaedic Experience & Innovation, November. https://doi.org/10.60118/001c.121815.

Download all (1)

Click here : https://joeipub.com/learning
Download

View more stats

Abstract

Background

Patient education is an important component in providing high quality healthcare, especially in the context of orthopedic surgery. In the current era of continuous technological advancements and the adoption of artificial intelligence in healthcare, the use of online chatbots in patient education is inevitable. The purpose of this paper is to evaluate ChatGPT-3.5’s effectiveness in answering common patient questions about knee osteoarthritis.

Methods

Ten frequently asked questions were collected from ten separate healthcare institution pages and input into ChatGPT-3.5. The questions were then analyzed for reliability and completeness using the DISCERN instrument and the Journal of the American Medical Association (JAMA) Benchmark criteria. The readability was analyzed using the Flesch Kincaid scoring system.

Results

Conclusion

ChatGPT-3.5 may have the potential to be an informative tool for patients with questions about knee osteoarthritis. It was able to provide fair responses, however, some inquiries required clarification and all responses lacked reliable citations. Furthermore, the responses were written at a college grade reading level, which limits its utility. Therefore, proper patient education should be conducted by orthopedic surgeons. This highlights the need for patient education resources that are both accessible and comprehensible.

Click here : https://joeipub.com/learning

Introduction

One of the most common conditions managed by orthopedic surgeons is knee osteoarthritis (KOA). Osteoarthritis is estimated to affect about 240 million people around the world (Katz, Arant, and Loeser 2021). Among adults in the United States age 45+, at least 19% are affected by KOA (Wallace et al. 2017). With the large prevalence of KOA, ensuring patients receive proper education on their condition is crucial. Patients frequently use the internet as a resource for medical information. It has been previously estimated that 65% of patients with orthopedic conditions who have access to the internet used it to search for information related to their condition (Fraval et al. 2012).

The use of artificial intelligence (AI) in healthcare has grown exponentially within recent years. ChatGPT is an online artificial intelligence tool that has gained over 1 billion visits since its inception (Herbold et al. 2023). Given the extensive utilization of online search engines as tools for researching medical information and the recent popularity of ChatGPT, it is reasonable to forecast that ChatGPT will become a common tool for patient education. However, ChatGPT can at times provide answers that are plausible but incorrect (Gravel, D’Amours-Gravel, and Osmanlliu 2023). Previous studies investigating the use of ChatGPT to answer patient questions have been conducted with mixed results. Some studies found the answers to be complete and accurate (Johnson et al. 2023). However, others found that its responses were inaccurate and low quality (Coskun et al. 2023; Draschl et al. 2023).

Given its prevalence, we would expect KOA to be an especially common topic researched by patients. However, it is not known how reliable the medical information provided by the chatbot on KOA is. The purpose of this study is to evaluate the effectiveness of ChatGPT in answering common patient questions about KOA.

Methods

The KOA ‘Frequently Asked Questions’ (FAQs) section of ten health-care institution pages were reviewed to identify ten questions that were deemed the most common and clinically relevant (Appendix). The healthcare institution pages were all publicly available websites that were not affiliated with the authors’ institutions. The questions gathered were then individually input to ChatGPT 3.5 (https://chat.openai.com/chat) (“ChatGPT,” n.d.), an AI chatbot with a free online interface, on December 21, 2023. There were no follow-up questions or prompts for further clarification. Furthermore, the ChatGPT account used was new and had not been used to input questions.

The responses were critically analyzed for accuracy and reliability using the DISCERN instrument (Table 1) by the two senior authors (Charnock et al. 1999). Each author analyzed the responses individually and did not have access to the other author’s scoring. The DISCERN scores were classified as: Excellent (scores 64-80), Good (scores 52-63), Fair (scores 41-51), Poor (scores 30-40), and Very Poor (scores 16-29) (Table 2) (Tahir et al. 2020). The Journal of the American Medical Association (JAMA) Benchmark criteria was used to determine if the responses utilized evidence-based material. JAMA Benchmark criteria critically evaluate healthcare websites using four key pillars- authorship, attribution, disclosure, and currency.

Table 1.The components of the DISCERN Instrument are shown.

DISCERN Instrument
1. Are the aims clear?
2. Does it achieve its aims?
3. Is it relevant?
4. Is it clear what sources of information were used to compile the publication (other than the author or producer)?
5. Is it clear when the information used or reported in the publication was produced?
6. Is it balanced and unbiased?
7. Does it provide details of additional sources of support and information?
8. Does it refer to areas of uncertainty?
9. Does it describe how each treatment works?
10. Does it describe the benefits of each treatment?
11. Does it describe the risks of each treatment?
12. Does it describe what would happen if no treatment is used?
13. Does it describe how the treatment choices affect overall quality of life?
14. Is it clear that there may be more than one possible treatment choice?
15. Does it provide support for shared decision-making?
16. Based on the answers to all of the above questions, rate the overall quality of the publication as a source of information about treatment choices

Table 2.The DISCERN ratings that correlate to the overall response rating are shown.

Response Rating	DISCERN Score
Excellent	64-80
Good	52-63
Fair	41-51
Poor	30-40
Very Poor	16-29

The readability of the responses was determined by The Flesch-Kincaid Grade Level test, which uses the verbiage complexity to assign a school grade reading level to the text (Table 3) (Flesch 1948; Kincaid et al. 1975). If disagreements in the grading occurred, consensus was determined through further discussion. The Cohen Kappa correlation was used to determine interrater agreeability.

Table 3.Interpretation of The Flesch-Kincaid Grade Level scores

FRE Score	Reading Difficulty	Estimated FK Reading Grade Level
91-100	Very Easy	US Grade 5
81-90	Easy	US Grade 6
71-80	Fairly Easy	US Grade 7
61-70	Standard	US Grade 8-9
51-60	Fairly Difficult	US Grade 10-12
31-50	Difficult	US Grade 13-16
0-30	Very Difficult	College Graduate

Results

Of the ten questions, the average DISCERN score was 51. Three responses were considered good, six were fair, and one was poor. The JAMA Benchmark criteria was zero for all responses. However, this metric was greatly limited by the lack of citations in the responses. The average Flesch Kincaid grade level score was 29.33, indicating a college level grading level. The Cohen Kappa correlation was calculated to be 0.842, indicating near perfect agreement.

Question 1: What causes osteoarthritis?

DISCERN Score: 36
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 33.7

The ChatGPT response correctly identified many of the risk factors for KOA, including age, obesity, injury, and joint misalignment (Sharma 2021; Giorgino et al. 2023; Lespasio et al. 2017). It also accurately stated that the causes of KOA are multifactorial and likely vary between patients (Lespasio et al. 2017).

Question 2: Are there any treatments to repair knee cartilage damage due to osteoarthritis?

DISCERN Score: 55
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 32.1

ChatGPT correctly recognized that there are currently no treatments that restore the hyaline cartilage to the articular surfaces of the knee and provide long-term relief of KOA. Therefore, ChatGPT’s statement that treatment for KOA is focused on improving symptoms and improving functionality of the knee joint is correct.

However, the chatbot incorrectly stated that platelet rich plasma (PRP) can be used to promote healing and provide symptomatic relief. PRP contains numerous growth factors that are involved in cell proliferation, angiogenesis, cell migration, and the production of cellular matrix. However, current literature has found that PRP’s role is limited to inflammatory control and symptom improvement (Di et al. 2018; Meheux et al. 2016; Campbell et al. 2015). There has not been a robust demonstration of cartilage improvement with the use of PRP (Fuggle et al. 2020).

Question 3: How is knee osteoarthritis treated?

DISCERN Score: 58
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 25.0

The ChatGPT response regarding KOA treatment correctly states that the management of KOA requires a combined treatment approach that may including lifestyle modifications, non-pharmacologic, pharmacologic, and surgical options (Zhang et al. 2008; Lim and Al-Dadah 2022; Bannuru et al. 2019). It mentioned many relevant conservative treatments including weight loss, exercise, physical therapy, and assistive devices as well as the important first-line medications and procedural options available (Lim and Al-Dadah 2022). Importantly, the response includes the fact that some treatment adjuncts, like nutritional supplements, have debated effectiveness (Brophy and Fillingham 2022). Further, although the response does not offer specifics, it does appropriately state that the combination of treatments chosen will depend on severity of symptoms and individual patient factors (Bannuru et al. 2019).

However, the response does require clarification. Although the use of canes, braces, and insoles is supported by older treatment guidelines, current literature suggests that their effectiveness in symptomatic relief, clinically relevant biomechanical alterations, and improved functionality in KOA is lacking. Therefore, their recommended use is controversial (Bannuru et al. 2019; Brophy and Fillingham 2022; Lim and Al-Dadah 2022; Duivenvoorden et al. 2015; Jindasakchai et al. 2023). American Academy of Orthopedic Surgery (AAOS) and Osteoarthritis Research Society International (OARSI) guidelines specifically recommend against the use of lateral insoles and braces respectively (Bannuru et al. 2019). Lastly, the AI response includes both NSAIDs and acetaminophen as appropriate pharmacological first-line treatment. However, several updated guidelines do not recommend the use of acetaminophen alone due to lack of efficiency over placebo (Bannuru et al. 2019; Lim and Al-Dadah 2022).

Question 4: Do I need to stop doing my normal activities because of knee osteoarthritis?

DISCERN Score: 50
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 42.4

The ChatGPT response to this query offers an acceptable recommendation to remain active while incorporating appropriate adjustments to protect your joints when managing KOA (Kraus et al. 2019; Goh et al. 2019; Lim and Al-Dadah 2022; Brophy and Fillingham 2022; Bannuru et al. 2019; Masud et al. 2021). Exercise is listed as a first line treatment by numerous treatment guidelines and many forms of exercise (strength, aerobic, land-based, aquatic, mind-body, balance training, supervised, unsupervised) have been shown to improve pain, function, and quality of life in patients with KOA (Brophy and Fillingham 2022; Bannuru et al. 2019; Masud et al. 2021). There is no consensus on what particular exercise modality is superior. Therefore, ChatGPT’s response correctly explains that an appropriate exercise regimen is individual and realistic, taking into account patient comorbidities, personal goals, and unique benefits of each modality (Brophy and Fillingham 2022; Lim and Al-Dadah 2022; Goh et al. 2019).

The AI response also includes weight management as an important reason to stay active given its ability to slow the progression of KOA (Lim and Al-Dadah 2022; Gersing et al. 2019). Furthermore, the response accurately recommends avoidance of excessive high impact activity such as running and exercise requiring prolonged kneeling or squatting (Griffin and Guilak 2005; Petrigna et al. 2022). When employing activity in the prevention and management of joint degeneration, there is an optimal balance where the joint is neither insufficiently loaded nor overloaded that facilitates optimal cartilage health (Griffin and Guilak 2005; Petrigna et al. 2022).

Question 5: Why is losing weight recommended for knee osteoarthritis?

DISCERN Score: 55
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 32.1

The ChatGPT response regarding weight loss for individuals with KOA correctly identifies some key reasons why weight loss is recommended for individuals with KOA. These reasons include reduced joint stress, pain relief, improved joint function, delaying disease progression, enhanced response to treatment, and reduced risk of complications (Messier et al. 2000, 2018; Miller et al. 2006). These reasons are supported by current medical literature, which acknowledges the benefits of weight loss in managing KOA, particularly in terms of pain reduction and improved function (Kulkarni et al. 2016).

However, the response could be improved with a clarification regarding the complexity and variability of outcomes related to weight loss in KOA (Laperche et al. 2022). While losing weight generally offers benefits, it’s not guaranteed to lessen complications for all patients undergoing knee arthroplasty, particularly for those with severe obesity and/or osteoarthritis. This indicates that the impact of weight loss can vary depending on individual circumstances and disease severity. Therefore, a more elaborative discussion acknowledging these complexities would provide a more comprehensive understanding of the role of weight loss in KOA management (Godziuk et al. 2021). This nuanced understanding of the benefits and limitations of weight loss in the context of KOA is important for a comprehensive understanding of the subject.

Question 6: Is it safe to exercise with knee osteoarthritis?

DISCERN Score: 51
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 34.6

The ChatGPT response regarding the safety and benefits of exercise for individuals with KOA is largely accurate and aligns well with current medical understanding. ChatGPT also appropriately emphasized the importance of consulting with a healthcare provider before starting an exercise regimen for KOA (Young, Pedersen, and Bricca 2023). Exercise is recommended for KOA patients, as it can ameliorate joint function and reduce pain (Tanaka et al. 2013). The emphasis on low-impact activities, strength training, and the need for a personalized approach based on individual condition and tolerance levels is supported by current literature (Zeng et al. 2021).

However, the ChatGPT response could benefit from a slight clarification regarding the variability of exercise outcomes and the importance of tailored exercise programs (Tiffreau et al. 2007; Messier et al. 2021). For example, while exercise is generally effective in improving pain and strength in KOA patients, the evidence on its impact on function, performance, and quality of life is less clear. Additionally, while land and aquatic exercises are known to improve KOA, several specific exercise modalities such as stretching, plyometric, and proprioception training lend ambiguous results to their effectiveness in reducing KOA symptoms (Raposo, Ramos, and Lúcia Cruz 2021; Bosomworth 2009).

Question 7: How do corticosteroid injections help with knee osteoarthritis?

DISCERN Score: 51
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 6.8

The ChatGPT response appropriately identified corticosteroid injections as a temporary measure for providing patients with KOA relief from pain and inflammation (Arroll and Goodyear-Smith 2004). Many investigations have validated the utility of intraarticular injections in managing pain and inflammation in the short-term period, with a range of appreciable improvements from two to six weeks (Arroll and Goodyear-Smith 2004; Jüni et al. 2015; Najm et al. 2021). The chatbot also provided insight on the anti-inflammatory effect of corticosteroid injections, and the proposed mechanism for pain relief (Najm et al. 2021; Martin and Browne 2019). The chatbot provided a wider range of pain relief expectations, from several weeks to a few months. Although previous studies have demonstrated the role of corticosteroid injections for long term management of KOA, recent literature has indicated its use is more appropriate for short term management (Arroll and Goodyear-Smith 2004; Najm et al. 2021; Raynauld et al. 2003; Felson 2023). Further, the chatbot appropriately highlights the role of injections in improving joint function, the potential side effects, and its capacity to minimize systemic side effects because of the localized administrative nature of the treatment (Donovan et al. 2022; Kompel et al. 2019). Moreover, the chatbot defines the role of injections as a component of larger comprehensive treatment plan and responsibly recommends physician consultation (Hsu and Siwiec 2023; Arden et al. 2021; Block and Cherny 2021).

Question 8: When should I consider knee replacement surgery for knee osteoarthritis?

DISCERN Score: 47
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 20.8

The ChatGPT response regarding when to seek a surgical intervention correctly indicated that knee arthroplasties typically follow unresponsive conservative treatment options (Martel-Pelletier et al. 2019; Aweid et al. 2018; Collins, Hart, and Mills 2019; Afzali et al. 2018). The chatbot further highlights indications for surgery including pain, loss of function, deformity, cartilage loss, and joint damage (Aweid et al. 2018; Schmitt et al. 2017). Factors such as male sex, young age, African American race, diabetes, cementless fixation, preoperative limb malalignment, and coronal laxity have all been demonstrated to impact prognosis (Parvizi, Hanssen, and Spangehl 2004; Jasper et al. 2016; Julin et al. 2010; Ekhtiari et al. 2021). In this regard, the chatbot emphasizes the importance of physician consultation, touching on how variable risks and benefits should be considered on a patient-to-patient basis (Nielsen et al. 2017).

Question 9: What are the risks of knee replacement surgeries?

DISCERN Score: 47
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 34.3

The ChatGPT response accurately explained the postoperative complications including infection, thrombosis, inflammation, and stiffness (Alshammari et al. 2023). Although the chatbot provided a correct explanation of the risks for knee arthroplasties and encouraged discussion with a healthcare provider, it failed to provide the complication rates. Access to this data may be an important consideration when deciding management options. For example, knee arthroplasties have reported postoperative infection and thrombotic event rates of 1% (Gajda et al. 2023) and a 1.3% (Keller et al. 2020), respectively. Knowing the postoperative risks helps the patient make more informed decisions and provides an opportunity for deeper patient-physician communication.

Question 10: How long will my prosthetic knee last?

DISCERN Score: 44
JAMA Benchmark Criteria: 0
Flesch-Kincaid: 31.5

The ChatGPT response regarding the longevity of a prosthetic knee correctly explained that total knee implant longevity is multifactorial and varies by patient (Cohen et al. 2023). Additionally, current literature supports the chatbot statement that implants usually last in the range of 15–20 years (Hussain et al. 2016; Dalury 2016; Cohen et al. 2020; Berger et al. 2001; Dixon et al. 2005). The chatbot also recommended that patients communicate with their orthopedic surgeon regarding the lifespan of their knee implant, which is critical in building rapport and achieving patient-centered care (O’Reilly et al. 2022).

Discussion

ChatGPT’s integration into healthcare prompts an interesting exploration of its applicability in patient education, particularly in the context of KOA. Our study primarily aimed to assess the effectiveness of ChatGPT in providing reliable information for common patient questions regarding KOA. Of the ten questions, the average DISCERN score was 51. Three responses were considered good, six were fair, and one was poor (Table 4). However, they all lacked reliable citations. Furthermore, while ChatGPT may have potential in patient education, it should only be used as supplementary source of information. Adequate patient education should be performed by a board-certified orthopedic surgeon. This highlights the importance of a balanced approach in integrating AI into patient education.

Table 4.Top 10 FAQs with their corresponding response accuracy scores

Number	Question	DISCERN Score	DISCERN Rating	Flesch-Kincaid Reading Ease Scores	Flesch-Kincaid Reading Grade Level
1	What causes osteoarthritis?	36	Poor	33.7	US Grade 13-16
2	Are there any treatments to repair knee cartilage damage due to osteoarthritis?	55	Good	32.1	US Grade 13-16
3	How is knee osteoarthritis treated?	58	Good	25.0	College Graduate
4	Do I need to stop doing my normal activities because of knee osteoarthritis?	50	Fair	42.4	US Grade 13-16
5	Why is losing weight recommended for knee osteoarthritis?	55	Good	32.1	US Grade 13-16
6	Is it safe to exercise with knee osteoarthritis?	51	Fair	34.6	US Grade 13-16
7	How do corticosteroid injections help with knee osteoarthritis?	51	Fair	6.8	College Graduate
8	When should I consider knee replacement surgery for knee osteoarthritis?	47	Fair	20.8	College Graduate
9	What are the risks of knee replacement surgeries?	47	Fair	34.3	US Grade 13-16
10	How long will my prosthetic knee last?	44	Fair	31.5	US Grade 13-16

Several studies incorporating ChatGPT in orthopedic contexts have drawn similar conclusions, which indicates that ChatGPT offers satisfactory preliminary information but requires clarification for improved decision making (Draschl et al. 2023). This approach suggests a cautious stance toward relying exclusively on AI-generated responses for medical advice. This study found that all responses received a JAMA Benchmark criteria score of zero due to a lack of citations in the chatbot responses. Similarly, Eng et al. found that ChatGPT’s responses to common questions on rotator cuff repairs lacked reliable references, and they all received JAMA Benchmark criteria scores of zero. (Eng et al. 2024) Furthermore, other studies have found that ChatGPT lacks the ability to determine the reliability of the sources it utilizes to generate responses (Ayers et al. 2023). It also has the tendency to cite irrelevant articles, which have been termed “hallucinations” (Brega et al. 2015). In contrast, other investigations have highlighted inconsistencies in ChatGPT’s accuracy for self-diagnosing orthopedic conditions, pointing to the variability and potential risks of depending solely on AI in medical contexts (Kuroiwa et al. 2023; Massey, Montgomery, and Zhang 2023).

Additionally, the average Flesch Kincaid grade level score was 29.33, indicating a college level grading level. The average American reads at an 8^th grade level. Thus, many sources recommend that patient education resources do not surpass a 6^th grade reading level (Cotugna, Vickery, and Carpenter-Haefele 2005; Weiss et al. 1994; Brega et al. 2015). This offers the ability to provide medical information that is both comprehensible and accurate in an inclusive manner. Given the chatbot’s advanced language, it may be challenging for patients to understand and utilize the information it provides. Similarly, Fahy et al. found that the ChatGPT responses to patient questions about anterior cruciate ligament injuries were at least nine reading grade levels above the recommended 6^th grade (Fahy et al. 2024). Interestingly, the responses were more complex with the more advanced 4.0 model (Fahy et al. 2024). However, this finding is not limited to ChatGPT. Sports medicine and sports orthopedics internet-based patient education resources are written at an advanced reading grade level. Doinn et al. found that these materials exceeded the recommended reading grade level by at least four grade levels (Ó Doinn et al. 2022). To improve the utility of patient education materials and future iterations of ChatGPT, the responses should be written at a lower reading grade level to optimize inclusivity for the general public.

Our findings affirm the multifaceted role of AI in healthcare. For specific conditions, like KOA, ChatGPT may be an effective tool to relay reliable information to patients. However, the level of reliability and need for additional clarification can significantly vary depending on the medical condition and context. This inconsistency underscores the importance of contextual evaluation when utilizing AI tools in healthcare, suggesting that AI may best function only as an adjunct to professional medical advice and consultation.

Limitations

While our study offers important insights, it is not without limitations. The selection of only ten questions for analysis in our study introduces the possibility of selection bias, as this small sample may not adequately represent the full spectrum of patient inquiries regarding KOA. This potentially limits the generalizability of our findings. Additionally, the questions were analyzed using scoring systems that involve subjective measurements. The authors attempted to decrease biases due to this methodology by having two authors score the responses individually. Furthermore, the chatbot failed to include reliable citations in the responses, which greatly impacted the authors’ ability to utilize the JAMA Benchmark criteria. Due to the lack of citations, all responses received a score of zero. Lastly, there is variability in ChatGPT’s responses based on question phrasing and program updates, which can lead to inconsistencies in the information produced. These limitations exemplify the need for cautious interpretation of our results and highlight areas for improvement in future research on AI applications in patient education.

Conclusion

ChatGPT-3.5 has the potential to be an informative tool for patients with questions about knee osteoarthritis. It was able to provide fair responses, however, some inquiries required clarification and all responses lacked reliable citations. Furthermore, the responses were written at a college grade reading level, which limits its utility. Therefore, proper patient education should be conducted by orthopedic surgeons. This study highlights the need for patient education resources that are accessible, accurate, and comprehensible.

Submitted: May 21, 2024 EDT

Accepted: July 24, 2024 EDT

References

Afzali, Tamana, Mia V. Fangel, Anne S. Vestergaard, Michael S. Rathleff, Lars H. Ehlers, and Martin B. Jensen. 2018. “Cost-Effectiveness of Treatments for Non-Osteoarthritic Knee Pain Conditions: A Systematic Review.” PLOS ONE 13 (12): e0209240. https://doi.org/10.1371/journal.pone.0209240.

Assessing ChatGPT Responses to Common Patient Questions on Knee Osteoarthritis

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Results

Question 1: What causes osteoarthritis?

Question 2: Are there any treatments to repair knee cartilage damage due to osteoarthritis?

Question 3: How is knee osteoarthritis treated?

Question 4: Do I need to stop doing my normal activities because of knee osteoarthritis?

Question 5: Why is losing weight recommended for knee osteoarthritis?

Question 6: Is it safe to exercise with knee osteoarthritis?

Question 7: How do corticosteroid injections help with knee osteoarthritis?

Question 8: When should I consider knee replacement surgery for knee osteoarthritis?

Question 9: What are the risks of knee replacement surgeries?

Question 10: How long will my prosthetic knee last?

Discussion

Limitations

Conclusion

References

This website uses cookies