Is the Interpretation of Radiographic Knee Arthritis Consistent Between Orthopaedic Surgeons and Radiologists?

Justin A. Magnuson; Nihir Parikh; Francis Sirch; Justin R. Montgomery; Raja N. Kyriakos; Arjun Saxena; Andrew M. Star

doi:10.60118/001c.91022

Magnuson, Justin A., Nihir Parikh, Francis Sirch, Justin R. Montgomery, Raja N. Kyriakos, Arjun Saxena, and Andrew M. Star. 2024. “Is the Interpretation of Radiographic Knee Arthritis Consistent Between Orthopaedic Surgeons and Radiologists?” Journal of Orthopaedic Experience & Innovation 5 (1). https://doi.org/10.60118/001c.91022.

Download all (1)

Click here: https://joeipub.com/learning
Download

View more stats

Abstract

Background

Knee radiographs are often examined independently by both radiologists and orthopaedic surgeons when evaluating osteoarthritis (OA). While multiple systems have been described, formal classification systems are infrequently used in clinical practice and documentation. Instead, providers commonly describe knee OA on radiographs as “mild,” “moderate,” or “severe,” with loose and unclear interpretations. From a patient’s perspective, inconsistent reading and charting of knee OA severity can have financial and psychological implications, such as prior authorization denial, as well as anxiety-provoking uncertainty with their diagnosis. The purpose of this study was to investigate the agreement between orthopaedic surgeons, musculoskeletal radiologists, and general radiologists on the severity and location of knee OA.

Methods

105 deidentified radiographs of patients presenting with knee pain were obtained. Anteroposterior (AP) and lateral radiographs were reviewed independently by two high-volume arthroplasty surgeons, two musculoskeletal radiologists, and two general radiologists. Each radiograph was classified as mild, moderate, or severe OA, mirroring the language used in the providers’ documentation. Providers were also asked to comment on the location of OA, described as medial, lateral, patellofemoral, or any combination. Agreement was calculated using Fleiss’ kappa in which values less than 0.3 were considered no true agreement, 0.3 and 0.5 weak agreement, 0.5 and 0.8 moderate agreement, and greater than 0.8 strong agreement.

Results

There was inconsistent agreement for severity and location among physicians of the same specialty and between specialties. There was moderate agreement (k = 0.513) in the assessment of patellofemoral arthritis among radiologists. Orthopaedic surgeons (k = 0.503) and musculoskeletal radiologists (k = 0.568) demonstrated moderate agreement in the perceived need for TKA, and there was moderate agreement between the two specialties (k = 0.556). All other comparisons indicate weak or no agreement.

Conclusion

A high degree of inconsistency was found in the subjective interpretation of radiographic knee OA. Although grading systems exist, providers often document knee OA based on the terms “mild,” “moderate,” and “severe,” which was shown to have poor reliability. Utilization and adherence to an existing standardized system of interpreting knee x-rays, which can be efficiently integrated into clinical practice, is necessary to improve communication for providers, patients, and insurers.

Click here: https://joeipub.com/learning

Introduction

Knee osteoarthritis (OA), or degenerative joint disease, is one of the most common reasons for presentation to orthopaedic and primary care offices (Weinstein et al. 2013; Turkiewicz et al. 2015; Van Manen, Nace, and Mont 2012). The prevalence of knee arthritis has grown considerably since the mid-twentieth century, affecting more than 50% of individuals over the age of 65 and approximately 80% by the age of 75 years old (Wallace et al. 2017; Arden and Nevitt 2006). Standing knee radiographs with multiple views are typically the first imaging study obtained to evaluate the presence and severity of OA (Boegård and Jonsson 1999; Duncan et al. 2015). At most institutions and radiology centers, a radiologist interprets the radiographs, and a written report is made available to the referring physician (primary care or orthopaedic surgeon). In some, but not all cases, radiographs are reviewed by a musculoskeletal specialized radiologist. Additionally, radiographs are commonly repeated at the initial orthopaedic evaluation, even if ordered previously by a different physician (Yayac et al. 2021). Unfortunately, at the present time there is no gold standard radiographic scale for osteoarthritis, and this difficulty is compounded by the involvement of physicians across different specialties.

While there is no one gold standard classification method, multiple systems have been described for the classification of radiographic knee osteoarthritis based on etiology, symptom duration and severity, and radiographic findings (Lespasio et al. 2017). The Kellgren-Lawrence (KL) system, described in 1957, is one of the most widely used systems for research purposes (Kellgren and Lawrence 1957). Despite common use in research, the KL system is not widely used in clinical practice (Riddle, Jiranek, and Hull 2013). Limitations include the overemphasis of osteophytes and underemphasis of joint space narrowing, which has been demonstrated as a more reliable indicator of OA (Heng, Bin Abd Razak, and Mitra 2015; Kallman et al. 1989; Wright and The MARS Group 2014). In our experience, OA is commonly graded in clinical practice as “mild,” “moderate,” or “severe” disease without the use of any specific classification system. Thus, differences in each physician’s interpretation of the same image may lead to discrepancies in clinical documentation, patient management, and prior authorization.

The purpose of this study was to investigate agreement in the interpretation of knee radiographs between orthopaedic surgeons and radiologists using these simple and subjective terms that are commonly used in practice. We looked at agreement among orthopaedic surgeons specializing in arthroplasty, musculoskeletal radiologists, and general radiologists. Specifically, we investigated agreement in (1) severity of OA, (2) location of OA. We hypothesized that there would be moderate to strong agreement between physicians of the same specialty but lower agreement between those of different specialties.

Methods

Study Setting and Participants

One hundred five patients presenting to a single orthopaedic practice for unilateral knee pain were identified. Mean age was 62 +/- 16 with 65 females (62%) and a mean body mass index (BMI) of 28 +/- 7. Standing anterior to posterior (AP) and lateral radiographs were obtained for each patient. Patient history and demographic information was blinded before evaluation by each reviewer.

Six physicians independently reviewed the radiographs to characterize the severity and location of OA. Reviewers included two high-volume adult reconstruction orthopaedic surgeons, two fellowship-trained musculoskeletal (MSK) radiologists, and two general radiologists. For each set of radiographs, osteoarthritis was described as “mild,” “moderate,” or “severe,” which mirrored the language utilized in the clinical documentation of the providers. The location of degenerative changes was described as medial compartment, lateral compartment, patellofemoral (PF), or any combination. Although blinded to the entirety of the patient’s clinical presentation, reviewers were then asked to indicate the perceived need for TKA solely based on the severity noted on the knee radiographs.

Statistical Analysis

Agreement testing was calculated using Fleiss’ kappa for categorical variables. Values less than 0.3 were considered no true agreement, between 0.3 and 0.5 weak agreement, between 0.5 and 0.8 moderate agreement, and greater than 0.8 strong agreement. Moderate or strong agreement was considered reliable agreement. We compared agreement among readers of the same specialty and between different specialties using the following groups: (1) surgeons and all radiologists, (2) surgeons and MSK radiologists, (3) surgeons and general radiologists, and (4) MSK radiologists and general radiologists. All statistical analyses were done using R Studio (Version 3.6.3, Vienna, Austria).

Results

Overall Agreement

When comparing reads between all reviewers, we found weak agreement in assessment of severity, and PF OA (Table 1). There was no true agreement in the assessment of medial, lateral, or tricompartmental OA. No evaluations had moderate or strong evaluation among the entire group of reviewers.

Table 1.Agreement among all reviewers

Variable	Kappa	Interpretation
Severity	0.383	Weak agreement
Medial OA	0.237	No true agreement
Lateral OA	0.287	No true agreement
Patellofemoral OA	0.395	Weak agreement
Tricompartmental OA	0.138	No true agreement
Surgical Recommendation	0.443	Weak agreement

Agreement Within Orthopaedic Surgeons

Orthopaedic surgeons demonstrated weak agreement in assessment of severity and lateral compartment OA, but no true agreement for medial, PF, or tricompartmental OA (Table 2). They did, however, show moderate agreement (κ = 0.503) in their perceived need for TKA based on radiographic findings.

Table 2.Agreement among orthopaedic surgeons

Variable	Kappa	Interpretation
Severity	0.455	Weak agreement
Medial OA	0.135	No true agreement
Lateral OA	0.302	Weak agreement
Patellofemoral OA	0.229	No true agreement
Tricompartmental OA	0.153	No true agreement
Surgical Recommendation	0.503	Moderate agreement

Agreement Among all Radiologists (MSK and General)

The radiologists combined as a single group showed moderate agreement in assessment of PF OA, weak agreement in severity, and no true agreement for all other locations and recommendation of TKA (Table 3). MSK radiologists had weak agreement in assessment of severity and PF OA and no agreement in the presence of medial, lateral, or tricompartmental OA (Table 4). Similar to orthopaedic surgeons, they demonstrated moderate agreement in the perceived need for TKA (κ = 0.568), which was the strongest agreement within any single specialty.

Table 3.Agreement among all radiologists

Variable	Kappa	Interpretation
Severity	0.354	Weak agreement
Medial OA	0.286	No true agreement
Lateral OA	0.190	No true agreement
Patellofemoral OA	0.513	Moderate agreement
Tricompartmental OA	0.054	No true agreement
Surgical Recommendation	0.262	No true agreement

Table 4.Agreement among MSK radiologists

Variable	Kappa	Interpretation
Severity	0.397	Weak agreement
Medial OA	0.250	No true agreement
Lateral OA	0.195	No true agreement
Patellofemoral OA	0.368	Weak agreement
Tricompartmental OA	0.074	No true agreement
Surgical Recommendation	0.568	Moderate agreement

Agreement Across Specialties

There was weak agreement in the assessment of severity in all four comparison groups (Table 5), with the strongest agreement between general and MSK radiologists (Kappa-Fleiss 0.438) followed by surgeons and MSK radiologists (κ = 0.438). Location of OA showed the lowest agreement of any comparison between groups, ranging from κ values of 0.126 to 0.183.

Table 5.Assessment of severity between specialties

Variable	Kappa	Interpretation
Surgeons vs All Radiologists	0.383	Weak agreement
Surgeons vs MSK Radiologists	0.420	Weak agreement
Surgeons vs General Radiologists	0.383	Weak agreement
General vs MSK Radiologists	0.438	Weak agreement

Table 6.Assessment of location between specialties

Variable	Fleiss’ Kappa	Interpretation
Surgeons vs All Radiologists	0.160	No true agreement
Surgeons vs MSK Radiologists	0.126	No true agreement
Surgeons vs General Radiologists	0.174	No true agreement
General vs MSK Radiologists	0.183	No true agreement

Table 7.Perceived need for TKA between specialties

Variable	Fleiss’ Kappa	Interpretation
Surgeons vs All Radiologists	0.443	Weak agreement
Surgeons vs MSK Radiologists	0.556	Moderate agreement
Surgeons vs General Radiologists	0.354	Weak agreement
General vs MSK Radiologists	0.393	Weak agreement

Discussion

The most important result of this study was that the commonly used definitions of mild, moderate, and severe arthritis reported for knee radiographs are not consistent or reproducible. In this study, we found that there was generally weak agreement both among and between orthopaedic surgeons and radiologists in the interpretation of radiographic knee arthritis based on an assessment of the severity and location of disease. Only three comparisons had moderate agreement: (1) assessment of PF arthritis by radiologists, (2) perceived need for TKA by orthopaedic surgeons, (3) perceived need for TKA by MSK radiologists. The provider’s perceived need for TKA is merely a subjective measure of whether the patient would be a candidate for TKA solely based on radiographic OA severity but is not clinically relevant as a patient’s entire clinical picture and physical exam is needed. There were no comparisons that resulted in strong agreement within or between specialties. These findings suggest that the utilization and adherence to one of the many standard classification systems is needed for consistently interpreting radiographic knee arthritis that is both reliable and applicable for use in clinical practice.

The Kellgren and Lawrence (KL) grading system is widely regarded and was found to have the highest inter- and intra-observer correlation coefficients when looking at the severity of knee arthritis (0.83 for both) compared to other joints (Kellgren and Lawrence 1957). Despite the popularity of the KL grading system, it is not without drawbacks. Wright et al. evaluated interrater reliability in six classification systems (KL, International Knee Documentation Committee (IKDC), Fairbank, Brandt et al, Ahlback, and Jager-Wirth) in patients undergoing anterior cruciate ligament revision reconstruction, looking for degenerative changes (Wright and The MARS Group 2014). For the KL classification, they had an intraclass correlation coefficient of 0.38 for AP radiographs and 0.54 for Rosenberg flexion radiographs (Wright and The MARS Group 2014). They found the IKDC classification, which is based on the degree of joint space narrowing, to have the best combination of interrater reliability and correlation with arthroscopic findings (Wright and The MARS Group 2014; Mehta et al. 2007). A study by Riddle et al. looked at the interrater reliability of the KL system in arthroplasty surgeons, finding moderate to high agreement in knees indicated for TKA, but lower agreement in contralateral knees (Riddle, Jiranek, and Hull 2013).

While different radiographic classification systems have demonstrated various advantages or disadvantages for research purpose, the lack of a gold standard system of radiographic arthritis leads to a variety of clinical approaches to interpreting knee radiographs. Although widely regarded and used in practice, the KL system has been shown to underpredict the degree of OA observed intraoperatively at time of the arthroplasty (Abdelaziz et al. 2019; Blackburn et al. 1994). Complicating matters further, even within the KL system, different versions of the same criteria have led to lower agreement between readers (Schiphof et al. 2011). While studies have investigated the reliability of systems, there is little reported on the prevalence at which systems are used in routine evaluation of radiographs. It is critical that reviewers—both within and between specialties—speak the same language when communicating and documenting in patient notes. As variable agreement has been demonstrated in the literature using these well-established systems, our findings suggest even lower agreement when reviewers grade OA based on a subjective evaluation, as is commonly the case in practice.

From a patient’s perspective, inconsistent reporting and charting of their radiographic findings can lead to potential implications. Especially in current times, in which patients can access their charts and read clinical documentation, these inconsistencies can be anxiety-provoking for a patient (Meyer et al. 2021). When one provider reports “mild” and another provider reports something different, the diagnostic uncertainty can cause unnecessary stress and the potential for mistrust. Additionally, insurance companies often deny surgeries based on a radiologist’s reading and documentation of mild-moderate and severe OA. Therefore, communication with loosely defined terms such as “mild,” “moderate,” and “severe” can lead to prior authorization issues for the patient and payor.

Previous studies have investigated the agreement between surgeons and radiologists in other orthopaedic subspecialties, with variable results. One study demonstrated higher agreement between radiologists compared to surgeons when evaluating chondral knee lesions on magnetic resonance imaging (MRI) (Cavalli et al. 2011). Another study showed that experienced surgeons were more accurate at identifying shoulder lesions on MRIs when comparing to intraoperative findings (van Grinsven et al. 2015). A study on radiographic diagnosis of femoro-acetabular impingement demonstrated higher interobserver reliability within the same specialties but poor agreement between radiologists and surgeons (Ayeni et al. 2014). Assessment of hip fracture healing, another challenge that has demonstrated poor agreement with no reliable standard, was shown to be improved by using a standardized union score (Chiavaras et al. 2013). These studies demonstrate inconsistent agreement across multiple subspecialties within orthopaedics, which we found to be true as well when looking at knee OA.

There are several limitations to note in our study. Radiographs reviewed included AP and lateral, but the inclusion of Rosenberg flexion and PA would have given the providers a more complete assessment, which could alter agreement findings. A sunrise view would have been helpful for our providers to best assess PF osteoarthritis. Additionally, this study does not include an intra-rater reliability measure in order to limit the workload on our providers. There is also potential for experience bias in our study, in which the Orthopaedic surgeons and MSK radiologists may be more familiar looking at knee films. Despite this, all six providers were from a high-volume institution and had previous experience interpreting knee osteoarthritis. A limitation of our study also includes selection of radiographs from a single orthopaedic practice in a metropolitan area. Radiographs were not reviewed prior to the study to determine quality of alignment; therefore, some inconsistencies may have existed between images. Additionally, the two orthopaedic surgeons were fellowship-trained high-volume arthroplasty surgeons, thus our findings may not be generalizable to other subspecialty or generalist orthopaedic surgeons. Instructions to grade osteoarthritis as mild, moderate, or severe were likely interpreted subjectively by each of the reviewers; however, this subjective interpretation contributed to our main finding of inconsistent agreement between reviewers and is consistent with that used in ordinary practice.

Utilizing and having strict adherence to a standard grading system for evaluating radiographic knee arthritis remains a challenge, both within and between specialties. While the ultimate decision to undergo TKA must include information from patient history and physical examination, radiographs play an integral role in the evaluation and grading of arthritis. But without objectively stated findings, the utility of such an x-ray report must be considered. Radiographic assessment also frequently plays a role in assessment of surgical necessity by third-party payers, which invites issues of prior authorization. Our study demonstrated that, even between physicians of the same specialty, there remains a high degree of inconsistency. This inconsistency was even more pronounced when comparing across specialties. This may in turn result in obstacles when obtaining third-part payor approval, thus leading to challenges in providing timely and appropriate care. Future research should seek to identify how often language such as “mild,” “moderate,” and “severe” is used to grade OA as opposed to standardized grading systems. Establishing and adhering to a gold standard for use in the clinical arena that is reliable and efficient is critical to allow for improved decision making and communication among physicians, patients, and third-party payers.

Submitted: September 20, 2023 EDT

Accepted: December 09, 2023 EDT

References

Abdelaziz, Hussein, Oury M. Balde, Mustafa Citak, Thorsten Gehrke, Ahmed Magan, and Carl Haasper. 2019. “Kellgren–Lawrence Scoring System Underestimates Cartilage Damage When Indicating TKA: Preoperative Radiograph versus Intraoperative Photograph.” Archives of Orthopaedic and Trauma Surgery 139 (9): 1287–92. https://doi.org/10.1007/s00402-019-03223-6.

Is the Interpretation of Radiographic Knee Arthritis Consistent Between Orthopaedic Surgeons and Radiologists?

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Study Setting and Participants

Statistical Analysis

Results

Overall Agreement

Agreement Within Orthopaedic Surgeons

Agreement Among all Radiologists (MSK and General)

Agreement Across Specialties

Discussion

References

This website uses cookies