Fairness in AI: How Can We Avoid Bias and Disparities in Orthopedic Applications of Artificial Intelligence?

Karl Surmacz; Atul F Kamath; Dave Van Andel

doi:10.60118/001c.25901

Surmacz, Karl, Atul F Kamath, and Dave Van Andel. 2021. “Fairness in AI: How Can We Avoid Bias and Disparities in Orthopedic Applications of Artificial Intelligence?” Journal of Orthopaedic Experience & Innovation 2 (2). https://doi.org/10.60118/001c.25901.

View more stats

Abstract

Recent advances in artificial intelligence have the potential to transform the field of orthopedics. As well as the opportunities there are numerous challenges associated with applying AI to clinical decision-making, one such example being algorithmic fairness. In this article we introduce the concepts of bias and fairness in machine learning from an orthopedics perspective, covering concepts, examples, possible approaches and implications on the community. We hope that by working to embed these concepts and associated best practice into health data-product development workflows, we can help to promote fair and effective use of these powerful tools for all patients.

Introduction

Recent advances in artificial intelligence (AI) technology have created huge opportunities to transform healthcare (Davenport and Kalakota 2019). Interest in machine learning (ML) approaches to AI, specifically around deep learning techniques (LeCun, Bengio, and Hinton 2015), has exploded in the last few years. Exploration in this domain has been powered by an exponential increase in the availability of data and the seemingly limitless access to cloud computing power.

AI has the potential to improve health outcomes for patients through enhancing clinical decision-making (Barbieri et al. 2016; Gulshan et al. 2016), smart tooling and planning for surgical procedures (Knoops et al. 2019), improving support for patients through illness and recovery (Smuck, Odonkor, Wilt, et al. 2021), and increasing efficiencies in care systems (Bush 2018; Karnuta et al. 2019, 2020), to name a few areas. Orthopedics as a clinical field is ripe for innovation in this space (Haeberle et al. 2019). MedTech companies in orthopedics are pioneering the use of technologies such as wearables and smart devices for patient monitoring and engagement (Zimmer Biomet, n.d.-a) and clinical decision support (Kumar et al. 2021), as well as robotics and surgical planning tools that can lead to more objective technical performance intraoperatively (Karuppiah and Sinha 2018). Underpinning these technologies is a rich data collection environment – a digital ecosystem – where quantitative learning and algorithmic development have an opportunity to thrive.

However, there are numerous challenges associated with applying AI to clinical decision-making. One such challenge concerns fairness: equitable decision-making that does not discriminate based on biases encoded in datasets and methods used to develop the systems. There are already many examples of machine learning (ML) used in areas such as law enforcement (Angwin et al. 2016; Chouldechova 2017), education (Kizilcec and Lee, forthcoming) and employability (Raghavan et al. 2020), which have been shown to discriminate against under-represented groups. Designing fairness within AI-driven healthcare products is critical to help provide benefit to all people and avoid systematically disadvantaging minority populations - or worse, actively reinforcing that discrimination (O’Neil 2016).

The lack of transparency (or ‘black-box’ nature) of some AI models, which can bear little cognitive resemblance to the problem they are solving, can exacerbate these issues. It is often a non-trivial problem to understand the reasoning behind a decision or recommendation made by such a model. For example, in a scenario where a decision-support tool recommends that a patient’s total knee arthroplasty (TKA) surgery should be delayed, the patient and their clinical team would benefit from understanding the factors leading to such a decision. AI model interpretability and explainability are highly active topics of current research and development (Gilpin et al. 2018; “Explainable Machine Learning Challenge,” n.d.).

The field and industry are shifting in order to address these challenges:

Institutions are putting together guidelines (Awwad et al., n.d.) and toolkits (Bellamy et al., n.d.; Bird et al., n.d.) for promoting fairness and mitigating bias in ML applications
Companies are building out their data platforms and tooling to deliver data products in a principled way
The FDA is supporting the development of methods to build fairness into its AI-as-a-medical-device framework (Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, n.d.), the principles of which include enhancing trust in such systems through transparency, and supporting regulatory science research into methodologies for identifying and mitigating bias

In this editorial, we introduce the concepts of algorithmic fairness to the orthopedic community, to build awareness when developing and encountering AI-driven solutions. We will also discuss possible strategies to mitigate these issues and to minimize inequity in healthcare applications of AI.

Fair Machine Learning: An Introduction

To introduce the topic, let us consider an example. There are a large number of factors that have been associated with patient outcomes after a joint replacement, from clinical parameters to treatment variables and patient characteristics. One such factor is socioeconomic status (SES), characterized in the United States by the Index of Multiple Deprivation (IMD), for which there is mixed evidence of association with outcomes (Martsolf et al. 2016; DeKeyser et al. 2020). It is tempting to consider a measure of SES as an input to an AI-based decision tool to support an orthopedic pathway – we all have an intuitive sense of how SES and social factors may influence the outcome of joint replacement procedures. However, one would need to take care when doing so:

SES may be strongly associated with other factors, such as access to high-quality care, which may be the underlying explanation for patient outcome
Perceptions associated with characteristics such as SES can lead to bias in human decision-making
SES may be covariable to other evolving metrics, such as social determinants of health

The above could inadvertently lead to an AI model unfairly discriminating against patients from a low socioeconomic background – even if traditional model performance metrics remain high.

To illustrate further, we will consider 3 hypothetical examples of AI-driven products centered around the orthopedic episode-of-care:

A software tool that highlights the Total Joint Replacement patients within a clinical team’s cohort most at risk of poor quality-of-life outcome, so that a clinician can help with their recovery
A software-based risk assessment tool that recommends to a surgeon whether to delay or deny surgery, based on patient co-morbidities / characteristics and expected patient outcome
A software-based optimization tool that assigns patients to surgical teams based on surgeon preference and expected outcomes

In use-case (1), it is possible that the tool would be more likely to highlight patients with a low IMD score, resulting in greater support from the clinical team. This may result in insights that reduce disparate healthcare outcomes in populations with lower SES or in those impacted by racial biases. In use-case (2), however, the output of such an AI product could lead to low-income patients disproportionately having their access to surgery delayed or denied, without appropriate consideration during product validation.

The potential consequences in use-case (3) are even more detrimental. Suppose that a group of highly-skilled orthopedic surgeons are reluctant to take on patients with low SES because of the risk of poor outcomes/complications or other issues such as lack of acceptable reimbursement. The tool in this case would pick up this bias, and recommend pairing such patients elsewhere. Such a logic would lead to a reduction of low-SES patients’ access to skilled surgeons, and may even reinforce this undesirable selection of patients. Such biases can find their way into datasets in other ways, such as selection criteria for treatment programs with limited resources, some of which may be made based on the care team’s perceptions.

These issues are not specific to healthcare. Topics such as facial recognition technology (Buolamwini 2018) and loan decisions (Rice and Swesnik, n.d.) have previously received scrutiny for algorithms which have inadvertently discriminated based on race, age and gender. As we have shown, the level of risk can depend heavily on the use-case. Essentially, we must ask: is there a risk that a person in a vulnerable or minority group can be assigned a negative/detrimental outcome simply by virtue of them being in that group? Fortunately, there is a growing body of research and tools to help mitigate and avoid these biases (Zemel et al. 2013), which we will touch on in the next section.

Approaches

There are numerous different approaches for detecting and mitigating bias/unfairness in ML solutions. We shall briefly list a few examples here; for a more complete discussion of the technical methods please refer to a work by Caton et al. (2020) and references contained therein.

Social determinants of health (SDoH) are known to influence health equities. To avoid any unwanted influence of these parameters, one could consider them to be ‘protected’, and hence not usable for ML models. A similar approach is to only consider ‘modifiable risk factors,’ which a patient has some ability to influence or change. However, in our setting, SDoH could help to identify patients that need extra support and/or resources. Again, it depends on the use-case and the risk that this information does more harm than good.

We can also validate the quality of prediction to minimize discrepancies across different population sub-groups. For example, a tool predicting risk of rehospitalization could achieve a high accuracy across the whole population, but perform significantly worse on a minority group. This sub-group validation could happen at demographic, geographic and even provider level. Models trained predominantly on high-end orthopedics providers, where only patients with good access are able to receive joint replacements, may not translate well to geographies with significantly different demographics.

ML can also be used to reduce unexplained orthopedic disparities in care. Often, these populations experience healthcare disparities even after accounting for objective severity of osteoarthritis. A recent ML study (Pierson et al. 2021) evaluated the unexplainable higher reported pain scores among underserved subjects against a ML algorithm trained to detect radiographic image variations not captured in traditional scoring in an attempt to estimate reported pain. This use of ML found a significantly higher proportion of the disparity between these sub-groups could be explained, compared with radiologist grading of severity. Much of the increased pain experienced by these underserved populations, previously attributed to external sociocultural causes, was more likely due to anatomical factors not currently captured in radiographic assessment. Findings such as these can lead to reduced disparity in treatment access for underserved populations, in this case focusing more on surgical/anatomical intervention and less on treatments related to psychosocial support.

Population demographics and the measures associated with them may change over time. Such a shift could cause the performance of ML-driven products to decrease. Monitoring real-world datasets for drift and bias, creating robust solutions resistant to outliers, and applying a post-market surveillance approach, can all help mitigate these changes.

Some issues cannot be solely managed by data scientists, requiring larger societal/political prioritization for equitable healthcare. Changing business models around reimbursement to encourage fair and equitable treatment in the face of the challenges of outcome disparities could lead to a more just access to care, valuing the dignity of every patient. Payers will have the responsibility to promote fair and effective use of these powerful tools, while providers of these products might contract to ensure appropriate use.

Outlook

The application of AI in healthcare and orthopedics is still in its early stages. As a community, we are well charged to learn from mistakes made in other industries. We must use recent advances in technology to build solid foundations for ethical AI that serve the wider population and build trust in institutions.

Awareness and mitigation of these issues can increase the likelihood that AI-based decision-support tools improve quality-of-care for underserved population sub-groups, turning potential risk into tangible benefit. Initiatives can be put in place to collect quality data, such as those from wearable technologies, from these populations. This has the added benefit of countering the community’s historic lack of systemic access as well as building back trust in the institutions initiating these projects.

It may seem that adoption is already underway, and that any issues around bias and fairness cannot stop this. However, to be seen as unfair will sow mistrust in users, which could lead to patients being less comfortable sharing the data upon which these applications depend on. Also, as we have seen recently in regions like Europe (“Proposal for a Regulation Laying down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act,” n.d.), top-down regulation can be brought in to protect individuals if standards are not met. It is better to shape the best-practice responsibly rather than have it imposed in the future.

Forward-thinking technology at the intersection of providers, payors and clinical consumers should consider the following:

For solution providers: how are you building out your technology stack (Zimmer Biomet, n.d.-b) to evaluate and mitigate for bias and inequity in your solutions? What metrics are you using to quantify this? Whilst there is no ‘one-size-fits-all’ approach to this, numerous open-source tools are emerging to tackle this. Investment in this now will mitigate future technical and reputational debt.
How can the relationships that you build out between payors and providers encourage and incentivize this way of thinking? If you are adopting AI-driven products, ask about how they have been developed to promote equitable outcomes. Setting out solid foundations early on for use of AI within healthcare will save time and effort in the long-run.
Are the populations and cohorts used to validate your products representative enough? What initiatives can be put in place to access greater population diversity in your clinical studies?
How are you embedding fairness into your regulatory frameworks and processes? For regulatory organizations: how can you encourage organizations to do this in a way that does not carry a burden that might deter them?
Take a look at the teams that are designing, building and using AI products. How likely are they to understand and spot risk around lack of inclusivity? What can you do to bake these principles into your organizational culture (“J. Robert Gladden Orthopaedic Society,” n.d.)? Teams and companies that engage with the concepts of diversity within orthopedics (“Movement Is Life Caucus,” n.d.; CEO Action For Diversity & Inclusion, n.d.) are more likely to build products that reflect these values.

As AI technology sits poised to transform healthcare, it is critical that good ethical practice around algorithmic fairness becomes commonplace within healthcare data science teams, and that the industry shifts to meet this need. The orthopedic community’s continued attention to the design of these principles into data- and AI-driven products from the outset will lead to all patients benefitting from this transformation.

Submitted: June 14, 2021 EDT

Accepted: July 17, 2021 EDT

References

Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias.” ProPublica, May 23, 2016.

Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. n.d. US Food and Drug Administration. https://www.fda.gov/media/145022/download.