Loading [Contrib]/a11y/accessibility-menu.js
Skip to main content
null
J Orthopaedic Experience & Innovation
  • Menu
  • Articles
    • Brief Report
    • Case Report
    • Data Paper
    • Editorial
    • Hand
    • Meeting Reports/Abstracts
    • Methods Article
    • Product Review
    • Research Article
    • Review Article
    • Review Articles
    • Systematic Review
    • All
  • For Authors
  • Editorial Board
  • About
  • Issues
  • Blog
  • "Open Mic" Topic Sessions
  • Advertisers
  • Recorded Content
  • CME
  • JOEI KOL Connect
  • search

RSS Feed

Enter the URL below into your favorite RSS reader.

http://localhost:44606/feed
Review Article
Vol. 2, Issue 2, 2021August 23, 2021 EDT

Fairness in AI: How Can We Avoid Bias and Disparities in Orthopedic Applications of Artificial Intelligence?

Karl Surmacz, PhD, Atul F Kamath, MD, Dave Van Andel, MS,
health equitymachine learningartificial intelligence
Copyright Logoccby-nc-nd-4.0 • https://doi.org/10.60118/001c.25901
J Orthopaedic Experience & Innovation
Surmacz, Karl, Atul F Kamath, and Dave Van Andel. 2021. “Fairness in AI: How Can We Avoid Bias and Disparities in Orthopedic Applications of Artificial Intelligence?” Journal of Orthopaedic Experience & Innovation 2 (2). https:/​/​doi.org/​10.60118/​001c.25901.
Save article as...▾

View more stats

Abstract

Recent advances in artificial intelligence have the potential to transform the field of orthopedics. As well as the opportunities there are numerous challenges associated with applying AI to clinical decision-making, one such example being algorithmic fairness. In this article we introduce the concepts of bias and fairness in machine learning from an orthopedics perspective, covering concepts, examples, possible approaches and implications on the community. We hope that by working to embed these concepts and associated best practice into health data-product development workflows, we can help to promote fair and effective use of these powerful tools for all patients.

Introduction

Recent advances in artificial intelligence (AI) technology have created huge opportunities to transform healthcare (Davenport and Kalakota 2019). Interest in machine learning (ML) approaches to AI, specifically around deep learning techniques (LeCun, Bengio, and Hinton 2015), has exploded in the last few years. Exploration in this domain has been powered by an exponential increase in the availability of data and the seemingly limitless access to cloud computing power.

AI has the potential to improve health outcomes for patients through enhancing clinical decision-making (Barbieri et al. 2016; Gulshan et al. 2016), smart tooling and planning for surgical procedures (Knoops et al. 2019), improving support for patients through illness and recovery (Smuck, Odonkor, Wilt, et al. 2021), and increasing efficiencies in care systems (Bush 2018; Karnuta et al. 2019, 2020), to name a few areas. Orthopedics as a clinical field is ripe for innovation in this space (Haeberle et al. 2019). MedTech companies in orthopedics are pioneering the use of technologies such as wearables and smart devices for patient monitoring and engagement (Zimmer Biomet, n.d.-a) and clinical decision support (Kumar et al. 2021), as well as robotics and surgical planning tools that can lead to more objective technical performance intraoperatively (Karuppiah and Sinha 2018). Underpinning these technologies is a rich data collection environment – a digital ecosystem – where quantitative learning and algorithmic development have an opportunity to thrive.

However, there are numerous challenges associated with applying AI to clinical decision-making. One such challenge concerns fairness: equitable decision-making that does not discriminate based on biases encoded in datasets and methods used to develop the systems. There are already many examples of machine learning (ML) used in areas such as law enforcement (Angwin et al. 2016; Chouldechova 2017), education (Kizilcec and Lee, forthcoming) and employability (Raghavan et al. 2020), which have been shown to discriminate against under-represented groups. Designing fairness within AI-driven healthcare products is critical to help provide benefit to all people and avoid systematically disadvantaging minority populations - or worse, actively reinforcing that discrimination (O’Neil 2016).

The lack of transparency (or ‘black-box’ nature) of some AI models, which can bear little cognitive resemblance to the problem they are solving, can exacerbate these issues. It is often a non-trivial problem to understand the reasoning behind a decision or recommendation made by such a model. For example, in a scenario where a decision-support tool recommends that a patient’s total knee arthroplasty (TKA) surgery should be delayed, the patient and their clinical team would benefit from understanding the factors leading to such a decision. AI model interpretability and explainability are highly active topics of current research and development (Gilpin et al. 2018; “Explainable Machine Learning Challenge,” n.d.).

The field and industry are shifting in order to address these challenges:

  • Institutions are putting together guidelines (Awwad et al., n.d.) and toolkits (Bellamy et al., n.d.; Bird et al., n.d.) for promoting fairness and mitigating bias in ML applications

  • Companies are building out their data platforms and tooling to deliver data products in a principled way

  • The FDA is supporting the development of methods to build fairness into its AI-as-a-medical-device framework (Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, n.d.), the principles of which include enhancing trust in such systems through transparency, and supporting regulatory science research into methodologies for identifying and mitigating bias

In this editorial, we introduce the concepts of algorithmic fairness to the orthopedic community, to build awareness when developing and encountering AI-driven solutions. We will also discuss possible strategies to mitigate these issues and to minimize inequity in healthcare applications of AI.

Fair Machine Learning: An Introduction

To introduce the topic, let us consider an example. There are a large number of factors that have been associated with patient outcomes after a joint replacement, from clinical parameters to treatment variables and patient characteristics. One such factor is socioeconomic status (SES), characterized in the United States by the Index of Multiple Deprivation (IMD), for which there is mixed evidence of association with outcomes (Martsolf et al. 2016; DeKeyser et al. 2020). It is tempting to consider a measure of SES as an input to an AI-based decision tool to support an orthopedic pathway – we all have an intuitive sense of how SES and social factors may influence the outcome of joint replacement procedures. However, one would need to take care when doing so:

  • SES may be strongly associated with other factors, such as access to high-quality care, which may be the underlying explanation for patient outcome

  • Perceptions associated with characteristics such as SES can lead to bias in human decision-making

  • SES may be covariable to other evolving metrics, such as social determinants of health

The above could inadvertently lead to an AI model unfairly discriminating against patients from a low socioeconomic background – even if traditional model performance metrics remain high.

To illustrate further, we will consider 3 hypothetical examples of AI-driven products centered around the orthopedic episode-of-care:

  1. A software tool that highlights the Total Joint Replacement patients within a clinical team’s cohort most at risk of poor quality-of-life outcome, so that a clinician can help with their recovery

  2. A software-based risk assessment tool that recommends to a surgeon whether to delay or deny surgery, based on patient co-morbidities / characteristics and expected patient outcome

  3. A software-based optimization tool that assigns patients to surgical teams based on surgeon preference and expected outcomes

In use-case (1), it is possible that the tool would be more likely to highlight patients with a low IMD score, resulting in greater support from the clinical team. This may result in insights that reduce disparate healthcare outcomes in populations with lower SES or in those impacted by racial biases. In use-case (2), however, the output of such an AI product could lead to low-income patients disproportionately having their access to surgery delayed or denied, without appropriate consideration during product validation.

The potential consequences in use-case (3) are even more detrimental. Suppose that a group of highly-skilled orthopedic surgeons are reluctant to take on patients with low SES because of the risk of poor outcomes/complications or other issues such as lack of acceptable reimbursement. The tool in this case would pick up this bias, and recommend pairing such patients elsewhere. Such a logic would lead to a reduction of low-SES patients’ access to skilled surgeons, and may even reinforce this undesirable selection of patients. Such biases can find their way into datasets in other ways, such as selection criteria for treatment programs with limited resources, some of which may be made based on the care team’s perceptions.

These issues are not specific to healthcare. Topics such as facial recognition technology (Buolamwini 2018) and loan decisions (Rice and Swesnik, n.d.) have previously received scrutiny for algorithms which have inadvertently discriminated based on race, age and gender. As we have shown, the level of risk can depend heavily on the use-case. Essentially, we must ask: is there a risk that a person in a vulnerable or minority group can be assigned a negative/detrimental outcome simply by virtue of them being in that group? Fortunately, there is a growing body of research and tools to help mitigate and avoid these biases (Zemel et al. 2013), which we will touch on in the next section.

Approaches

There are numerous different approaches for detecting and mitigating bias/unfairness in ML solutions. We shall briefly list a few examples here; for a more complete discussion of the technical methods please refer to a work by Caton et al. (2020) and references contained therein.

Social determinants of health (SDoH) are known to influence health equities. To avoid any unwanted influence of these parameters, one could consider them to be ‘protected’, and hence not usable for ML models. A similar approach is to only consider ‘modifiable risk factors,’ which a patient has some ability to influence or change. However, in our setting, SDoH could help to identify patients that need extra support and/or resources. Again, it depends on the use-case and the risk that this information does more harm than good.

We can also validate the quality of prediction to minimize discrepancies across different population sub-groups. For example, a tool predicting risk of rehospitalization could achieve a high accuracy across the whole population, but perform significantly worse on a minority group. This sub-group validation could happen at demographic, geographic and even provider level. Models trained predominantly on high-end orthopedics providers, where only patients with good access are able to receive joint replacements, may not translate well to geographies with significantly different demographics.

ML can also be used to reduce unexplained orthopedic disparities in care. Often, these populations experience healthcare disparities even after accounting for objective severity of osteoarthritis. A recent ML study (Pierson et al. 2021) evaluated the unexplainable higher reported pain scores among underserved subjects against a ML algorithm trained to detect radiographic image variations not captured in traditional scoring in an attempt to estimate reported pain. This use of ML found a significantly higher proportion of the disparity between these sub-groups could be explained, compared with radiologist grading of severity. Much of the increased pain experienced by these underserved populations, previously attributed to external sociocultural causes, was more likely due to anatomical factors not currently captured in radiographic assessment. Findings such as these can lead to reduced disparity in treatment access for underserved populations, in this case focusing more on surgical/anatomical intervention and less on treatments related to psychosocial support.

Population demographics and the measures associated with them may change over time. Such a shift could cause the performance of ML-driven products to decrease. Monitoring real-world datasets for drift and bias, creating robust solutions resistant to outliers, and applying a post-market surveillance approach, can all help mitigate these changes.

Some issues cannot be solely managed by data scientists, requiring larger societal/political prioritization for equitable healthcare. Changing business models around reimbursement to encourage fair and equitable treatment in the face of the challenges of outcome disparities could lead to a more just access to care, valuing the dignity of every patient. Payers will have the responsibility to promote fair and effective use of these powerful tools, while providers of these products might contract to ensure appropriate use.

Outlook

The application of AI in healthcare and orthopedics is still in its early stages. As a community, we are well charged to learn from mistakes made in other industries. We must use recent advances in technology to build solid foundations for ethical AI that serve the wider population and build trust in institutions.

Awareness and mitigation of these issues can increase the likelihood that AI-based decision-support tools improve quality-of-care for underserved population sub-groups, turning potential risk into tangible benefit. Initiatives can be put in place to collect quality data, such as those from wearable technologies, from these populations. This has the added benefit of countering the community’s historic lack of systemic access as well as building back trust in the institutions initiating these projects.

It may seem that adoption is already underway, and that any issues around bias and fairness cannot stop this. However, to be seen as unfair will sow mistrust in users, which could lead to patients being less comfortable sharing the data upon which these applications depend on. Also, as we have seen recently in regions like Europe (“Proposal for a Regulation Laying down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act,” n.d.), top-down regulation can be brought in to protect individuals if standards are not met. It is better to shape the best-practice responsibly rather than have it imposed in the future.

Forward-thinking technology at the intersection of providers, payors and clinical consumers should consider the following:

  • For solution providers: how are you building out your technology stack (Zimmer Biomet, n.d.-b) to evaluate and mitigate for bias and inequity in your solutions? What metrics are you using to quantify this? Whilst there is no ‘one-size-fits-all’ approach to this, numerous open-source tools are emerging to tackle this. Investment in this now will mitigate future technical and reputational debt.

  • How can the relationships that you build out between payors and providers encourage and incentivize this way of thinking? If you are adopting AI-driven products, ask about how they have been developed to promote equitable outcomes. Setting out solid foundations early on for use of AI within healthcare will save time and effort in the long-run.

  • Are the populations and cohorts used to validate your products representative enough? What initiatives can be put in place to access greater population diversity in your clinical studies?

  • How are you embedding fairness into your regulatory frameworks and processes? For regulatory organizations: how can you encourage organizations to do this in a way that does not carry a burden that might deter them?

  • Take a look at the teams that are designing, building and using AI products. How likely are they to understand and spot risk around lack of inclusivity? What can you do to bake these principles into your organizational culture (“J. Robert Gladden Orthopaedic Society,” n.d.)? Teams and companies that engage with the concepts of diversity within orthopedics (“Movement Is Life Caucus,” n.d.; CEO Action For Diversity & Inclusion, n.d.) are more likely to build products that reflect these values.

As AI technology sits poised to transform healthcare, it is critical that good ethical practice around algorithmic fairness becomes commonplace within healthcare data science teams, and that the industry shifts to meet this need. The orthopedic community’s continued attention to the design of these principles into data- and AI-driven products from the outset will lead to all patients benefitting from this transformation.

Submitted: June 14, 2021 EDT

Accepted: July 17, 2021 EDT

References

Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias.” ProPublica, May 23, 2016.
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. n.d. US Food and Drug Administration. https:/​/​www.fda.gov/​media/​145022/​download.
Google Scholar
Awwad, Yazeed, Richard Fletcher, Daniel Frey, Amit Gandhi, Maryam Najafian, and Mike Teodorescu. n.d. “Exploring Fairness in Machine Learning for International Development.” https:/​/​d-lab.mit.edu/​resources/​publications/​exploring-fairness-machine-learning-international-development.
Barbieri, Carlo, Manuel Molina, Pedro Ponce, Monika Tothova, Isabella Cattinelli, Jasmine Ion Titapiccolo, Flavio Mari, et al. 2016. “An International Observational Study Suggests That Artificial Intelligence for Clinical Decision Support Optimizes Anemia Management in Hemodialysis Patients.” Kidney International 90 (2): 422–29. https:/​/​doi.org/​10.1016/​j.kint.2016.03.036.
Google Scholar
Bellamy, Rachel K.E. et al. n.d. Exploring Fairness in Machine Learning for International Development.
Google Scholar
Bird, Sarah et al. n.d. “Fairlearn: A Toolkit for Assessing and Improving Fairness in AI.” https:/​/​www.microsoft.com/​en-us/​research/​publication/​fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/​.
Buolamwini, Joy. 2018. “Timnit Gebru; Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR, 81:77–91.
Google Scholar
Bush, J. 2018. “How AI Is Taking the Scut Work out of Health Care.” Harvard Business Review. https:/​/​hbr.org/​2018/​03/​how-ai-is-taking-the-scut-work-out-of-health-care.
Google Scholar
Caton, Simon, and Christian Haas. 2020. “Fairness in Machine Learning: A Survey.” https:/​/​arxiv.org/​pdf/​2010.04053.pdf.
CEO Action For Diversity & Inclusion. n.d. “CEO Pledge Commitments.” https:/​/​www.ceoaction.com/​pledge/​ceo-pledge/​.
Chouldechova, Alexandra. 2017. “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.” Big Data, June, 153–63.
Google Scholar
Davenport, Thomas, and Ravi Kalakota. 2019. “The Potential for Artificial Intelligence in Healthcare.” Future Healthcare Journal 6 (2): 94–98. https:/​/​doi.org/​10.7861/​futurehosp.6-2-94.
Google ScholarPubMed CentralPubMed
DeKeyser, G.J., M.B. Anderson, H.D. Meeks, C.E. Pelt, C.L. Peters, and J.M. Gililland. 2020. “Socioeconomic Status May Not Be a Risk Factor for Periprosthetic Joint Infection.” J Arthroplasty 35 (7): 1900–1905.
Google Scholar
“Explainable Machine Learning Challenge.” n.d. https:/​/​community.fico.com/​s/​explainable-machine-learning-challenge.
Gilpin, Leilani H., David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. “Explaining Explanations: An Overview of Interpretability of Machine Learning.” In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE. https:/​/​doi.org/​10.1109/​dsaa.2018.00018.
Google Scholar
Gulshan, Varun, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, et al. 2016. “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.” JAMA 316 (22): 2402. https:/​/​doi.org/​10.1001/​jama.2016.17216.
Google Scholar
Haeberle, Heather S., James M. Helm, Sergio M. Navarro, Jaret M. Karnuta, Jonathan L. Schaffer, John J. Callaghan, Michael A. Mont, Atul F. Kamath, Viktor E. Krebs, and Prem N. Ramkumar. 2019. “Artificial Intelligence and Machine Learning in Lower Extremity Arthroplasty: A Review.” The Journal of Arthroplasty 34 (10): 2201–3. https:/​/​doi.org/​10.1016/​j.arth.2019.05.055.
Google Scholar
“J. Robert Gladden Orthopaedic Society.” n.d. https:/​/​www.gladdensociety.org/​.
Karnuta, Jaret M., Joshua L. Golubovsky, Heather S. Haeberle, Prashant V. Rajan, Sergio M. Navarro, Atul F. Kamath, Jonathan L. Schaffer, Viktor E. Krebs, Dominic W. Pelle, and Prem N. Ramkumar. 2020. “Can a Machine Learning Model Accurately Predict Patient Resource Utilization Following Lumbar Spinal Fusion?” The Spine Journal 20 (3): 329–36. https:/​/​doi.org/​10.1016/​j.spinee.2019.10.007.
Google Scholar
Karnuta, Jaret M., Sergio M. Navarro, Heather S. Haeberle, J. Matthew Helm, Atul F. Kamath, Jonathan L. Schaffer, Viktor E. Krebs, and Prem N. Ramkumar. 2019. “Predicting Inpatient Payments Prior to Lower Extremity Arthroplasty Using Deep Learning: Which Model Architecture Is Best?” The Journal of Arthroplasty 34 (10): 2235-2241.e1. https:/​/​doi.org/​10.1016/​j.arth.2019.05.048.
Google Scholar
Karuppiah, Karthik, and Joydeep Sinha. 2018. “Robotics in Trauma and Orthopaedics.” The Annals of The Royal College of Surgeons of England 100 (6_sup): 8–15. https:/​/​doi.org/​10.1308/​rcsann.supp1.8.
Google ScholarPubMed CentralPubMed
Kizilcec, René F., and Hansol Lee. Forthcoming. “Algorithmic Fairness in Education.” In Ethics in Artificial Intelligence in Education, edited by W. Holmes and K. Porayska-Pomsta. Taylor & Francis.
Google Scholar
Knoops, Paul G. M., Athanasios Papaioannou, Alessandro Borghi, Richard W. F. Breakey, Alexander T. Wilson, Owase Jeelani, Stefanos Zafeiriou, et al. 2019. “A Machine Learning Framework for Automated Diagnosis and Computer-Assisted Planning in Plastic and Reconstructive Surgery.” Scientific Reports 9 (1): 41598–41019. https:/​/​doi.org/​10.1038/​s41598-019-49506-1.
Google ScholarPubMed CentralPubMed
Kumar, Vikas, Christopher Roche, Steven Overman, Ryan Simovitch, Pierre-Henri Flurin, Thomas Wright, Joseph Zuckerman, Howard Routman, and Ankur Teredesai. 2021. “Using Machine Learning to Predict Clinical Outcomes after Shoulder Arthroplasty with a Minimal Feature Set.” Journal of Shoulder and Elbow Surgery 30 (5): e225–36. https:/​/​doi.org/​10.1016/​j.jse.2020.07.042.
Google Scholar
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https:/​/​doi.org/​10.1038/​nature14539.
Google Scholar
Martsolf, G.R., M.L. Barrett, A.J. Weiss, R. Kandrack, R. Washington, C.A. Steiner, A. Mehrotra, N.F. SooHoo, and R. Coffey. 2016. “Impact of Race/Ethnicity and Socioeconomic Status on Risk-Adjusted Hospital Readmission Rates Following Hip and Knee Arthroplasty.” J Bone Joint Surg Am 98 (16): 1385–91.
Google Scholar
“Movement Is Life Caucus.” n.d. https:/​/​www.movementislifecaucus.com/​.
O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York, NY, USA: Crown Publishing Group.
Google Scholar
Pierson, Emma, David M. Cutler, Jure Leskovec, Sendhil Mullainathan, and Ziad Obermeyer. 2021. “An Algorithmic Approach to Reducing Unexplained Pain Disparities in Underserved Populations.” Nature Medicine 27 (1): 136–40. https:/​/​doi.org/​10.1038/​s41591-020-01192-7.
Google Scholar
“Proposal for a Regulation Laying down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act.” n.d. https:/​/​digital-strategy.ec.europa.eu/​en/​library/​proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence.
Raghavan, Manish, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. “Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20), 469–81. New York, NY, USA: Association for Computing Machinery.
Google Scholar
Rice, Lisa, and Deidre Swesnik. n.d. Discriminatory Effects of Credit Scoring on Communities of Color.
Google Scholar
Smuck, M., C.A. Odonkor, J.K. Wilt, et al. 2021. “The Emerging Clinical Role of Wearables: Factors for Successful Implementation in Healthcare.” NPJ Digital Medicine 4:45.
Google Scholar
Zemel, Rich, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. “Proceedings of the 30th International Conference on Machine Learning.” PMLR 28 (3): 325–33.
Google Scholar
Zimmer Biomet. n.d.-a. “mymobility with Apple Watch.” https:/​/​www.zimmerbiomet.com/​medical-professionals/​zb-edge/​mymobility.html.
———. n.d.-b. “ZBEdge.” https:/​/​www.zimmerbiomet.com/​medical-professionals/​zb-edge.html.

This website uses cookies

We use cookies to enhance your experience and support COUNTER Metrics for transparent reporting of readership statistics. Cookie data is not sold to third parties or used for marketing purposes.

Powered by Scholastica, the modern academic journal management system