Introduction
Lower extremity joint replacement (LEJR) is the second-largest inpatient expense to the U.S. Centers for Medicare & Medicaid Services (CMS), and the demand is growing (Springer and McInerney 2021; Shichman et al. 2023). In response, CMS tests alternative payment models (APMs) that move away from fee-for-service (FFS) and promote value-based care (VBC)—including nearly two decades spent refining bundled payment programs (Carter Clement et al. 2017). Today, over half of arthroplasty surgeons participate in these models (Rana et al. 2022). Bundled payments hold healthcare teams accountable for cost, quality, and outcomes to encourage efficiency and financial savings. The Bundled Payments for Care Improvement Advanced Model (BPCI-A), launched in 2018, was a voluntary APM that built on lessons from Comprehensive Care for Joint Replacement (CJR) and Bundled Payments for Care Improvement (BPCI) Models. Under BPCI-A, acute care hospitals (ACHs), physician group practices (PGPs), and conveners joined as participants (episode initiators) and assumed financial risk for the quality and cost of care for Medicare beneficiaries during a clinical episode. Conveners, who represented the majority of BPCI-A initiators, supported ACHs and PGPs at the frontlines of healthcare system transformation by promoting BPCI-A participation, assuming in-part or all of the financial risk, and implementing cost-saving evidence-based strategies and quality improvement initiatives – such as preoperative patient optimization programs, modified anesthesia protocols, accelerated physical therapy timelines, same-day discharge, and safe discharge home (Lewin Group Inc. 2024; Berlin et al. 2021; Featherall et al. 2018; Debbi et al. 2022).
Each inpatient episode, including Medicare Severity-Diagnosis Related Group (MS-DRG) 469 (major replacement or reattachment of lower extremity with major complications or comorbidities) and MS-DRG 470 (without major complications or comorbidities), was assigned a target price for the period between hospitalization and 90-days post-discharge. At the end of the performance period, CMS reconciled participant FFS expenditures against an adjusted target price for each clinical episode initiated. Participants who kept net costs below the cumulative target prices earned a quality-adjusted reconciliation payment from CMS, while those exceeding the summed targets incurred a financial penalty and submitted a repayment to CMS (Somers et al. 2022; BPCI Advanced Target Price Specifications Model Year 4 2020). Bundled payments offer opportunities for shared savings but also expose participants to financial risk, particularly from “bundle busters”–patients whose care costs exceed the target payment. Higher episode cost often stems from increased care needs (Plate et al. 2016; Kurtz et al. 2017). Accurate prediction of high-cost THA episodes supports early patient optimization, targeted care pathway investments to better meet patient needs, and improvements to CMS’s target pricing methodology. A better understanding of risk stratification can also drive more appropriate use of inpatient versus outpatient status and enhanced reimbursement precision accuracy in FFS models.
PGPs are at the frontlines of THA delivery in the face of declining reimbursement and increasing patient comorbidities. PGPs were responsible for 73 percent of joint replacement episodes among BPCI-A participants in Model Years 1-2, outpacing hospitals (Crowley et al. 2025). Despite participating PGPs achieved greater reductions in spending compared to PGP nonparticipants and ACHs, they still faced negative mean reconciliation payments (Crowley et al. 2025; Shashikumar et al. 2024). Understanding the PGP experience in BPCI-A has key implications in encouraging PGPs to own their bundles and take a leading role in shaping the healthcare landscape. The purpose of this study was to determine which preoperative patient factors or facility characteristics were associated with exceeding the target payment (‘bundle busters’) in Medicare patients who underwent a primary elective THA as part of a BPCI-A bundle initiated by PGPs partnered with a single convener. We then sought to develop a predictive model to anticipate if a THA episode would exceed the target price.
Methods
Study Population
After receiving Institutional Review Board exemption for data analysis, a query of the enterprise data warehouse of a third-party VBC convener identified 4,178 consecutive patients who underwent elective inpatient THA through a PGP-partnership in BPCI-A bundles between January 2020 (Model Year 3) and December 2022 (Model Year 5). Of these, 120 (2.9%) cases were classified as MS-DRG 469. The total sample included 29 PGPs. For all but two clients, the convener assumed full downside risk, while PGPs retained a significant portion of shared savings. All PGPs were provided with advanced data analytics and care coordination support.
Patient Factors
As part of the patient case mix adjustment (PCMA) portion of target price calculation, CMS considered age, comorbidities (via CMS-Hierarchical Condition Categories, HCCs), comorbidity burden (total HCC count per beneficiary), and recent resource use in the 90-day period prior to the clinical episode (BPCI Advanced Target Price Specifications Model Year 4 2020; BPCI Advanced Target Price Specifications Model Years 1 and 2 2018; BPCI Advanced Model Overview Fact Sheet - Model Year 3 (MY3), n.d.; BPCI Advanced Target Price Specifications Model Year 5 2022). For the purposes of this study, we collected patient age, sex, long-term institutional or post-acute care (any home health or stay at a long-term care hospital, skilled nursing facility, or inpatient rehabilitation facility) in the 90-days prior, six co-occurring diagnoses, 78 HCC variables, and total HCCs. None of the patients in this study had the following HCCs: dialysis; cystic fibrosis; severe head injury; quadriplegia; coma, brain compression, anoxic damage; ALS and other motor neuron disease; traumatic amputations and complications; pressure ulcer of the skin with necrosis through to muscle, tendon, or bone; or severe skin burn or condition.
Facility Characteristics
A new element in BPCI-A target pricing methodology was the construction of ACH peer groups, created through regression models based on five factors: major teaching hospital status, urban/rural designation, safety net status, census region (Northeast, South, Midwest, and West), and bed size (small, medium, large, extra-large, missing) (Springer and McInerney 2021; BPCI Advanced Target Price Specifications Model Year 4 2020). We collected data on major teaching hospital status, urban/rural designation, safety net status, census division (New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain and Pacific) which comprise the census regions, and bed size (“Census Regions and Divisions of the United States,” n.d.). All THAs in our cohort were performed at urban, non-safety-net facilities.
Primary Outcome
In this study, ‘bundle busters’ were defined as patients whose 90-day episode costs exceeded the final target price set by CMS during reconciliation, as previously described in the literature (Wodowski et al. 2019). Episode costs are presented using claims data. During reconciliation, CMS calculated the final price by incorporating historic Medicare FFS expenditures during the baseline period (Standardized Baseline Spending), actual PCMA, persistent differences in patient-mix adjusted across peer groups (Peer Group Historical Adjustment Factor, PGHA), the difference between the projected (PGT) and realized (PGT Factor Adjustment) Peer Group Trend, and the CMS Discount. PGPs received target prices that were unique to their relative case mix and the hospital at which the procedure was performed (BPCI Advanced Target Price Specifications Model Year 4 2020; BPCI Advanced Target Price Specifications Model Year 5 2022).
Statistical Analyses
Summary statistics are presented as medians with interquartile ranges for continuous variables and counts with percentages for categorical variables. Multivariable logistic regression was used to investigate the relationship between patient factors and facility characteristics on exceeding the final target price (‘bundle buster’). Model results are presented as odds ratios (ORs) with 95% confidence intervals (CIs). Model performance was evaluated using the concordance statistic (C-statistic), where values range from 0.5-1 with higher values indicating better performance. Variables were ranked based on their relative contribution to the model, as assessed by the Akaike information criterion (AIC) increase upon removal of the variable from the full model. An AIC increase ≥ 2 indicates a statistically significant improvement in the model, with larger increases highlighting the importance of that variable in explaining the outcome. Data management and analysis were performed using R (Version 4.3.1; Vienna, Austria). All tests were two-sided with an alpha level of 0.05.
Results
Of the 4,178 THAs included for analysis, 901 (21.6%) were bundle busters. Bundle busters had a median loss of $4,863 (IQR $1,429 - $13,617) whereas the median gain of a non-bundle buster was $3,069 (IQR $417 - $4,051) (Figure 1). More bundle busters were coded as MS-DRG 469 (P < 0.001). Overall, the number of total cases declined from Model Year 3 to Model Year 5, and the proportion of bundle busters increased annually (P < 0.001). (Table 1)
Patient Factors
Bundle busters were older (P = 0.029) and utilized more post-acute care services (P < 0.001). Multiple co-occurring diagnoses and comorbidities were significantly different between bundle busters and non-bundle busters (Table 2).
Facility Characteristics
More bundle busters underwent surgery at a major teaching hospital (P = 0.021) and at a large or extra-large hospital (P=0.004). Census division distribution also significantly differed (P < 0.001). (Table 3)
Multivariate Logistic Regression
In multivariate logistic regression, higher odds of busting the bundle were associated with episodes occurring in Model Year 4 (OR 1.32, P = 0.001) or 5 (OR 2.35, P < 0.001), at large or extra-large hospitals (OR 1.57, P = 0.014), and among patients with prior post-acute care use (OR 1.71, P = 0.004), higher total HCC (OR 1.44, P = 0.034), or metastatic cancer and acute leukemia (OR 2.49, P = 0.029). Lower odds of busting the bundle were associated with MS-DRG 470 (OR 0.47, P < 0.001), coagulation defects and other specified hematological disorders (OR 0.49, P = 0.011) or myasthenia gravis, myoneural disorders, or inflammatory and toxic neuropathies (OR 0.19, P = 0.049). Compared to New England, THAs performed in the East North Central (OR 0.40, P = 0.002), South Atlantic (OR 0.39, P = 0.003), East South Central (OR 0.37, P < 0.001), West South Central (OR 0.27, P < 0.001), Mountain (OR 0.42, P = 0.001), and Pacific (OR 0.28, P < 0.001) divisions had lower odds of busting the bundle. (Table 4)
The C-statistic for this model was 0.678. Total HCC (AIC 2.4) was the ninth most important variable influencing the likelihood of exceeding target price, following census division (AIC 38.4), model year (AIC 16.6), MS-DRG (AIC 10.8), prior post-acute care (AIC 6.1), coagulation defects (AIC 4.7), bed size (AIC 3.2), myasthenia gravis (AIC 3.1), and metastatic cancer (AIC 2.7) (Figure 2).
Discussion
In our cohort of 4,178 Medicare patients undergoing primary elective inpatient THA within BPCI-A bundles initiated by PGPs partnered with a single convener, we identified nine factors that affected the odds of becoming a “bundle buster”, but our model had moderate discriminative power.
Success in a bundled payment model hinges on achieving clinically meaningful, patient-centered outcomes while keeping costs below the target. While orthopaedic surgeons aim for uncomplicated recoveries after elective THA, some patients inevitably require a higher level of care and greater resources (Bosco et al. 2014; Clair et al. 2016). CMS employed two strategies to limit the impact of outliers on reconciliation payments: a risk cap, in which the 1st and 99th percentile of spending was winsorsized, and stop- loss/gain limits, which capped the total reconciliation at 20% of the volume-weighted sum of the final net reconciliation amount (BPCI Advanced Model Year 5 Fact Sheet 2021). Nevertheless, participants remained financially vulnerable to “bundle busters” who drive net negative reconciliation payments (Springer and McInerney 2021; Krueger et al. 2021; Parikh et al. 2024).
Our study builds on prior research investigating patient factors associated with high-cost LEJR in bundled payment programs. Unlike Fillingham et al (Fillingham et al. 2020) who established a cut-off ($20,084) to define high-cost Medicare THA cases, we recognize the variability in CMS target prices complicates the ability to define a single cost-threshold. Different statistical methods – such as the Anderson-Darling test, boxplots, z-scores, 75th percentile, or a combination of the Shapiro-Wilk test, histogram evaluation, kurtosis test, and skewness evaluation – yield different outlier cutoffs. Episodes exceeding this arbitrary threshold may not always result in financial loss, and when they do, the deficit can vary widely due to the complexity of the target pricing methodology. Relying on simplistic statistical approaches to define outliers within non-parametric cost data is inherently flawed and fails to accurately capture the true financial risk assumed by participants. We believe predictive models should use the same definition of bundle busters as employed in this study and by Wodowski et al (Wodowski et al. 2019) and Ryan et al (Ryan, Goltz, et al. 2019). to accurately reflect the burden of bundle-busters. However, lacking actual target prices for 66% of cases, Ryan et al (Ryan, Goltz, et al. 2019) extrapolated backward from 2016 targets. Additionally, both authors combined multiple LEJR procedures in their analyses (Wodowski et al. 2019; Ryan, Goltz, et al. 2019). A key strength of our study is the use of real final target price data with a specific focus on THAs in BPCI-A. Unlike the previous-single institution studies, our dataset comprises PGPs nationwide, enhancing the generalizability of our findings and enabling analysis of PGT factor components, among which census division and bed size proved to be significant influences on our model.
In our multivariable logistic regression, patients who utilized post-acute care services in the 90-days before THA had 71% greater odds of being classified as a “bundle buster” (P = 0.004). This may reflect a patient preference of discharge destination, highlighting the need for preoperative patient education on the merits of home discharge when medically appropriate (Fang et al. 2020). Alternatively, this utilization may be a signal of frailty (Schuijt et al. 2021; Flinn et al. 2025; Nidadavolu et al. 2020). Frailty is associated with increased health expenditures and a higher risk of adverse events following LEJR (Tram et al. 2022; Ron et al. 2025). However, arthroplasty plays a crucial role in improving mobility, independence, and overall frailty in this patient population (Ron et al. 2025; Kappenschneider et al. 2024). Given these risks and benefits, CMS should incorporate frailty-based risk adjustments in their payment calculations. Incorporating prior post-acute care service utilization into dashboards on the electronic medical record can help physicians screen for frail patients who would benefit from targeted prehabilitation and postoperative interventions including improved care coordination with case management, physical therapy, and occupational therapy to align resources with patient needs, provide home health, and schedule timely postoperative follow-up visits (Crowley et al. 2025; Ron et al. 2025; Kappenschneider et al. 2024; Pearl et al. 2023).
Consistent with prior studies linking higher comorbidity burden to higher care needs, our model demonstrated that each additional HCC increased the odds of being classified as an outlier by 44% (P = 0.034) (Elings et al. 2015; Pasqualini et al. 2025). Recognizing the impact of comorbidities, the HCC lookback period was extended from 90 days to 180 days in Model Year 5 to improve model fit (BPCI Advanced Model Year 5 Fact Sheet 2021). Notably, no modifiable risk factors were significant, likely reflecting the substantial time and effort modern arthroplasty surgeons dedicate to preoperative optimization (Wasterlain et al. 2019). The only significant HCCs were coagulation defects and other hematologic disease, myasthenia gravis/myoneural disorders/inflammatory and toxic neuropathies, and metastatic cancer or acute leukemia. These conditions are associated with higher complication rates following arthroplasty (Challoumas et al. 2024; Sherman et al. 2021; Newman et al. 2018). Lower odds of exceeding target payments in patients with coagulation defects, neuromuscular disorders, and inflammatory neuropathies suggest an accurate accounting of risk.
In the first two years of BPCI-A, CMS paid $567 million in positive reconciliation and was on track for a $2 billion loss, prompting major target pricing adjustments (Daly 2020; Ryan et al. 2023). Model Year strongly predicts bundle busting, with odds increasing annually as CMS refined pricing to increase accuracy of target prices and find savings (Lewin Group Inc. 2024). Beginning in Model Year 3, baseline data used to assess historical performance shifted forward annually (BPCI Advanced Model Year 5 Fact Sheet 2021). Model Year 4 required participants to select service lines instead of unrelated bundles, introduced retrospective trend adjustments of peer group trends, and eliminated the PGP offset (Lewin Group Inc. 2024; BPCI Advanced Model Year 4 Fact Sheet, n.d.; Scheurer, n.d.). The removal of THA from the inpatient-only list in 2020 temporarily shifted healthier, less resource-intensive THA patients outside of bundle payments until Model Year 5 when outpatient hips were included (BPCI Advanced Model Year 5 Fact Sheet 2021; Parikh et al. 2024; Turcotte et al. 2020). This shift to ambulatory surgical centers likely explains bed size effect in our model, as higher-risk patients were treated at larger hospitals with more resources (BPCI Advanced Model Year 4 Fact Sheet, n.d.).
This study has several limitations. Our dataset is limited to the PGPs working with a single convener which took all downside risk in most of the agreements, such that PGPs received most of savings. As downside risk increased over time, other conveners in BPCI-A exited the program. The experience of PGPs and the impact of conveners on BPCI-A remains underexplored (Berlin et al. 2021; Shashikumar et al. 2024). This present study fills a gap by being the first to provide insight into PGP-convener partnerships in the context of THA in BPCI-A. PGPs play an important role in caring for Medicare beneficiaries. Conveners drive savings, expand participation in voluntary CMS models to create a more nationally representative sample, and guide PGPs through the complexities of APMs (Berlin et al. 2021; Somers et al. 2022; Murphy et al. 2019; Klika et al. 2025). Administrative claims data is vulnerable to miscoding, undercoding, and/or billing errors and the inability to assess other preoperative factors known to affect cost, such as patient-reported outcome measures, socioeconomic data, or history of mental health or substance abuse (Ryan, Goltz, et al. 2019; Squitieri et al. 2017; Courtney et al. 2017; Grits et al. 2022). While a potential criticism of our study is the inclusion of patients with both MS-DRG 469 and 470 designations, we believe MS-DRG codes provide inadequate risk stratification (Ryan, Plate, et al. 2019). MS-DRG codes are assigned at discharge, and physicians with greater awareness of billing codes can document the extent of patient comorbidities to justify the higher-reimbursed MS-DRG 469 designation if they anticipate a costlier post-operative course. Nevertheless, 42% of MS-DRG 469 patients exceeded their target price compared to 21% of MS-DRG 470 patients, highlighting that adjustments made for the major comorbidities and complications designation does not consistently align with true episode cost. Importantly, patients classified under MS-DRG 470 may still have risk factors associated with increased episode costs (Wodowski et al. 2019; Fillingham et al. 2020; Ryan, Plate, et al. 2019).
Despite these limitations, our findings have important implications as orthopaedic surgeons become increasingly engaged in APMs and must better understand financial risk in an evolving reimbursement landscape. By analyzing claims data and the final target prices provided by CMS for 29 PGP clients of a single convener, we identified nine factors associated with a bundle buster in BPCI-A. Equipped with this knowledge, physicians can take part in shaping the healthcare system of tomorrow. The upcoming Transforming Episode Accountability Model (TEAM), a new five-year mandatory episode-based payment model, reflects key learnings from CJR and BPCI-A and will replace these models in January 2026. Currently, it excludes PGPs and conveners as episode initiators due to concerns about episode attribution, care coordination services, and patient volume. Prior study has demonstrated that the greatest magnitude of savings were achieved by beneficiaries treated by both participating physicians and participating hospitals, signifying the potential for success in gainsharing approaches (Crowley et al. 2025). CMS signaled potential pathways for PGP involvement, whether through arrangements with TEAM participants, future consideration for TEAM inclusion, or new PGP-specific models (Centers for Medicare & Medicaid Services 2024). Refined predictive models are needed to empower PGPs and conveners to strategically redesign care to ensure high-quality THA for all patients across different practice models. Similar investigation should be performed for other orthopaedic procedures as bundles expand to total shoulder arthroplasty and common spine procedures.


