Racial bias in healthcare AI applications

min read

Addressing racial bias in algorithms has been a significant challenge for clinical applications of artificial intelligence. Researchers have demonstrated that algorithms can reinforce existing health inequities arising from structural racism in the United States (Bailey et al. 2017), from cancer-detection algorithms that are less effective for Black patients to cardiac risk-scores that underestimate the amount of care needed by Black patients (Owens et al. 2020). Maternal health is no exception. For example, the Vaginal Birth after Cesarean (VBAC) algorithm, which was widely used until 2021, predicted lower successful VBAC rates for pregnant people of color, contributing to higher c-section rates among women of color (Vyas et al. 2020). Racial bias in algorithms results from the use of clinical data that encodes racial inequities in access to healthcare or experiences of racial discrimination.

Ever growing literature on fairness in machine learning demonstrates that bias can be addressed at multiple points during algorithm development. Teams composed of health disparities researchers, clinicians, health policy experts, and data scientists can provide multiple perspectives on potential biases and improve the fairness of the final algorithm (Rajkomar et al. 2018; Owens et al. 2020). For algorithmic development, the underlying data source should adequately represent populations with significant disease burden and ideally capture complex intersectional groups that may be systematically underrepresented in healthcare solutions (Chen et al. 2021). When assessing algorithmic performance, researchers should evaluate the performance of the model among racial and ethnic groups; if the model significantly underperforms in some subgroups, data pre-processing techniques can be implemented to ensure equitable performance. Importantly, the outcome measurements should be formally monitored to ensure that model deployment is also equitable (Rajkomar et al. 2018). Systematic evaluation of data labeling practices for potential bias is another key strategy to reduce algorithmic bias. Specifically, there is precedent in both academic research and industry applications that labels for machine learning can be systematically improved, especially as more representative data is collected over time (Obermeyer et al. 2019; Rajkomar et al. 2018).

Developers of deep learning models must also consider the inclusion of race labels. Supporters of race-neutrality argue that embedding race in healthcare decisions can inappropriately reduce systemic racism to biological factors, propagate race-based medicine, and cause disparate resource allocation (Vyas et al. 2020). However, because complex models can infer protected attributes (including race), we know that no analysis is truly race-neutral (Owens et al. 2020). Completely removing race from our models blinds clinicians and researchers from the ways in which race structures our society and inappropriately presents racial disparities as immutable facts. Instead, researchers need to be continuously trained and educated to take a proactive approach to data collection, analysis, and prediction (Vyas et al. 2020).

By providing automated risk stratification, Delfina has the potential to both investigate the causes of and reduce the incidence of pregnancy-related complications. We recognize that Delfina’s software does not exist in a vacuum and will always be affected by the political, legal, and socioeconomic factors that have contributed to the maternal health crisis in the United States that disproportionately impacts Black and Native women. Our team is committed to the iterative process of anti-racist AI development. Earlier this year, Delfina participated in the National Institute of Child Health and Human Development’s (NICHD) Decoding Maternal Morbidity Data Challenge using a random forest model trained to predict hypertensive disorders of pregnancy on a nationally representative dataset. To reduce bias in our model, we excluded race and ethnicity from training inputs but tracked model performance by ethnicity. To improve the equitability of model performance across racial and ethnic groups, we selectively oversampled non-Hispanic Black patients in our training methods to compensate for initially worse model performance in those patients. Our model ultimately predicted hypertensive disorders of pregnancy comparably for patients of all ethnic and racial backgrounds. We were grateful to be recognized by the NICHD for innovation and addressing racial disparities in healthcare.

As a technology company focused on the US maternal health crisis, we know that we must consciously and iteratively work to ensure our software works equitably for all patients. Our team has spent time clinically serving pregnant patients of diverse racial and ethnic backgrounds, and studying the ways in which bias may be incorporated into both risk analysis and clinical decision making. We welcome the opportunity to continue learning from and partnering with community organizations sharing our mission to provide a safe, healthy, and supported journey for every pregnancy.

Delfina on social media
© 2022 Delfina
info@delfina.com · 2021 Fillmore St, Ste 37 · San Francisco, CA 94115