Cancer Data Science Pulse

How Can Big Data Help to Address Health Disparities?

Our country is in the midst of two crises. One is the threat of the COVID-19 pandemic. The other has more historic underpinnings but is now finally receiving greater recognition—the issue of racial inequality. 

Dr. Tony Kerlavage, director of NCI’s Center for Biomedical Informatics and Information Technology (CBIIT), sat down to discuss one key component of racial inequality, the issue of health disparities, as it relates to Big Data. He particularly draws on discussions from the National Academies of Sciences, Engineering, and Medicine workshop, Applying Big Data to Address the Social Determinants of Health in Oncology, in which he participated in October 2019. Future blogs will look at these topics in greater depth, as CBIIT continues to bring to light issues in data bias, clinical trial recruitment, and the use of artificial intelligence to reduce disparities in research and to better understand the social determinants of health.   

A lot has been written about the two crises that we’re experiencing today (COVID-19 and the continuing inequality in health care/mortality among minority and other vulnerable populations). Do you sense that change is now possible in the United States?

More than 50 years ago, Dr. Martin Luther King, Jr. said, “Of all the forms of inequality, injustice in health is the most shocking and the most inhuman because it often results in physical death.”1 While the focus of this recent workshop was on Social Determinants of Health (SDOH) in oncology, we’re also seeing this statement reflected daily in the numbers of lives lost to the novel coronavirus throughout communities of color. In fact, the Centers for Disease Control and Prevention estimates that African Americans are more than twice as likely to contract and die from COVID-19 compared with Whites.2  In the United States, in 2020, this is inexcusable.

We need to identify and strengthen the weaknesses in our health care system and our society that make poorer communities and minorities more vulnerable to public health crises. Research has an important role to play in helping to end these disparities and ensuring that all populations have the same access to services and care, not only for COVID, but for all diseases and disorders that threaten public health.

Will this change now? I think that’s truly up to all of us. As researchers, we need to do whatever we can to solve these institutional and societal problems. We can’t tackle everything, but we can certainly focus on understanding the issues, improving the research on SDOH, and improving access to quality care. The recent peaceful protests that have been going on around our country demonstrate that many Americans want a more equitable society. If we can bring the disparities in health care into the spotlight, we can take advantage of that groundswell of support to help raise the issue more broadly and effect real change.

You mention the “groundswell of support” that’s sweeping our country today. How can NCI use that to make changes in addressing public health issues like COVID and cancer?

Narrowing the gap between those who need treatment and those who actually receive treatment needs to begin early in the research process. We’re working across NCI to do better at recruiting minorities and underserved populations into clinical trials for cancer research. This will generate more accurate data for helping the full range of people with cancer.

We know that to engage people in clinical trials we need to understand their barriers to care, whether it’s mistrust, stigma, transportation, or technology (lack of internet access or cell phones). These barriers differ across regions and populations.

COVID introduces new concerns. Patients may be unable to travel to research hospitals, or they may fear catching the virus, making them reluctant to sign up for trials. Medical staff are overwhelmed with addressing the virus, and research labs are not running at full capacity. Our approach to clinical trials now and post-COVID will need to take these new barriers into account.

Representing our diverse U.S. population in research is key, but we also need better data.

How can we improve SDOH data?

We’re working now to standardize data so we can pull in findings from multiple sources—electronic health records, health insurance claims data, national registries, etc. By linking those data with demographic information, we can learn more about why some populations are at greater risk for disease. Such data will enable us to look at national trends all the way down to specific neighborhoods. Knowing why some populations are more vulnerable than others can help us better target interventions to reduce that risk.

Besides looking at risk, combining different data types lets us see why some people respond better to certain treatments than others. Being able to combine outcome data from clinical trials nationally and globally can generate insights into treating complex diseases such as cancer. Ultimately, the goal is to enable precision medicine, which allows us to treat every member of society equally and holistically based on genetics, clinical characteristics, environment, and need, even as it changes across a person’s lifespan.

That’s the vision, and we are making progress. A lot of these data exist currently but are housed in different repositories or are not easily comparable across data sets. This is why standardizing how data are collected and stored is so important. The tools used to analyze these data also need to be validated, as I pointed out at the workshop. Sharing statistical, artificial intelligence (AI), and machine learning models is critical for validation, but the willingness to share these is sometimes lacking. Rewarding researchers who share their data and tools might be one solution.

Lynne Penberthy in NCI’s Division of Cancer Control and Population Sciences (DCCPS) is leading an effort to link more data as part of the broader Surveillance, Epidemiology, and End Results (SEER) program. SEER currently provides information on cancer statistics from about 35% of the U.S. population. Lynne and her group are looking at new ways of linking data by automatically capturing information from oncology practice claims, incorporating data from large pharmacy chains such as CVS and Walgreens, and blending this with important genomic information on the tumors themselves—all of which will help us direct the most appropriate therapy.  

In partnership with the Department of Energy and the Coordinating Center for Clinical Trials at NCI, they also are working on ways to use SEER data to better match people to clinical trials so patients can receive therapy that is most likely to work for them. Their goal is to generate and share tools that will expand patients’ access across the full clinical trials network.

Ultimately, these new processes will improve our understanding of cancer care and outcomes for all patients.

Does new technology, such as AI, hold the key for addressing health disparities? 

The irony is that even data can carry bias. If study participants are selected that don’t represent a diverse population, it can lead to findings that may not apply to everyone. If an algorithm is too narrow or too broad it can unintentionally lead to false conclusions.

With technologies such as AI, it is vital to have a significant volume of data to accurately train the algorithms. This goes back to the need to collect as much data from as many sources as possible, in standardized formats, and then to link them across data types to form a more complete picture of a patient or group of patients.

Initiatives like SEER and the Cancer Research Data Commons (CRDC) are helping to break down data siloes and make large volumes of data more harmonized. Through CRDC, we’ve also developed a cloud-based infrastructure so researchers can store, analyze, and share results more easily than before. CRDC offers more than 1,000 tools and workflows for scientists to use in exploring and analyzing data.

The more data that can be turned into information and the more tools available to analyze and connect those data, the better AI will work to discover new treatments and to understand the differences in how cancer affects the full range of populations in the United States and the world today.

Another way to address health disparities is by encouraging more diversity in the field. Within CBIIT and across NIH, are there changes we could make to increase diversity in research and in our workforce?

The NCI Center to Reduce Cancer Health Disparities (CRCHD) leads NCI’s efforts to increase the cancer research workforce’s diversity by training students and investigators from a wide variety of backgrounds. Through CRCHD's Continuing Umbrella of Research Experiences (CURE) program, we’ve seen steady growth in the number of applicants and awards representing different races, ethnicities, genders, and abilities.

As we discussed at the workshop, funding that trains researchers in both data science and SDOH is another solution. An initiative supported by a new offshoot of CURE, the Intramural Continuing Umbrella of Research Experiences (iCURE), is designed to attract students and scientists into the NCI Intramural Research Program. Data science and cancer health disparities are key focus areas within the iCURE program.

At CBIIT, Dr. Vivian Ota Wang has helped to lead the effort to reduce disparities, and she’s been tenacious in keeping this topic front and center. She not only works within CBIIT on issues related to social justice but collaborates across NCI. She’s currently participating in the NCI Workplace Civility Committee and the Trans-NCI Cancer Disparities Activities Committee.

Ultimately, genetic research shows that there is very little that separates us. As humans, our biology is, on average, nearly identical, and we share 99.9% of our DNA. If we’re so alike, why is it so important to have more diverse representation in research? 

Clearly, our genetics are not the only things that determine our health outcomes and treatment choices. For example, lower educational and income levels are associated with higher risks of certain cancers, particularly late-stage cancers.3 Race and ethnicity also contribute to risk. African American men and women show the highest death rates across all cancer sites compared with other racial groups.4

Data representing the full range of populations can alter these trajectories, helping to identify specific mutations or biomarkers within those populations that might influence diagnosis and treatment. For example, we know certain gene mutations (e.g., BRACA) associated with breast cancer run in certain families. We know that genetic or hormonal differences between men and women can impact chemotherapy. So it stands to reason that there may be differences within subpopulations that can help us better identify and treat cancer in different races and ethnicities.

Beyond CBIIT and data science, what do you think we can do as a society to truly make a difference? 

First, we need to create strong ties with communities to address the mistrust that exists around research. Large segments of our population avoid participating in clinical trials because they don’t trust the process. We need to elevate the issues underserved communities face and work within NCI to address those issues in any way we can. Without representation in research, whole subgroups of our population will be missed, and we’ll continue to see a gap in health care.  

We also need to ensure that quality care is accessible to all Americans regardless of their income, race, gender, ethnicity, or geographical location. We’ve made strides here, but more needs to be done.

Lastly, and perhaps most importantly, we need a workforce that represents the diverse people in our country. Members of that workforce need to be cognizant of how their own biases may be reflected, even unintentionally, in their work. Everyone who participates in research and clinical care has the potential to make a difference.

Additional Resource

National Academies of Sciences, Engineering, and Medicine 2020. Applying Big Data to Address the Social Determinants of Health in Oncology: Proceedings of a Workshop. Washington, DC: The National Academies Press.


Older Post
Integrated Canine Data Commons: Using Comparative Oncology to Advance Translational Research
Newer Post
Turning Life’s Passion into Purposeful Work—Following A Fellow’s Path to CBIIT

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.


Enter the characters shown in the image.

thanks for info
You’re welcome! You might also enjoy exploring additional articles on our blog: