Cancer Data Science Pulse
NCI’s National Cancer Plan and Data’s Critical Role
On April 3, 2023, NCI’s vision for the future of cancer research gained sharper focus when NCI Director Dr. Monica Bertagnolli released a detailed National Cancer Plan. This Plan builds on work started under the Cancer MoonshotSM effort but with greater emphasis on the most critical cancer research needs of today. The Plan offers eight strategic goals, set by scientists, administrators, and people on the front lines of cancer care, all with one ultimate goal in mind—to end cancer as we know it.
Many NCI staff had a hand in bringing this framework to fruition, including four key members of our CBIIT leadership team:
Dr. Tony Kerlavage, director of NCI’s CBIIT
Dr. Jill Barnholtz-Sloan, associate director of NCI’s Informatics and Data Science Program, and intramural senior investigator of NCI’s Division of Cancer Epidemiology and Genetics Trans-Divisional Research Program
Mr. Jeff Shilling, staff scientist, and chief information officer/associate director for infrastructure and IT operations
Our team offers their insight into the new Plan with particular emphasis on data science and related technologies—two fields that are poised to be the catalyst for future cancer research at NCI.
What National Cancer Plan goals and strategies are you most excited about?
Tony Kerlavage: The Plan establishes eight ambitious goals to prevent cancer, reduce mortality, and improve outcomes for people living with cancer. All of these goals have interdependencies, and data and information technology are indispensable to each of them. The goals establish a roadmap for continuing the Cancer Moonshot efforts; at the same time, they will drive all the work we do at NCI. I’ve encouraged every member of our staff to consider the entire list as they think about the impact of their efforts.
The foundation we laid with the Cancer Research Data Commons (CRDC) and the Precision Medicine Initiative®—as well as our semantic infrastructure, exascale computing efforts, cloud-first strategy, and more—has positioned us to achieve these goals. The Plan’s ultimate success will require input from the entire cancer community as well as the creation of a cancer data ecosystem. This calls for collaboration and data federation across the board, from basic scientists, oncologists, and those working in clinical trials (who contribute data relevant to treatment and outcomes), to people with cancer (who are actively engaged in research and who need to be kept informed of studies using their data).
Jeff Shilling: The Plan hinges on our ability to share data, including individual, clinical trial, and research data. At CBIIT, we’ve been involved in conceptualizing and building infrastructure that allows NCI to do just that. Cloud-based information technology gives us the ability to network across the world. The cloud allows us to share all types of data—genomic, proteomic, clinical, as well as multiple image formats. This next era of cancer research depends on information technology, and we’re excited to be able to support the cancer research community as they work toward the goals outlined in the new Plan.
Jill Barnholtz-Sloan: All the goals in the Plan are critical for maximal impact on cancer. Still, I am most excited about Goal 7, “Maximize Data Utility.” One of the hallmark projects of our Informatics and Data Science Program is the CRDC. Like Jeff, I believe that cloud-based technology will be what gets us to the finish line in solving many of cancer’s most pressing research questions. The CRDC uses a flexible, cloud-based infrastructure to make cancer data accessible, usable, and interoperable. Data are everywhere. Making that information available and usable to impact individuals with cancer is critical to cancer detection, diagnosis, and to fuel the most positive outcomes.
Jaime Guidry Auvil: Like my colleagues, I’m excited about the critical role that data sharing will play in achieving each of the important goals in the Plan. The Office of Data Sharing (ODS) is actively working to develop and implement guidelines and processes to enable us to share data from every study across the entire NCI. These advances in data sharing will help foster innovation and improve outcomes throughout the broader cancer research and care community.
At NCI, we fund the collection of a wide variety of data types. And ODS is responsible for facilitating collaboration across many different scientific focus areas to share those data with the wider research community. We, in ODS, look forward to defining and developing new approaches for sharing high-value, sensitive data through our cross-cutting programs, such as the Childhood Cancer Data Initiative (CCDI).
Our approaches to collecting, analyzing, and sharing data are at the heart of nearly every goal outlined in the Plan. As we move forward, it’s vital that we continue to adapt our data management and sharing expectations (by implementing effective policy across all of NCI) to match those needs, particularly as they evolve over time.
A national “Cancer Research Data Ecosystem” is a key strategy in the Plan. What would you most like to see in this ecosystem? What infrastructure do you think is most important to advance cancer research?
Jill Barnholtz-Sloan: The ecosystem would allow NCI to bring together diverse types of data (e.g., clinical, electronic health record, genomics, proteomics, imaging, etc.), harmonize those data, and then make them available to every person interested in cancer research. We want the ecosystem to be user friendly. It should allow researchers to easily submit their data. We could do this by automating the collection and data error checking steps using novel approaches such as machine learning.
All submitted data would be standardized and harmonized per common definitions, leveraging the work of the NCI Semantics Infrastructure Team in NCI’s Informatics and Data Science Program. And lastly, the data in the ecosystem would be easily accessible for all in the cancer research community. This includes people living with cancer and their family members, clinicians, basic scientists, translational scientists, and more. All these elements coming together are imperative for maximal impact on cancer research.
Tony Kerlavage: The amount of cancer data is increasing at a rapid rate, and the opportunities to learn from those data are increasing as well. We need to evolve our processes and systems to be production-grade, so we’re better able to handle the flow of data.
Many of the steps that go into making data usable are labor intensive and, as Jill said, we need to take advantage of automation wherever we can. This will require innovative approaches and making use of the latest technologies from our industry partners. We will need to balance innovative research with operational efficiency, with a focus on creating sustainable solutions to data management, sharing, and analytics.
Jaime Guidry Auvil: In thinking of the ecosystem, it’s imperative that we don’t lose sight of the people who need to benefit from it. As Jill mentioned, we need to ensure that all levels of “users” have access to the data or summary information that best fits their needs. Right now, only about 10% of the research population that could use our data in some form (from raw files to aggregate summaries) are accessing that information.
It’s important to include tools and applications that allow citizen and basic scientists, as well as clinicians and families, to query and use data (in addition to bioinformaticians and data scientists). These resources are necessary for sustaining an ecosystem and for keeping it moving forward.
We also want to use the data we generate or collect—and make available through the ecosystem—to maximize our ability to learn from every person with cancer. For CCDI, we’ve been developing a Participant Index and a Clinical Data Commons to bring together data from across many duplicative projects. We already know a lot that we can apply to this new ecosystem.
Jeff Shilling: When building an IT infrastructure, you have to think 10 years ahead, especially where the internet is concerned. The internet we plan for today will not be the same in the future—technology is changing too quickly. We know the cloud gives us the greatest return on investment for providing cancer data research services.
Our goal is to give our research community a solid base upon which to build the data analysis tools they need. We want to provide a foundation for all types of automation and advances, such as artificial intelligence, machine learning, and health tracking devices. Our work will be to provide capabilities that act as a centralized hub for all those cancer research data needs.
Tony Kerlavage: That’s a great point. The auto industry production line offers a good model, especially when it comes to making data fully interoperable. Building a car depends on the assembly of unique parts, all designed and built by experts using certain specifications. For a car to be fully functional, all its subcomponents need to come together and all need to adhere to well-defined specifications. The same is true here. Researchers working in different domains, with different visions, and different ideas all can come together through a data ecosystem. But to be broadly usable, those data need to adhere to certain specifications, with agreed-upon dictionaries and well-defined metadata. It all starts with good data (i.e., data that’s FAIR—findable, accessible, interoperable, and reusable).
Ultimately, we want to build infrastructure, tools, models, and resources that will preserve and extend the value of NCI’s research investment. The Plan helps crystalize what we’re all working toward. It’s a great start but it’s also a living document.
Over time, government agencies, nonprofits, academic institutions, the healthcare industry, the commercial sector, and individuals will respond to the Plan’s call to action. I believe our role is to provide a usable data-and-infrastructure framework and to be a catalyst for action. I’m very optimistic about the future of cancer research, and I know data will be an integral part of that success.
Leave a Reply
- Data Sharing (63)
- Informatics Tools (35)
- Training (34)
- Genomics (33)
- Data Commons (32)
- Data Standards (32)
- Precision Medicine (27)
- Seminar Series (22)
- Data Sets (21)
- Machine Learning (20)
- Artificial Intelligence (16)
- Leadership Updates (12)
- High-Performance Computing (HPC) (9)
- Imaging (9)
- Policy (8)
- Jobs & Fellowships (6)
- Funding (6)
- Proteomics (4)
- Semantics (4)
- Information Technology (2)
- Publications (2)
- Awards & Recognition (1)
- Childhood Cancer Data Initiative (1)
- Request for Information (1)