Cancer Data Science Pulse
Datum and Artificial Intelligence—A Perfect Match
For data science, joining robust data with computer-assisted artificial intelligence (AI) is a match seemingly made in heaven, giving us a way to efficiently sift through petabytes of data, glean the information that’s most relevant, and make decisions based on solid, scientific facts. This is an area especially relevant to precision medicine. Once perfected, the merging of data and AI could help take a lot of the guesswork out of cancer research, diagnosis, and treatment. In fact, this amazing union stands to revolutionize how we treat not only cancer, but also a host of other diseases and disorders.
Yet, this merging of data and AI wasn’t exactly love at first sight. As shown in this fictionalized blog, when our pal Datum (introduced in an earlier blog) first met AI, the two weren’t exactly attracted to one another. On the contrary, they didn’t want to work together at all! It’s better, though, if Datum and his AI friend “tell” their story in their own words, as detailed in the Q&A below.
CBIIT Interviewer: Thanks Datum and AI for taking the time to sit down with us today and tell us a bit about your unique relationship. We know how busy you are right now.
Datum: We’re happy to do this.
AI: Please, call me Aida.
CBIIT Interviewer: Right, Aida, as in Ada Lovelace, the first woman of data science?
Aida: Yes, Ada Lovelace. She firmly believed that machines were only as good as their programmers and could only do what they were told to do. Other pioneers in AI have a different perspective, believing AI can indeed perform actions that are beyond what humans have programmed. Machine learning is a rapidly evolving field. Only time will tell which perspective wins out. For now, I do as I’m told, and I’m proud to carry on Ada’s data science legacy.
CBIIT Interviewer: We want to tell our audience about how you two first became a couple. I hear it wasn’t always smooth sailing.
Datum: (laughing) Yes, you could say that. I don’t think Aida liked me at all when we first met. She wouldn’t even talk to me.
Aida: It wasn’t that I didn’t want to talk to you, I just couldn’t. I couldn’t make sense out of anything you were saying.
Datum: Ah, yes, I remember now. My data set wasn’t harmonized when we first met. Our data had a lot to say, and a lot of great information. You just couldn’t understand what we were trying to tell you.
Aida: (sighing) You and your friends nearly drove me crazy. You remember how many others were in that set? Not just you and your genomics partners but all those imaging data. Those folks were speaking a completely different language. There were so many artifacts. I didn’t know what to pay attention to.
Datum: Harmonization is really important, that’s for sure. Data are all so unique. Each set of my friends had to go through their own rigorous process so we could be fully integrated. Fortunately, NCI is working hard to make harmonizing data easier by developing detailed data and metadata standards to help researchers in integrating and reusing data. My researchers used resources available from the cancer Data Standards Registry and Repository (caDSR) to match up our data set’s metadata. When researchers take the time to apply those standards to their data early in the process, it makes us that much more useful for future studies. Harmonization is key! When harmonization all works, it truly is a symphony of information!
Aida: (laughing) I think I’ve heard that analogy before, perhaps in a video….
Datum: I love that movie! It stars NCI’s Cancer Research Data Commons (CRDC). It’s sort of the “mother ship” for me and all my data pals. I’m part of the Genomics Data Commons, but there’s a bunch of other collections—proteomics, imaging, canine, with clinical still to come. The CRDC also offers tools and other resources to help researchers manage and use data.
CBIIT Interviewer: Can you tell us more about that symphony? How does AI help?
Aida: Where do I begin? There are so many ways to use AI. Basically, with AI, scientists shift the chore of interpreting data and making decisions to a computer. An AI model takes in data, interprets those data, and renders a decision. The inner workings of an AI model can be very complex and are often called “black boxes” because not all operations are visible to the user, but instead are self-directed by the model itself. Model developers interpret these black boxes by training the models—that is, by changing inputs and evaluating the outcomes. This helps to validate the model. Especially in clinical medicine, this is an area where we are still learning. Researchers need to find ways to make the black box more transparent before AI is fully deployed in clinical medicine.
Datum: (chuckling) Hold on, Aida. I think you might be getting a bit ahead of yourself. AI has the potential to be an awesome tool for clinicians, but I think we both know there’s still a lot of research that needs to be done before AI can be deployed in routine clinical practice. We’re doing some good work though. I don’t think it will be long before you and I can be used to identify people at risk for cancer or help to predict how cancer might spread within the body. We’re already helping scientists process huge amounts of data that can be used during their experiments.
Aida: (nodding) You’re right, of course. There is still a lot to work we need to do. We need to find a way to avoid data bias when we’re making our decisions. We need to do better at integrating data so we aren’t so easily distracted by artifacts or outliers. Still, it’s an exciting time for Datum and me. We have a chance to work on pioneering research that someday may revolutionize how medicine is practiced.
CBIIT Interviewer: Speaking of research, do you remember what your first project together was? How did it turn out?
Datum: Yes, NCI scientists used my data set, along with imaging data, to detect breast cancer. Our research was similar to another study, which showed how AI and data can be a helpful tool for radiologists to detect prostate cancer.
Aida: Yes, but we really aren’t limited to just identifying people at risk for cancer. With the right data, AI like me can be used to find better treatments, predict outcomes, and interpret the results of many types of data (i.e., genomics, proteomics, metabolomics, etc.) blended together. As Datum noted, we’re already being used to assess radiology reports.
We’re also helping interpret pathology slides. A recent study shows how AI is allowing researchers to identify cancerous cells in a mouse model of lung disease. This automated system performed on par with expert humans. A tool like this could save significant time and resources for researchers looking to evaluate new medications for lung cancer, which is the leading cause of cancer-related deaths across the world.
AI can’t replace humans, but we can give expert insight to clinicians and researchers. Eventually, we can open up the world to expert cancer care, making it easier for patients and clinicians to get answers quickly, even in remote places where specialists aren’t readily available.
CBIIT Interviewer: When did you first know you two had something special?
Aida: For me, there was a lot of uncertainty at first. We had to refine my algorithm before we could have a meaningful relationship. The data scientists kept inputting data, our models kept adjusting, until finally everything was fully validated. Datum could “speak” to me. Our researchers were able to produce results!
Datum: Yeah, and with Aida processing our data set right there in the cloud, we were able to show meaningful results in a fraction of the time. Investigators didn’t need to download me, and the results were easy to share. That’s huge. That whole process of uploading, downloading, and transferring was painful. I definitely like staying in the cloud.
CBIIT Interviewer: Are you working on anything outside the clinical realm?
Datum: I don’t want to brag, but honesty, the work we’re doing is really what’s driving discovery in cancer research right now.
Aida: (smiling) What he means is that scientists are now able to make highly accurate predictions using analyses from very large data sets. And that’s transforming how decisions are made in research.
Here’s just one example. This study blended genomic and imaging data gleaned from melanoma biopsies in The Cancer Genome Atlas. The researchers used those already-harmonized data, which they accessed through CRDC’s Genomic Data Commons. The goal was to see how cancerous cells responded to medication, in this case an immune checkpoint blocker. This research led to a cost-effective way for studying the link between genetics and immune response, right down to a cellular level. Using this approach, researchers can better understand which medications will work best for a certain genetic profile.
Datum: This sort of AI-data matching means scientists can avoid doing research on drug compounds that aren’t likely to work. That will save a ton of money. And it should speed the time it takes to develop drugs for treating cancer.
Aida: There also are other AI-data models helping in research. NCI and the Department of Energy are working together to develop models to predict the pathways and interactions that result in proteins that cause cancer (RAS oncoproteins). They’re blending powerful computers, data, and AI to simulate all the possible ways that these proteins might interact with each other, with cell membranes, and with other proteins. Knowing more about how proteins work can help researchers intervene to disrupt cancer-causing signals and hopefully identify key areas where we can target therapies to stop the growth of cancerous cells.
CBIIT Interviewer: What do you think the future holds for you two?
Aida: For me, I’m excited about the work we’re doing to translate data from health records, smart phones, wearable devices, body sensors, and doctors’ notes to help people live healthier and better lives. A lot of data are collected for each cancer patient throughout their diagnosis and care. Our goal is to use that information to improve diagnostics and treatments in the future.
Datum: I’m most interested in the research side of things. I want to help find medications to treat cancer. One of the things my researchers have been working on is finding new ways to identify people for clinical trials who are most likely to benefit from the medication being studied. Focusing on subgroups of patients who are likely to respond to the medication can really help speed medications through the pipeline. I’m also helping to identify subgroups of people who might have more side effects or who are at risk for having a re-occurrence of disease.
CBIIT Interviewer: Will you continue to work together?
Datum: I think we’ll always continue to support each other as we help scientists find new ways to improve treatments for cancer.
Aida: Yes, in fact NCI is working with other federal partners, like the U.S. Food and Drug Administration to use AI to standardize how doctors use radiology images to detect and monitor cancer. I’m also really excited to work with data from The Cancer Imaging Archive, which hosts large, publicly available cancer imaging radiology and histopathology data sets. There’s a great video for researchers on how to use NCI’s Imaging Data Commons (IDC) to access imaging collections and other tools, including free credits for working in that amazing cloud-based infrastructure. The video features a case study where researchers used the IDC to confirm findings from an earlier investigation. It shows how AI can help identify imaging biomarkers so we can detect cancer earlier and customize treatment to fit the individual patient.
CBIIT Interviewer: The future looks very bright for you two.
Datum: Yes, but ultimately, it’s Aida who’s at the heart of all this. It’s her time to shine. AI is going to change things in a big way. As algorithms get more refined, she’s really going to take off. She’ll show the world just how much we can do with data—she’s the star here.
Aida: That’s a sweet thing to say Datum, but we all know that data are really what’s at the heart of all these discoveries. It all starts with good data. With data, all these things are possible.
CBIIT would like to thank Drs. Baris Turkbey and Stephanie Harmon from NCI’s Center for Cancer Research (CCR) for their help in the preparation of this article. Both are key members of CCR’s Artificial Intelligence Resource.
Additional sources include:
Mehralivand S., Yang D., Harmon S.A., Xu D., Xu Z., Roth H., Masoudi S., Kesani D., Lay N., Merino M.J., Wood B.J., Pinto P.A., Choyke P.L., Turkbey B. Deep learning-based artificial intelligence for prostate cancer detection at biparametric MRI. Abdominal Radiology (NY) Jan 31; 2022. Epub ahead of print.
Arlova A., Jin C., Wong-Rolle A., Chen E.S., Lisle C., Brown G.T., Lay N.,. Choyke P.L., Turkbey B., Harmon S., Zhao C. Artificial Intelligence-based Tumor Segmentation in Mouse Models of Lung Adenocarcinoma,Journal of Pathology Informatics 13; 2022.
Wang K., Patkar S., Lee J.S., Gertz E.M., Robinson W., Schischlik F., Crawford D.R., Schaffer A.A., Ruppin E. Deconvolving clinically relevant cellular immune crosstalk from bulk gene expression using CODEFACS and LIRICS stratifies melanoma patients to anti-PD-1 therapy. Cancer Discovery Jan 4; 2022. Epub ahead of print.
Hosny, A., Parmar, C., Coroller, T. P., Grossmann, P., Zeleznik, R., Kumar, A., Bussink, J., Gillies, R. J., Mak, R. H., Aerts, H. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Medicine 15(11); 2018.