Cancer Data Science Pulse

The Good, the Bad, and the Unexplained: Five Tips for Evaluating an AI Product

Are you intrigued by the idea of using artificial intelligence (AI) in your research or clinical practice but unsure about where to start?

You’re not alone! In a recent survey, oncologists reported concerns with using AI ranging from lack of understanding to concern over the ethics.

AI/ML holds a lot of promise, but it also has a lot of unknowns. We asked Dr. Baris Turkbey, senior clinician, and Dr. Stephanie A. Harmon, Stadtman Investigator in NCI’s Molecular Imaging Branch Center for Cancer Research, to help. Here are five tips they have to keep in mind when evaluating an AI product.

Tip #1: Assess the Data

An AI/ML model is only as good as the data upon which it’s built. In cancer, those data can be genomic, proteomic, clinical, imaging (radiology, pathology, endoscopy), demographic, or just about any aspect of patient care we can collect, measure, and replicate.

No matter the type, to be most useful, data need to be a true representation of the population you’ll be treating or studying—whether that population is a cell, tissue, organ, animal model, or human. Diversity is vital, both in the population and in the machines we use to collect those data.

You also need to be sure the data are plentiful. Models learn, infer patterns, and make predictions or recommendations based on data. The more extensive the data, the more likely the model will return accurate results and, most importantly, the more likely it is to maintain its high performance when applying it to new data.

Take our prostate magnetic resonance imaging (MRI) and digital pathology AI models as examples. We typically start with a sample size of several hundred or even thousands of images, and, over the course of time, we’ll validate those images using hundreds more. Commercially available models typically use thousands of images, from multiple data sets.

When you’re assessing a model’s data, consider these three questions:

  1. What data sets did the developers use to build the model? Similar to assessing the results of a scientific study, the results are better when there are more patients/study subjects.
  2. What reference data set did they use as the standard? For example, with genomic data, a model might use the reference genome GRCh38.14, maintained by the National Center for Biotechnology Information. For imaging data, the model might adhere to a DICOM standard in radiology or meet similar scanning resolution requirements in pathology. All data should adhere to FAIR data standards (that is, be “findable, accessible, interoperable, and reusable”). Knowing the standard can help you assess the validity of the data.
  3. Does the data align with your intended patient’s population in terms of age, gender, sex, race, and ethnicity? Keep in mind too that the data used to train the model should be different from those used in the original development (which we often refer to as “unseen testing data”). Training data should add diversity and ensure the model stands up to scrutiny. This helps ensure independent results.

Tip #2: Confirm the Reproducibility

Science hinges on reproducibility, and this includes reproducing the results from AI/ML. Unfortunately, developers, their team members, or the organization where they work may be reluctant to fully disclose the steps and code used in developing the model, especially when a commercially viable product is involved. But transparency is key for testing and, ultimately, for replicating an AI/ML model.

When you’re evaluating an AI/ML product, look at the extent of the testing.

  • Was the testing fully blinded (i.e., the investigators weren’t aware of the model they were using) and used in a clinical trial?
  • Were the results prospective (i.e., researchers used the model to confirm real-world results)?
  • Did the developers use test data that differed from the original training data?

What were the results—in terms of accuracy, performance, and costs? For example, we tested a prostate MRI model in 658 patients. We found our algorithm was highly sensitive in identifying cases of treatable prostate cancer 96% of the time (which is comparable to the 98% detection rate by radiologists). Our model for detecting cancer using histopathologic exams also was highly accurate, though it wasn’t as precise in segmenting the lesions for analysis. That brings up an important point: When looking at validation, we need to consider “failure analysis.” Knowing how validation falls short helps us further optimize our AI model.

Understand too that bias is real. AI has the potential to make health care more equitable. But developers need to be very intentional with these technologies to ensure that they work similarly across diverse populations. By checking the validation, you can make certain the data reflect a broad range of people and circumstances and avoid biased results.

Tip #3: Trust in Transparency

Just as we need transparency to check for bias and to be able to replicate an AI model’s findings, we need the ability to see into the “black box” to assess how the model is making its “decisions.”

This type of “explainable” or interpretable AI gives us insight into the AI system’s internal machinery, enabling us to understand and verify how the model works. Of course, it can be hard to explain some AI/ML. But just because we don’t understand the inner workings of the black box, doesn’t mean we should avoid using these models. On the contrary, we can leverage the power of AI/ML using careful validation and by fully understanding the features that we can interpret.

At NCI, we make our code fully available with the hope that other scientists will use our model as a launch pad—both for improving today’s models and for creating future ones. With transparency and open science, we can propel AI/ML applications in the medical domain toward greater excellence in the future, especially in terms of reproducibility and reliability.

Models that perform consistently over time, in different circumstances and in varied populations, lead to trust. Similarly, it’s important to report common pitfalls and errors when you’re evaluating a new AI model for use. In fact, acknowledging these inaccuracies can be just as important as showing that the model remains accurate when applied to new populations.

Tip #4: Put it into Practice

First, decide what problem or need you want to address. Then, once you’ve selected a model to address that problem/need, decide how best to apply it to your practice or research workflow.

  • Will you use the model to help with routine processes? Perhaps there’s something that takes too much time in your work schedule, such as segmenting an organ or interpreting medical images.
  • Will you use the model to predict responses to treatment or recurrence of disease? Maybe you want to find a more precise way of narrowing down targets for medications development.

Today’s models can augment cancer research and care in many ways and at many stages—in predicting cancer, diagnosing disease, framing a treatment plan, and managing follow-up care.

If you lack data science expertise or access to a data scientist, there are venders who can help you establish AI/ML tools in your practice. You’ll also need to determine where you (the human) will fit into the equation. For example, you may use AI to find a certain biomarker, such as cell clusters indicative of cancer. The next step would be for the human to interpret this result. Or, perhaps you’ll want to use AI/ML as a “second pair of eyes” to check the radiologist/pathologist’s initial report.

Whatever the model, you’ll want to test it within your workflow before you deploy it. Before testing, establish how you’ll measure the tool—in terms of accuracy, time, costs, etc. Then, you can run a pilot project to see how the tool measures up before ever putting it into practice.

Tip #5: Promote Responsible Use

Last, but certainly not least, are issues related to ethics, regulatory concerns, and responsibility.

  • If you use AI for your practice, and the model returns an error, who bears the responsibility?
  • If you choose a model that leads to a bias against a subpopulation of patients in your care, who is at fault?

Your use of AI and your trust in the model’s decisions ultimately rests with you. The better you know the model, how it works, its testing, and the data behind it, the more equipped you’ll be to address any ethical issues, or other types of issues, that arise.

Incorporating AI into your research or medical practice takes commitment. Just as you commit to continuing education in other aspects of your work, you’ll want to stay up to date on the latest technological advancements. Knowing the strengths and limitations of AI will help you deliver optimal patient care, while minimizing potential errors or biases. Likewise, regular training and collaboration with AI developers can help you ensure that your AI applications remain ethical and effective. Ultimately, it’s up to you to take responsibility for interpreting AI’s recommendations and making informed decisions to ensure quality care for your patients.

Baris Turkbey, M.D.
Senior Clinician, Head of MRI Section, Head of Artificial Intelligence Resource, Molecular Imaging Branch, NCI Center for Cancer Research
Stephanie A. Harmon, Ph.D.
Stadtman Investigator, Molecular Imaging Branch, NCI Center for Cancer Research
Older Post
Hone Your Communication Skills: “Weird” Cancer and Data Science Terms to Know!

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.