Cancer Data Science Pulse
Different Perspectives Lead to Discovery of a Surprisingly Effective Algorithm
At the upcoming Data Science Seminar, you’ll be discussing an algorithm (KSTAR) to identify kinase activity using patient phosphoproteomic data. Can you give a little background on how this idea came about?
The foundation really started back in graduate school, when I was just getting into computational biology, and where I first learned to blend different perspectives—basic science, computational biology, systems biology, and engineering.
It's truly a culmination of all these different viewpoints. It began when a colleague, who specializes in mass spectrometry, mentioned that he had phosphoproteomic data that might be useful in understanding cancer. Unfortunately, existing analysis approaches were not helping, and we needed a way to find the links between kinases and disease with greater accuracy.
From there, I recruited an undergraduate student, Sweta Ravi, and she developed an initial prototype that looked, at the time, really promising. But we weren’t sure it would work. With support from NCI, we have since moved this initial idea to a fully functional algorithm, which appears to work well at predicting kinase activity.
We then applied our algorithm to data from breast cancer patients, as we had a good foundation of data to use as a control. We looked at human epidermal growth factor receptor 2 (HER2), a protein linked to many common types of cancer.
In high levels, HER2 is associated with tumor growth and metastases. But high levels don’t necessarily represent the receptor’s activity. After applying the algorithm, we frequently predicted HER2 activity in patients with HER2-negative tumors. And, in some instances, we saw no evidence of activity, even in HER2-positive patient tumors.
Predicting this activity could help us better target treatment. Using various independent studies, we found that patients who were HER2 negative and mapped to HER2 activity were more likely to respond to treatment; whereas HER2-positive patients who lacked net HER2 activity did not respond to therapy.
Looking back and knowing there were so many places for potential errors, and pieces that were missing in what we know and what the data showed, I’m really surprised at how accurate the algorithm turned out to be.
But algorithms at their core are just predictions. They aren’t the ground truth. Our next steps will be to further validate the model and learn how best to apply it to precision medicine. This idea that you can combine systems-level data with the right algorithms and patient data to make predictions to help people—that’s what drives my research.
For readers not familiar with the importance of kinases, can you briefly explain their role in cancer?
Kinases are enzymes that help attach phosphate groups to proteins. They act in concert with phosphatases, which remove phosphate groups. These reactions, collectively called phosphorylation, are involved in complex, interwoven networks. These networks respond to internal and environmental factors to alter cellular function—ultimately turning protein activity “on” or “off.” Most protein kinases promote cell growth and proliferation. When kinase activity goes awry it can lead to carcinogenesis and metastases of many types of cancer.
Medications that inhibit protein kinases have been intensively studied and used effectively for cancer treatment for decades. Yet, despite our progress in understanding kinases in oncology, more needs to be known to better predict how and which inhibitor will work best in any given patient at any given time. For example, it’s common for an inhibitor to be more effective in some patients than in others. And some patients develop a resistance to a particular inhibitor after taking it for some time.
If we can use personalized phosphoproteomic data to better understand a patient’s likely response, we can further target treatment. We also may be able to avoid the unwanted side effects that accompany kinase inhibitors and head off tolerance before it develops.
This is not a simple problem. Kinases are highly complex and are regulated by a whole host of factors, internal and external. Using kinase activity alone as a proxy for predicting how a patient will respond to a particular kinase inhibitor is problematic. In fact, we initially were really surprised that our algorithm was so effective in identifying kinase activity.
You noted that you were surprised at how effective KSTAR has been in detecting kinase activity. What surprised you the most?
It was surprising because we based the algorithm only on the predictions that were known to exist. But we know so very little about which phosphorylation sites are phosphorylated by which kinases. We had to infer the rest, extrapolating what we think based on kinase activity. We knew too that there were problems with the data. Many of the pipelines used to capture the data were based on shotgun proteomics.
So we knew heading into this that we’d have a lot of missing data. For example, in the proteomics from the breast cancer patients, there was a huge scarcity of data. In 107 breast cancer patients, only about 5% of thousands of phosphorylation sites were common across all the data samples. Those odds would seem insurmountable, and yet, the algorithm was able to predict a likely kinase profile that held true across samples from different labs and using different techniques.
You mentioned breast cancer; are you testing this in other areas as well?
Yes, we’re now using a mouse model to make sure we can see through the tea leaves and confirm that what we believe we are seeing is true. In one analysis, led by our collaborator, Dr. Cynthia Ma, the researchers looked at four patient-derived xenograft (PDX) breast cancer mouse models. Two were HER2 positive, and two were HER2 negative. They were all treated with the same HER2-directed medication, and, surprisingly, one HER2-negative tumor responded. KSTAR confirmed this, however, as it predicted that the tumor was, in fact, HER2 active. We’re collaborating with Dr. Ma and colleagues to test additional PDX models, including a triple-negative tumor, a type of cancer that shows negative results on all three tests for the receptors commonly found in breast cancer, and which is particularly difficult to treat. Our algorithm suggests that the model is HER2 active.
We’re also looking at ovarian data and lung cancer. We’re hoping that KSTAR predictions will help design more effective clinical studies and treatment. Specifically, some kinase inhibitors have failed in treating ovarian cancer and evidence of drug resistance often appears in lung cancer. We wonder if past clinical trials have failed because of our inability to identify the subpopulations of patients who might respond best; that is, patients who have high levels of activity for the target drug or, alternatively, profiles of activity that predict they’re less likely to respond.
Do you think this research will help boost translational medicine?
That’s our hope. We had very robust findings when we tested this. Most importantly, from a clinical perspective, we saw results quickly thanks to a study by Matthew Ellis and colleagues who developed a microbiopsy approach and sampled patients both pre- and post-treatment. Within just 72 hours of treatment with an inhibitor, we see kinase activity literally disappear in patients who went on to become pathologically complete responders, lacking all signs of cancer in tissue samples. Given that many inhibitors have significant side effects, it’s incredibly useful to see if you’re getting a response to therapy quickly.
We’re generally a basic science lab, so this is our closest foray into clinical translation. It’s been really rewarding. We’re hopeful that we can help identify more effective treatments, especially for those patients who have run out of therapy options.
Who should attend this seminar?
I’d encourage anyone interested in data science to attend. But really, I hope to spark interest in a much broader audience. We need people with very diverse backgrounds to collaborate on these complex scientific questions if we’re going to come up with workable solutions.
So I’d encourage basic scientists to attend, clinical oncologists, data scientists, computational biologists—really anyone who has an interest in finding new therapeutic targets for cancer or other diseases or in developing algorithms for systems-level data.
Where do you see this research headed in the next few years?
Just as we engaged people from many different backgrounds to get to where we are today, I think the next stage will be to bring people in who will offer their own unique skill sets and to find new ways to adopt and test the technology. Like my mass spectrometry colleague brought the problem to us to solve by finding an algorithm, I hope to collaborate with others to move this approach forward, especially clinical oncologists.
Where can we find out more about this research?
In addition to attending the Data Science Seminar Series on May 19, details on the KSTAR algorithm will be posted soon as a preprint manuscript at bioRxiv.
We also recently published an article, “KinPred: A unified and sustainable approach for harnessing proteome-level human kinase-substrate predictions,” summarizing currently available kinase-specific predictive algorithms. It’s intended as a resource for researchers who wish to explore specific kinase-substrate hypotheses or for use in developing algorithms that rely on these unique networks.
Register for the May 19 Data Science Seminar
Categories
- Data Sharing (64)
- Informatics Tools (40)
- Training (39)
- Data Standards (35)
- Genomics (35)
- Precision Medicine (32)
- Data Commons (32)
- Data Sets (26)
- Machine Learning (24)
- Seminar Series (22)
- Artificial Intelligence (21)
- Leadership Updates (13)
- Imaging (12)
- Policy (9)
- High-Performance Computing (HPC) (9)
- Jobs & Fellowships (7)
- Semantics (6)
- Funding (6)
- Proteomics (5)
- Awards & Recognition (2)
- Publications (2)
- Request for Information (2)
- Information Technology (2)
- Childhood Cancer Data Initiative (1)
Leave a Reply