Cancer Data Science Pulse
Three Pillars of Cloud Computing—People, Processes, and Technology
The title of your seminar is “Growing Pains as Research and Researchers Adopt and Adapt to Cloud and Commons.” Can you give us a brief description of what you plan to discuss in your webinar and how this relates to the California Teachers Study?
For more than two decades, NCI has funded the California Teachers Study (CTS), one of many large and prospective cancer epidemiology cohorts in the NCI portfolio. These cohorts have been used for a wide range of cancer research projects over many decades.
In 2016, we transitioned our CTS data strategy to the cloud, and we’ve learned a lot since then. This transition shifted our way of thinking, and even our day-to-day research activities, in positive ways. We initially underestimated how impactful the change to cloud computing would be.
I’ll be sharing what we learned about this process. I’ll describe how this experience has changed the way we think about collaborative research. I hope our experience will help others avoid the need to reinvent the wheel and spark a community-wide discussion about ways to make our collective research more productive and efficient.
You mention that you encountered some issues that made transitioning to the cloud difficult. Can you elaborate on that?
We were one of the first cohorts to go all-in on a cloud environment. When we started, even NCI’s Cancer Research Data Commons was still in its early stages.
Our first challenge was to find ways for our diverse team of researchers, cloud providers, data warehouse architects, and data analysts to learn each other’s languages. We needed to understand the questions each of us wanted to solve. As researchers, our research goals were new to them, and as computer specialists, their tools were new to us, but we learned together and made it work.
We also learned how essential it was to be more outcome-focused. Rather than thinking about the cloud as simply providing access, data, or tools, we needed to think about our users. Where were they in the process, where did they need to go, and how would they get there? As a group, researchers are optimists and believe we can figure out a solution, even if we have to kick a few problems down the road before finding the answers. It became clear right away that configuring a new cloud environment meant we had to accept that we couldn’t kick problems down the road; instead, we had to frontload all of the questions and unknowns and address those challenges at the start.
My talk will focus on how this research lifecycle is common across all types of studies, whether epidemiology, clinical research, or laboratory. I’ll describe the importance of taking a full-circle view of the process and looking at outcomes in new ways.
Who should attend the webinar? What do you hope attendees will take away from this talk?
Project managers, principal investigators, co-investigators, data managers, data analysts—really anyone who is part of a team that wants to use the cloud or cloud-based resources for their studies. I hope they will come away optimistic about transitioning to the cloud, believing it’s doable and beneficial with multiple approaches that will meet their needs.
One huge benefit of cloud computing is it helps us all collaborate in one space. That means those spaces need to be welcoming to communities beyond our immediate or local teams. That shift to designing for our peers and the larger community does not happen overnight, and it won’t be the same for every team. But the more we, as researchers, think about not only our team’s needs but also our community’s needs, the better our solutions and collaborations.
Were there any surprises (i.e., things you didn’t expect) that you encountered?
I was surprised by how strong the tendency is to focus on technology first. I’m guilty of this sometimes, too. I’ve seen it over and over again when the cloud comes up. The initial focus is all on data and technology.
The bigger surprise was how much the people and process components ended up being the rate limiting steps of our transition. Ironically, technology is often the easiest part.
Thinking about all three together—people, process, technology—was part of our data security strategy from the beginning, because a secure infrastructure requires a people component, a process component, and a technology component. We eventually realized that this people-process-technology idea needed to be the foundation for all of our problem-solving, and we often asked, “Is this a people issue, a process issue, or a technology issue?" This really helped make our transition possible.
Is there something about your background that sparked an interest in this field? How did you come into this project and where you are today?
My path was circuitous. I was a cancer epidemiologist, happily working on cohort studies, but I was beginning to get frustrated by some common “pain points” when it came to getting timely information and data out of large studies.
I moved to City of Hope with plans to collect biospecimens within the CTS. As we started planning, a co-investigator on our team suggested we use Salesforce, a cloud-based customer relationship management (CRM) platform. CRM platforms are best known for helping companies use data to improve their interactions with their customers. I had never heard of it before, but we all agreed it had a lot of potential for our type of research. I ended up as the principal investigator on that project, more by default than anything else. The project was a success and put us on the path to innovation.
Were there any particular skills that have helped you with your work on this cloud project?
I tend to learn through and rely a lot on analogies. It’s probably enabled me to be especially open to new perspectives and ideas. Collecting thousands of biospecimens in a cancer epidemiology cohort seems like a relatively niche project, but we realized early on that it had some real similarities with sales, marketing, and logistics.
Granted, we weren’t selling anything, but inviting our CTS participants to donate blood required us to have a strong “sales pitch.” Recruiting participants requires marketing, too.
Once a participant agreed to provide blood, logistics kicked in. We needed a rigorous and scalable protocol for collecting, shipping, processing, and storing those samples. Today, because of the pandemic, a lot of us have a better appreciation of the importance of logistics and how they can cause disruptions to the supply chain.
That biobanking project was my crash-course in logistics. We managed all of those biobanking tasks in the cloud, where our people-process-technology approach was so effective at giving transparency to this multisite effort. We could track tasks from our team, our laboratory colleagues, even FedEx shipments and weather delays. Our cloud-based solution allowed us to generate data-driven solutions to get the results we needed.
We’re constantly looking for new ideas too, and sometimes they’re right in front of us. SEER*Explorer is a great interactive tool that lets researchers choose and explore tons of combinations of cancer statistics. Its visualizations and data are easy to use yet, at the same time, elegant. One challenge the CTS and other cohorts face is that every project wants custom data. We kept coming back to SEER*Explorer and asking, “What if we had a SEER*Explorer-like tool that enabled researchers to choose their own data?”
Earlier this year we rolled out a self-service web application (for more information on this app, see the archived webinar, “Push Button Data Sharing: Web-Based Self-Service and Automated Data Delivery in the California Teachers Study”). This tool lets users select their own cohort and delivers custom data in minutes. Although this is a different tool and a different task, the inspiration was 100% from SEER*Explorer.
The second key skill that helped us was getting the right team in place for this. We all had to learn a lot. There were lots of growing pains, but we stuck with and trusted each other.
What is your hope for this project in the next 5–10 years?
I’d love to see more widespread adoption of the cloud for cancer epidemiology and for all types of population research. There are legitimate hurdles to cloud computing, but none of these are insurmountable. How do I get started? Where will we find the help we need? How do I get up to speed on all of the technology? NIH is already helping us all navigate these questions. CBIIT’s Data Science Seminar Series is one example, as it offers a forum for discussing lessons learned. NIH’s Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, also offers resources to help.
I’d also like to see more members of the community help by sharing their experiences and lessons learned. More collaboration could really speed things up.
Where can people go for more information?
For more information, visit our website. Whether it’s feedback or a request to use our data, we’d love to hear from the community.