Cancer Data Science Pulse
A Tail-Wagging Good Time—Working on the Integrated Canine Data Commons
We asked three key members of NCI’s ICDC—software engineer Ming Ying and website specialists, Hannah Stogsdill and Ambar Rana—to share their experiences working on the ICDC. Together, they offer insight into what it’s like to design, develop, implement, and maintain features that make up the ICDC ecosystem.
How long have you been working on the ICDC and what’s your primary responsibility?
Ming Ying: I’ve been working on ICDC since the beginning of the project. I’m a backend/data engineer. Every day, I work with data, databases, and Application Programming Interfaces. When people here think of data, I’m the person they usually come to. ICDC’s backend is in maintenance mode now. My main tasks in ICDC are validating and processing data submitted to us and improving the ETL (i.e., the extract, transform, and load) pipeline.
Hannah Stogsdill: Like Ming, I’ve been working on this project since the beginning. My main responsibility is to serve as senior user experience (UX) specialist and design lead.
Within each project, I have two scenarios that must be considered simultaneously. I look at the “micro-view‚” where I focus on the details and catch every pixel and inconsistency that could impact our users’ experience with the data portal. I also take a “macro-view” approach, where I work to continually understand the user’s mind. When both views are in sync, I’m doing a good job.
Right now, I’m re-fashioning the data model environment to give users an improved experience—one that is streamlined, intuitive, and similar to the other NCI data commons. We’ve also worked hard to establish an ICDC “brand” in terms of efficiency, visual engagement, and ease of access to data. And we’ve made changes to the homepage, such as adding topics of interest to encourage connectivity and engagement with the research community that goes beyond simple data access.
Ambar Rana: I’ve worked on this project for just over 1 year. As a developer, my primary responsibility is to develop interactive frontend applications—that is, the “user side” of the website. My role requires constant interaction with all team members, including the lead product owner, product manager, UX designer, technical lead, quality assurance, and fellow developers.
ICDC is in the active development phase, so most of the tasks I’m working on are related to new features. Recently, I’ve been involved in developing interactive features (like viewing genomic files on JBrowse and a data model explorer). Other tasks are related to fixing and refining (such as code refactoring and addressing bugs).
Is your work on the ICDC different from other data commons?
Ambar Rana: The work I’m doing is related to the Bento project. (Bento is a term that loosely refers to a collection/family of templates/applications used to support certain website functions. It’s a modular, open source, data agnostic, cloud-enabled software framework developed by NCI’s Frederick National Laboratory for Cancer Research for building platforms to support data sharing for the Cancer Research Data Commons projects.) All the frontend (Bento) applications in the data commons share core features so apps developed in one project can be easily leveraged to other Bento applications. This ability to “plug and play” saves a lot of resources across the Cancer Research Data Commons.
ICDC developers are currently developing an interactive feature called the Data Model Navigator, which is specific to ICDC. The navigator’s core objective is to ease the data loading process. This includes a data loading template, examples of acceptable values, and a PDF with further information. The navigator also has a graph view with an interactive feature and a table view with details on the node’s properties and organization. Now, because it’s published as a Node Package Manager, people can easily integrate it into other applications.
Ming Ying: I’d add too that our patients are unique. They are human’s best friends—dogs!
Hannah Stogsdill: Yes, through ICDC, I’ve had the good fortune of working on a topic that’s close to my heart. Many of us on the team are pet owners, including myself, so that combined with the fight against cancer makes ICDC a particularly meaningful subject. I’ve also had the privilege to see ICDC evolve and mature over the last 3 years. We’ve added many new features and functionalities that are helping distinguish ICDC as a trailblazer in data commons.
Are there tasks that you find particularly challenging?
Hannah Stogsdill: In many ways, our work is considered niche, with very unique user behavior and best practices. Our users’ needs don’t always align with those UX best practices. Thankfully, members of our team have expertise that give us insight into what our users need and expect in the system. That expertise is a significant advantage because, in our line of work, finding a pool of test users to help with our user research is challenging. Still, without ongoing user research, we’re not going to catch every issue or avoid possible blind spots. The best design solutions flow from understanding our users well. Increasing user research is an area we’re actively seeking to improve.
Ming Ying: Another key challenge we face is that users often submit data in different formats and don’t use the same controlled vocabularies. So much of the data submitted to us has all kinds of errors, from non-standard terminologies to missing required properties. To address this challenge, we are perfecting our Data Model Navigator with its data template download function. Hopefully, we can mitigate some of these problems for future submissions.
Ambar Rana: Accurately estimating the complexity and time required for new and challenging tasks can be hard. Engineering requirements and the need for testing make it difficult to set realistic estimations. Sometimes this work goes very smoothly, but sometimes you’re met with problems you never expected. This is a key challenge, but it’s also what makes work interesting. For example, the Data Model Navigator that Ming mentioned was inherited with a Gen3 code base. Estimating the work involved in bringing this application into the minimum viable product phase so it would be useful for ICDC was a challenge.
Your skills could be applied to a number of firms. Why did you choose to work here instead of a company like, say, Google, Amazon, Microsoft, etc.?
Hannah Stogsdill: I believe in NCI’s commitment to the cause of eradicating cancer! I think everyone here wants our work to mean something and have lasting impact. I also value the caliber of developers here (frontend and backend) and their work ethic aligns very closely with my own. The combination of talent and integrity can be difficult to find in my field of UX design. I also value that our work is respected. There are constant recognitions and appreciations shared with one another, which adds to team morale and the desire to do good and be better.
Ambar Rana: Both my academic and professional career has been focused on health/medical informatics. Like Hannah, I feel like this is an area where I can make a difference. Early on, I wanted to become a research technician in the field of molecular diagnosis. But after working in data analysis and visualization as part of that research, I found I really liked computer programming. I’m still attracted to the biomedical or health domain because it’s what brought me to computer programing in the first place.
Ming Ying: I lost my mom to breast cancer over 15 years ago when she was only 58 years old. In addition, my mother-in-law is a breast cancer survivor. I choose to work at NCI because I feel my work can help us defeat cancer one day, hopefully soon. When my kids ask what I do in my work. I proudly tell them, “Daddy works to defeat cancer!”
What advice would you give to someone hoping to break into the field?
Ambar Rana: In addition to learning different programing languages, I think it’s very helpful to learn the basic concepts of artificial intelligence, machine learning, and database architecture. These concepts help in analyzing, decomposing, and generating solutions for addressing new challenges in the data science field.
Ming Ying: I’d add that data quality is critical for a successful data science project. “Garbage in, garbage out” as they say. From a data engineer’s point of view, solid data manipulation skills are essential. Python is a very useful programming language for any data engineer/scientist.
Hannah Stogsdill: People in the design world might think data systems are “dry” and maybe not a clear choice for those looking to perform “cutting-edge” work. But scientists and researchers are actually people too! They want to be wowed by intuitive, functional, and beautiful UX, and not treated like robots presented with endless text and flat scrolling tables. This world that I work in truly lives in innovation, and opportunities for designing visually engaging systems are quickly expanding.