Cancer Data Science Pulse
The Network of "BioThings"
Biomedical knowledge is typically centered around the variety of biological entity types, such as genes, genetic variants, drugs, diseases, etc. Collectively, we refer to them as "BioThings." The volume of biomedical data has grown explosively, thanks to the efforts of many different researchers and consortia. This explosive growth includes many different types of data using many different formats and standards, making it difficult to unify the disparate sources of data. Researchers around the world could miss some valuable data resources or spend significant efforts to wrangle multiple data sources locally. Moreover, such efforts are often duplicative among researchers, and they could have put that time and effort into their own downstream analyses, instead.
To facilitate biomedical discovery, we developed a framework that regularly pools multiple data sources and formats these annotation data by each individual BioThings object, like a gene, a variant, or a drug. These pre-aggregated BioThings objects can then be queried via high-performance web Application Programming Interfaces (APIs). For example, researchers can easily get the gene-specific annotations (~50 sources) using the MyGene.info API [Fig 1], human variant annotations (~20 sources) using the MyVariant.info API, or chemical and drug annotations (~10 sources) using the MyChem.info API.¬†Collectively, these "BioThings APIs" now serve millions of requests from thousands of unique IPs every month. To keep up with the different biological data types and data resources, we've abstracted and standardized our APIs and built a Software Development Kit (SDK) so that researchers can create similarly powerful APIs, for any BioThings types, to suit their needs. In this manner, our BioThings APIs and SDK can help researchers more efficiently utilize existing data resources, as well as share their own data with the community in an accessible and reusable way.
While we are continuously expanding the scope of BioThings APIs, there are other excellent APIs that have been made available from other groups. How to find the relevant APIs (findability) and, more importantly, how to efficiently use them collectively (interoperability) are issues that are generally not addressed by the community. Through the SmartAPI and BioThings Explorer applications, we provide tools for API providers to describe their API in a standard way and provide biomedical-specific semantic annotations, such as what specific biomedical identifiers an API parameter accepts and what specific biomedical entity types an API response contains. This enables researchers to identify the APIs they need and build their own knowledge extraction workflows.
As the web APIs have been rapidly adopted as a new way of disseminating the underlying biomedical knowledge, we envision an API development ecosystem [Fig 2] will play its key role in this transition era of making biomedical data FAIR (Findable, Accessible, Interoperable, and Reusable), and building an inter-connected and distributed knowledgebase in the form of the network of BioThings.
More about the BioThings project can be found at http://biothings.io.