TitleBT101: SourceData: Making Data discoverable
Publication TypeConference Paper
Year of Publication2016
AuthorsGeorge N, El-Gebali S, Lemberger T
Conference NameInternational Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published11/30/16 Volume 1747
Other NumbersVol-1747|urn:nbn:de:0074-1747-1

In molecular and cell biology, most of the data presented in published papers are not available in formats that allow for direct analysis and systematic mining. The goal of the SourceData project ( is to make published data easier to find, to connect papers containing related information and to promote the reuse and novel analysis of published data. The main concept underlying the project is that the structure of a dataset provides information about the design of the study in question and can be exploited in powerful data-oriented search strategies. SourceData has therefore developed tools to generate machine-readable descriptive metadata from figures in published manuscripts. Experimentally tested hypotheses are represented as directed relationships between standardized biological entities. Once processed, a comprehensive ‘scientific knowledge graph’ can be generated from this data (see demo video1 at, making the body of data efficiently searchable. Importantly, this graph is objectively grounded in published data and not on the potentially subjective interpretation of the results.