ΔTissue Demonstrator (v0.4)

⬅️  Back Next ➡️

© 2022 Andra Waagmeester and Josh Moore.

This is the first release of the ΔTissue Demonstrator which provides starting points for the exploration of public, linked data. Linked data is core to FAIR data sharing and is a key component of the FAIR data model. In this release we use Wikidata as a central hub for linked data related to the ΔTissue disease areas (TB, TBNC, GBM).

Resources that are reachable via data links include: Wikipathways [1], Reactome [2], Uniprot [3] , OpenCitations [4], Cellosaurus [5], NCBI gene [6], Ensembl [7], Pubchem [8], cBioPortal [9]. Other resources, like Genomic Data Commons Data Portal or sfaira portal provide access to large quantities of data using the same (or compatible) identifiers but which are not directly linkable. Finally, other sites like Scholia provide enhanced visualization of the existing data links, like the image below showing the topics of all ΔTissue authors who could be found in Wikidata.

For the purpose of this demonstrator workflows have been developed:

  1. to complete publication records for a list of authors and publications
  2. to make biological pathways published in the scientific literature machine readable
  3. to query GDC enties via GraphQL and download related tabular data
  4. to download datasets listed on the sfaira portal and load them with the sfaira Python library

In some cases, entries have been created in Wikidata and elsewhere in order to establish initial links. This, however, has not been done systematically without the input of the domain experts.

The demonstrator is a POC to explore existing linked data on the ΔTissue disease areas in Wikidata. Currently, the coverage of Wikidata on the ΔTissue disease areas appears to be incomplete. Relevant data either needs to be added systematically or the existing data needs to be updated, keeping in mind that Wikidata follows applies a CC0 license. Many resources do not. To be able to render a full picture of the linked data cloud related to the disease areas, either more public data must be added or a linked-data resource that hosts non-CC0 data will be needed.

Please note that the queries in this repository were written by the authors who are not domain experts in the disease areas. Suggestions for different queries on the disease areas are welcome using this form

Contents

  1. An Introduction (NEW)
    1. What is Linked Data?
    2. Toy example
    3. Initial project
    4. Outlook
    5. Terminology
  2. Publication Record
    1. Leap group leaders
    2. Group leaders and their past and present affiliations
    3. Group leaders and their publication records
    4. Publications and their topics
    5. Author name strings and their publications
  3. Triple-negative breast record
    1. Genes associated with TNBC through gene variant annotations (negative result)
    2. Therapies associate with positive and negative predicators (partial result)
    3. Genes list linked through pathways on triple-negative breast cancer
    4. Concatenated list of TNBC genes
    5. Chemical compounds part of a pathway on triple-negative breast cancer
    6. Genes and the cellular components where encoded gene products are found in Pathways on TNBC
    7. Cell types related to breast cancer via markers
  4. Glioblastoma
    1. Genes associated with Glioblastoma through gene variant annotations
    2. Gene list linked through pathways on glioblastoma
    3. Concatenated list of GBM genes
    4. Chemical compounds part of a pathway on glioblastoma
    5. All cell lines associated with glioblastoma
  5. Tuberculosis
    1. Translations of the Disease Ontology term DOID:399 (Tuberculosis)
    2. Top 100 authors of publications covering tuberculosis (according to Wikidata)
    3. Genes involved in the immune response to tuberculosis
    4. Concatenated list of tuberculosis genes
  6. Data Resources
    1. sfaira
    2. TCGA
      1. All breast cancer records (NEW)
      2. Mutations and slides for TNBC cases

Future work

Future editions of this demonstrator will include:

  • Direct linked-data validation and enrichment using Shape Expressions
  • Federated querying of linked data where the data is hosted on multiple sources.
  • Direct download of linked data from the sources.
  • Example queries on a revived linked-tcga SPARQL endpoints.

Impressum

This demonstration is written in Markdown with additional instructions consisting of SPARQL queries that are dynamically loaded from https://www.wikidata.org/. While the website itself is licensed under CC-BY-SA, all SPARQL queries in this resource can be used under the CCZero license/waiver. Feedback can be sent via this GitHub repository.

References

[1] Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C. (2008) WikiPathways: Pathway Editing for the People. PLoS Biol 6(7) https://doi.org/10.1371%2FJOURNAL.PBIO.0060184 Scholia (wikidata)

[2] Marc Gillespie, Bijay Jassal, Ralf Stephan, Marija Milacic, Karen Rothfels, Andrea Senff-Ribeiro, Johannes Griss, Cristoffer Sevilla, Lisa Matthews, Chuqiao Gong, Chuan Deng, Thawfeek Varusai, Eliot Ragueneau, Yusra Haider, Bruce May, Veronica Shamovsky, Joel Weiser, Timothy Brunson, Nasim Sanati, Liam Beckman, Xiang Shao, Antonio Fabregat, Konstantinos Sidiropoulos, Julieth Murillo, Guilherme Viteri, Justin Cook, Solomon Shorser, Gary Bader, Emek Demir, Chris Sander, Robin Haw, Guanming Wu, Lincoln Stein, Henning Hermjakob, Peter D’Eustachio, The reactome pathway knowledgebase 2022, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D687–D692, https://doi.org/10.1093/nar/gkab1028 Scholia (wikidata)

[3] UniProt: the universal protein knowledgebase in 2021 Nucleic Acids Res. 49:D1 (2021) https://doi.org/10.1093%2FNAR%2FGKAA1100 Scholia (wikidata)

[4] Silvio Peroni, David Shotton (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1): 428-444. https://doi.org/10.1162/qss_a_00023 Scholia (wikidata)

[5] The Cellosaurus, a cell line knowledge resource. J. Biomol. Tech. 29:25-38(2018) https://doi.org/10.7171/jbt.18-2902-002 Scholia (wikidata)

[6] Agarwala RBarrett TBeck JBenson DABollin CBolton EBourexis DBrister JRBryant SHCanese KCavanaugh MCharowhas CClark KDondoshansky IFeolo MFitzpatrick LFunk KGeer LYGorelenkov VGraeff AHlavina WHolmes BJohnson MKattman BKhotomlianski VKimchi AKimelman MKimura MKitts PKlimke WKotliarov AKrasnov SKuznetsov ALandrum MJLandsman DLathrop SLee JMLeubsdorf CLu ZMadden TLMarchler-Bauer AMalheiro AMeric PKarsch-Mizrachi IMnev AMurphy TOrris ROstell JO’Sullivan CPalanigobu VPanchenko ARPhan LPierov BPruitt KDRodarmer KSayers EWSchneider VSchoch CLSchuler GDSherry STSiyan KSoboleva ASoussov VStarchenko GTatusova TAThibaud-Nissen FTodorov KTrawick BWVakatov DWard MYaschenko EZasypkin AZbicz KCoordinators NRNCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information Nucleic Acids Research 46:D8–D13. https://doi.org/10.1093/nar/gkx1095

[7] Zerbino DRAchuthan PAkanni WAmode MRBarrell DBhai JBillis KCummins CGall AGirón CGGil LGordon LHaggerty LHaskell EHourlier TIzuogu OGJanacek SHJuettemann TTo JKLaird MRLavidas ILiu ZLoveland JEMaurel TMcLaren WMoore BMudge JMurphy DNNewman VNuhn MOgeh DOng CKParker APatricio MRiat HSSchuilenburg HSheppard DSparrow HTaylor KThormann AVullo AWalts BZadissa AFrankish AHunt SEKostadima MLangridge NMartin FJMuffato MPerry ERuffier MStaines DMTrevanion SJAken BLCunningham FYates AFlicek P (2018) Ensembl 2018 Nucleic Acids Research 46:D754–D761. https://doi.org/10.1093/nar/gkx1098

[8]Wang YXiao JSuzek TOZhang JWang JBryant SH (2009) PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Research 37:W623–W633. https://doi.org/10.1093/nar/gkp456 Scholia

[9]Cerami et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery. May 2012 2; 401. Scholia (wikidata)

⬅️  Back Next ➡️