Announcing CovidMine – analyse integrated COVID genomic and geographical distribution data

We’re excited to announce that a project we’ve been working on for the last few weeks is ready for public consumption: CovidMine, an InterMine dedicated to COVID-19 / SARS-CoV-2 data. Data is updated on a daily basis Monday-Saturday at 6PM UK time. You can try CovidMine out now, or read more about it below. 

So, what’s it all about, and why another COVID resource? 

This is something we thought about a lot, initially – there have been a massive number of initiatives going into making data available and visualising it already. In the end it came down to a couple of reasonably simple facts: InterMine already has tools to draw data from a lot of sources and integrate it, but it also offers a familiar interface if you’ve used any of the other InterMines out there, and we have API language bindings for multiple programming languages, including R, Python, Perl, and Javascript

Data sources include confirmed Covid-19 cases, deaths, new confirmed cases and new deaths for countries from Our World In Data1, data separately for individual states (for the United States only) from the COVID Tracking Project2, Sars-CoV-2 reference genome3 and nucleotide sequences from isolates deposited in Genbank4.

If you’re aware of other data sets that might make this more useful please contact us to suggest them.

Jump straight in

We’ve prepared a few template queries to help you get started with your analysis –

What’s still missing and how can I help? 

We’re officially focusing our efforts on developing tools for CovidMine in our new user interface, BlueGenes, rather than the legacy JSP interface. 

A few things we’d like to add to the UI:

  • A data visualisation showing all results on a map.
  • A visualisation that shows change over time in countries or regions, for known cases, recovered, and deaths. 
  • A genome browser (JBrowse 1)

These visualisations would update based on the filters in the table showing in your data

Data updates: 

  • Find and integrate a data source which provides China data separately for individual states

Bioschemas Markup

We have applied structured data in JSON-LD format, using the Bioschemas.org profiles DataSet, Gene and Protein. It’s available in the legacy JSP interface only, but it will be integrated in the new interface soon.

If you’re aware of other data sets that might make this more useful, or other visualisations that might be exciting, please contact us to suggest them! 

References:

  1. https://covidtracking.com
  2. https://covid.ourworldindata.org
  3. https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/#reference-genome
  4. https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049

Published by

yoyehudi

I make software & yarn things. Heart open*, scifi, cycling, running, veggies. 🇳🇿🇮🇱🇬🇧🇪🇺🏳️‍🌈. @yoyehudi on Twitter, but also @codeisscience & @intermineorg. @softwaresaved Fellow.