Bioschemas

Justin and Gos attended the BioSchemas kick-off meeting at Hinxton this week. As well as giving a short talk about InterMine (the slides are on FigShare), Justin managed to jot down a few thoughts about the event:


The aim of the Bioschemas schema is to come up with simple metadata that can be embedded in webpages (JSON-LD, RDFa, etc.) to make it easier to find data. For instance, suppose you wanted datasets that concerned the effect of a certain drug.  At the moment, if you search in Google for that drug name, you will find relevant datasets, but also possibly datasets that happen to use that drug as part of their protocol and more things besides.

But if you can embed a schema which names the subject of the dataset in a structured way (e.g. puts an ontology term URL in a dataset.subject field) then you can pull up more relevant data.

However, there is a strong concern with keeping such markup as light as possible so that people aren’t put off annotating their datasets.  Hence, a notional rule that there should only be 6 properties per class (e.g. just name, description, url, keywords, variableMeasured, creator.name for dataset).

As such, bioschemas won’t be replacing any of the existing ‘heavyweight’ schemas (DATS, DataCite’s model, OMICSdi model, our own InterMine data model), as it’s not meant to be used as an internal data model.

  • Bioschemas is about getting properties into schema.org (as supported by Google, Bing, Yahoo, etc.).  However, schema.org is a big bag of potential metadata with few constraints (no cardinality, for instance!).  It’s up to Bioschemas to come up with constraints for our purposes, especially on generic metadata such as DataSet and DataCatalog.
  • Bioschemas is also largely about general areas at the moment (datasets, data sources, etc.) though there is some specific work on protein annotations and samples.  But not, for instance, on genes and proteins at this stage (though presumably protein annotations would need some metadata for proteins….)
  • Things are at an early stage – the next year will (hopefully) see some Bioschemas definitions.  There is still debate about exactly what it can be used for beyond search.  For instance, we (InterMine) may be able to use them to improve our data integration process if all sources start embedding common metadata in their download files as well as on webpages.
  • Bioschemas is more than schema work. The initiative covers topics such as identifiers, citation, metrics and tools too (which will be relevant to us in the future).  We can get value from these other areas too – for instance there was discussion of a standard way of providing notification that a data source (uniprot, etc.) had updated, which would be very useful to us in building mines that automatically update.  There was also talk of having metadata that specifies when a data source changes its format – another thing that would be tremendously useful for us.
  • The presentations were interesting and the group friendly.  InterMine seemed to be well received.  The group doesn’t have a many actual data consumers and integrators, so I think that we can make a valuable contribution from that perspective.

BioJS Workshop Dec 2015

After the excitement around BiVi, I’d be remiss if I didn’t discuss all the work put into both a BioJS presentation at BiVi, and BioJS Workshop in the afternoon after BiVi.

We’re already avid BioJS fans at InterMine, because BioJS provides easy plug-in visualisations (for example, cytoscape). I’d expected that a Venn diagram depicting the BioJS crowd would intersect almost perfectly with the BiVi crowd, so I was surprised to find that they were actually completely separate groups.

The difference was explained to me by Manny Corpas as follows: While BioJS,  given the (mostly) browser nature of Javascript, is indeed about visualisations – not all of it is dedicated to visual things. BioJS modules can be related to data parsing, for example.

On the other side of things, BiVi is about visualisation – no matter the language. Indeed, quite a few of the demos we saw at BiVi were desktop or server based, and unrelated to Javascript at all.

The workshop covered the basics of Javascript development, and shown how to include/interact with BioJS components on a webpage – but the most interesting sessions for me (as someone who makes a living out of writing Javascript, among other things) was definitely the session at the end where we were talked through creating our very own BioJS component.

Dennis Schwartz bravely live-coded a pie chart using d3.js on a projector – not an easy task! We started by setting up the scaffolding of the project using the BioJS Slush generator. This created examples, set up a build process, and ensured the BioJS pre-requisites were present, like licence and tags (which allows the biojs registry sniffer to pick up biojs packages from the npm registry). Despite only having an hour or so to get it all done, by the end we had each coded a functioning basic component.

The workshop finished off nicely with group pizza to feed hungry biojshackers. Unfortunately I was unable to attend the hackathon the next day, but if its quality was anything like the workshop, I’m sure it must have been a fabulous success.

 

BiVi 2015

It’s the first year someone from InterMine has attended BiVi (Biology Visualisation), which is in its second year of a (currently) three-year plan to create a community around biology visualisation.

The atmosphere was pleasantly thriving, with probably 20 or 30 attendees from Cambridge, Edinburgh, London, Oxford, and even Institut Curie in France. I think it was safe to say that most of the attendees were people with a computer science background who worked in biology-related fields, although that wasn’t a strict rule.

Two themes were particularly popular this year:

  1. 3d molecule visualisation. Multiple different talks / groups of BiVians had exciting developments to show us, from Hapitimol, a visualisation with haptic feedback, to EZMol, with an easy-to-use wizard to produce visualisations, Foldsynth, a physics based engine from Goldsmiths, MARender, a Javascript-driven biomedical imaging library, and BioBlox, which gamifies protein docking to allow crowd-sourced research.
  2. Usability. Biology is a rich, complex field. Computer people making tools for biologists need to keeps things easy to use. EZMol, mentioned above, was created due to a notable lack of simple usable 3d visualisation tools. Our upcoming InterMine 2.0 release is another push towards creating a better user experience.

One of the most impressive tools demonstrated was Zegami, a tool to annotate, view, and filter images. It may have started as a biological tool, but it’s equally functional for your holiday snaps or for sorting and filtering Netflix movies. It’s a shame we don’t really have any image data in InterMine (well, in Fly or HumanMine at least) given how cool it looked.

zegami.png
Example from zegami.com: South Australian Museum invertebrates.

A few other tools / demos of note included:

  • Reactome pathway browser, which is fully embeddable into your own web app.
  • Jalview, ‘a free program for multiple sequence alignment editing, visualisation and analysis’. The main Jalview dev is also one of the organisers of the VizBi biology vis conference, next taking place in Germany.
  • Aequatus-vis uses Ensembl web services to visualise homologous gene families.

On day 2, I gave a short talk on the future of InterMine, focusing on why we want to revamp our UI, and what we think we’re doing better in InterMine 2.0. The slides are available on Google Drive (better format) or Slideshare.