Bioschemas

Justin and Gos attended the BioSchemas kick-off meeting at Hinxton this week. As well as giving a short talk about InterMine (the slides are on FigShare), Justin managed to jot down a few thoughts about the event:


The aim of the Bioschemas schema is to come up with simple metadata that can be embedded in webpages (JSON-LD, RDFa, etc.) to make it easier to find data. For instance, suppose you wanted datasets that concerned the effect of a certain drug.  At the moment, if you search in Google for that drug name, you will find relevant datasets, but also possibly datasets that happen to use that drug as part of their protocol and more things besides.

But if you can embed a schema which names the subject of the dataset in a structured way (e.g. puts an ontology term URL in a dataset.subject field) then you can pull up more relevant data.

However, there is a strong concern with keeping such markup as light as possible so that people aren’t put off annotating their datasets.  Hence, a notional rule that there should only be 6 properties per class (e.g. just name, description, url, keywords, variableMeasured, creator.name for dataset).

As such, bioschemas won’t be replacing any of the existing ‘heavyweight’ schemas (DATS, DataCite’s model, OMICSdi model, our own InterMine data model), as it’s not meant to be used as an internal data model.

  • Bioschemas is about getting properties into schema.org (as supported by Google, Bing, Yahoo, etc.).  However, schema.org is a big bag of potential metadata with few constraints (no cardinality, for instance!).  It’s up to Bioschemas to come up with constraints for our purposes, especially on generic metadata such as DataSet and DataCatalog.
  • Bioschemas is also largely about general areas at the moment (datasets, data sources, etc.) though there is some specific work on protein annotations and samples.  But not, for instance, on genes and proteins at this stage (though presumably protein annotations would need some metadata for proteins….)
  • Things are at an early stage – the next year will (hopefully) see some Bioschemas definitions.  There is still debate about exactly what it can be used for beyond search.  For instance, we (InterMine) may be able to use them to improve our data integration process if all sources start embedding common metadata in their download files as well as on webpages.
  • Bioschemas is more than schema work. The initiative covers topics such as identifiers, citation, metrics and tools too (which will be relevant to us in the future).  We can get value from these other areas too – for instance there was discussion of a standard way of providing notification that a data source (uniprot, etc.) had updated, which would be very useful to us in building mines that automatically update.  There was also talk of having metadata that specifies when a data source changes its format – another thing that would be tremendously useful for us.
  • The presentations were interesting and the group friendly.  InterMine seemed to be well received.  The group doesn’t have a many actual data consumers and integrators, so I think that we can make a valuable contribution from that perspective.

Google Summer of Code at InterMine

We’re pleased to announce that we’ll be participating in Google Summer of Code 2017 as a mentor organisation, under the umbrella of the Open Genome Informatics. Here’s the full ideas list for Open Genome Informatics Projects – InterMine projects are numbers 3 to 9.

Information for students:

About us:

InterMine is an open source biological data warehouse, based in the University of Cambridge. There are nearly thirty instances of public InterMines, covering a range of subjects from organisms like mice and rats, mines dedicated to plants such as the soybean, insects like the fruitfly or bees and wasps, and even mines dedicated to mitochondrial DNA and discovering drug targets.

We’re interested in mentoring students from a bioinformatics, computational biology, or computer science background.

You don’t have to be a biologist to work on InterMine related projects – many of the full time developers on the team didn’t come from a biology background – but biological knowledge is an advantage.

We use a range of languages in our projects, but most commonly you’ll see Java, PostgreSQL, Clojure/ClojureScript, and JavaScript. Each instance of InterMine has its own set of web services, and there are client libraries in five different languages, with a sixth in final stages of development.

Browse through our GitHub repos to see more of our projects: https://github.com/intermine

Getting started:

If you’re interested in applying for one of our projects, drop an email to the people named in the project description to introduce yourself, and explain which of the project(s) you’re interested in. There’s already been quite a lot of interest in the Similarity project from multiple students, so you might want to consider one of the other projects as a backup if you think you’d particularly like InterMine.

When you mail us, please make sure to include as many of the following as possible:

  • A CV / Resume. Tell us about yourself!
  • Links to GitHub, BitBucket, LinkedIn or similar.
  • Sample code. If you don’t have GitHub/Bitbucket etc. we’d still like to see what you can do. A class coding assignment or personal project you’re proud of is a great alternative.

A great way to familiarise yourself with the basics of building InterMine is to run through our tutorial: http://intermine.readthedocs.io/en/latest/get-started/tutorial/ – or alternatively you could try familiarising yourself with the web interface for your preferred InterMine. You can find the full list of InterMines at intermine.org, or try our experimental interface here: http://redgenes.apps.intermine.org/

We’ve also set up a few tickets on the core InterMine repo with the tag “Good first bug” if you’d like to get your hands dirty. Pop a note on the ticket and make a pull request when you think you’re ready. We have some guidelines for contributing that you should read before you make the pull request.

Finally, if you have any ideas or questions, please don’t hesitate to email us.

Useful links:

– Our twitter feed: https://twitter.com/intermineorg
– Here’s a blog post about some of the cool things the community has done with InterMine resources: https://intermineorg.wordpress.com/2016/11/22/cool-intermine-features-roundup/
– Our interactive web services docs: http://iodocs.apps.intermine.org/
– Our very in-the-works new ClojureScript UI. Demo: http://redgenes.apps.intermine.org/ repo: github.com/intermine/redgenes
– Developer documentation: http://intermine.readthedocs.io/en/latest/

Upcoming Events

March and April 2017 are looking to be quite exciting for the InterMine team. Here’s where to spot us over the next couple of months:

6-8 March 2017: BioSchemas kick-off meeting – If you’re there, look out for Gos and Justin!

8 March 2017: As part of the EBI’s Introduction to Omics Data Integration course, Rachel will be delivering a session on InterMine “Open tools for data integration – hands on example with InterMine

20 March 2017: Scientific Computation in the University of Cambridge Seminar Day – Daniela will be giving the talk “InterMine: Best Practices for Open Source Software

29 March 2017 – 2 April 2017: Our very own InterMine Dev Workshop and Hackathon. Registrations are still open at the moment (21 Feb 2017), but the early bird room rate will be expiring in early March, so try not to delay if you’re planning on coming!

24-27 April 2017: Josh will be attending the HUPO-PSI 2017 meetup in Beijing.

2016 holiday period

1.png

Quick FYI:

We’ll have several people out of the office – but not everyone – over the week before Christmas, starting from today (16 December). If you need to contact us, you may get a prompter response by emailing our lists (e.g. the dev list) which will reach all of us, rather than emailing individuals directly.

The annual InterMine office closure will be between 23rd December evening until 3rd January morning UK time. During this period no one will be in the office, and we may not be able to respond to emails, tweets, blogs, etc.

See you all in 2017!!

 

Cool InterMine features roundup

I’ve said this before, but I’ll proudly say it again: one of the greatest things about being open source is the community. People are continually creative and resourceful with the tools we’ve built, and we love seeing all the different things you guys do with InterMine. Here’s a quick roundup of some of the things we’ve seen so far this year:

TargetMine’s Auxiliary Toolkit

targetmine-new-stuff
TargetMine’s Auxiliary toolkit offers advanced analysis for networks and enrichment

TargetMine links out from report pages to provide external enrichment and interaction tools. Read more about it here, or  browse the tutorials: [Enrichment] [Interaction Network].

The Beany Mines:

The beany mines (Soy, Peanut, Legume, and Bean) recently added a shared motif search, as well as a couple of other great visualisations:legume-shared-motif-search

 

R and SOLR

Colin of HymenopteraMine and BovineMine did a great blog post about using our R client, InterMineR, and then continued to impress by making efforts to upgrade InterMine to use Solr.

MOLD

Ever wondered what Model Organism Linked Data might look like?  MOLD includes a queryable SPARQL endpoint and draws from multiple different InterMines to create a single dataset.

mold

Tip: Make it generic

Generic tools are ones that aren’t hard-coded to a specific Mine or model. We’re always on the look out for new and exciting features, whether it’s a visualisation or a web service or a database tweak. If you think it’s good, you can email us to discuss it or simply create a pull request, and bask in glory forever after.

We’d love to see more!

This list is awesome (thanks everyone!!) but by no means conclusive. If you think we’ve missed something out, or you’re doing something new at the moment, drop us a line and we’ll add you to the next round up. We’d also love to hear from others who might be interested in guest-blogging an InterMine related feature.

Save the date: 29-31 March 2017

Remember the big International InterMine meetup we were tentatively discussing a few months back? Thanks again to everyone who responded to the survey, as it helped us a lot. We’re still in the process of nailing down the details, but here’s the rough program we are expecting (with details potentially subject to change but hopefully they won’t…)

Wednesday 29th March 2017: Arrival at Berkley in preparation for the fun ahead. Hopefully there will be some limited accommodation on site available for early birds.

Thursday 30th and Friday 31st March 2017: The conference itself. Details to be confirmed.

Saturday 1st and Sunday 2nd April: Hackathon! Entirely optional. Put your thinking caps on and start looking for fun ideas.

We’ll post more details as things become concrete, and we’d love to hear from you if you have any ideas or thoughts regarding the conference and its content. We still need to think of a catchy name and hashtag for twitter!

Finally for now, we’d like to give massive thanks to the folks at JGI for helping us to coordinate this.