California Dreaming: InterMine Dev Conf 2017 Report – Day 1

2017’s developer conference has been and gone; time to pay my dues in a blog post or two.

Day 0: Welcome dinner, 29 March 2017

The Cambridge InterMine arrived at Walnut Creek without a hitch, and after a jetlagged attempt at a night’s sleep we sat down to a mega-grant-writing session in the hotel lobby, fuelled by several pots of coffee and plates of nachos.

By 7PM, people had begun to gather in the lobby to head to the inaugural conference dinner at the delicious Walnut Creek Yacht Club. We had to change the venue quite late on in the game, meaning we decided to wander down the street to collect some of the InterMiners who had ended up at the original venue (sorry!!). By the end of the meal, most of the UK contingent was dead on their feet – 10pm California time worked out to be 6am according to our body clocks, so when Joe offered to give several of us a lift back to the hotel, it was impossible to decline.

20170329_221945

Day 1: Workshop Intro

The day started with intros from our PI, Gos, and our host, David Goodstein. 

Josh and I followed up by introducing BlueGenes, the UI we’ve been working on to replace InterMine’s older JSP-based UI. You can view Josh’s slide deck , try out a live demoor browse / check out the source on GitHub.

Next came one of my favourite parts: short talks from InterMiners.

Short community talks

Doppelgangers – Joel Richardson, MGI

Joel gave a great presentation about Doppelgangers in InterMine – that is, occasionally, depending on your data sets and config, you can end up with duplicate or strange / incomplete InterMine objects in your mine. He follows up with explanations of the root causes and mitigation methods – a great resource for any InterMiner who is working in data source integration! 

Genetic data in Mines – Sam Hokin, NCGR/LegFed

Next up was Sam’s talk about his various beany mines, including CowpeaMine, which has only genetics data, rather than the more typical InterMine genomic data. He’s also implemented several custom data visualisations on gene report pages – check out the slides or mines for more details.

JBrowse and Inter-mine communication – Vivek Krishnakumar, JCVI

Vivek focused on some great cross-InterMine collaborations (slides here), including the technical challenges integrating JBrowse into InterMine, as well as a method to link to other InterMines using synteny rather than InterMine’s typical homology approach.

InterMine at JGI – Joe Carlson, Phytozome, JGI

Joe has the privilege to run the biggest InterMine, covering (currently) 72 data sets on 69 organisms. Compared to most InterMines, this is massive! Unsurprisingly, this scale comes with a few hitches many of the other mines don’t encounter. Joe’s slides give a great overview of the problems you might encounter in a large-scale InterMine and their solutions.

Afternoon sessions

FAIR and the semantic web – Daniela & Justin

After a yummy lunch at a nearby cafe, Justin introduced the concept of FAIR, and discussed InterMine’s plans for a FAIRer future (slides). Discussion topics included:

  • How to make stable URIs (InterMine object IDs are transient and will change between builds)
  • Enhanced embedded metadata in webpages and query results (data provenance, licencing)
  • Better Findablility (the F in FAIR) by registering InterMine resources with external registries
  • RDF generation / SPARQL querying

This was followed up by Daniela’s introduction to RDF and SPARQL, which provided a great basic intro to the two concepts in an easily-understood manner. I really loved these slides, and I reckon they’d be a good introduction for anyone interested in learning more about what RDF and SPARQL are, whether or not you’re interested in InterMine .

Extending the InterMine Core Data Model – Sergio

Sergio ran the final session, “Extending the InterMine Core Data Model“. Shared models allow for easier cross-InterMine queries, as demoed in the GO tool prototype:

This discussion raised several interesting talking points:

  • Should model extensions be created via community RFC?
  • If so, who is involved? Developers, community members, curators, other?
  • Homologue or homolog? Who knew a simple “ue” could cause incompatibility problems? Most InterMine use the “ue” variation, with the exception of PhytoMine. An answer to this problem was presented in the “friendly mine” section of Vivek’s talk earlier in the day.

Another great output was Siddartha Basu’s gist on setting up InterMine – outlining some pain points and noting the good bits.

Most of us met up for dinner afterwards at Kevin’s Noodle House – highly recommended for meat eaters, less so for veggies.

Advertisements

A flurry of deadlines: Grants, GSoC, workshops, and more…

We blogged in February commenting that we had a lot of events over the March / April period. Here’s a re-cap:

  • Attending conferences: Amongst the team we attended Bioschemas, the Elixir all-hands, and the Cambridge Scientific Computation Day.
  • InterMine training: We delivered a training workshop about using InterMine at the EBI, part of their Introduction to Omics data integration week-long course.
    • This went well despite a server-room meltdown which conveniently timed itself for the morning of the same day (the training session was in the afternoon, so we thankfully had time to get the servers back up!).
    • In contrast to previous years, every single hand went up when we asked if the participants wrote code as part of their job. Next time, we will try to allow for a longer session on using InterMine web services, rather than the 15 minute slot we allocated this time!
  • Developer Workshop and Hackathon: 5 days in sunny California, spending time with InterMiners from around the world. Longer blog posts to follow, but in the meantime you can browse the agenda for links to slides from each session, or the storify summary of tweets.
  • Google Summer of Code: We’re participating in Google Summer of Code (GSoC) this year (previously) as a mentoring organisation. We had over 50 interested students and 30 distinct applications, many of which were simply brilliant. The deadline for students applying, naturally, was the day after the hackathon, making finding time to provide student feedback a challenge. Maybe there’s a reason to be grateful for jet-lag induced wakefulness at odd hours!
  • Grants: A tale of two grants… :
    • New application: We had a grant application deadline that was, once again, the day after the hackathon. Uh-oh! Feverish figure fixes, tentative typo tweaks and word-count winnowing was squeezed in at every opportunity.
    • Good news about an old application: Meanwhile, we got the news that we’d been fortunate enough to have our hard work pay off: a grant we’d applied for last year as part of the BBSRC BBR 2016 call was agreed to! Hint: the future of InterMine is looking very FAIR, possibly even SPARQLing. More details in a later post.

Events coming up soon:

New Blog!

We’ve decided to streamline our blogging experience a little bit. Rather than maintaining several separate but mostly similar blogs for HumanMine, FlyMine, and InterMine, this blog will act as a combined stream.

Don’t worry – this doesn’t mean you’ll be forced to view irrelevant updates if you’re only interested in one of the sub categories. WordPress is great about filtering via tag or category. Here are a few quick links:

InterMine-specific updates: https://intermineorg.wordpress.com/category/intermine/

FlyMine-specific updates: https://intermineorg.wordpress.com/category/flymine/

HumanMine-specific updates: https://intermineorg.wordpress.com/category/humanmine/ (It’s empty now, but coming soon!)

2016 InterMine RoadMap

We have a brand new blog and so would like to take this opportunity to tell you our grand plans for 2016.

InterMine 2.0

Gradle

Currently InterMine is built with a series of ant commands, and dependencies are managed manually. This of course is not ideal, and we plan to use Gradle to replace Ant and manage our dependencies automatically. This change will make builds faster, easier and more efficient.

For those of you with InterMines of your own, this means that you will use different commands for building your databases and deploying your webapps. We’ll provide the new commands along with documentation, and aim to make the transition as easy as possible.

Keyword Search

We currently use Lucene for our search index but plan to greatly expand our utilisation of this great library — making search on InterMine more robust, sensitive and powerful.

The Cloud

Some have already deployed their InterMine to the cloud. We intend to make this process much easier, probably by creating a custom InterMine buildpack which pre-configures a Docker container with all of InterMine’s dependencies.

New Data Sources

We are always adding new data sources and would like to hear your suggestions. On our list right now is:

And of course we will continue to update our current data source library as file formats and data change.

New User Interface

We’ve developed a new user interface which should be ready for beta testing in early 2016. It’ll exist alongside the current interface for some time, allowing you to feed back ideas, suggestions, and critiques in the new interface, whilst still being able to rely on the old one.

Here’s a sneak preview (subject to plenty of change, of course!):

Sneak preview: Homepage for the (work-in-progress) Intermine 2.0 UI.
Sneak preview: Homepage for the (work-in-progress) Intermine 2.0 UI.

New Tools

To go along with our new interface, we’re going to be adding a lot of new tools for you to use. Our wish list so far (not in order of priority):

  1. Advanced Search / Query builder / Guided search
  2. Recommendation engine (which gene is like my gene?)
  3. Complex Interaction viewer
  4. more powerful region search
  5. phenotype viewer
  6. InterMine search tool
  7. R plug-in
  8. Text mining tool
  9. JBrowse / other genome browsers
  10. UniProt protein browser

We’d like to hear which tools are important to you. We also will improve the tools we currently have, making them easier to adapt to your data sets.

2017 and beyond

Genomes are being sequenced every day, technology is moving at an ever more rapid pace and everyone is facing a challenging funding environment. We don’t know quite what the world will look like in the next five years but we are working hard to be future proof. We’ve always had a deep commitment to openness, flexibility and collaboration, and feel that this will help us meet any future challenges.

Towards this end, we are running a pilot program to test out various graph databases and to explore the semantic web. We will keep you posted on our progress as always, and would like to hear your thoughts.

Thanks to our great community for all of their support over the years! We look forward to a really exciting year!