California Dreaming: InterMine Dev Conf 2017 Report – Day 1

2017’s developer conference has been and gone; time to pay my dues in a blog post or two.

Day 0: Welcome dinner, 29 March 2017

The Cambridge InterMine arrived at Walnut Creek without a hitch, and after a jetlagged attempt at a night’s sleep we sat down to a mega-grant-writing session in the hotel lobby, fuelled by several pots of coffee and plates of nachos.

By 7PM, people had begun to gather in the lobby to head to the inaugural conference dinner at the delicious Walnut Creek Yacht Club. We had to change the venue quite late on in the game, meaning we decided to wander down the street to collect some of the InterMiners who had ended up at the original venue (sorry!!). By the end of the meal, most of the UK contingent was dead on their feet – 10pm California time worked out to be 6am according to our body clocks, so when Joe offered to give several of us a lift back to the hotel, it was impossible to decline.

20170329_221945

Day 1: Workshop Intro

The day started with intros from our PI, Gos, and our host, David Goodstein. 

Josh and I followed up by introducing BlueGenes, the UI we’ve been working on to replace InterMine’s older JSP-based UI. You can view Josh’s slide deck , try out a live demoor browse / check out the source on GitHub.

Next came one of my favourite parts: short talks from InterMiners.

Short community talks

Doppelgangers – Joel Richardson, MGI

Joel gave a great presentation about Doppelgangers in InterMine – that is, occasionally, depending on your data sets and config, you can end up with duplicate or strange / incomplete InterMine objects in your mine. He follows up with explanations of the root causes and mitigation methods – a great resource for any InterMiner who is working in data source integration! 

Genetic data in Mines – Sam Hokin, NCGR/LegFed

Next up was Sam’s talk about his various beany mines, including CowpeaMine, which has only genetics data, rather than the more typical InterMine genomic data. He’s also implemented several custom data visualisations on gene report pages – check out the slides or mines for more details.

JBrowse and Inter-mine communication – Vivek Krishnakumar, JCVI

Vivek focused on some great cross-InterMine collaborations (slides here), including the technical challenges integrating JBrowse into InterMine, as well as a method to link to other InterMines using synteny rather than InterMine’s typical homology approach.

InterMine at JGI – Joe Carlson, Phytozome, JGI

Joe has the privilege to run the biggest InterMine, covering (currently) 72 data sets on 69 organisms. Compared to most InterMines, this is massive! Unsurprisingly, this scale comes with a few hitches many of the other mines don’t encounter. Joe’s slides give a great overview of the problems you might encounter in a large-scale InterMine and their solutions.

Afternoon sessions

FAIR and the semantic web – Daniela & Justin

After a yummy lunch at a nearby cafe, Justin introduced the concept of FAIR, and discussed InterMine’s plans for a FAIRer future (slides). Discussion topics included:

  • How to make stable URIs (InterMine object IDs are transient and will change between builds)
  • Enhanced embedded metadata in webpages and query results (data provenance, licencing)
  • Better Findablility (the F in FAIR) by registering InterMine resources with external registries
  • RDF generation / SPARQL querying

This was followed up by Daniela’s introduction to RDF and SPARQL, which provided a great basic intro to the two concepts in an easily-understood manner. I really loved these slides, and I reckon they’d be a good introduction for anyone interested in learning more about what RDF and SPARQL are, whether or not you’re interested in InterMine .

Extending the InterMine Core Data Model – Sergio

Sergio ran the final session, “Extending the InterMine Core Data Model“. Shared models allow for easier cross-InterMine queries, as demoed in the GO tool prototype:

This discussion raised several interesting talking points:

  • Should model extensions be created via community RFC?
  • If so, who is involved? Developers, community members, curators, other?
  • Homologue or homolog? Who knew a simple “ue” could cause incompatibility problems? Most InterMine use the “ue” variation, with the exception of PhytoMine. An answer to this problem was presented in the “friendly mine” section of Vivek’s talk earlier in the day.

Another great output was Siddartha Basu’s gist on setting up InterMine – outlining some pain points and noting the good bits.

Most of us met up for dinner afterwards at Kevin’s Noodle House – highly recommended for meat eaters, less so for veggies.

A flurry of deadlines: Grants, GSoC, workshops, and more…

We blogged in February commenting that we had a lot of events over the March / April period. Here’s a re-cap:

  • Attending conferences: Amongst the team we attended Bioschemas, the Elixir all-hands, and the Cambridge Scientific Computation Day.
  • InterMine training: We delivered a training workshop about using InterMine at the EBI, part of their Introduction to Omics data integration week-long course.
    • This went well despite a server-room meltdown which conveniently timed itself for the morning of the same day (the training session was in the afternoon, so we thankfully had time to get the servers back up!).
    • In contrast to previous years, every single hand went up when we asked if the participants wrote code as part of their job. Next time, we will try to allow for a longer session on using InterMine web services, rather than the 15 minute slot we allocated this time!
  • Developer Workshop and Hackathon: 5 days in sunny California, spending time with InterMiners from around the world. Longer blog posts to follow, but in the meantime you can browse the agenda for links to slides from each session, or the storify summary of tweets.
  • Google Summer of Code: We’re participating in Google Summer of Code (GSoC) this year (previously) as a mentoring organisation. We had over 50 interested students and 30 distinct applications, many of which were simply brilliant. The deadline for students applying, naturally, was the day after the hackathon, making finding time to provide student feedback a challenge. Maybe there’s a reason to be grateful for jet-lag induced wakefulness at odd hours!
  • Grants: A tale of two grants… :
    • New application: We had a grant application deadline that was, once again, the day after the hackathon. Uh-oh! Feverish figure fixes, tentative typo tweaks and word-count winnowing was squeezed in at every opportunity.
    • Good news about an old application: Meanwhile, we got the news that we’d been fortunate enough to have our hard work pay off: a grant we’d applied for last year as part of the BBSRC BBR 2016 call was agreed to! Hint: the future of InterMine is looking very FAIR, possibly even SPARQLing. More details in a later post.

Events coming up soon:

Goal in sight: better uptime

TL;DR: We’re planning to host our CDN and an InterMine registry on AWS in the future. For more specifics, read on…

Over the course of 2015, we had several note-worthy service outages.

First of all there was a downpour in Cambridge that caused massive flooding, which took out the university’s network connection. The server room actually had to be pumped out. Yikes.

Next, the machine room had a broken chiller – this was so severe that the servers began to heat up almost immediately and we had to go into shutdown straight away whilst repair people worked on it. Services came back up the next morning, but only just in time for our training workshop.

Then came the big nasty DDOS attack against Janet, a higher education network which spans the UK. From the university, we could access some websites, but many more were down. Crucially, no one could reach our servers from the outside, even though this time they were running.

Apart from taking down HumanMine, FlyMine, and our websites, this also took down our CDN, meaning that anyone who hosted their own InterMine but relied upon us for their CDN was also affected by the outages.

Amazon Web Services to the rescue

We’re always working our hardest to make sure things run as smoothly as possible, but sometimes things are beyond our control. This number of outages is obviously less than ideal, and we’d like to do better in 2016.

One solution is to set up your own InterMine CDN, but could we do something to help you on our end, something that doesn’t require extra work for others? After a bit of thinking, we realised we probably could.

So as part of a plan to offer a more reliable service, our CDN will be migrated to live on AWS in the next few months. AWS has a pretty darned good uptime track record. What’s more, Amazon serves a massive chunk of the web – so if AWS ever is down, the likelihood is that much of the web will be down as well, and the outage won’t be surprising to any potential users.

An InterMine registry

Another feature that’s been in planning works for a while is an InterMine registry. Like the CDN, it’s pretty important that a dynamic discoverable registry have near-perfect uptime. So hopefully you can expect to see the hardcoded mine lists disappearing as we start to take advantage of a hosting service with high availability.

Watch this space for the official announcement and release dates for the CDN upgrade and registry release!

A final note about outages

While email was okay when Janet was playing up, the machine room breakdowns took out our mail server, and hence the mailing list – so the “Hey guys, is your server down?” queries didn’t reach us until after everything had recovered.

If you’re aware of a problem, the best place to find out more will be our Twitter account, @intermineorg. We’ll always make an effort to update our twitter feed – by mobile network if necessary – with details of outages, whether planned or emergency.

Featured image courtesy of Torkild Retvedt via Flickr. Licence CC-BY-SA 2.0

BiVi 2015

It’s the first year someone from InterMine has attended BiVi (Biology Visualisation), which is in its second year of a (currently) three-year plan to create a community around biology visualisation.

The atmosphere was pleasantly thriving, with probably 20 or 30 attendees from Cambridge, Edinburgh, London, Oxford, and even Institut Curie in France. I think it was safe to say that most of the attendees were people with a computer science background who worked in biology-related fields, although that wasn’t a strict rule.

Two themes were particularly popular this year:

  1. 3d molecule visualisation. Multiple different talks / groups of BiVians had exciting developments to show us, from Hapitimol, a visualisation with haptic feedback, to EZMol, with an easy-to-use wizard to produce visualisations, Foldsynth, a physics based engine from Goldsmiths, MARender, a Javascript-driven biomedical imaging library, and BioBlox, which gamifies protein docking to allow crowd-sourced research.
  2. Usability. Biology is a rich, complex field. Computer people making tools for biologists need to keeps things easy to use. EZMol, mentioned above, was created due to a notable lack of simple usable 3d visualisation tools. Our upcoming InterMine 2.0 release is another push towards creating a better user experience.

One of the most impressive tools demonstrated was Zegami, a tool to annotate, view, and filter images. It may have started as a biological tool, but it’s equally functional for your holiday snaps or for sorting and filtering Netflix movies. It’s a shame we don’t really have any image data in InterMine (well, in Fly or HumanMine at least) given how cool it looked.

zegami.png
Example from zegami.com: South Australian Museum invertebrates.

A few other tools / demos of note included:

  • Reactome pathway browser, which is fully embeddable into your own web app.
  • Jalview, ‘a free program for multiple sequence alignment editing, visualisation and analysis’. The main Jalview dev is also one of the organisers of the VizBi biology vis conference, next taking place in Germany.
  • Aequatus-vis uses Ensembl web services to visualise homologous gene families.

On day 2, I gave a short talk on the future of InterMine, focusing on why we want to revamp our UI, and what we think we’re doing better in InterMine 2.0. The slides are available on Google Drive (better format) or Slideshare.

 

 

New Blog!

We’ve decided to streamline our blogging experience a little bit. Rather than maintaining several separate but mostly similar blogs for HumanMine, FlyMine, and InterMine, this blog will act as a combined stream.

Don’t worry – this doesn’t mean you’ll be forced to view irrelevant updates if you’re only interested in one of the sub categories. WordPress is great about filtering via tag or category. Here are a few quick links:

InterMine-specific updates: https://intermineorg.wordpress.com/category/intermine/

FlyMine-specific updates: https://intermineorg.wordpress.com/category/flymine/

HumanMine-specific updates: https://intermineorg.wordpress.com/category/humanmine/ (It’s empty now, but coming soon!)

2016 InterMine RoadMap

We have a brand new blog and so would like to take this opportunity to tell you our grand plans for 2016.

InterMine 2.0

Gradle

Currently InterMine is built with a series of ant commands, and dependencies are managed manually. This of course is not ideal, and we plan to use Gradle to replace Ant and manage our dependencies automatically. This change will make builds faster, easier and more efficient.

For those of you with InterMines of your own, this means that you will use different commands for building your databases and deploying your webapps. We’ll provide the new commands along with documentation, and aim to make the transition as easy as possible.

Keyword Search

We currently use Lucene for our search index but plan to greatly expand our utilisation of this great library — making search on InterMine more robust, sensitive and powerful.

The Cloud

Some have already deployed their InterMine to the cloud. We intend to make this process much easier, probably by creating a custom InterMine buildpack which pre-configures a Docker container with all of InterMine’s dependencies.

New Data Sources

We are always adding new data sources and would like to hear your suggestions. On our list right now is:

And of course we will continue to update our current data source library as file formats and data change.

New User Interface

We’ve developed a new user interface which should be ready for beta testing in early 2016. It’ll exist alongside the current interface for some time, allowing you to feed back ideas, suggestions, and critiques in the new interface, whilst still being able to rely on the old one.

Here’s a sneak preview (subject to plenty of change, of course!):

Sneak preview: Homepage for the (work-in-progress) Intermine 2.0 UI.
Sneak preview: Homepage for the (work-in-progress) Intermine 2.0 UI.

New Tools

To go along with our new interface, we’re going to be adding a lot of new tools for you to use. Our wish list so far (not in order of priority):

  1. Advanced Search / Query builder / Guided search
  2. Recommendation engine (which gene is like my gene?)
  3. Complex Interaction viewer
  4. more powerful region search
  5. phenotype viewer
  6. InterMine search tool
  7. R plug-in
  8. Text mining tool
  9. JBrowse / other genome browsers
  10. UniProt protein browser

We’d like to hear which tools are important to you. We also will improve the tools we currently have, making them easier to adapt to your data sets.

2017 and beyond

Genomes are being sequenced every day, technology is moving at an ever more rapid pace and everyone is facing a challenging funding environment. We don’t know quite what the world will look like in the next five years but we are working hard to be future proof. We’ve always had a deep commitment to openness, flexibility and collaboration, and feel that this will help us meet any future challenges.

Towards this end, we are running a pilot program to test out various graph databases and to explore the semantic web. We will keep you posted on our progress as always, and would like to hear your thoughts.

Thanks to our great community for all of their support over the years! We look forward to a really exciting year!