InterMineR package

InterMine data can be accessed via command line programs like cURL and client libraries for five programming languages (Java, JavaScript, Perl, Python and Ruby.) Aiming to expand the functionality of InterMine framework, an R package, InterMineR, had been started that provided basic access to InterMine instances through the R programming environment. (You could run template queries, but not much else!)

However, in order to fully utilize the statistical and graphical capabilities of the R language and make the InterMine framework available to an even greater number of life scientists, the goals were set to:

  1. Further develop and publish the InterMineR package to Bioconductor, a widely used, open source software project based in R, which aims to facilitate the integrative analysis of biological data derived from high-throughput assays.
  2. Add visualisation capabilities, e.g. “What features are close to my feature of interest?”
  3. Add enrichment analysis in InterMineR, a feature that will provide R users with access to the InterMine enrichment analysis widgets and can be effectively combined with the graphical capabilities of R libraries.

InterMineR performs a call to the InterMine Registry to retrieve up-to-date information about the available Mines. The information retrieved are then used to connect the Mines with the R environment using the InterMine web services.

Queries

The InterMineR package can be used to perform complicated queries on a Mine. The process is facilitated by the retrieval of the data model and the ready-to-use template queries of the respective Mine. The R functions setConstraints and setQuery have been created along with the formal class InterMineR, to create new or modify existing queries, store them as Intermine-class objects and apply them to the Mine with the runQuery method.

Genomic Coordinates

r_gviz

Figure 1: Gene visualisation done via InterMineR AND GVIZ

InterMineR can retrieve genomic coordinates and gene expression analysis data which can be converted to:

with the R functions convertToGRanges and convertToRangedSummarizedExperiment respectively. This way an interaction layer between InterMineR and other Bioconductor packages (e.g. GenomicRanges and SummarizedExperiment) is established, allowing for rapid analysis of the retrieved InterMine data.

Enrichment + GeneAnswers

InterMineR also retrieves InterMine enrichment widgets and facilitates the enrichment analysis on an InterMine instance using the R functions getWidgets and doEnrichment, respectively. With the usage of the R function convertToGeneAnswers the results of the enrichment analysis are converted to a GeneAnswers-class object, therefore allowing the visualization of:

  • Pie charts
  • Bar plots
  • Concept-gene networks
  • Annotation category (e.g. GO terms, KEGG pathways) – interaction networks
  • Gene interaction networks

by using R functions from the GeneAnswers R package.

geneanswers_go_structure_network

Figure 2: GeneAnswers GO structure network, generated via InterMineR

geneanswers_concept_gene_network_colors

Figure 3: GeneAnswers gene network generated using InterMineR

Final steps: Bioconductor & Vignettes

The updated InterMineR package complies to the instructions for submitting new packages to Bioconductor, has passed all automated checks (R CMD build, check and BiocCheck) and is currently under the process of manual review for Bioconductor submission.

Documentation of each function along with examples of its usage are available in the GitHub repo and as help files upon the installation of the package. Furthermore, a detailed vignette and tutorials concerning the new functionality of InterMineR package are currently available at the intermine/InterMineR/vignettes folder of the GitHub dev branch, and will be shortly available on the GitHub master branch as well.

This project is part of Google Summer of Code, still under development by me, Konstantinos Kyritsis, PhD student at the Aristotle University of Thessaloniki, under the mentoring of Julie Sullivan and Rachel Lyne. The GitHub repository of the InterMineR package can be found at https://github.com/intermine/InterMineR

Google Analytics in BlueGenes: what should we track?

TL;DR: We’re implementing analytics tracking in BlueGenes. We can probably track anything you like, within reason. Leave a comment or email us if you have anything you’d like to see! Must adhere our privacy policy.

Longer version:

InterMine’s JSP pages (the current, older UI) are set up with a couple of different types of tracking:

  1. Google Analytics, which currently anonymously records things like:
    1. Number of users and their locations
    2. Pages viewed
    3. With a bit of effort you can figure out what items were searched for by analysing query strings.
  2. InterMine home-brew internal analytics (to view in your own mine, log in as the super user and select the “usage” tab.) It tracks:
    1. Logins (anonymously)
    2. Keyword search terms
    3. Popular templates
    4. Count of custom queries executed
    5. List views by InterMine object type (but not list contents)
    6. Count of lists created, by type

So we have a couple of questions we’d love some feedback on, as we implement Google Analytics in BlueGenes:

  1. Do you use the current analytics? Which, or both?
  2. What would you *like* to record? Here’s a list of ideas

Things that are probably okay to track

  • Pageviews including counts and times – e.g. “17 views for /region-search on Monday the 13th at 10:pm”
  • Logins (anonymously)
  • Visitor location
  • Tools used (e.g. report page tools interacted with)
  • Popular templates
  • Mine used / switched to a different mine

Things we’re not sure about – what do you think?

  • Keyword search contents (anonymously). Pros: interesting analyses like this one. Cons: Could someone avoid InterMine out of fear someone would notice their gene is getting too much attention?
  • List contents (anon, as above).
  • What about mistyped identifier names in list upload?
  • Region search
  • Queries built in the query builder

I’m sure I’ve missed off quite a few things from both lists. We’d love to hear your input and feelings, both with regards to privacy and with ideas about useful trackable events and pages.

 

 

 

 

InterMine Registry

At the beginning of the development of this project, there was no place from where all the up-to-date InterMine instances information like name, url, description, versions, organism, colors, logo, could be retrieved at once. This lead to hard-coded information, and inefficient processes in order to get these data. Motivated by these problems, InterMine Registry idea was conceived. InterMine Registry is a place where all the up-to-date instances information is stored and can be consumed by applications like Blue Genes, iOS, InterMine R, the friendly mine tool or available to everyone who needs it.

The core of InterMine Registry is its RESTful API (http://registry.intermine.org/api-docs/). Running over Node.js integrated with MongoDB, it contains methods (endpoints) to administer the instances on the registry (add, update & delete) and search among them. Maintaining the registry up-to-date is critical. In order to achieve this goal, the Registry provides automatic updates of all the instances every 24 hours. In addition to this, all or one instances can be manually updated by using the API  synchronization methods. It should be noted that in order to administer instances, an authentication process must be done.

To complement the API, a fully responsive front-end web application is being developed (http://registry.intermine.org/), from which everyone can see all the InterMine instances and search among them. Instances are presented in a list and grid view, both of them having the same purpose but with different aspect. Moreover, a world view is presented, from which the users can see the InterMine instances location on a world map. In addition to this, authenticated users can administer the instances (add, update & delete) with a nice user interface.

This project is part of Google Summer of Code, still under development by me, Leonardo Kuffó, undergraduate student at ESPOL university (Guayaquil, Ecuador), under the mentoring of Daniela Butano. The source code of the application can be found at https://github.com/intermine/intermine-registry

 

California Dreaming: InterMine Dev Conf 2017 Report – Day 1

2017’s developer conference has been and gone; time to pay my dues in a blog post or two.

Day 0: Welcome dinner, 29 March 2017

The Cambridge InterMine arrived at Walnut Creek without a hitch, and after a jetlagged attempt at a night’s sleep we sat down to a mega-grant-writing session in the hotel lobby, fuelled by several pots of coffee and plates of nachos.

By 7PM, people had begun to gather in the lobby to head to the inaugural conference dinner at the delicious Walnut Creek Yacht Club. We had to change the venue quite late on in the game, meaning we decided to wander down the street to collect some of the InterMiners who had ended up at the original venue (sorry!!). By the end of the meal, most of the UK contingent was dead on their feet – 10pm California time worked out to be 6am according to our body clocks, so when Joe offered to give several of us a lift back to the hotel, it was impossible to decline.

20170329_221945

Day 1: Workshop Intro

The day started with intros from our PI, Gos, and our host, David Goodstein. 

Josh and I followed up by introducing BlueGenes, the UI we’ve been working on to replace InterMine’s older JSP-based UI. You can view Josh’s slide deck , try out a live demoor browse / check out the source on GitHub.

Next came one of my favourite parts: short talks from InterMiners.

Short community talks

Doppelgangers – Joel Richardson, MGI

Joel gave a great presentation about Doppelgangers in InterMine – that is, occasionally, depending on your data sets and config, you can end up with duplicate or strange / incomplete InterMine objects in your mine. He follows up with explanations of the root causes and mitigation methods – a great resource for any InterMiner who is working in data source integration! 

Genetic data in Mines – Sam Hokin, NCGR/LegFed

Next up was Sam’s talk about his various beany mines, including CowpeaMine, which has only genetics data, rather than the more typical InterMine genomic data. He’s also implemented several custom data visualisations on gene report pages – check out the slides or mines for more details.

JBrowse and Inter-mine communication – Vivek Krishnakumar, JCVI

Vivek focused on some great cross-InterMine collaborations (slides here), including the technical challenges integrating JBrowse into InterMine, as well as a method to link to other InterMines using synteny rather than InterMine’s typical homology approach.

InterMine at JGI – Joe Carlson, Phytozome, JGI

Joe has the privilege to run the biggest InterMine, covering (currently) 72 data sets on 69 organisms. Compared to most InterMines, this is massive! Unsurprisingly, this scale comes with a few hitches many of the other mines don’t encounter. Joe’s slides give a great overview of the problems you might encounter in a large-scale InterMine and their solutions.

Afternoon sessions

FAIR and the semantic web – Daniela & Justin

After a yummy lunch at a nearby cafe, Justin introduced the concept of FAIR, and discussed InterMine’s plans for a FAIRer future (slides). Discussion topics included:

  • How to make stable URIs (InterMine object IDs are transient and will change between builds)
  • Enhanced embedded metadata in webpages and query results (data provenance, licencing)
  • Better Findablility (the F in FAIR) by registering InterMine resources with external registries
  • RDF generation / SPARQL querying

This was followed up by Daniela’s introduction to RDF and SPARQL, which provided a great basic intro to the two concepts in an easily-understood manner. I really loved these slides, and I reckon they’d be a good introduction for anyone interested in learning more about what RDF and SPARQL are, whether or not you’re interested in InterMine .

Extending the InterMine Core Data Model – Sergio

Sergio ran the final session, “Extending the InterMine Core Data Model“. Shared models allow for easier cross-InterMine queries, as demoed in the GO tool prototype:

This discussion raised several interesting talking points:

  • Should model extensions be created via community RFC?
  • If so, who is involved? Developers, community members, curators, other?
  • Homologue or homolog? Who knew a simple “ue” could cause incompatibility problems? Most InterMine use the “ue” variation, with the exception of PhytoMine. An answer to this problem was presented in the “friendly mine” section of Vivek’s talk earlier in the day.

Another great output was Siddartha Basu’s gist on setting up InterMine – outlining some pain points and noting the good bits.

Most of us met up for dinner afterwards at Kevin’s Noodle House – highly recommended for meat eaters, less so for veggies.

A flurry of deadlines: Grants, GSoC, workshops, and more…

We blogged in February commenting that we had a lot of events over the March / April period. Here’s a re-cap:

  • Attending conferences: Amongst the team we attended Bioschemas, the Elixir all-hands, and the Cambridge Scientific Computation Day.
  • InterMine training: We delivered a training workshop about using InterMine at the EBI, part of their Introduction to Omics data integration week-long course.
    • This went well despite a server-room meltdown which conveniently timed itself for the morning of the same day (the training session was in the afternoon, so we thankfully had time to get the servers back up!).
    • In contrast to previous years, every single hand went up when we asked if the participants wrote code as part of their job. Next time, we will try to allow for a longer session on using InterMine web services, rather than the 15 minute slot we allocated this time!
  • Developer Workshop and Hackathon: 5 days in sunny California, spending time with InterMiners from around the world. Longer blog posts to follow, but in the meantime you can browse the agenda for links to slides from each session, or the storify summary of tweets.
  • Google Summer of Code: We’re participating in Google Summer of Code (GSoC) this year (previously) as a mentoring organisation. We had over 50 interested students and 30 distinct applications, many of which were simply brilliant. The deadline for students applying, naturally, was the day after the hackathon, making finding time to provide student feedback a challenge. Maybe there’s a reason to be grateful for jet-lag induced wakefulness at odd hours!
  • Grants: A tale of two grants… :
    • New application: We had a grant application deadline that was, once again, the day after the hackathon. Uh-oh! Feverish figure fixes, tentative typo tweaks and word-count winnowing was squeezed in at every opportunity.
    • Good news about an old application: Meanwhile, we got the news that we’d been fortunate enough to have our hard work pay off: a grant we’d applied for last year as part of the BBSRC BBR 2016 call was agreed to! Hint: the future of InterMine is looking very FAIR, possibly even SPARQLing. More details in a later post.

Events coming up soon:

Goal in sight: better uptime

TL;DR: We’re planning to host our CDN and an InterMine registry on AWS in the future. For more specifics, read on…

Over the course of 2015, we had several note-worthy service outages.

First of all there was a downpour in Cambridge that caused massive flooding, which took out the university’s network connection. The server room actually had to be pumped out. Yikes.

Next, the machine room had a broken chiller – this was so severe that the servers began to heat up almost immediately and we had to go into shutdown straight away whilst repair people worked on it. Services came back up the next morning, but only just in time for our training workshop.

Then came the big nasty DDOS attack against Janet, a higher education network which spans the UK. From the university, we could access some websites, but many more were down. Crucially, no one could reach our servers from the outside, even though this time they were running.

Apart from taking down HumanMine, FlyMine, and our websites, this also took down our CDN, meaning that anyone who hosted their own InterMine but relied upon us for their CDN was also affected by the outages.

Amazon Web Services to the rescue

We’re always working our hardest to make sure things run as smoothly as possible, but sometimes things are beyond our control. This number of outages is obviously less than ideal, and we’d like to do better in 2016.

One solution is to set up your own InterMine CDN, but could we do something to help you on our end, something that doesn’t require extra work for others? After a bit of thinking, we realised we probably could.

So as part of a plan to offer a more reliable service, our CDN will be migrated to live on AWS in the next few months. AWS has a pretty darned good uptime track record. What’s more, Amazon serves a massive chunk of the web – so if AWS ever is down, the likelihood is that much of the web will be down as well, and the outage won’t be surprising to any potential users.

An InterMine registry

Another feature that’s been in planning works for a while is an InterMine registry. Like the CDN, it’s pretty important that a dynamic discoverable registry have near-perfect uptime. So hopefully you can expect to see the hardcoded mine lists disappearing as we start to take advantage of a hosting service with high availability.

Watch this space for the official announcement and release dates for the CDN upgrade and registry release!

A final note about outages

While email was okay when Janet was playing up, the machine room breakdowns took out our mail server, and hence the mailing list – so the “Hey guys, is your server down?” queries didn’t reach us until after everything had recovered.

If you’re aware of a problem, the best place to find out more will be our Twitter account, @intermineorg. We’ll always make an effort to update our twitter feed – by mobile network if necessary – with details of outages, whether planned or emergency.

Featured image courtesy of Torkild Retvedt via Flickr. Licence CC-BY-SA 2.0

BiVi 2015

It’s the first year someone from InterMine has attended BiVi (Biology Visualisation), which is in its second year of a (currently) three-year plan to create a community around biology visualisation.

The atmosphere was pleasantly thriving, with probably 20 or 30 attendees from Cambridge, Edinburgh, London, Oxford, and even Institut Curie in France. I think it was safe to say that most of the attendees were people with a computer science background who worked in biology-related fields, although that wasn’t a strict rule.

Two themes were particularly popular this year:

  1. 3d molecule visualisation. Multiple different talks / groups of BiVians had exciting developments to show us, from Hapitimol, a visualisation with haptic feedback, to EZMol, with an easy-to-use wizard to produce visualisations, Foldsynth, a physics based engine from Goldsmiths, MARender, a Javascript-driven biomedical imaging library, and BioBlox, which gamifies protein docking to allow crowd-sourced research.
  2. Usability. Biology is a rich, complex field. Computer people making tools for biologists need to keeps things easy to use. EZMol, mentioned above, was created due to a notable lack of simple usable 3d visualisation tools. Our upcoming InterMine 2.0 release is another push towards creating a better user experience.

One of the most impressive tools demonstrated was Zegami, a tool to annotate, view, and filter images. It may have started as a biological tool, but it’s equally functional for your holiday snaps or for sorting and filtering Netflix movies. It’s a shame we don’t really have any image data in InterMine (well, in Fly or HumanMine at least) given how cool it looked.

zegami.png
Example from zegami.com: South Australian Museum invertebrates.

A few other tools / demos of note included:

  • Reactome pathway browser, which is fully embeddable into your own web app.
  • Jalview, ‘a free program for multiple sequence alignment editing, visualisation and analysis’. The main Jalview dev is also one of the organisers of the VizBi biology vis conference, next taking place in Germany.
  • Aequatus-vis uses Ensembl web services to visualise homologous gene families.

On day 2, I gave a short talk on the future of InterMine, focusing on why we want to revamp our UI, and what we think we’re doing better in InterMine 2.0. The slides are available on Google Drive (better format) or Slideshare.