The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.
This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:
an ELIXIR identity
register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.
GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.
InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community. To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.
So, the first thing that we did was packaging InterMine into Docker containers.
Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.
But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.
So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.
Usage instructions for these Docker containers are documented here.
After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.
To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.
Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.
Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.
Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.
Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.
InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.
The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.
InterMine Cloud has four main components:
Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.
InterMine Configurator and Wizard
My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.
The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.
InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.
Our Google Summer of Code students presented their work at a special edition of the community call yesterday. You can catch up on the entire recording on YouTube – or scroll down to see individual presentations. The agenda and notes accompanying the call (including code and slides links) is in Google Docs.
After weeks and weeks of fabulous work, our six Google Summer of Code projects are approaching the finish line. As in previous years (2018, 2017), our students will be sharing their work in a series of 5-minutes presentations at an InterMine Community Call. Everyone from the InterMine community is encouraged to come and see what our fantastic students have been up to.
Joining the call
The call will be on the 14th of August 2019. (Note we previously advertised the call as being on the 15th; this was an error – the call is definitely on Wednesday the 14th of August).
It’s been a while since we posted our last (rather optimistic) update around BlueGenes, so we thought we’d share a quick update, starting with the basics.
As a reminder, the long-term goal of BlueGenes is to replace the existing JSP-based UI with a more modern interface – one that works well with mobiles, one that hopefully responds more quickly and is easier to use, and perhaps most importantly, is easy to update and customise.
Some of the questions we’ve had in the last few months:
Q: Will BlueGenes replace the current JSP UI?
A: Yes, eventually. Once we reach official beta/prod release (we’re currently in alpha), we anticipate running them concurrently for a couple of years, but we probably will only provide small fixes for the JSP UI during this period, focusing most of our development effort on BlueGenes.
Q: Do I have to run my own BlueGenes, or can I use the central one at apps.intermine.org?
A: Since BlueGenes is powered purely by web services, it will probably be possible to run your InterMine as a server/api-only service and use BlueGenes at bluegenes.apps.intermine.org/. You can also run your own BlueGenes on your servers and domains, allowing you to customise it so it’s suitable for your data, and not having to rely on our uptime. Either (or both) should work fine. There will be some version requirements related to what version of InterMine can access all the features of BlueGenes – see the next point.
Q: What version of InterMine do I need to have to run BlueGenes?
A: BlueGenes will require a minimum version of InterMine to run. The original release of InterMine web services focused primarily on providing a way to give JSP users access to their data programmatically, but at the time there wasn’t an anticipated need for application level services such as superuser actions. There are a few web services and authentication-layer services we still need to implement, so it’s likely BlueGenes will need API version 31+ or higher in order to be fully-featured. InterMines with API version 27 or higher can run a basic version of BlueGenes. You can check out this table to see if your InterMine is configured to work with BlueGenes.
Q: Ok, so what’s left to do before BlueGenes is released as a public beta?
A: Mostly authentication, superuser and MyMine features – things like saving and updating personal templates, sorting lists in folders, updating preferences and passwords. Some of these features require updates to InterMine itself in order to work – hence the minimum version noted in the previous question. Once these are ready we’ll move to the public beta stage.
Your input here will be incredibly welcome, too – the more feedback we get early on, the more polished we hope BlueGenes can be.
Q: Will BlueGenes work nicely with HTTPS InterMines?
A: You will be able to run BlueGenes without HTTPS, but in order to avoid inadvertently exposing user passwords, the login button will only be available over HTTPS connections. We’re also working with a student over the next few months, to implement a pilot InterMine Single Sign On service. You can read about it in our interview with Rahul Yadav.
Q: Will I be able to customise the way BlueGenes looks?
A: Totally! There are two ways you can do this. One is to make sure you have your logo and colour settings configured in your web properties. We have a nice guide for that. This’ll tell us what your preferred highlight colours are – FlyMine is purple, HumanMine green, etc. If you’re really dedicated and would like to write your own CSS, you can do that too, if you’re running your own InterMine/BlueGenes combo.
Q: I have some nice custom visualisation tools in my InterMine. I don’t want to have to re-write them!
Update: If you missed the call, the recording is available on YouTube
InterMine runs quarterly Community Outreach calls, targeted to interest people in the overlap between life sciences, open source, and open science, with a few InterMine specific updates sprinkled in as well. Most times we host one or two guest speakers who work in scientific outreach or have done an interesting InterMine-related project we’d like to spotlight. We aim to welcome anyone interested in community outreach and generally try to avoid overly techie themes.
The next outreach call coming up on June 6th, 5PM UK time, with two exciting guest speakers, plus we’ll briefly mention the six Google Summer of Code students working with us over the next few months.
Please help to spread the word by sharing with colleagues and friends – the more the merrier!
Malvika Sharan is a computational biologist and a community outreach coordinator for EMBL Bio-IT, which fosters a community of bioinformaticians. She will be talking about inconclusiveness in Open Science Communities using examples from her work at EMBL and her involvement in The Carpentries, SSI, and Mozilla.
Emmy Tsang: I’m the new Innovation Community Manager at eLife and am now running the eLife Innovation Initiative. I’m going to talk about the #eLifeSprint– our effort to drive and support collaborations in developing open-source software for open science, and the latest developments of Reproducible Document Stack project, which include a roadmap towards sharing reproducible research.
This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Akshat Bhargava, who will be creating data visualisations for BlueGenes.
Hi Akshat! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?
Apart from coding, I love reading about psychology, history and watching horror movies.
What interested you about GSoC with InterMine?
I feel it magical how numbers show their true faces when seen via a meaningful visualization, and this is why I’m most excited for this summer with InterMine.
Real World Bio Data + Data Viz = Something big coming in! ❤
Another reason for my interest in InterMine, is that I applied to InterMine last year too for Cross InterMine Search Tool and couldn’t make it, but understood it’s community and how they work. The mentors are very helpful and supportive to everyone, so I directly jumped here this year. 😀
Tell us about the project you’re planning to do for InterMine this summer.
InterMine has tons of different types of biological data, this summer I’ll mostly be working on discussing and developing visualizations for data, making it easier to biologists to understand it in a easier way, and draw relevant conclusions with a single sight to the graphs.
As an example of how useful is what I’m doing, one of the visualizations I’ll be developing will help us understand how the expression of a particular gene is distributed among different tissues. This information is helpful for cancer biologists that want to assess if a gene is highly expressed across different tissues of an organism, because that gives a relative picture on to what degree it’s implicated in diseases.
Are there any challenges you anticipate for your project? How do you plan to overcome them?
Data visualization is something that requires you to understand the data properly first in order to be able to actually create some meaningful visualization out of it, and since I’m not very familiar with InterMine’s data model and the related biological terms, I’ll face some difficulties during my thought process of “what and why” to visualize. To overcome this, I’m already exploring more and more of the InterMine’s data model, trying to understand how to deal with different types of data, and how to create the appropriate visualization for them. Mentors are really helping me out with this (overall in terms of tech, viz and everything). 🙂