The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.
This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:
an ELIXIR identity
register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.
GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.
InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community. To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.
So, the first thing that we did was packaging InterMine into Docker containers.
Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.
But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.
So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.
Usage instructions for these Docker containers are documented here.
After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.
To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.
Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.
Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.
Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.
Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.
InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.
The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.
InterMine Cloud has four main components:
Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.
InterMine Configurator and Wizard
My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.
The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.
InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.
Our Google Summer of Code students presented their work at a special edition of the community call yesterday. You can catch up on the entire recording on YouTube – or scroll down to see individual presentations. The agenda and notes accompanying the call (including code and slides links) is in Google Docs.
We are excited to publish the latest version of InterMine, version 4.0.
It’s a collection of our efforts to make InterMine more “FAIR“. As an open source data warehouse, InterMine’s raison d’être is to be a framework that enables people to quickly and easily provide public access to their data in a user friendly manner. Therefore InterMine has always strived to make data Findable, Accessible, Interoperable and Reusable and this push is meant to formally apply the FAIR principles to InterMine.
What’s included in this release?
Generate globally unique and stable URLs to identify InterMine data objects in order to provide more findable and accessible data.
Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability
Embed metadata in InterMine web pages to make data more findable
Improve accessibility of data licenses for integrated sources via web interface and REST web-service.
More details below!
How to upgrade?
This is a non-disruptive release, but there are additions to the data model. Therefore, you’ll want to increment your version, then build a new database when upgrading. No other action is required.
However, keep reading for how to take advantages of the new FAIR features in this release.
Currently this is used only in the “share” button in the report pages and in the web pages markup. In the future, this will be the only URL seen in the browser location bar.
For details on how to configure your mine’s URLs, see the docs here.
See our previous blog posts on unique identifiers.
Decorating the InterMine data model with ontology terms
InterMine 4.0 introduces the ability to annotate your InterMine data model with ontology terms.
While these data are not used (yet), it’s an important feature in that it’s going to facilitate cross-InterMine querying, and eventually cross-database analysis — allowing us to answer questions like “Is the ‘gene’ in MouseMine the same ‘gene’ at the EBI?”.
For details on how to add ontologies to your InterMine data model, see the docs here.
Embedding metadata in InterMine webpages
We’ve added structured data to web pages in format of JSON-LD to make data more findable, and these data are indexed by Google data search. Bioschemas.org is extending Schema.org with life science-specific types, adding required properties and cardinality on the types. For more details see the docs here.
By default this feature is disabled. For details on how to enable embedding metadata in your webpages, see the docs here.
In our ongoing effort to make the InterMine system more FAIR, we have started working on improving the accessibility of data licences, retaining licence information supplied by the data sources integrated in InterMine, and making it available to humans via our web application and machines via queries.
For details on how to add data licences to your InterMine, see the docs.
Future FAIR plans
Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore
Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets
The next InterMine version will likely be ready in the Fall/Winter and include some user interface updates.
In this blog post I would like to give a brief overview of what I’m currently working on.
Knowledge Transfer Partnership: what & why?
First, in order to give context to this post, last year InterMine at University of Cambridge and STORM Therapeutics, a spin-out of University of Cambridge working on small modulating RNA enzymes for the treatment of cancer, were awarded a Knowledge Transfer Partnership (KTP) from the UK Government (read this post for more information). With this award, the objective is to help STORM Therapeutics advance their efforts in cancer research, and contribute to their ultimate goal of drug target validation.
As part of the KTP Award, a KTP Associate needs to be appointed by both the knowledge base (University of Cambridge) and the company (STORM). The role of the KTP Associate is to act as the KTP Project Manager and is in charge of the successful delivery of the project. For this project, I was appointed as the KTP Associate, with a Research Software Engineer / Research Associate role at the University of Cambridge, for the total duration of the project: 3 years.
Machine learning and a new mine: StormMine
Now that you know what the KTP project is about, and who is delivering it, let’s move on to more interesting matters. In order to successfully delivering this project, the idea is to use the InterMine data warehouse to build a knowledge base for the company, STORM, that enables their scientist to have all the relevant data for their research in a single, integrated, place. For this reason, several new data sources will be integrated into a STORM’s deployment of the InterMine data warehouse (StormMine, from now on), and appropiate data visualizations will be added.
Then, once the data is integrated, we can think towards analysing the data to gather insights that may help the company goals, such as applying statistical and Machine Learning methods to gather information from the data, as well as building computational intelligence models. This leads the way towards what I’ve been working on since my start in February, and will continue until July 2019.
In general terms, I’m currently focused on building Machine Learning models that are able to learn how to differentiate between known drug targets and non-targets from available biological data. This part of work is going to be used as my Master’s Thesis, which I hopefully will deliver in July! Moreover, with this analysis, we will be able to answer three extremely relevant questions for STORM, and which are the questions leading the current work on the project. These questions are
Which are the most promising target genes for a cancer type?
Which features are most informative in predicting novel targets?
Given a gene, for which cancer types is it most relevant?
If you are interested in learning more about this work, stay tuned for next posts, and don’t hesitate contacting me, either by email (firstname.lastname@example.org) or connect with me in LinkedIn (click here)!
After the fabulous experience we’ve had with GSoC in 2017 and 2018, we’re delighted to announce that we’ll be mentoring again this year. It’s almost impossible to describe the breadth of experience, quality, and insight students bring us every year and we’re so excited to meet a whole new batch of students again in 2019.
If you’re a student interested in working with us, your first port of call is our GSoC site. Most of our students hang out at chat.intermine.org too.
We have a Q&A webinar coming up on March 12, 2019 at 3PM UK time (when is it in your timezone?) where we’ll share tips for good applications, GSoC alumni from previous years will share their experiences, and we’ll briefly describe all of the project ideas and answer any questions. If you can’t make it, add your questions to the agenda before the call and we’ll answer them during the call anyway! Here’s the agenda and joining instructions.
Interested in mentoring?
Generally we expect mentors to come from our community – InterMine users, developers, or previous students. If you fit into one of those categories and want to help mentor, email email@example.com. Not sure if you’d be a good fit? We’re still happy to discuss any ideas!