InterMine Cloud: Making InterMine cloud-native and easing deployments

GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.

InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community.
To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.

So, the first thing that we did was packaging InterMine into Docker containers.

InterMine on Docker

Repo: https://github.com/intermine/docker-intermine-gradle
Commits: https://github.com/intermine/docker-intermine-gradle/commits?author=leoank


Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.

But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.

So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.

Usage instructions for these Docker containers are documented here.

After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.

InterMine Cloud Infrastructure as Code

Repo: https://github.com/intermine/intermine-cloud
Commits: https://github.com/intermine/intermine-cloud/commits?author=leoank

To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.

Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.

Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.

Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.

Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.

InterMine Cloud

Repos:
Compose: https://github.com/intermine/intermine_compose
Configurator: https://github.com/intermine/intermine_configurator
Wizard: https://github.com/intermine/wizard

Commits:
Compose: https://github.com/intermine/intermine_compose/commits?author=leoank
Configurator: https://github.com/intermine/intermine_configurator/commits?author=leoank
Wizard: https://github.com/intermine/wizard/commits?author=leoank

InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.

InterMine Cloud Workflow

The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.

InterMine Cloud Architecture Overview

InterMine Cloud has four main components:

  • InterMine Compose
  • InterMine Configurator
  • Wizard
  • Kubernetes environment

InterMine Compose

Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.

InterMine Configurator and Wizard

My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.

Kubernetes environment

The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.

Future Work

InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.

Call recording available: GSoC 2019 Final Presentations

Our Google Summer of Code students presented their work at a special edition of the community call yesterday. You can catch up on the entire recording on YouTube – or scroll down to see individual presentations. The agenda and notes accompanying the call (including code and slides links) is in Google Docs.

Prabodh Kotasthane – Spring Migration

Prabodh’s presentations starts at 3:54: https://youtu.be/ZzV6JmVRQmA?t=234

Slides

Ankur Kumar – InterMine Cloud

Ank’s presentation starts at 13:12: https://youtu.be/ZzV6JmVRQmA?t=792

Laksh Singla – Upgrading imjs & im-tables

Laksh’s presentation starts at 21:08: https://youtu.be/ZzV6JmVRQmA?t=1268

Rahul Yadav – Single Sign-In

Rahul’s presentation starts at 27:39 https://youtu.be/ZzV6JmVRQmA?t=1659

Deepak Kumar – InterMine Schema Validator

Deepak’s presentation starts at 24:11 https://youtu.be/ZzV6JmVRQmA?t=2051

Akshat Bhargava – Data Visualisations

Akshat’s presentation starts at 41:30 https://youtu.be/ZzV6JmVRQmA?t=2490

14 August: InterMine Community Call and Google Summer of Code Final Presentations

After weeks and weeks of fabulous work, our six Google Summer of Code projects are approaching the finish line. As in previous years (2018, 2017), our students will be sharing their work in a series of 5-minutes presentations at an InterMine Community Call. Everyone from the InterMine community is encouraged to come and see what our fantastic students have been up to.

Joining the call

The call will be on the 14th of August 2019. (Note we previously advertised the call as being on the 15th; this was an error – the call is definitely on Wednesday the 14th of August).

Time: 17:00 UK time / 21:30 IST / or check your time zone here: https://arewemeetingyet.com/London/2019-08-14/17:00/Final%20presentations

Agenda and joining instructions: https://docs.google.com/document/d/14KAdYACPowLxcIhOe6yVzeYsHMnSy2X0WzuJ124KZ30/edit#heading=h.x7mc3otkj1bu

Here’s a sneak preview of what our students have been working on:

GSoC Interview: Akshat Bhargava on new data visualisations for BlueGenes

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Akshat Bhargava, who will be creating data visualisations for BlueGenes.

Hi Akshat! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

akshat1Hi InterMine team, I’m very excited too for this upcoming summer! I’m a Computer Science undergraduate going to start my 3rd year this August. I’m primarily a Javascript Developer (Web & Hybrid Mobile) and have been working with it for the last 2.5+ years, but the real me is a person who loves to solve problems in general, may they be related to programming or not. I’ve been exploring the field of data visualization for the last few months and I am in love with it. Have a look at IPL (cricket) data viz I created a few months back here.

Apart from coding, I love reading about psychology, history and watching horror movies.

What interested you about GSoC with InterMine?

I feel it magical how numbers show their true faces when seen via a meaningful visualization, and this is why I’m most excited for this summer with InterMine.

Real World Bio Data + Data Viz = Something big coming in! ❤

Another reason for my interest in InterMine, is that I applied to InterMine last year too for Cross InterMine Search Tool and couldn’t make it, but understood it’s community and how they work. The mentors are very helpful and supportive to everyone, so I directly jumped here this year. 😀

Tell us about the project you’re planning to do for InterMine this summer.

InterMine has tons of different types of biological data, this summer I’ll mostly be working on discussing and developing visualizations for data, making it easier to biologists to understand it in a easier way, and draw relevant conclusions with a single sight to the graphs.

There is a software called BlueGenes, which is already developed and helps explore different mines, it provides a tool API which allows Javascript developers to create additional visualization tools on top of it, which can be integrated on any Gene or Protein result page. My goal for this summer is to create a different variety of such visualization tools in order to enrich the visualization of different types of data.

As an example of how useful is what I’m doing, one of the visualizations I’ll be developing will help us understand how the expression of a particular gene is distributed among different tissues. This information is helpful for cancer biologists that want to assess if a gene is highly expressed across different tissues of an organism, because that gives a relative picture on to what degree it’s implicated in diseases.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Data visualization is something that requires you to understand the data properly first in order to be able to actually create some meaningful visualization out of it, and since I’m not very familiar with InterMine’s data model and the related biological terms, I’ll face some difficulties during my thought process of “what and why” to visualize. To overcome this, I’m already exploring more and more of the InterMine’s data model, trying to understand how to deal with different types of data, and how to create the appropriate visualization for them. Mentors are really helping me out with this (overall in terms of tech, viz and everything). 🙂

Share a meme or gif that represents your project

akshatmeme1

GSoC Interview: Migrating from Struts to Spring with Prabodh Kotasthane

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Prabodh Kotasthane, who will be working on a project to migrate InterMine’s RESTful web services from Struts to Spring.

Hi Prabodh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

I am a final year Computer Science Engineering (B.Tech.) student from Birla Institute of Technology, Mesra, India. I have been coding in  JAVA and related frameworks, C/C++ and python since last 4 years. I have contributed to Open Source Community, some software development projects and hackathon projects at college level. Prabodh
I successfully completed GSoC 2018 with OpenMRS. My project was OAuth Module Enhancement and SMART Apps Support. Details about the project can be found here:

https://pkatgithub.github.io/GSoC-2018-Final-Evaluations/

Presently I am doing internship under Microland Limited, Bengaluru, India till end of May 2019. Here I am working around graph databases and technologies like Apache Kafka and Neo4j with all the coding part done in python.
Apart from coding, I have many other interests and hobbies which include singing, cooking, photography, fine arts, basketball and writing.

What interested you about GSoC with InterMine?

It was around mid Jan this year when I got to know about InterMine.
Previously, I have worked with Java Spring Framework and hence I am comfortable with the same. So, I was searching for GSoC organisations which have something to do with Spring and then I got to know about InterMine and their project in which they were planning to migrate the web-services from Struts to Spring.
I read more about InterMine and also about the project, and I found it interesting. I went through the documentation of the project and joined the discord handle of InterMine so that I could connect with the mentors and the community.
I had a warm welcome into the community. Julie, Daniela and Yo are always excited to chat and exchange thoughts and it’s been such a good time with them till now.
All in all, this community, this project and the people associated to this project made me believe that I can do a GSoC with InteMine!

Tell us about the project you’re planning to do for InterMine this summer.

Presently InterMine uses Struts framework which is outdated. InterMine provides RESTful web-services which facilitates to execute custom or templated queries, search keywords, manage lists, discover metadata, perform enrichment statistics and manage user profiles.
The main objective of this project is to migrate the web-services from Struts to Spring framework and document the APIs with Swagger in compliance with OpenAPI Specifications.
Spring framework is evolving all the time and is more robust and flexible as compared to the Struts framework.
OpenAPI specifications are easy to write and Swagger Codegen, which supports Spring, makes the job of developer easy by generating the code stubs which can be modified to render the services.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

InterMine has a lot of web-services, a total of 70, with various different functionalities.
The business logic of web-services is strongly dependent upon the
classes in webcore and in order to migrate a web-service, the knowledge of underlying logic layer is a must.This is going to a real challenge. It is a requirement to give proper time and understand this business logic layer of the project.
Apart from this, writing tests is also a time taking job. I wish I could get some help in that! 😛

Share a meme or gif that represents your project

PrabodhMeme

GSoC Interview: InterMine Schema Validator with Deepak Kumar

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Deepak Kumar, who will be working on the InterMine Schema Validator.

Hi Deepak! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi, Thank you for this opportunity, Let me first talk about myself, My name is Deepak Kumar, I live in Ahmedabad, India with my family. I started coding when I was in 17, I had two great teachers in my school days who introduced me to computer programming, and from that time I got interested in this field.

I completed my graduation in Computer Applications from St. Xavier’s College, Ahmedabad and currently I’m doing  Post Graduate Program MSC.IT(Information Technology) at DA-IICT, Gandhinagar, India.

Now talking about my technical details, I love working on challenging projects, I’ve worked on several projects, One of my favourite project that I created while pursuing my bachelors was ‘Smallscript’, It’s a compiled programming language that compiles to bytecode and runs on JVM that makes it platform-independent. It’s my favourite project because It was challenging and when I started with the project I didn’t know any technical detail about compilers, so I had to start from very scratch.

I’ve also worked with a startup company, where I worked as a backend-developer with a team of 8 people and our team was really fantastic, I worked on two projects there, and I really enjoyed it, working with a big team wonderful experience.

I’ve recently started my open source journey with GSoC 2019. Though I’m new to open source, I’ve started contributing to ‘JabRef’ and as I’m selected for GSoC 2019, I’m also going to work with Intermine this summer, and have future plan to contribute to Intermine after completion of GSOC. I also regularly participate in coding contests and hackathon, In one of the AI contest, I built an AI game that ranked 68 among thousands of participants.

Currently, I’m working at OpenXcell Technolabs as an Intern, which is part of my MSC.IT Master’s program. I love reading, travelling, table-tennis and working with new technologies.

What interested you about GSoC with InterMine?

When GSoC 2019 was about to start, I had already bookmarked a few of the previous year organizations I was interested in, and hoping that Intermine will be part of GSoC 2019 too. When the organization list came out, I was super excited to see Intermine in the list. After going through the Intemine’s idea list, I found myself very interested in ‘Intemine Schema Validator Project’, So it was really the Intermine’s project that made me interested in the community.

Tell us about the project you’re planning to do for InterMine this summer.

I’ll be working on a project named ‘Schema Validator’ for Intermine this summer. Well, the project is quite simple to explain, it’s going to be a library that takes a file as input and outputs whether that file is following a particular schema or not. While working on the project my goal from the first day would be to create this project as general as possible, so that the project can be easily extended to support other schemas as well.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Yes, there are few challenges that I will face while working on this project, One of the biggest challenges which I’m currently trying to solve is about performance. As the purpose of this project is to validate schema files, then the problem is how will I handle larger files that are filled with the content of like 10GB or more. I need to discuss this problem with my mentors that what is their expectation about the performance of the library.

Currently, I’m thinking about the solution to this problem. Maybe I can boost the performance by concurrently running multiple instances of a Schema Validator, Although it doesn’t matter how I implement it If the library is validating a 10GB file that it is definitely going to take a little amount of time.

Then there are also a few challenges regarding the implementation of the schema rules.

GSoC Interview: Laksh Singla on imjs and imtables upgrades

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Laksh Singla, who will be working on upgrading imjs and imtables.

Hi Laksh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

pic

Hi, I am excited to be a part of the team too!! I am a Computer Science undergraduate student studying at BITS Pilani, India. I will be entering my third year in August. I was originally passionate about web development, but after entering my sophomore year, I was exposed to a wide variety of fields in computer science, and hence my current interest is primarily divided between exploring new web technologies, understanding internals of computer systems and a little bit of data science (read above as I am confused :/ ).

I listen to rock music a lot and my favorite band at the moment (and maybe forever) is Queen. I used to play Basketball too but left it soon after entering college. I am constantly looking to diversify my interests.

What interested you about GSoC with InterMine?

After getting to know about open source, I was determined to actively take part in GSoC. One of the primary reasons why I was interested in InterMine was the friendly and helpful community of mentors and volunteers who enthusiastically answered all my doubts. Moreover, bioinformatics is a field that I have never explored and I thought it would be fun to gain some insight into it without getting much out of my comfort zone.

Tell us about the project you’re planning to do for InterMine this summer.

My project over this summer has multifold tasks, all towards a single goal – maintenance of the im-tables and imjs libraries. Following are the major tasks which I plan to complete over the summers:

  • Upgrade current dependencies of the libraries
  • Improving the test suite of imjs libraries
  • Updating current docs to be more newcomer friendly (user side for imjs, developer side for im-tables
  • Adding a few helper functions to query the intermine-registry data

Are there any challenges you anticipate for your project? How do you plan to overcome them?

One of the serious challenges that I will face would be fully upgrading dependencies of both of the libraries, as it has been a pretty long time since they were last updated and the Javascript/Web ecosystem moves fairly quickly. Mocha (for imjs) and CoffeeScript (for im-tables) on being upgraded broke the library. Although the errors encountered during upgrading Mocha were decent in number (approximately 200 total errors, 5-6 distinct errors), I was able to debug some of them down giving me a little bit of confidence that they could be overcome.

For CoffeeScript however, the whole grunt system has gone obsolete and the error messages are esoteric and non-informative. I am not certain that all of the dependencies for im-tables would be able to get updated, and might require a rehaul of the library, something that is not possible during the timeline stipulated by GSoC. If such a case occurs, I will make sure to create a doc highlighting issues faced, long term goals regarding those pending upgrades and hopefully vulnerabilities present in the old (i.e. currently used) versions of those libraries.

Share a meme or gif that represents your project

unnamed