InterMine Cloud: Making InterMine cloud-native and easing deployments

GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.

InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community.
To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.

So, the first thing that we did was packaging InterMine into Docker containers.

InterMine on Docker

Repo: https://github.com/intermine/docker-intermine-gradle
Commits: https://github.com/intermine/docker-intermine-gradle/commits?author=leoank


Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.

But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.

So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.

Usage instructions for these Docker containers are documented here.

After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.

InterMine Cloud Infrastructure as Code

Repo: https://github.com/intermine/intermine-cloud
Commits: https://github.com/intermine/intermine-cloud/commits?author=leoank

To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.

Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.

Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.

Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.

Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.

InterMine Cloud

Repos:
Compose: https://github.com/intermine/intermine_compose
Configurator: https://github.com/intermine/intermine_configurator
Wizard: https://github.com/intermine/wizard

Commits:
Compose: https://github.com/intermine/intermine_compose/commits?author=leoank
Configurator: https://github.com/intermine/intermine_configurator/commits?author=leoank
Wizard: https://github.com/intermine/wizard/commits?author=leoank

InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.

InterMine Cloud Workflow

The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.

InterMine Cloud Architecture Overview

InterMine Cloud has four main components:

  • InterMine Compose
  • InterMine Configurator
  • Wizard
  • Kubernetes environment

InterMine Compose

Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.

InterMine Configurator and Wizard

My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.

Kubernetes environment

The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.

Future Work

InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.

Call recording available: GSoC 2019 Final Presentations

Our Google Summer of Code students presented their work at a special edition of the community call yesterday. You can catch up on the entire recording on YouTube – or scroll down to see individual presentations. The agenda and notes accompanying the call (including code and slides links) is in Google Docs.

Prabodh Kotasthane – Spring Migration

Prabodh’s presentations starts at 3:54: https://youtu.be/ZzV6JmVRQmA?t=234

Slides

Ankur Kumar – InterMine Cloud

Ank’s presentation starts at 13:12: https://youtu.be/ZzV6JmVRQmA?t=792

Laksh Singla – Upgrading imjs & im-tables

Laksh’s presentation starts at 21:08: https://youtu.be/ZzV6JmVRQmA?t=1268

Rahul Yadav – Single Sign-In

Rahul’s presentation starts at 27:39 https://youtu.be/ZzV6JmVRQmA?t=1659

Deepak Kumar – InterMine Schema Validator

Deepak’s presentation starts at 24:11 https://youtu.be/ZzV6JmVRQmA?t=2051

Akshat Bhargava – Data Visualisations

Akshat’s presentation starts at 41:30 https://youtu.be/ZzV6JmVRQmA?t=2490

14 August: InterMine Community Call and Google Summer of Code Final Presentations

After weeks and weeks of fabulous work, our six Google Summer of Code projects are approaching the finish line. As in previous years (2018, 2017), our students will be sharing their work in a series of 5-minutes presentations at an InterMine Community Call. Everyone from the InterMine community is encouraged to come and see what our fantastic students have been up to.

Joining the call

The call will be on the 14th of August 2019. (Note we previously advertised the call as being on the 15th; this was an error – the call is definitely on Wednesday the 14th of August).

Time: 17:00 UK time / 21:30 IST / or check your time zone here: https://arewemeetingyet.com/London/2019-08-14/17:00/Final%20presentations

Agenda and joining instructions: https://docs.google.com/document/d/14KAdYACPowLxcIhOe6yVzeYsHMnSy2X0WzuJ124KZ30/edit#heading=h.x7mc3otkj1bu

Here’s a sneak preview of what our students have been working on:

GSoC Interview: Akshat Bhargava on new data visualisations for BlueGenes

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Akshat Bhargava, who will be creating data visualisations for BlueGenes.

Hi Akshat! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

akshat1Hi InterMine team, I’m very excited too for this upcoming summer! I’m a Computer Science undergraduate going to start my 3rd year this August. I’m primarily a Javascript Developer (Web & Hybrid Mobile) and have been working with it for the last 2.5+ years, but the real me is a person who loves to solve problems in general, may they be related to programming or not. I’ve been exploring the field of data visualization for the last few months and I am in love with it. Have a look at IPL (cricket) data viz I created a few months back here.

Apart from coding, I love reading about psychology, history and watching horror movies.

What interested you about GSoC with InterMine?

I feel it magical how numbers show their true faces when seen via a meaningful visualization, and this is why I’m most excited for this summer with InterMine.

Real World Bio Data + Data Viz = Something big coming in! ❤

Another reason for my interest in InterMine, is that I applied to InterMine last year too for Cross InterMine Search Tool and couldn’t make it, but understood it’s community and how they work. The mentors are very helpful and supportive to everyone, so I directly jumped here this year. 😀

Tell us about the project you’re planning to do for InterMine this summer.

InterMine has tons of different types of biological data, this summer I’ll mostly be working on discussing and developing visualizations for data, making it easier to biologists to understand it in a easier way, and draw relevant conclusions with a single sight to the graphs.

There is a software called BlueGenes, which is already developed and helps explore different mines, it provides a tool API which allows Javascript developers to create additional visualization tools on top of it, which can be integrated on any Gene or Protein result page. My goal for this summer is to create a different variety of such visualization tools in order to enrich the visualization of different types of data.

As an example of how useful is what I’m doing, one of the visualizations I’ll be developing will help us understand how the expression of a particular gene is distributed among different tissues. This information is helpful for cancer biologists that want to assess if a gene is highly expressed across different tissues of an organism, because that gives a relative picture on to what degree it’s implicated in diseases.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Data visualization is something that requires you to understand the data properly first in order to be able to actually create some meaningful visualization out of it, and since I’m not very familiar with InterMine’s data model and the related biological terms, I’ll face some difficulties during my thought process of “what and why” to visualize. To overcome this, I’m already exploring more and more of the InterMine’s data model, trying to understand how to deal with different types of data, and how to create the appropriate visualization for them. Mentors are really helping me out with this (overall in terms of tech, viz and everything). 🙂

Share a meme or gif that represents your project

akshatmeme1

GSoC Interview: Migrating from Struts to Spring with Prabodh Kotasthane

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Prabodh Kotasthane, who will be working on a project to migrate InterMine’s RESTful web services from Struts to Spring.

Hi Prabodh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

I am a final year Computer Science Engineering (B.Tech.) student from Birla Institute of Technology, Mesra, India. I have been coding in  JAVA and related frameworks, C/C++ and python since last 4 years. I have contributed to Open Source Community, some software development projects and hackathon projects at college level. Prabodh
I successfully completed GSoC 2018 with OpenMRS. My project was OAuth Module Enhancement and SMART Apps Support. Details about the project can be found here:

https://pkatgithub.github.io/GSoC-2018-Final-Evaluations/

Presently I am doing internship under Microland Limited, Bengaluru, India till end of May 2019. Here I am working around graph databases and technologies like Apache Kafka and Neo4j with all the coding part done in python.
Apart from coding, I have many other interests and hobbies which include singing, cooking, photography, fine arts, basketball and writing.

What interested you about GSoC with InterMine?

It was around mid Jan this year when I got to know about InterMine.
Previously, I have worked with Java Spring Framework and hence I am comfortable with the same. So, I was searching for GSoC organisations which have something to do with Spring and then I got to know about InterMine and their project in which they were planning to migrate the web-services from Struts to Spring.
I read more about InterMine and also about the project, and I found it interesting. I went through the documentation of the project and joined the discord handle of InterMine so that I could connect with the mentors and the community.
I had a warm welcome into the community. Julie, Daniela and Yo are always excited to chat and exchange thoughts and it’s been such a good time with them till now.
All in all, this community, this project and the people associated to this project made me believe that I can do a GSoC with InteMine!

Tell us about the project you’re planning to do for InterMine this summer.

Presently InterMine uses Struts framework which is outdated. InterMine provides RESTful web-services which facilitates to execute custom or templated queries, search keywords, manage lists, discover metadata, perform enrichment statistics and manage user profiles.
The main objective of this project is to migrate the web-services from Struts to Spring framework and document the APIs with Swagger in compliance with OpenAPI Specifications.
Spring framework is evolving all the time and is more robust and flexible as compared to the Struts framework.
OpenAPI specifications are easy to write and Swagger Codegen, which supports Spring, makes the job of developer easy by generating the code stubs which can be modified to render the services.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

InterMine has a lot of web-services, a total of 70, with various different functionalities.
The business logic of web-services is strongly dependent upon the
classes in webcore and in order to migrate a web-service, the knowledge of underlying logic layer is a must.This is going to a real challenge. It is a requirement to give proper time and understand this business logic layer of the project.
Apart from this, writing tests is also a time taking job. I wish I could get some help in that! 😛

Share a meme or gif that represents your project

PrabodhMeme

GSoC Interview: InterMine Schema Validator with Deepak Kumar

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Deepak Kumar, who will be working on the InterMine Schema Validator.

Hi Deepak! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi, Thank you for this opportunity, Let me first talk about myself, My name is Deepak Kumar, I live in Ahmedabad, India with my family. I started coding when I was in 17, I had two great teachers in my school days who introduced me to computer programming, and from that time I got interested in this field.

I completed my graduation in Computer Applications from St. Xavier’s College, Ahmedabad and currently I’m doing  Post Graduate Program MSC.IT(Information Technology) at DA-IICT, Gandhinagar, India.

Now talking about my technical details, I love working on challenging projects, I’ve worked on several projects, One of my favourite project that I created while pursuing my bachelors was ‘Smallscript’, It’s a compiled programming language that compiles to bytecode and runs on JVM that makes it platform-independent. It’s my favourite project because It was challenging and when I started with the project I didn’t know any technical detail about compilers, so I had to start from very scratch.

I’ve also worked with a startup company, where I worked as a backend-developer with a team of 8 people and our team was really fantastic, I worked on two projects there, and I really enjoyed it, working with a big team wonderful experience.

I’ve recently started my open source journey with GSoC 2019. Though I’m new to open source, I’ve started contributing to ‘JabRef’ and as I’m selected for GSoC 2019, I’m also going to work with Intermine this summer, and have future plan to contribute to Intermine after completion of GSOC. I also regularly participate in coding contests and hackathon, In one of the AI contest, I built an AI game that ranked 68 among thousands of participants.

Currently, I’m working at OpenXcell Technolabs as an Intern, which is part of my MSC.IT Master’s program. I love reading, travelling, table-tennis and working with new technologies.

What interested you about GSoC with InterMine?

When GSoC 2019 was about to start, I had already bookmarked a few of the previous year organizations I was interested in, and hoping that Intermine will be part of GSoC 2019 too. When the organization list came out, I was super excited to see Intermine in the list. After going through the Intemine’s idea list, I found myself very interested in ‘Intemine Schema Validator Project’, So it was really the Intermine’s project that made me interested in the community.

Tell us about the project you’re planning to do for InterMine this summer.

I’ll be working on a project named ‘Schema Validator’ for Intermine this summer. Well, the project is quite simple to explain, it’s going to be a library that takes a file as input and outputs whether that file is following a particular schema or not. While working on the project my goal from the first day would be to create this project as general as possible, so that the project can be easily extended to support other schemas as well.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Yes, there are few challenges that I will face while working on this project, One of the biggest challenges which I’m currently trying to solve is about performance. As the purpose of this project is to validate schema files, then the problem is how will I handle larger files that are filled with the content of like 10GB or more. I need to discuss this problem with my mentors that what is their expectation about the performance of the library.

Currently, I’m thinking about the solution to this problem. Maybe I can boost the performance by concurrently running multiple instances of a Schema Validator, Although it doesn’t matter how I implement it If the library is validating a 10GB file that it is definitely going to take a little amount of time.

Then there are also a few challenges regarding the implementation of the schema rules.

GSoC Interview: Laksh Singla on imjs and imtables upgrades

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Laksh Singla, who will be working on upgrading imjs and imtables.

Hi Laksh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

pic

Hi, I am excited to be a part of the team too!! I am a Computer Science undergraduate student studying at BITS Pilani, India. I will be entering my third year in August. I was originally passionate about web development, but after entering my sophomore year, I was exposed to a wide variety of fields in computer science, and hence my current interest is primarily divided between exploring new web technologies, understanding internals of computer systems and a little bit of data science (read above as I am confused :/ ).

I listen to rock music a lot and my favorite band at the moment (and maybe forever) is Queen. I used to play Basketball too but left it soon after entering college. I am constantly looking to diversify my interests.

What interested you about GSoC with InterMine?

After getting to know about open source, I was determined to actively take part in GSoC. One of the primary reasons why I was interested in InterMine was the friendly and helpful community of mentors and volunteers who enthusiastically answered all my doubts. Moreover, bioinformatics is a field that I have never explored and I thought it would be fun to gain some insight into it without getting much out of my comfort zone.

Tell us about the project you’re planning to do for InterMine this summer.

My project over this summer has multifold tasks, all towards a single goal – maintenance of the im-tables and imjs libraries. Following are the major tasks which I plan to complete over the summers:

  • Upgrade current dependencies of the libraries
  • Improving the test suite of imjs libraries
  • Updating current docs to be more newcomer friendly (user side for imjs, developer side for im-tables
  • Adding a few helper functions to query the intermine-registry data

Are there any challenges you anticipate for your project? How do you plan to overcome them?

One of the serious challenges that I will face would be fully upgrading dependencies of both of the libraries, as it has been a pretty long time since they were last updated and the Javascript/Web ecosystem moves fairly quickly. Mocha (for imjs) and CoffeeScript (for im-tables) on being upgraded broke the library. Although the errors encountered during upgrading Mocha were decent in number (approximately 200 total errors, 5-6 distinct errors), I was able to debug some of them down giving me a little bit of confidence that they could be overcome.

For CoffeeScript however, the whole grunt system has gone obsolete and the error messages are esoteric and non-informative. I am not certain that all of the dependencies for im-tables would be able to get updated, and might require a rehaul of the library, something that is not possible during the timeline stipulated by GSoC. If such a case occurs, I will make sure to create a doc highlighting issues faced, long term goals regarding those pending upgrades and hopefully vulnerabilities present in the old (i.e. currently used) versions of those libraries.

Share a meme or gif that represents your project

unnamed

GSoC Interview: Ankur Kumar on putting InterMine in the cloud

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Ankur Kumar, who will be working on the project “Intermine Cloud: Making Intermine cloud native and easing deployments”.

Hi Ankur! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Namaste everyone! I am a second-year undergraduate student at the Indian Institute of Engineering Science and Technology, Shibpur. I am pursuing a Bachelor’s degree in Mechanical Engineering. To properly introduce myself, honestly, It is always a hard thing to do for me. I do not associate myself with a single identity of a particular subject, a stream of study or profession. I do design bike frames, refrigeration systems and power generation plants. But I also code control algorithms for motors that power those bikes and path planning algorithms that are used by autonomous bikes and robots. I grow plants in controlled environment with help of various sensors and actuators to enhance their yield and study their response to different stresses and also connect those sensors to cloud as iot devices to do data analysis on collected data. I have huge interest in commerce, working of businesses and financial markets. I spend a good amount of my time learning about these things. This list is not exhaustive, But finally, as a mandatory disclaimer, I have not figured out everything yet, about the things that I just mentioned. I hope that one day I will and then I will move on to new projects. So, to put it in a poetic way, I am a curious explorer, who is ready to embark on any journey without even knowing the destination. As long as the journey has a lot of surprises to momentarily satisfy my curiosity. I know what are you thinking after reading this, Why and how you do all this? (Except that I am too ambitious, show off or just insane 😅) Well, I do not have a proper or detailed answer to these questions. I just keep trying to do things and they eventually happen. But, I have a better question for everyone instead of this one. Why not? It is too much fun to live this way. I promise!

What interested you about GSoC with InterMine?

I always wanted to work on a project that is at the intersection of computer science and biology. Both of these fields equally attract me. I had a really hard time choosing between them when I was filling my admission form for senior secondary. I eventually went for biology, if you are wondering. Intermine is a perfect place for me to explore both of these fields. But, this is not the most important thing that makes me choose Intermine. The most important thing is the people at Intermine. Intermine has an awesome and very friendly community. Mentors are very supportive and responsive. I had a great experience discussing the details of my project with mentors. Well, I can confidently say that my mentors are the best. If anyone thinks otherwise, I am ready for a debate!!

Tell us about the project you’re planning to do for InterMine this summer.

My project forms a part of larger efforts of Intermine team that will make Intermine more accessible to its users. More specifically, my project aims to create a service that offers managed intermine instances on the cloud. Also, the work done on my project will be used to create a cli tool that will ease the creation of intermine instances locally, using the same cloud technologies.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The most important one is time. I have a long list of tasks that needs to be completed. Also, I need to coordinate with two other projects, which can be tricky. To overcome these challenges, I worked hard to come up with a very detailed timeline and design documentation. So, now my plan for the coding period is simple, while tasks remain, pick one task at a time, work hard on it, complete tasks on time and then party hard on weekends.

Share a meme or gif that represents your project

Replacing a lightbulb - Imgur

GSoC Student Interview spotlight: Single Sign-in For Intermine + Rahul Yadav

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Rahul Yadav, who will be working on the InterMine single sign-in project.

Hi Rahul! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi ! Excited to be on the team. I am a third year undergraduate student, pursuing my Bachelors of Technology in Computer Science from USICT (GGSIPU, Delhi). I love being in front of my laptop. I can certainly spend more time writing code than doing anything else, but Football and Basketball have always been an exception.

I have done many projects during my past academic year in order to utilise and explore my skill set. I have always loved contributing to open source because it is such a huge community of amazing developers who are always there to help you out.
Apart from this, I have worked on oauth2 implementation during my internship in last summer where I used Java to connect google services like G-Drive, Hangout and others with the company codebase. I was always fascinated by cloud services so I kept working on GCP, AWS, AZURE and etc frequently.

What interested you about GSoC with InterMine?

To be honest, I never thought i would get an opportunity to work with a community like InterMine. But, when I saw list of projects, it intrigued me and I found myself on this very interesting project, single sign in which the project requirements and the tech seemed very familiar to me and because of that I kept on digging about the project requirements and did lots of research on it, and with every minute spent on this, my interest escalated exponentially, and Eureka! I finally came up with solution which helped me to be a part of this amazing community.

Tell us about the project you’re planning to do for InterMine this summer.

In the current scenario, a user logs in the desired intermine and saves the results and the required data. The problem arises when the same user wants to access a different intermine, he/she will have to register again on this new mine and log in again. Currently, InterMine community does not have a single common sign-in mechanism and thus it is authenticating users with the help of tokens (temporary and permanent one) or using google service to log in. This project will modify the existing token mechanism by making the intermine as an OAuth2 provider with a single common Authorization server for all 30 mines so that user could access all the mines with the single set of credentials i.e just one time registration.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

This project is related to security and the most important part about it is, that it is all about user credentials which means a single wrong logic or step can expose our security, so implementing a fully secure system is a major challenge for this project.

I’m going to consider all the possible threats and vulnerabilities during the development phase of the system, and will focus on a lots of testing and debugging in search of any kind of loopholes, if so then fixing it before deployment.

Share a meme or gif that represents your project

 

 

GSoC 2019 with InterMine is ON!

After the fabulous experience we’ve had with GSoC in 2017 and 2018, we’re delighted to announce that we’ll be mentoring again this year. It’s almost impossible to describe the breadth of experience, quality, and insight students bring us every year and we’re so excited to meet a whole new batch of students again in 2019.

Prospective student?

If you’re a student interested in working with us, your first port of call is our GSoC site. Most of our students hang out at chat.intermine.org too.

We have a Q&A webinar coming up on March 12, 2019 at 3PM UK time (when is it in your timezone?) where we’ll share tips for good applications, GSoC alumni from previous years will share their experiences, and we’ll briefly describe all of the project ideas and answer any questions. If you can’t make it, add your questions to the agenda before the call and we’ll answer them during the call anyway! Here’s the agenda and joining instructions.

Interested in mentoring?

Generally we expect mentors to come from our community – InterMine users, developers, or previous students. If you fit into one of those categories and want to help mentor, email yo@intermine.org. Not sure if you’d be a good fit? We’re still happy to discuss any ideas!