Levelling up: From GSoC student to mentor

We’re really proud of our ongoing engagement with GSoC students from previous years, and we always encourage our students to stay involved in any way that suits them, from writing papers about their work, summer internships in the office, and even joining the team. Here, we’ve interviewed Aman Dwivedi, Arunan Sugunakumar, and Adrián Rodríguez-Bazaga, all of whom were mentors in 2019, but came from the special perspective of having been InterMine students in 2018. It’s not long at all until we’ll be thinking about GSoC 2020! 

Hi all – thanks for volunteering to be interviewed! What motivated you to return as a mentor after having been a student?

Adrián: As a result of being a student under InterMine umbrella during GSoC 2018, I got invaluable skills that contributed towards my professional career, and eventually to getting a job at the mentor organization itself. One of these skills is the ability to communicate, cooperate, and in general terms, to work with a software development organization in an international setting. This is a highly demanded skill – both in industry and in academia – that I couldn’t really get anywhere else before GSoC. 

On the second hand, the opportunity to learn how to contribute back into an open-source with a (huge) codebase and a decent number of contributors, both with code and ideas, was a unique chance to add this top-tier ability to my skill-stack. For this reason, since the high impact that GSoC had in my career, I wanted to go back and help other prospective students by mentoring them and sharing my experience, something that my current position at InterMine helped to contribute positively.

Aman: As a developer, I think we often use open source software and we don’t really get a chance to give back to the community. It becomes difficult to keep contributing to open source in our day to day professional work. Being a part of GSoC in the past, I have realised the importance of open source projects and the communities running them. Returning as a mentor for GSoC this year gave me a reason and a chance to contribute again. I always wanted to be a part of the GSoC journey again and this gave me an opportunity to welcome new contributors to the community.

Arunan: Being part of an organisation which is on the other side of the planet is always an exciting thing to do. I understood the full meaning of the term ‘Globalization’ when I was a student at InterMine last year, thanks to GSOC. I loved our meetings, guidance I received, the project outcome and the level of satisfaction I got. I wanted to have the same experience again this year as a mentor with the organisation I am familiar with.

Did you feel like you had any special insights into what students were going through, having been in the same position in previous years?

Adrián: Having been in the same situation as the students were during GSoC, was indeed very helpful to find and understand the potential needs that they might have. As a matter of illustration, one of the difficulties that is common within already-accepted GSoC students, is that when they face issues in terms of how to continue their progress through the program – either in terms of how to fix obstacles that they might find or contributing with new features – they often don’t feel “brave” enough to communicate with the mentor in order to ask about those problems directly, but instead prefer to find their way through independently, as maybe some of them feel that asking on how to proceed/fix something is a “signal of  lack of knowledge”, and in my opinion this is totally wrong, as mentors are there precisely to help you get around these situations!

Aman: From being a GSoC student to stepping into the shoes of a GSoC mentor, I already was aware of the problems faced by a student. Being a first time contributor in an open source organisation is just like entering a room full of unknown people. Sometimes the student might not know when to ask for help or feedback. Communication becomes the main barrier in such cases.

Arunan: As a student, the hardest part was selecting an organisation and working with them before submitting a proposal. GSoC has gained more and more popularity over the years and the competition is very tough. This might discourage many students and they might postpone their idea of participating in GSoC to the following year. Students should learn to overcome this fear and start trying. Once you have passed a threshold point of getting to know the organisation, the path becomes clear and easy. Once you reach this point, you get all the motivation in the world to start and complete the project because it is an exciting journey.

What advice would you give to a student who is applying for GSoC? Is there something you’d go back and tell yourself when you were a student? 

Adrián: In my view, and re-iterating what I’ve stated in my answer to the previous question, I encourage students to communicate with mentors constantly, and ask about any issue that may arise during the program, while still keeping a high degree of independence.

Aman: GSoC is about open source communities. The student should keep in mind that his/her code would be used by a lot of people all over the world. Each and every aspect of the student’s work has a great impact on a lot of people and a lot of dependent projects. With this thought, comes a great responsibility of ownership. The student should work passionately and should ask for feedback and suggestions from other community members to enhance his/her work.

What tips would you give to first-time mentors? 

Adrián: For first-time mentors, I strongly advise to be proficient enough with the tech stack and have a clear idea of what the desired output from the project is – especially if the project has not been proposed by you, so that you are able to guide the student through the program. In addition to that, make sure to continuously be in close communication with at least one senior mentor in the organization, so that any arising matters can be cleared.

Aman: Mentors should understand the project thoroughly. Understanding the various components of the project is extremely necessary. One should be in sync with the core team of the organisation and should discuss about the expectations from the project. Selection of students is the most important part of GSoC. It is always better to discuss about the various students with the other team members before coming on to the final selection.

Arunan: Mentoring might seem hard especially if you are not part of the internal InterMine team. But if you are comfortable with the project and the tech stack, then mentoring wouldn’t be a problem. Mentors needs to be up-to-date on the project all the time and should have some patience when the student struggles. If you are a first time mentor, it is better to co-mentor with a person who is in the internal InterMine team so that decision making can be easy and aligns with the future work of the organisation.

Interested in participating as a mentor or student yourself?

Mentoring: If you’re interested in mentoring, please email yo@intermine.org to discuss your project ideas. Generally we expect mentors to be known to us and/or have had some involvement in the InterMine community before participating as a mentor. You can also read through our Guidance for Mentors.

Interested student / intern: Check out our guide for students applicants. In 2020 we may well be participating in Outreachy as well as GSoC – so you don’t have to be a student to apply!

InterMine 4.1.1 – patch release

We’ve released a small batch of bug fixes and added the Code of Conduct.

Thank you to our contributor Asher Pasha (ThaleMine).

Fixes

  • ncbi-gff bio source updated due to data change
  • intermine plugin updated to allow you to build and deploy your InterMine instance using Gradle 4.9. To update the Gradle version on your mine, please read the upgrade instructions
  • merged PRs from Asher Pasha (ThaleMine) aimed at streamlining ThaleMine production.

This is a non-disruptive release.

See release notes for detailed information.

InterMine 4.1.0

InterMine team has just released InterMine 4.1.0.

The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.

This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:

  1. an ELIXIR identity
  2. register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.

More details here in the OpenAuth2 Settings section of the documentation.

Also new in this version is the gradle wrapper 4.9, which is compatible with Java11. This only effects the users which compile/install InterMine code.

Thank you so much to our contributor Joe Carlson for improving the generateUpdateTriggers task.

The release contains also a few bug fixes.

Bug Fixes

  • Solved the error caused by obsolete terms in the gene ontology
  • Fasta query result: CDS translation option + extra view parameter
  • The ONE OF constraint works properly when editing a template
  • The default queries configuration have been migrated to json
  • The task generateUpdateTriggers has been improved

See the release notes for the complete list and detailed information.

This is a non-disruptive release. To update your mine with these new changes, see the upgrade instructions.

InterMine Cloud: Making InterMine cloud-native and easing deployments

GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.

InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community.
To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.

So, the first thing that we did was packaging InterMine into Docker containers.

InterMine on Docker

Repo: https://github.com/intermine/docker-intermine-gradle
Commits: https://github.com/intermine/docker-intermine-gradle/commits?author=leoank


Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.

But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.

So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.

Usage instructions for these Docker containers are documented here.

After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.

InterMine Cloud Infrastructure as Code

Repo: https://github.com/intermine/intermine-cloud
Commits: https://github.com/intermine/intermine-cloud/commits?author=leoank

To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.

Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.

Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.

Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.

Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.

InterMine Cloud

Repos:
Compose: https://github.com/intermine/intermine_compose
Configurator: https://github.com/intermine/intermine_configurator
Wizard: https://github.com/intermine/wizard

Commits:
Compose: https://github.com/intermine/intermine_compose/commits?author=leoank
Configurator: https://github.com/intermine/intermine_configurator/commits?author=leoank
Wizard: https://github.com/intermine/wizard/commits?author=leoank

InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.

InterMine Cloud Workflow

The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.

InterMine Cloud Architecture Overview

InterMine Cloud has four main components:

  • InterMine Compose
  • InterMine Configurator
  • Wizard
  • Kubernetes environment

InterMine Compose

Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.

InterMine Configurator and Wizard

My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.

Kubernetes environment

The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.

Future Work

InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.

Call recording available: GSoC 2019 Final Presentations

Our Google Summer of Code students presented their work at a special edition of the community call yesterday. You can catch up on the entire recording on YouTube – or scroll down to see individual presentations. The agenda and notes accompanying the call (including code and slides links) is in Google Docs.

Prabodh Kotasthane – Spring Migration

Prabodh’s presentations starts at 3:54: https://youtu.be/ZzV6JmVRQmA?t=234

Slides

Ankur Kumar – InterMine Cloud

Ank’s presentation starts at 13:12: https://youtu.be/ZzV6JmVRQmA?t=792

Laksh Singla – Upgrading imjs & im-tables

Laksh’s presentation starts at 21:08: https://youtu.be/ZzV6JmVRQmA?t=1268

Rahul Yadav – Single Sign-In

Rahul’s presentation starts at 27:39 https://youtu.be/ZzV6JmVRQmA?t=1659

Deepak Kumar – InterMine Schema Validator

Deepak’s presentation starts at 24:11 https://youtu.be/ZzV6JmVRQmA?t=2051

Akshat Bhargava – Data Visualisations

Akshat’s presentation starts at 41:30 https://youtu.be/ZzV6JmVRQmA?t=2490

InterMine 4.0 – InterMine as a FAIR framework

We are excited to publish the latest version of InterMine, version 4.0.

It’s a collection of our efforts to make InterMine more “FAIR“. As an open source data warehouse, InterMine’s raison d’être is to be a framework that enables people to quickly and easily provide public access to their data in a user friendly manner. Therefore InterMine has always strived to make data Findable, Accessible, Interoperable and Reusable and this push is meant to formally apply the FAIR principles to InterMine.

What’s included in this release?

  1. Generate globally unique and stable URLs to identify InterMine data objects in order to provide more findable and accessible data.
  2. Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability
  3. Embed metadata in InterMine web pages to make data more findable
  4. Improve accessibility of data licenses for integrated sources via web interface and REST web-service.

More details below!

How to upgrade?

This is a non-disruptive release, but there are additions to the data model. Therefore, you’ll want to increment your version, then build a new database when upgrading. No other action is required.

However, keep reading for how to take advantages of the new FAIR features in this release.

Unique and stable URLs

We’ve added a beautiful new user-friendly URL.

Example: http://beta.flymine.org/beta/gene:FBgn0000606

Currently this is used only in the “share” button in the report pages and in the web pages markup. In the future, this will be the only URL seen in the browser location bar.

For details on how to configure your mine’s URLs, see the docs here.

See our previous blog posts on unique identifiers.

Decorating the InterMine data model with ontology terms

InterMine 4.0 introduces the ability to annotate your InterMine data model with ontology terms.

While these data are not used (yet), it’s an important feature in that it’s going to facilitate cross-InterMine querying, and eventually cross-database analysis — allowing us to answer questions like “Is the ‘gene’ in MouseMine the same ‘gene’ at the EBI?”.

For details on how to add ontologies to your InterMine data model, see the docs here.

Embedding metadata in InterMine webpages

We’ve added structured data to web pages in format of JSON-LD to make data more findable, and these data are indexed by Google data search. Bioschemas.org is extending Schema.org with life science-specific types, adding required properties and cardinality on the types. For more details see the docs here.

By default this feature is disabled. For details on how to enable embedding metadata in your webpages, see the docs here.

Data licences

In our ongoing effort to make the InterMine system more FAIR, we have started working on improving the accessibility of data licences, retaining licence information supplied by the data sources integrated in InterMine, and making it available to humans via our web application and machines via queries.

See our previous blog post on data licences.

For details on how to add data licences to your InterMine, see the docs.

Future FAIR plans

  1. Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore
  2. Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets

Upcoming Releases

The next InterMine version will likely be ready in the Fall/Winter and include some user interface updates.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.

InterMine 3.1.2 – patch release

We’ve released a small batch of bug fixes and small features. Thank you so much to our contributors: Sam Hokin, Arunan Sugunakumar and Joe Carlson!

Features

  • Templates can be tagged by any user, not just the super user. (Via webservice only – for now)

Fixes

  • When searching our docs, some times the “.html” extension was dropped. This was fixed by our beautiful documentation hosters – readthedocs.org
  • Installing the “bio” project via Gradle does not fail if you do not have the test properties file.
  • Gradle logs error fixed
  • Removed old GAF 1.0 code
  • Fixed XML library issue:  java.lang.ClassCastException for org.apache.xerces
  • Set converter.class correctly
  • Updated the protein atlas expression graph
  • Handle NULL values returned by NCBI web services
  • Updated Solr to support new Solr versions
  • Removed unneeded Gretty plugin
  • Better error handling for CHEBI web services
  • Publication abstract is longer than postgres index
  • Removed phenotype key, it’s not in the core model and has conflicting key
  • Updated ObjectStoreSummary to handle ignored fields consistently.

Upcoming Releases

InterMine 4.0 is scheduled for release the week of 7 May 2019.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.