Outreachy Internship blog: Everybody Struggles!

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

No matter how experienced or novice a person is, everybody experiences struggle at some point in their journey. The statement seems pretty easy to admit for many people. But when you are a beginner stepping your foot in the mammoth field of software development, it’s very difficult to acknowledge that even your mentors or other senior developers would have ever struggled at basic problems like you do. This gap in acknowledgement creates an inferiority complex and makes your journey to the top much more difficult than it should be.

Today, I’ll be sharing one such incident where I was stuck on an issue for quite a long time just because I was hesitant to ask someone else. As you read on, I’ll recommend to ignore all the technical jargon in the coming paragraphs if you don’t get it as that’s not essential to the point I want to make. There can be lot of similar situations.

I am in my third week of internship with Intermine. I have been doing some form of coding for past 4 years or more (mostly as part of my course curriculum) but I am still very much a beginner in most of the domains. Giving some context to the following discussion, the intermine_boot project is a command line tool to ease the building process for the Intermine instances. It fetches an already built docker image or builds a docker image if needed and runs docker container with the image to get the intermine instance running. I was working on a task to modify the build file for a docker image in such a way that a new image is only built if a build folder does not already exist on the system. To test the changes, I’d have to run the intermine_boot command in such a way that the rebuild of the image is triggered and I can see if the changes are taking effect. My mentor, Kevin, gave me instructions on how to test this. The instructions, although clear, involved a number of steps out of which one step wasn’t clear to me even after going through the explanation multiple times. The fear of asking a stupid question kicked in and I thought I’ll just go on with whatever I understood.

I started my 16 hour long journey to debugging my changes by modifying the code and testing the functionality. I followed the instructions and tested my build and it failed (obviously, as I was missing that piece). I searched the error online to no and landed on some stack overflow results. I tried to make the suggested changes without understanding them and it resulted in other errors. Finally, I gave up and took a nap for the second time. After waking up I was attaching the errors in a message to ask the mentor again. But, voila! When I started putting all things together during asking I realized the fix that could be useful and it worked. I realized that I had become frantic and started trying a lot of things without understanding them.

I took-away following lessons from this incident and consciously try to follow them.

  1. When you don’t understand what the other person has said, don’t just assume that you will figure it out. Just ask him again to clarify and that will save you a lot of time.
  2. When stuck on issue, you can become frantic and trying random solutions. Just take a small break or nap and see the magic.
  3. Don’t code before understanding what you are trying to do. It’s a recipe for failure.

The Struggle you are in today is developing the Strength you need for tomorrow

– Robert Tew

Google Season of Docs 2020

We’re pleased to announce that, after partecipating in Google Summer of Code (GSoC) for three fantastic years, and in Outreachy mentoring program which is running right now, we will be participating, for the first time, in Google Season of Docs 2020 as a mentor organization.
InterMine will be under the umbrella of the INCF organitation; here you can find the full ideas list for INCF projects including InterMine projects (numbers 3 and 4).

InterMine Projects

  1. InterMine user training docs. For more details, please see here.
  2. Review, update, and integrate InterMine developer documentation. For more details, please see here.

If you’re interested in applying for one of our two projects, please drop an email to the people named in the project document to introduce yourself, and explain which of the project(s) you’re interested in.

Deadline for technical writer applications is the 9th of July.

If you have any ideas or questions, please don’t hesitate to email us.

Announcing CovidMine – analyse integrated COVID genomic and geographical distribution data

We’re excited to announce that a project we’ve been working on for the last few weeks is ready for public consumption: CovidMine, an InterMine dedicated to COVID-19 / SARS-CoV-2 data. Data is updated on a daily basis Monday-Saturday at 6PM UK time. You can try CovidMine out now, or read more about it below. 

So, what’s it all about, and why another COVID resource? 

This is something we thought about a lot, initially – there have been a massive number of initiatives going into making data available and visualising it already. In the end it came down to a couple of reasonably simple facts: InterMine already has tools to draw data from a lot of sources and integrate it, but it also offers a familiar interface if you’ve used any of the other InterMines out there, and we have API language bindings for multiple programming languages, including R, Python, Perl, and Javascript

Data sources include confirmed Covid-19 cases, deaths, new confirmed cases and new deaths for countries from Our World In Data1, data separately for individual states (for the United States only) from the COVID Tracking Project2, Sars-CoV-2 reference genome3 and nucleotide sequences from isolates deposited in Genbank4.

If you’re aware of other data sets that might make this more useful please contact us to suggest them.

Jump straight in

We’ve prepared a few template queries to help you get started with your analysis –

What’s still missing and how can I help? 

We’re officially focusing our efforts on developing tools for CovidMine in our new user interface, BlueGenes, rather than the legacy JSP interface. 

A few things we’d like to add to the UI:

  • A data visualisation showing all results on a map.
  • A visualisation that shows change over time in countries or regions, for known cases, recovered, and deaths. 
  • A genome browser (JBrowse 1)

These visualisations would update based on the filters in the table showing in your data

Data updates: 

  • Find and integrate a data source which provides China data separately for individual states

Bioschemas Markup

We have applied structured data in JSON-LD format, using the Bioschemas.org profiles DataSet, Gene and Protein. It’s available in the legacy JSP interface only, but it will be integrated in the new interface soon.

If you’re aware of other data sets that might make this more useful, or other visualisations that might be exciting, please contact us to suggest them! 

References:

  1. https://covidtracking.com
  2. https://covid.ourworldindata.org
  3. https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/#reference-genome
  4. https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049

Outreachy Interview: John Mendez on Improving the InterMine Data Browser

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed John Mendez, who will be working on the InterMine Data Browser.

Hi John! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

I’m a US Army disabled veteran, a lucky husband, and proud father of two forever puppies, Didgy and Delilah. I started self-learning to code 3 years ago on FreeCodeCamp as a way to transition into a different career, and ended up founding a startup with my wife in our spare time. At first, coding was just a means to an end for me, but after coming into contact with the open-source community, I became enthralled with the prospect of giving something back to humanity through code.

People often ask me what the ideal scenario for our startup is. To that, I always answer, “hopefully it’s successful enough that we can hire under-represented talent to contribute to open source”. I genuinely believe that code can be used to uplift humanity, or enslave it. Hopefully, I can contribute more to the former.

What interested you about Outreachy with InterMine?

I came across Outreachy through a FreeCodeCamp post. I had no idea what to expect, and thought it would be a good way to gain the validation I needed to properly transition into a new career. My only interaction with OSS was through using it in my own project, so I assumed I would be working on codebases geared towards developers.

Then I came across InterMine, and my heart quite practically leaped for joy. You see, my father suffered from heart problems and passed away early this year. Then the coronavirus pandemic hit NY, with one of my aunts being the first in our family to become infected. 

So when I came across InterMine, I really fell in love with the mission to make data more readily available to biologists. Honestly, I didn’t even know it wasn’t. I never thought a non-scientist, beginner programmer like me would get accepted, so I continued to look for other projects. But a thought kept nagging me, “how many more lives could be saved if scientists could analyse data at the speed of their thoughts?”. 

This is why even though I highly doubted I would get accepted, I still had to make the effort. Because at this point in my life it would be the most impactful thing I’d be capable of doing.

Tell us about the project you’re planning to do for InterMine this summer.

My project is to bring the InterMine Data Browser web app and stack to more contemporary norms. The core of the project is already well-executed in jQuery, so mainly it’s a minor re-architecture using React. I do hope to finish that quickly so that I can continue to add more features though.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

My biggest hurdle will be overcoming my lack of scientific terms. During the pre-internship phase, I would sometimes feel I was reading alien hieroglyphics, and my brain literally would ache lol. 

To overcome this gap, I will need to rely on my mentors to help me develop proper test cases to ensure the data is being properly analysed. With those test cases, and binging Wikipedia articles, I feel I can become proficient enough with the terminology to make adequate progress.

My 2nd hurdle will be my perfectionism. It tends to stand in the way of making progress, and at times I’ve ended up tinkering too much that I’ve made it worse! The only way to overcome that will be with tough deadlines I suppose, as well as understanding when the requirements have been met.

Share a meme or gif that represents your project

image2

Announcing the InterMine 2020 Interns 🚀

Announcing the InterMine 2020 Interns! 🚀

As we mentioned in an earlier blog post, this year InterMine is participating in Outreachy for our May-August (northern hemisphere summer) internships. This year we’ll have five Outreachy interns, as well as a couple of in-house interns working with us. Please give them all a huge round of congratulations! 

Our interns and their projects 👩‍💻👨‍💻

InterMine training portal – Qian

Qian will be working on the InterMine training resources, creating new programmer and UI-oriented tutorials and updating/organising existing tutorials. 

Qian will be mentored by Yo and Asher. 

CLI tool for managing InterMine instances – Pooja Gaur

Contributing to the InterMine cloud project, Pooja will be orchestrating better automated builds for InterMines that will also work nicely in Travis or other CI. 

Pooja will be mentored by Kevin and Ank.

Improving the UX and UI of BlueGenes – Roshni Prajapati

BlueGenes, the new InterMine UI has improved drastically since Kevin joined the team and began adding and improving features. Roshni will conduct research into the usability and UX of BlueGenes, and redesign / improve interfaces where needs are identified. 

Roshni will be mentored by Yo and Kevin. 

Improving the InterMine Data Browser – John Mendez

The InterMine Data Browser was a 2018 GSoC project led by Adrián, designed to make it easier for people new to Intermine to learn and explore what types of data are present in an InterMine. John will be helping update the Data Browser with new features and migrate it to React. 

John will be mentored by Adrián, Nikhil, and Aman. 

Javascript Data Visualisations – Sakshi Srivastava

Last year, Akshat Bhargava created a suite of data visualisation tools for BlueGenes. Sakshi will be extending this work, refining some of the Tool API specifications and adding new data visualisations. 

Sakshi will be mentored by Akshat,  Kevin, and Asher. 

With thanks to our sponsors

We’re grateful to the Wellcome Trust for awarding a Diversity Enrichment grant that enabled us to sponsor three Outreachy interns, as well as Outreachy themselves who were able to sponsor another two internship positions.  

In-house interns  

We’ll also be working with Ank Kumar (working on InterMine Cloud related projects) and Celia Sanchez Laorden (InterMineR). 

What about GSoC? 

In previous years we also participated in Google Summer of Code – this year InterMine wasn’t accepted as a mentoring organisation. It is common for GSoC to “rest” applicants occasionally, so we’ll continue to apply and hopefully we will be back in GSoC next year.

Once we knew we weren’t accepted into GSoC for 2020 we joined the Open Bioinformatics Foundation GSoC org, who act as an umbrella organisation for GSoC mentoring organisations, and we had several strong applicants apply via the OBF’s organisation. Unfortunately the number of slots awarded to the OBF was lower than the number of promising students the OBF was ready to accept. This meant that InterMine was unable to take on any GSoC students, even though there were high-quality applications.

InterMine 4.1.2 – patch release

We’ve released a small batch of bug fixes and improvements.

If you host your own CDN please update it with the latest version of imjs (v.3.18.1) and im-tables (v 2.1.0).

Thank you to our contributors Joe Carlson, Paulo Nuin and Asher Pasha!

Fixes

  • DataSet URLs appear in tables
  • LOOP query in webapp has been fixed
  • Complex Displayer fixed
  • Updated chebiWS-client and jami-interactionviewer-json versions
  • Licence dataset doesn’t display null
  • runtime exception in BagManager.getBags catched
  • Fixed bug in the report page which allowed to execute javascript (Asher’s contribution)
  • Cast conversion corrected when updating serial (Joe’s contribution)

Enhancements

  • Update to Java11 (Asher’s contribution)
  • WebservicePythonCodeGenerator updated according to Python’s code styling PEP8 (Paulo’s contribution)
  • In the Export section, option “Upload to GenomeSpace” removed
  • ThaleMine updated to to psi BioSource and BioGrid (Asher’s contribution)
  • From FAIR side: json-ld home page updated + use of the registry to set provider/support in the home page markup, ‘Shared link’ configuration improved
  • Libraries as im.js, imtables.js, imtables-dep.js removed from intermine-webapp
  • gff source added to the bio/source multi gradle project
  • Improved the logs when post processes related to solr fail

This is a non-disruptive release.

See release notes for detailed information.

Levelling up: From GSoC student to mentor

We’re really proud of our ongoing engagement with GSoC students from previous years, and we always encourage our students to stay involved in any way that suits them, from writing papers about their work, summer internships in the office, and even joining the team. Here, we’ve interviewed Aman Dwivedi, Arunan Sugunakumar, and Adrián Rodríguez-Bazaga, all of whom were mentors in 2019, but came from the special perspective of having been InterMine students in 2018. It’s not long at all until we’ll be thinking about GSoC 2020! 

Hi all – thanks for volunteering to be interviewed! What motivated you to return as a mentor after having been a student?

Adrián: As a result of being a student under InterMine umbrella during GSoC 2018, I got invaluable skills that contributed towards my professional career, and eventually to getting a job at the mentor organization itself. One of these skills is the ability to communicate, cooperate, and in general terms, to work with a software development organization in an international setting. This is a highly demanded skill – both in industry and in academia – that I couldn’t really get anywhere else before GSoC. 

On the second hand, the opportunity to learn how to contribute back into an open-source with a (huge) codebase and a decent number of contributors, both with code and ideas, was a unique chance to add this top-tier ability to my skill-stack. For this reason, since the high impact that GSoC had in my career, I wanted to go back and help other prospective students by mentoring them and sharing my experience, something that my current position at InterMine helped to contribute positively.

Aman: As a developer, I think we often use open source software and we don’t really get a chance to give back to the community. It becomes difficult to keep contributing to open source in our day to day professional work. Being a part of GSoC in the past, I have realised the importance of open source projects and the communities running them. Returning as a mentor for GSoC this year gave me a reason and a chance to contribute again. I always wanted to be a part of the GSoC journey again and this gave me an opportunity to welcome new contributors to the community.

Arunan: Being part of an organisation which is on the other side of the planet is always an exciting thing to do. I understood the full meaning of the term ‘Globalization’ when I was a student at InterMine last year, thanks to GSOC. I loved our meetings, guidance I received, the project outcome and the level of satisfaction I got. I wanted to have the same experience again this year as a mentor with the organisation I am familiar with.

Did you feel like you had any special insights into what students were going through, having been in the same position in previous years?

Adrián: Having been in the same situation as the students were during GSoC, was indeed very helpful to find and understand the potential needs that they might have. As a matter of illustration, one of the difficulties that is common within already-accepted GSoC students, is that when they face issues in terms of how to continue their progress through the program – either in terms of how to fix obstacles that they might find or contributing with new features – they often don’t feel “brave” enough to communicate with the mentor in order to ask about those problems directly, but instead prefer to find their way through independently, as maybe some of them feel that asking on how to proceed/fix something is a “signal of  lack of knowledge”, and in my opinion this is totally wrong, as mentors are there precisely to help you get around these situations!

Aman: From being a GSoC student to stepping into the shoes of a GSoC mentor, I already was aware of the problems faced by a student. Being a first time contributor in an open source organisation is just like entering a room full of unknown people. Sometimes the student might not know when to ask for help or feedback. Communication becomes the main barrier in such cases.

Arunan: As a student, the hardest part was selecting an organisation and working with them before submitting a proposal. GSoC has gained more and more popularity over the years and the competition is very tough. This might discourage many students and they might postpone their idea of participating in GSoC to the following year. Students should learn to overcome this fear and start trying. Once you have passed a threshold point of getting to know the organisation, the path becomes clear and easy. Once you reach this point, you get all the motivation in the world to start and complete the project because it is an exciting journey.

What advice would you give to a student who is applying for GSoC? Is there something you’d go back and tell yourself when you were a student? 

Adrián: In my view, and re-iterating what I’ve stated in my answer to the previous question, I encourage students to communicate with mentors constantly, and ask about any issue that may arise during the program, while still keeping a high degree of independence.

Aman: GSoC is about open source communities. The student should keep in mind that his/her code would be used by a lot of people all over the world. Each and every aspect of the student’s work has a great impact on a lot of people and a lot of dependent projects. With this thought, comes a great responsibility of ownership. The student should work passionately and should ask for feedback and suggestions from other community members to enhance his/her work.

What tips would you give to first-time mentors? 

Adrián: For first-time mentors, I strongly advise to be proficient enough with the tech stack and have a clear idea of what the desired output from the project is – especially if the project has not been proposed by you, so that you are able to guide the student through the program. In addition to that, make sure to continuously be in close communication with at least one senior mentor in the organization, so that any arising matters can be cleared.

Aman: Mentors should understand the project thoroughly. Understanding the various components of the project is extremely necessary. One should be in sync with the core team of the organisation and should discuss about the expectations from the project. Selection of students is the most important part of GSoC. It is always better to discuss about the various students with the other team members before coming on to the final selection.

Arunan: Mentoring might seem hard especially if you are not part of the internal InterMine team. But if you are comfortable with the project and the tech stack, then mentoring wouldn’t be a problem. Mentors needs to be up-to-date on the project all the time and should have some patience when the student struggles. If you are a first time mentor, it is better to co-mentor with a person who is in the internal InterMine team so that decision making can be easy and aligns with the future work of the organisation.

Interested in participating as a mentor or student yourself?

Mentoring: If you’re interested in mentoring, please email yo@intermine.org to discuss your project ideas. Generally we expect mentors to be known to us and/or have had some involvement in the InterMine community before participating as a mentor. You can also read through our Guidance for Mentors.

Interested student / intern: Check out our guide for students applicants. In 2020 we may well be participating in Outreachy as well as GSoC – so you don’t have to be a student to apply!

InterMine 4.1.1 – patch release

We’ve released a small batch of bug fixes and added the Code of Conduct.

Thank you to our contributor Asher Pasha (ThaleMine).

Fixes

  • ncbi-gff bio source updated due to data change
  • intermine plugin updated to allow you to build and deploy your InterMine instance using Gradle 4.9. To update the Gradle version on your mine, please read the upgrade instructions
  • merged PRs from Asher Pasha (ThaleMine) aimed at streamlining ThaleMine production.

This is a non-disruptive release.

See release notes for detailed information.

InterMine 4.1.0

InterMine team has just released InterMine 4.1.0.

The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.

This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:

  1. an ELIXIR identity
  2. register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.

More details here in the OpenAuth2 Settings section of the documentation.

Also new in this version is the gradle wrapper 4.9, which is compatible with Java11. This only effects the users which compile/install InterMine code.

Thank you so much to our contributor Joe Carlson for improving the generateUpdateTriggers task.

The release contains also a few bug fixes.

Bug Fixes

  • Solved the error caused by obsolete terms in the gene ontology
  • Fasta query result: CDS translation option + extra view parameter
  • The ONE OF constraint works properly when editing a template
  • The default queries configuration have been migrated to json
  • The task generateUpdateTriggers has been improved

See the release notes for the complete list and detailed information.

This is a non-disruptive release. To update your mine with these new changes, see the upgrade instructions.

InterMine Cloud: Making InterMine cloud-native and easing deployments

GSoC 2019 was fun and I learned a lot from the InterMine Cloud project. In this blog post, I am going to summarise the work that I did on the project. A detailed technical description of all the work done will be published elsewhere.

InterMine is a powerful data warehousing, integration and analysis tool used to store and share genomics data. However, setting up an instance of InterMine is a time consuming and error prone process. It also requires technical knowledge and some familiarity with Java, Postgres, Solr, Perl and shell scripts. These issues create a barrier for entry and friction in adoption of InterMine by the bioinformatics community.
To solve these issues, we went back to the drawing board and spent two months planning and searching for simple and feasible solutions.

So, the first thing that we did was packaging InterMine into Docker containers.

InterMine on Docker

Repo: https://github.com/intermine/docker-intermine-gradle
Commits: https://github.com/intermine/docker-intermine-gradle/commits?author=leoank


Packaging InterMine into Docker containers helped us to reduce required dependencies to set up an InterMine to just two (Docker and Docker Compose). Previously you had to go through tens of pages of InterMine docs to get everything set up and configured correctly to start a new InterMine.

But, packaging InterMine into Docker containers was not a trivial task. Unlike other applications where we can have a single generic container image that can be used by different users, InterMine needs to be custom built for every user. Also, the build requires coordination with other services like Postgres and Solr.

So, instead of having a single Docker image, we now have a set of Docker images that can be orchestrated together to build custom InterMines. These Docker images can be configured easily using environment variables and config files for easier cloud deployments.

Usage instructions for these Docker containers are documented here.

After packaging InterMine in Docker containers, the second thing we did was to write the cloud infrastructure needed for deploying InterMine as Code.

InterMine Cloud Infrastructure as Code

Repo: https://github.com/intermine/intermine-cloud
Commits: https://github.com/intermine/intermine-cloud/commits?author=leoank

To achieve an easy to use and reproducible cloud infrastructure setup and deployments, we used three technologies: Terraform, Kubernetes and Helm.

Terraform is used to define required infrastructure as code. We now have Terraform scripts that can be used to spin up a Kuberenetes cluster on Google Cloud Platform with correct configs in just minutes.

Kubernetes is a production-grade container orchestration platform. It makes easier to manage containers on cloud.

Helm is like a package manager for Kubernetes. We wrote helm charts for deploying single InterMine instances and also entire InterMine Cloud components. Using these charts, users can deploy a custom InterMine in just minutes now.

Doing all this work standardised the cloud deployment process for InterMine. But, we didn’t stopped here though. We took this one step further, which finally brings us to InterMine Cloud.

InterMine Cloud

Repos:
Compose: https://github.com/intermine/intermine_compose
Configurator: https://github.com/intermine/intermine_configurator
Wizard: https://github.com/intermine/wizard

Commits:
Compose: https://github.com/intermine/intermine_compose/commits?author=leoank
Configurator: https://github.com/intermine/intermine_configurator/commits?author=leoank
Wizard: https://github.com/intermine/wizard/commits?author=leoank

InterMine Cloud is a SaaS platform that offers InterMines as a service to its users. It brings a whole new way to use InterMines and makes it accessible to a much larger group of users. We envisioned a completely new user workflow that removes all the technical burden from a user.

InterMine Cloud Workflow

The work we did on InterMine Cloud is completely reusable and we encourage others in to community to host their own InterMine Clouds. The diagram below gives you a brief overview of the architecture.

InterMine Cloud Architecture Overview

InterMine Cloud has four main components:

  • InterMine Compose
  • InterMine Configurator
  • Wizard
  • Kubernetes environment

InterMine Compose

Compose is responsible for authentication, authorisation and building custom InterMines using config files generated by InterMine Configurator. It also acts as a proxy to InterMine Configurator and the underlying kubernetes environment.

InterMine Configurator and Wizard

My mentors wrote configurator and wizard. Together they are responsible for generating a mine config that is used by InterMine Compose. Wizard asks a series of relevant question to the user about the data file, which is then processed by configurator to generate a config.

Kubernetes environment

The underlying Kubernetes environment is a standard Kubernetes cluster with few InterMine cloud specific components added. These specific components includes a Solr service and a distributed shared filesystem enabled by Rook.

Future Work

InterMine cloud is functional but a work in progress. It will take few more weeks to reach alpha. We have planned to add few more features before a public release and also actively looking for community feedback and suggestions.