Internships Summer 2020: Closing thoughts and final presentations recording

Last week was the end of the Outreachy May 2020 Internship Round, where we held a final presentations call for our interns, mentors and community. Due to the wide range of time zones, from New York to Singapore, no one time was optimal for everyone, so we extend our sincere gratitude that all the students were willing to attend the call!

We are thankful to a Wellcome Trust Diversity and Inclusion Grant for funding three of our interns, Outreachy for matching this by funding two interns, and our main Wellcome grant for making it possible to fund two more. This brings us up to a count of seven interns, the highest number that we’ve had the pleasure of working with yet at one time! This wouldn’t have been possible without help from our external mentors — Akshat Bhargava, Aman Dwivedi, Ankur Kumar, Asher Pasha and Nikhil Vats — whom will all be receiving a small prize as a token of our gratitude.

During the internship period, we had to say goodbye to our valued InterMine team member for 5 years, Yo Yehudi, who drove our internship scheme to its current state and has left to become the Technical Lead for Open Source at the Wellcome Trust. Thanks to the efforts of our mentors for their support of our interns, and to Rachel for running the final presentation call, we were able to provide a closing to the internship period we can all be proud of.

A recording of the final presentation call is available on our YouTube channel and embedded below.

Additional information on our interns

Many of our interns have been writing blog posts throughout their internship, which you may find an enjoyable read:

The GitHub accounts of all our interns are listed below, if you wish to check out their contributions:

In closing

It’s been a joy to work with so many talented people, and this includes all the contributions during the Outreachy contribution period prior to intern selection. Many valuable contributions to InterMine projects were made during this period, and we regret we weren’t able to offer everyone an internship.

We hope the next year of internships will be as successful as this one, and look forward to coming up with more exciting internship projects, as well as working with more fantastic interns and mentors. Until then, let’s enjoy the fruits of this labour!

Outreachy Internship blog: It’s a Wrap

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

This is the last week of the Outreachy Internship. From filling up the initial application form just based on the gut feeling even though I thought I lacked technical skills to apply for an internship, picking up intermine org to getting selected, the memories seem a bit fuzzy.

If someone would have asked me before applying to Outreachy whether I’d be able to get selected and successfully complete the internship, my answer would probably have been a no.

In a mere span of a few months, I feel like I have come quite a long way in terms of my confidence, approach to new problems and skills. I’ll reflect on three parts in this blog – my before and after thoughts of the internship, how Outreachy helped me grow and progress of my project.

Apart from my little stint at Hacktoberfest 2019, I did not have any quotable experience working with OpenSource. I feel lucky to land in such an encouraging org. I have had trouble thinking about communicating during the project as I have mostly just communicated in my mother tongue so understanding different accents or fear of mis-pronouncing words, mis-form sentences are there. But I never saw any judgement from anyone based on how correct my communication skills were. The only important thing is to express yourself. This really helped my self confidence. Before applying to Outreachy, I thought it is for people who have good open source experience and skill set, who have worked on reasonably big projects and are able to carry on the project completely independently. Now, I would say Outreachy is for anyone having basic skills and trying to learn new things. The collaborative environment of open source would surprise you.

I think Outreachy has helped me grow in a multitude of ways. It has definitely boosted my self confidence. It taught me working in collaborative projects and how it is ok to ask for help when stuck and not feel bad about it. Amongst technical skills, I have learnt about docker, docker-py, github actions, picked up more info about bash, python along the way.

The concept of having a mentor (actually two mentors!) is great. I have a background in research where most of the time I am all on my own to move the project forward (codewise) if stuck. Discussions with the advisor are mainly on the theoretical and experiment part. I am very grateful for my mentors being patient and guiding me during the internship.

I have definitely deviated from my initial project proposal. Some old tasks were removed paving the way for picking new tasks along the way. I think majorly the task of setting up wizard and configuration was not touched because they are not ready to be integrated. The task of running any mine (and not just mines based on biotestmine) would still require a bit of tussle. So there are definitely many open tasks requiring contribution.

I have got a taste of open source from Outreachy and hope to carry it forward. 

All good things come to an end eventually, but the next experience awaits

Anonymous

Outreachy Internship blog: After the internship

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

In this blog, I reflect on my academic journey so far and what I would like to do after the Outreachy internship ends.

I have done my schooling in Rajasthan, India. After appearing for RPET (Rajasthan Pre Engineering Test), a popular entrance exam for engineering college admissions in Rajasthan, I got admitted in Information Technology branch in Govt. Women College in Rajasthan which is one of the best Women only Colleges in India.

I had worked on some projects in college but exposure to diverse technologies was very much less. I felt compelled to pursue a master’s degree to both increase job opportunities and dive deeper in one of the domains in computer science.

After completing the BTech I worked in a startup, After that I got admission in IIIT Hyderabad, India in 2017 as a part-time Mtech student in computer science. As my interest developed, I talked to different research advisors for exploring my knowledge in research and converted to full-time MS student in data science under Prof P. Krishna Reddy. My MS would tentatively finish around May/June 2021.

I am looking towards joining the industry in an engineering role. I am interested both in data science and software development. My college organizes campus placement sessions around December every year. After Outreachy, I am planning to polish my data structures and algorithms knowledge and sit for campus placement.

I feel that after coming to IIIT, I have gained a lot of skills. I have worked on projects in distributed systems and machine learning. I was head teaching assistant for the Distributed Systems course and managed making assignments and paper evaluations for a class of around 150 students.

I am open to learning new skills for advancing my career. I am also open for remote jobs. I can communicate in Hindi and English language.

Github: https://github.com/22PoojaGaur
Linkedin: https://www.linkedin.com/in/pooja-gaur-b5848048/

To everybody reading this. Please feel free to contact me in case you want to ask anything or you feel that I could be a part of your organization!

The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle.

Steve Jobs

Outreachy Internship blog: Half Time

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

I started the Outreachy internship with Intermine on 19th May 2020. As I write this blog, I have covered a little over half of this 3 month journey. Through this blog, I will look back in retrospect on what has been done, what goals have been achieved, what goals have been changed, what remains in the coming half of Outreachy. Let’s get started!

I am working on the Intermine boot project in the Intermine organization. As part of the internship, the goal is to improve the intermine boot functionality for local instance setup and integrate it with cloud API for other features. The intermine boot project is at a very early stage of development. Let’s break down the tasks in few big chunks:
– Implement versioned mine data uploading to cloud storage.
– Create Continuous Integration setup case
– Add wizard and configurator to get custom configuration from project
– Use all of the above to orchestrate docker containers for usage.

As with all projects, you can not plan everything beforehand and requirements change/emerge as the project evolves. Right from the start, one additional task was updated to move the intermine_boot project from using docker-compose file to use docker-py for setting up intermine instance. Another extra work was to add docker-intermine-gradle as a submodule to the project.

The wizard and configurator are part of the Intermine Cloud project. They are not at a state where we can integrate them intermine_boot right now. So, they’ve been moved back in the project and that task would be pursued if they manage to be at a state so that they can be integrated. In place of that, my mentor has added other tasks which are needed to improve the usability of intermine_boot.

I have submitted 6 pull requests for the project which cover some of the tasks mentioned above. I am fairly happy with my progress. There were roadblocks which led to slower progress than I would have initially assumed but I think I am getting better equipped to handle the coming half of the project.

I feel very happy that I got accepted to such an amazing community and got a great mentor!

Now, buckling up for the next half of the internship to get stuff done!!

Start by doing what’s necessary, then do what’s possible; and suddenly you are doing the impossible 

Saint Francis of Assissi

Outreachy Internship blog: A beginner’s guide to Intermine Boot

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

Data is of paramount importance in research works. In biological research domains, there are multiple research communities working and generating new biological datasets for DNA, yeast, mouses etc etc. At the same time, there are many researchers who need to work with these datasets for their research projects.

One way to share data may be to just hand over archived datasets. In this case, there are numerous problems like, how can you understand the data format, how do you clean this data in case of any inconsistency, how do you search through this data, how do you integrate this data with different datasets, how do you store huge data.

Intermine is a biological data warehouse which aims to resolve these issues and make accessing data easier for researchers. Once a dataset is added to intermine, users can perform complex queries over it to get the required information.

There are different intermines for different types of data like FlyMine, YeastMine, HumanMine, WormMine. The intermine project is open source and it allows research organizations to set up intermine instances dedicated to their datasets.

An intermine instance provides both web app and web service where you can host data and clients can make queries to get integrated biological data. Now that we have covered basics, let’s move towards why the project I am working on becomes relevant!

Setting up your own instance of intermine is a time consuming and complex process requiring a fair amount of Linux administration skills. We would want to make this process easier so that people with very little programming knowledge can do it. Intermine cloud project attempts to solve this and lower the barrier of running an intermine instance.

Intermine Cloud is composed of three main parts – wizard, configurator and compose. The wizard provides an easy way for setting custom configuration for the new intermine instance. The configurator is the backend of the wizard which creates necessary configuration files required to build the intermine instance. Once an intermine instance is built, the compose handles deploying and managing intermine instances on the cloud.

At times, a user may want to set up the intermine instance locally to see how the project will look or while he is trying to make some customizations to extend intermine for the different use cases. Or if he wants to host the intermine instance on his own servers. That’s where the Intermine Boot project comes in.

Intermine Boot is a command line tool which aims to allow users to easily setup local intermine instances inside docker containers, upload data archives to the cloud and other functionalities to make the convenience features for users.

Let’s understand the use case with an example. Suppose as an end user, you get interested in intermine. You want to set up and host your intermine instance on your servers. You dig in the documentation, start setting up postgresql, gradle, perl, solr etc etc. Meanwhile, you are also polluting your system’s environment in case you are not using docker or any other virtualization. The intermine boot aims to make this process as easy as running few commands on terminal. Below is a meme version to explain the benefits in a funny way!

You can find the intermine boot at https://github.com/intermine/intermine_boot and all intermine org projects at https://github.com/intermine

This is enough introduction for the Intermine and Intermine boot. Feel free to dive in the project now, we have a lot of interesting things going on!

If you can’t explain it simply, you don’t understand it well enough.

– Albert Einstein

BlueGenes 0.10.0 release

This release was made to coincide with the InterMine 4.2.0 release, which included many updates to webservices important to BlueGenes. While BlueGenes aims to retain backward compatibility with InterMine instances all the way back to API version 27, (appropriate messages are displayed if your instance doesn’t support a feature) many new features are dependent on being up-to-date with InterMine releases.

We are still working towards the production release of BlueGenes, at which point we can recommend it for future deployments over the legacy user interface. This recent year has brought with it a plethora of necessary technical improvements and bug fixes, along with new additions to bring the user interface towards feature parity with the current webapp. The following details the most visible changes to BlueGenes in the last release, which you can explore by updating your local instance or using the public BlueGenes instance.

Visualization tools

  • New version of Tool API to allow list and query results page tools that use IDs from multiple classes
  • Tools on list and query results page should work properly for all classes now
  • Tools on list and query results page now update when editing im-table
  • Initialisation of tools has been made more performant
  • CovidMine visualization for Cases

im-tables

  • Better selection of constraint operation when creating filter
  • Filter manager for adding and modifying constraints and logic
  • Overly wide table contents are now hidden behind a scrollbar
  • Helpful messages and options when something goes wrong
  • Histogram in numeric column summary has been fixed and more features added
  • Calendar for Date type constraints
  • Searchable dropdown for single and multiple value constraints

Query builder page

  • Build queries with outer join and sorting
  • Save queries to your account
  • Load recently run queries from your current session
  • Data browser for selecting the root class
  • Import query from XML

Profile page (new)

  • Change your password or preferences
  • Delete your account
  • Register a new account for a mine

Lists

  • Folder hierarchy for your lists in My Data
  • Add and edit list descriptions

Interactive tool store

  • Currently placed in the developer page, but we intend to move it to an admin page in the future
  • Manage the installation, updating and removal of BlueGenes visualization tools using a web interface
  • Rich information on each tool, where they’ll be visible, and any compatibility issues with the currently active mine
  • All Tool API compliant npm packages with the bluegenes-intermine-tool tag are shown (only tools under the @intermine scope are installed by default)
  • Only superusers are allowed to make changes

Report page

  • Show FASTA information on report page when available (we intend to make drastic changes to the current report page in the near future)

Technical

  • Much improved handling of mines that are unresponsive or have erroneous web services
  • Java 11 support and a docker container

Previous minor releases

There have been some notable changes in prior minor releases. As they haven’t been mentioned in a blog post, we will include them here.

  • Dynamic page titles (the text displayed in the tab or window title) based on the current page and its contents
  • Improvements to the keyword search page
    • Filters should work as expected when applied
    • Multiple filter support
    • Endlessly display more results by scrolling down
    • Restoration of scroll position when returning to search page
  • Reworked routing
    • New and improved URL paths
    • Deep linking to pages of specific mines
  • Stability improvements to mine switching and initialising
  • HTTPS support

InterMine 4.2.0 release

We are pleased to announce the new InterMine release 4.2.0.
It includes new functionalities to support the upcoming BlueGenes release 0.10.0, some improvementes on FAIR side and a few bugs fixes.
This is a non-disruptive release.
Thank you so much to our contributors: Ahmed Hafez, Asher Pasha and Sam Hokin!

BlueGenes related improvements

  1. Added /login web service that merges the anonymous session with the user logged in.
  2. Added /logout web service.
  3. Added a new webservice to change the users’s password.
  4. Updated the existing /lists webservice which allows modifying the list description.
  5. Improvements on the Date type (to support CovidMine).

BlueGenes 0.10.0 will be released soon and announced in a separate blog.

FAIR related improvements

  1. Simplified the webservice that generates Bioschemas markup for the report page.
  2. Adopted DataRecord in the report page.
  3. Added Gene, Protein markup in the report page.
  4. Added BioChemEntity markup in the report page (only if configured).
  5. Added the ontology licences to the obo converters.

Bug Fixes / Improvements

  1. Added a new bio source to load ISA files in json format
  2. Fixed organism short name generation (Ahmed Hafez)
  3. Fixed a bug related to long fields in the report page (Asher Pasha)
  4. Removed BioEntity.ontologyAnnotations because redundant (Sam Hokin)
  5. Fixed src.data.dir.include (gff3 and xml) ans src.data.dir (intermine-items-xml-file)
  6. UniProtFastaLoader works with organism names longer than 2 words (for example Severe acute respiratory syndrome coronavirus 2)

See release notes for detailed information.

Upcoming releases

For more information about the upcoming releases, please visit the InterMine Development Roadmap. More details on the roadmap here.

Outreachy Internship blog: Everybody Struggles!

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

No matter how experienced or novice a person is, everybody experiences struggle at some point in their journey. The statement seems pretty easy to admit for many people. But when you are a beginner stepping your foot in the mammoth field of software development, it’s very difficult to acknowledge that even your mentors or other senior developers would have ever struggled at basic problems like you do. This gap in acknowledgement creates an inferiority complex and makes your journey to the top much more difficult than it should be.

Today, I’ll be sharing one such incident where I was stuck on an issue for quite a long time just because I was hesitant to ask someone else. As you read on, I’ll recommend to ignore all the technical jargon in the coming paragraphs if you don’t get it as that’s not essential to the point I want to make. There can be lot of similar situations.

I am in my third week of internship with Intermine. I have been doing some form of coding for past 4 years or more (mostly as part of my course curriculum) but I am still very much a beginner in most of the domains. Giving some context to the following discussion, the intermine_boot project is a command line tool to ease the building process for the Intermine instances. It fetches an already built docker image or builds a docker image if needed and runs docker container with the image to get the intermine instance running. I was working on a task to modify the build file for a docker image in such a way that a new image is only built if a build folder does not already exist on the system. To test the changes, I’d have to run the intermine_boot command in such a way that the rebuild of the image is triggered and I can see if the changes are taking effect. My mentor, Kevin, gave me instructions on how to test this. The instructions, although clear, involved a number of steps out of which one step wasn’t clear to me even after going through the explanation multiple times. The fear of asking a stupid question kicked in and I thought I’ll just go on with whatever I understood.

I started my 16 hour long journey to debugging my changes by modifying the code and testing the functionality. I followed the instructions and tested my build and it failed (obviously, as I was missing that piece). I searched the error online to no and landed on some stack overflow results. I tried to make the suggested changes without understanding them and it resulted in other errors. Finally, I gave up and took a nap for the second time. After waking up I was attaching the errors in a message to ask the mentor again. But, voila! When I started putting all things together during asking I realized the fix that could be useful and it worked. I realized that I had become frantic and started trying a lot of things without understanding them.

I took-away following lessons from this incident and consciously try to follow them.

  1. When you don’t understand what the other person has said, don’t just assume that you will figure it out. Just ask him again to clarify and that will save you a lot of time.
  2. When stuck on issue, you can become frantic and trying random solutions. Just take a small break or nap and see the magic.
  3. Don’t code before understanding what you are trying to do. It’s a recipe for failure.

The Struggle you are in today is developing the Strength you need for tomorrow

– Robert Tew

Google Season of Docs 2020

We’re pleased to announce that, after partecipating in Google Summer of Code (GSoC) for three fantastic years, and in Outreachy mentoring program which is running right now, we will be participating, for the first time, in Google Season of Docs 2020 as a mentor organization.
InterMine will be under the umbrella of the INCF organitation; here you can find the full ideas list for INCF projects including InterMine projects (numbers 3 and 4).

InterMine Projects

  1. InterMine user training docs. For more details, please see here.
  2. Review, update, and integrate InterMine developer documentation. For more details, please see here.

If you’re interested in applying for one of our two projects, please drop an email to the people named in the project document to introduce yourself, and explain which of the project(s) you’re interested in.

Deadline for technical writer applications is the 9th of July.

If you have any ideas or questions, please don’t hesitate to email us.

Announcing CovidMine – analyse integrated COVID genomic and geographical distribution data

We’re excited to announce that a project we’ve been working on for the last few weeks is ready for public consumption: CovidMine, an InterMine dedicated to COVID-19 / SARS-CoV-2 data. Data is updated on a daily basis Monday-Saturday at 6PM UK time. You can try CovidMine out now, or read more about it below. 

So, what’s it all about, and why another COVID resource? 

This is something we thought about a lot, initially – there have been a massive number of initiatives going into making data available and visualising it already. In the end it came down to a couple of reasonably simple facts: InterMine already has tools to draw data from a lot of sources and integrate it, but it also offers a familiar interface if you’ve used any of the other InterMines out there, and we have API language bindings for multiple programming languages, including R, Python, Perl, and Javascript

Data sources include confirmed Covid-19 cases, deaths, new confirmed cases and new deaths for countries from Our World In Data1, data separately for individual states (for the United States only) from the COVID Tracking Project2, Sars-CoV-2 reference genome3 and nucleotide sequences from isolates deposited in Genbank4.

If you’re aware of other data sets that might make this more useful please contact us to suggest them.

Jump straight in

We’ve prepared a few template queries to help you get started with your analysis –

What’s still missing and how can I help? 

We’re officially focusing our efforts on developing tools for CovidMine in our new user interface, BlueGenes, rather than the legacy JSP interface. 

A few things we’d like to add to the UI:

  • A data visualisation showing all results on a map.
  • A visualisation that shows change over time in countries or regions, for known cases, recovered, and deaths. 
  • A genome browser (JBrowse 1)

These visualisations would update based on the filters in the table showing in your data

Data updates: 

  • Find and integrate a data source which provides China data separately for individual states

Bioschemas Markup

We have applied structured data in JSON-LD format, using the Bioschemas.org profiles DataSet, Gene and Protein. It’s available in the legacy JSP interface only, but it will be integrated in the new interface soon.

If you’re aware of other data sets that might make this more useful, or other visualisations that might be exciting, please contact us to suggest them! 

References:

  1. https://covidtracking.com
  2. https://covid.ourworldindata.org
  3. https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/#reference-genome
  4. https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049