Outreachy Internship blog: A beginner’s guide to Intermine Boot

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

Data is of paramount importance in research works. In biological research domains, there are multiple research communities working and generating new biological datasets for DNA, yeast, mouses etc etc. At the same time, there are many researchers who need to work with these datasets for their research projects.

One way to share data may be to just hand over archived datasets. In this case, there are numerous problems like, how can you understand the data format, how do you clean this data in case of any inconsistency, how do you search through this data, how do you integrate this data with different datasets, how do you store huge data.

Intermine is a biological data warehouse which aims to resolve these issues and make accessing data easier for researchers. Once a dataset is added to intermine, users can perform complex queries over it to get the required information.

There are different intermines for different types of data like FlyMine, YeastMine, HumanMine, WormMine. The intermine project is open source and it allows research organizations to set up intermine instances dedicated to their datasets.

An intermine instance provides both web app and web service where you can host data and clients can make queries to get integrated biological data. Now that we have covered basics, let’s move towards why the project I am working on becomes relevant!

Setting up your own instance of intermine is a time consuming and complex process requiring a fair amount of Linux administration skills. We would want to make this process easier so that people with very little programming knowledge can do it. Intermine cloud project attempts to solve this and lower the barrier of running an intermine instance.

Intermine Cloud is composed of three main parts – wizard, configurator and compose. The wizard provides an easy way for setting custom configuration for the new intermine instance. The configurator is the backend of the wizard which creates necessary configuration files required to build the intermine instance. Once an intermine instance is built, the compose handles deploying and managing intermine instances on the cloud.

At times, a user may want to set up the intermine instance locally to see how the project will look or while he is trying to make some customizations to extend intermine for the different use cases. Or if he wants to host the intermine instance on his own servers. That’s where the Intermine Boot project comes in.

Intermine Boot is a command line tool which aims to allow users to easily setup local intermine instances inside docker containers, upload data archives to the cloud and other functionalities to make the convenience features for users.

Let’s understand the use case with an example. Suppose as an end user, you get interested in intermine. You want to set up and host your intermine instance on your servers. You dig in the documentation, start setting up postgresql, gradle, perl, solr etc etc. Meanwhile, you are also polluting your system’s environment in case you are not using docker or any other virtualization. The intermine boot aims to make this process as easy as running few commands on terminal. Below is a meme version to explain the benefits in a funny way!

You can find the intermine boot at https://github.com/intermine/intermine_boot and all intermine org projects at https://github.com/intermine

This is enough introduction for the Intermine and Intermine boot. Feel free to dive in the project now, we have a lot of interesting things going on!

If you can’t explain it simply, you don’t understand it well enough.

– Albert Einstein

BlueGenes 0.10.0 release

This release was made to coincide with the InterMine 4.2.0 release, which included many updates to webservices important to BlueGenes. While BlueGenes aims to retain backward compatibility with InterMine instances all the way back to API version 27, (appropriate messages are displayed if your instance doesn’t support a feature) many new features are dependent on being up-to-date with InterMine releases.

We are still working towards the production release of BlueGenes, at which point we can recommend it for future deployments over the legacy user interface. This recent year has brought with it a plethora of necessary technical improvements and bug fixes, along with new additions to bring the user interface towards feature parity with the current webapp. The following details the most visible changes to BlueGenes in the last release, which you can explore by updating your local instance or using the public BlueGenes instance.

Visualization tools

  • New version of Tool API to allow list and query results page tools that use IDs from multiple classes
  • Tools on list and query results page should work properly for all classes now
  • Tools on list and query results page now update when editing im-table
  • Initialisation of tools has been made more performant
  • CovidMine visualization for Cases

im-tables

  • Better selection of constraint operation when creating filter
  • Filter manager for adding and modifying constraints and logic
  • Overly wide table contents are now hidden behind a scrollbar
  • Helpful messages and options when something goes wrong
  • Histogram in numeric column summary has been fixed and more features added
  • Calendar for Date type constraints
  • Searchable dropdown for single and multiple value constraints

Query builder page

  • Build queries with outer join and sorting
  • Save queries to your account
  • Load recently run queries from your current session
  • Data browser for selecting the root class
  • Import query from XML

Profile page (new)

  • Change your password or preferences
  • Delete your account
  • Register a new account for a mine

Lists

  • Folder hierarchy for your lists in My Data
  • Add and edit list descriptions

Interactive tool store

  • Currently placed in the developer page, but we intend to move it to an admin page in the future
  • Manage the installation, updating and removal of BlueGenes visualization tools using a web interface
  • Rich information on each tool, where they’ll be visible, and any compatibility issues with the currently active mine
  • All Tool API compliant npm packages with the bluegenes-intermine-tool tag are shown (only tools under the @intermine scope are installed by default)
  • Only superusers are allowed to make changes

Report page

  • Show FASTA information on report page when available (we intend to make drastic changes to the current report page in the near future)

Technical

  • Much improved handling of mines that are unresponsive or have erroneous web services
  • Java 11 support and a docker container

Previous minor releases

There have been some notable changes in prior minor releases. As they haven’t been mentioned in a blog post, we will include them here.

  • Dynamic page titles (the text displayed in the tab or window title) based on the current page and its contents
  • Improvements to the keyword search page
    • Filters should work as expected when applied
    • Multiple filter support
    • Endlessly display more results by scrolling down
    • Restoration of scroll position when returning to search page
  • Reworked routing
    • New and improved URL paths
    • Deep linking to pages of specific mines
  • Stability improvements to mine switching and initialising
  • HTTPS support

InterMine 4.2.0 release

We are pleased to announce the new InterMine release 4.2.0.
It includes new functionalities to support the upcoming BlueGenes release 0.10.0, some improvementes on FAIR side and a few bugs fixes.
This is a non-disruptive release.
Thank you so much to our contributors: Ahmed Hafez, Asher Pasha and Sam Hokin!

BlueGenes related improvements

  1. Added /login web service that merges the anonymous session with the user logged in.
  2. Added /logout web service.
  3. Added a new webservice to change the users’s password.
  4. Updated the existing /lists webservice which allows modifying the list description.
  5. Improvements on the Date type (to support CovidMine).

BlueGenes 0.10.0 will be released soon and announced in a separate blog.

FAIR related improvements

  1. Simplified the webservice that generates Bioschemas markup for the report page.
  2. Adopted DataRecord in the report page.
  3. Added Gene, Protein markup in the report page.
  4. Added BioChemEntity markup in the report page (only if configured).
  5. Added the ontology licences to the obo converters.

Bug Fixes / Improvements

  1. Added a new bio source to load ISA files in json format
  2. Fixed organism short name generation (Ahmed Hafez)
  3. Fixed a bug related to long fields in the report page (Asher Pasha)
  4. Removed BioEntity.ontologyAnnotations because redundant (Sam Hokin)
  5. Fixed src.data.dir.include (gff3 and xml) ans src.data.dir (intermine-items-xml-file)
  6. UniProtFastaLoader works with organism names longer than 2 words (for example Severe acute respiratory syndrome coronavirus 2)

See release notes for detailed information.

Upcoming releases

For more information about the upcoming releases, please visit the InterMine Development Roadmap. More details on the roadmap here.

Outreachy Internship blog: Everybody Struggles!

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

No matter how experienced or novice a person is, everybody experiences struggle at some point in their journey. The statement seems pretty easy to admit for many people. But when you are a beginner stepping your foot in the mammoth field of software development, it’s very difficult to acknowledge that even your mentors or other senior developers would have ever struggled at basic problems like you do. This gap in acknowledgement creates an inferiority complex and makes your journey to the top much more difficult than it should be.

Today, I’ll be sharing one such incident where I was stuck on an issue for quite a long time just because I was hesitant to ask someone else. As you read on, I’ll recommend to ignore all the technical jargon in the coming paragraphs if you don’t get it as that’s not essential to the point I want to make. There can be lot of similar situations.

I am in my third week of internship with Intermine. I have been doing some form of coding for past 4 years or more (mostly as part of my course curriculum) but I am still very much a beginner in most of the domains. Giving some context to the following discussion, the intermine_boot project is a command line tool to ease the building process for the Intermine instances. It fetches an already built docker image or builds a docker image if needed and runs docker container with the image to get the intermine instance running. I was working on a task to modify the build file for a docker image in such a way that a new image is only built if a build folder does not already exist on the system. To test the changes, I’d have to run the intermine_boot command in such a way that the rebuild of the image is triggered and I can see if the changes are taking effect. My mentor, Kevin, gave me instructions on how to test this. The instructions, although clear, involved a number of steps out of which one step wasn’t clear to me even after going through the explanation multiple times. The fear of asking a stupid question kicked in and I thought I’ll just go on with whatever I understood.

I started my 16 hour long journey to debugging my changes by modifying the code and testing the functionality. I followed the instructions and tested my build and it failed (obviously, as I was missing that piece). I searched the error online to no and landed on some stack overflow results. I tried to make the suggested changes without understanding them and it resulted in other errors. Finally, I gave up and took a nap for the second time. After waking up I was attaching the errors in a message to ask the mentor again. But, voila! When I started putting all things together during asking I realized the fix that could be useful and it worked. I realized that I had become frantic and started trying a lot of things without understanding them.

I took-away following lessons from this incident and consciously try to follow them.

  1. When you don’t understand what the other person has said, don’t just assume that you will figure it out. Just ask him again to clarify and that will save you a lot of time.
  2. When stuck on issue, you can become frantic and trying random solutions. Just take a small break or nap and see the magic.
  3. Don’t code before understanding what you are trying to do. It’s a recipe for failure.

The Struggle you are in today is developing the Strength you need for tomorrow

– Robert Tew

Google Season of Docs 2020

We’re pleased to announce that, after partecipating in Google Summer of Code (GSoC) for three fantastic years, and in Outreachy mentoring program which is running right now, we will be participating, for the first time, in Google Season of Docs 2020 as a mentor organization.
InterMine will be under the umbrella of the INCF organitation; here you can find the full ideas list for INCF projects including InterMine projects (numbers 3 and 4).

InterMine Projects

  1. InterMine user training docs. For more details, please see here.
  2. Review, update, and integrate InterMine developer documentation. For more details, please see here.

If you’re interested in applying for one of our two projects, please drop an email to the people named in the project document to introduce yourself, and explain which of the project(s) you’re interested in.

Deadline for technical writer applications is the 9th of July.

If you have any ideas or questions, please don’t hesitate to email us.

Announcing CovidMine – analyse integrated COVID genomic and geographical distribution data

We’re excited to announce that a project we’ve been working on for the last few weeks is ready for public consumption: CovidMine, an InterMine dedicated to COVID-19 / SARS-CoV-2 data. Data is updated on a daily basis Monday-Saturday at 6PM UK time. You can try CovidMine out now, or read more about it below. 

So, what’s it all about, and why another COVID resource? 

This is something we thought about a lot, initially – there have been a massive number of initiatives going into making data available and visualising it already. In the end it came down to a couple of reasonably simple facts: InterMine already has tools to draw data from a lot of sources and integrate it, but it also offers a familiar interface if you’ve used any of the other InterMines out there, and we have API language bindings for multiple programming languages, including R, Python, Perl, and Javascript

Data sources include confirmed Covid-19 cases, deaths, new confirmed cases and new deaths for countries from Our World In Data1, data separately for individual states (for the United States only) from the COVID Tracking Project2, Sars-CoV-2 reference genome3 and nucleotide sequences from isolates deposited in Genbank4.

If you’re aware of other data sets that might make this more useful please contact us to suggest them.

Jump straight in

We’ve prepared a few template queries to help you get started with your analysis –

What’s still missing and how can I help? 

We’re officially focusing our efforts on developing tools for CovidMine in our new user interface, BlueGenes, rather than the legacy JSP interface. 

A few things we’d like to add to the UI:

  • A data visualisation showing all results on a map.
  • A visualisation that shows change over time in countries or regions, for known cases, recovered, and deaths. 
  • A genome browser (JBrowse 1)

These visualisations would update based on the filters in the table showing in your data

Data updates: 

  • Find and integrate a data source which provides China data separately for individual states

Bioschemas Markup

We have applied structured data in JSON-LD format, using the Bioschemas.org profiles DataSet, Gene and Protein. It’s available in the legacy JSP interface only, but it will be integrated in the new interface soon.

If you’re aware of other data sets that might make this more useful, or other visualisations that might be exciting, please contact us to suggest them! 

References:

  1. https://covidtracking.com
  2. https://covid.ourworldindata.org
  3. https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/#reference-genome
  4. https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049

Outreachy Interview: Sakshi Srivastava on JavaScript data visualisations for BlueGenes

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Sakshi Srivastava, who will be working on data visualisations for BlueGenes.

Hi Sakshi Srivastava! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Corona Namaste everybody! Delighted to be a part of the InterMine team. I’m an undergraduate pursuing Bachelor of Technology in computer science from Guru Gobind Singh Indraprastha University, Delhi, India. I’ve been working with JavaScript and the web ecosystem for the last 2 years. I like to take part in tech meet-ups and hackathons (also, have won a few of them). I like to solve puzzles that involve logical and mathematical questions. I’m also doing competitive programming to increase my problem-solving ability. I love to draw and paint, although I haven’t done it from the past few months, as it’s my best to escape from the real world and take a break from everything going on in life. I like to listen to soft relaxing music and play guitar sometimes. When I’m not on my laptop, you will mainly see me sleeping (mostly :P), delved into some interesting chat with friends, or day-dreaming. I’m in the phase of inspecting different kinds of technology sectors to discover the one which flatters me the most. One of my magnificent project in the field of data visualisation is IPLDataVizProject which was given in an interview as a task.

What interested you about Outreachy with InterMine?

Biologists study life on scales from single molecules to whole organisms to entire ecosystems. I’ve never explored the bioinformatics world much but getting acquainted with the science behind life always interests me. InterMine fits like a glove to me. Also, javascript is exactly where my interest revolves. I wanted to strengthen my skills and increase my capability to bring more and more conversions. Consequently, this perfect opportunity will give me a chance to get familiar with the underlying scientific notions by applying my computer science skills. But this is not the only reason that makes me choose InterMine. The primary reason was the optimistic environment at InterMine which never made me even go explore any other organisation during the application process. The mentors are highly admirable who always entertain the ideas, doubts, requests elegantly and motivate others to be awesome. The time spent with them discussing the details of the project was intriguing. They are one the most indispensable parts of the InterMine community.

Tell us about the project you’re planning to do for InterMine this summer.

The complexity of biological problems requires understanding and then analysis of networks and interactions. But when the data is huge it becomes difficult to get better insights easily. The aim of my project is to create different visualisation tools to propel the cluttered and chaotic data into an understandable form. This will help biologists to understand the networks and interactions between different entities in an easier way and consequently draw relevant conclusions with single sight to the graph.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

As we know InterMine has tons of biological data worldwide. The procurement and comprehension of data are essential in order to mold it into meaningful visualisations and get better insights. I will try to get familiar with the biological entities prior to beginning each viz by studying the InterMine’s data models and with the help of mentors. This will help me to write better documentation or maybe it could light me with new viz ideas in my mind.

I also came up with an interesting idea to use storybook.js to showcase all our visualisation tools in one place for demo purposes without actually needing anybody to run the tools locally. I’ve started exploring monorepo techniques and how we can actually integrate it with our visualisation tools. This is going to be a new and engaging challenge for me as I’ve never worked with monorepos before. This is going to be fun.

Share a meme or gif that represents your project

Outreachy Interview: John Mendez on Improving the InterMine Data Browser

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed John Mendez, who will be working on the InterMine Data Browser.

Hi John! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

I’m a US Army disabled veteran, a lucky husband, and proud father of two forever puppies, Didgy and Delilah. I started self-learning to code 3 years ago on FreeCodeCamp as a way to transition into a different career, and ended up founding a startup with my wife in our spare time. At first, coding was just a means to an end for me, but after coming into contact with the open-source community, I became enthralled with the prospect of giving something back to humanity through code.

People often ask me what the ideal scenario for our startup is. To that, I always answer, “hopefully it’s successful enough that we can hire under-represented talent to contribute to open source”. I genuinely believe that code can be used to uplift humanity, or enslave it. Hopefully, I can contribute more to the former.

What interested you about Outreachy with InterMine?

I came across Outreachy through a FreeCodeCamp post. I had no idea what to expect, and thought it would be a good way to gain the validation I needed to properly transition into a new career. My only interaction with OSS was through using it in my own project, so I assumed I would be working on codebases geared towards developers.

Then I came across InterMine, and my heart quite practically leaped for joy. You see, my father suffered from heart problems and passed away early this year. Then the coronavirus pandemic hit NY, with one of my aunts being the first in our family to become infected. 

So when I came across InterMine, I really fell in love with the mission to make data more readily available to biologists. Honestly, I didn’t even know it wasn’t. I never thought a non-scientist, beginner programmer like me would get accepted, so I continued to look for other projects. But a thought kept nagging me, “how many more lives could be saved if scientists could analyse data at the speed of their thoughts?”. 

This is why even though I highly doubted I would get accepted, I still had to make the effort. Because at this point in my life it would be the most impactful thing I’d be capable of doing.

Tell us about the project you’re planning to do for InterMine this summer.

My project is to bring the InterMine Data Browser web app and stack to more contemporary norms. The core of the project is already well-executed in jQuery, so mainly it’s a minor re-architecture using React. I do hope to finish that quickly so that I can continue to add more features though.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

My biggest hurdle will be overcoming my lack of scientific terms. During the pre-internship phase, I would sometimes feel I was reading alien hieroglyphics, and my brain literally would ache lol. 

To overcome this gap, I will need to rely on my mentors to help me develop proper test cases to ensure the data is being properly analysed. With those test cases, and binging Wikipedia articles, I feel I can become proficient enough with the terminology to make adequate progress.

My 2nd hurdle will be my perfectionism. It tends to stand in the way of making progress, and at times I’ve ended up tinkering too much that I’ve made it worse! The only way to overcome that will be with tough deadlines I suppose, as well as understanding when the requirements have been met.

Share a meme or gif that represents your project

image2

Roshni Prajapati on BlueGenes UX, user research, and saving people from bad design

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Roshni Prajapati, who will be working on UX research and recommendations for BlueGenes.

Hi Roshni! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hi team, can’t deny the wait for a totally new experience is driving me crazy here! I’m an IT undergraduate pursuing bachelors of Technology at IIIT Allahabad and will be onboarded to IIIrd year from August onwards. Primarily my interest lies in interaction design, user research, product thinking and a bit of graphics design still I don’t mind banging lines of code to build stuff that interest me. Few of my works could be seen here & here.

Some days I try to solve user issues by merging aesthetics, a bit mathematics & data in unequal proportions while other days I can be spotted preparing for my upcoming hackathon, lying all day watching cartoons or enjoying 70s-80s classical playlist. 

    Other than this I’m a wanderlust person, a guitarist, a painter, an intermediate football and Table Tennis player and a coffee addict 😛 

What interested you about Outreachy with InterMine?

I have this craving of improving things to redefine work for living breathing humans i.e,  to save them from bad design. Case with InterMine is that while surfing through the mine-sites I noticed it mostly comprises analytical data and their representation. The current website has several user issues & pain points, also naive look and presentation of the data is not apt and even violates some design rules. This made me dive deeper into the real world biodata and their visualization for better usability of the website added the fact that the organization itself registered a design issue (driving me more to work).

    One of the facts is that design analysis needs views from users and developers and it becomes important that the community interacts. So I needed a better understanding of the real world bio data (new to me) and mentors willingly helped, this everready response brought an optimistic vibe to work for the team and organization.

Tell us about the project you’re planning to do for InterMine this summer.

The content layout in the current website design needs to be strategically placed in order to make it easier for users to go through. Since the site contains heavy analytical and a variety of biological data, my task will be to organize the website content such that users can find the things at ease, improving overall user experience. So basically I would try to carry out my process in following phases-

Discover & Define: Carry out questionnaire sessions and meetings for collecting user experience observations then interpret the observations and define insights. I will try to convey my ideas through user personas & stories and finally set my design challenges.

Develop & Deliver: and further will discuss ideas and through sketching and experimenting and prototyping by working on feedback iteratively.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Previously I have worked on several projects of my own and this would be the first time I would be working with a community. So collecting user experience observations remotely through unit testing and other methods is gonna be quite a challenge for me. One of the major tasks also includes my contribution in implementation of design of which I’m concerned. Since this is gonna take some time, it could be counted as another challenge still I’m pretty much sure that work would get done under the time duration provided 🙂 

Share a meme or gif that represents your project

Outreachy Interview: Pooja on the CLI tool for managing InterMine instances

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Pooja Gaur, who will be working on the InterMine Boot CLI tool project.

Hi Pooja! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hello!! Excited to join the Intermine team. I am from Ajmer Rajasthan, India. I am pursuing MS by research from IIIT Hyderabad, India. I have completed My btech Honours from Govt. Women Engineering College Ajmer, Rajasthan. After that I worked for two years in a startup, where I worked on automating common queries by pattern matching. Right now, I am a Research Student in the Data Science and Analytics lab at IIIT Hyderabad. My current research work deals with increasing revenue and user satisfaction for retail stores. My interest varies from research in data organization, data mining and analytics to web development. I developed interest in open source after participating in Hacktoberfest 2019. I came to know about Outreachy from one of my friends in college. I like dancing and visiting new places. I used to take part in regional dance competitions before joining college. 

What interested you about Outreachy with InterMine?

I was browsing the past projects on the outreachy site. From a coarse look, I shortlisted around 7 to 8 projects. The intermine’s documentation was clear for contribution, So I started digging deeper and developed more interest over time in this organization. I liked the idea of providing tech power to biologists to improve their work flow and ease their work.

When the projects list was out, I saw the making CLI tool project. I had manually set up the intermine which is a laborious process and I realised that this project would be very helpful for end users. Also my current knowledge is aligned to this project, and it would be helpful in extending my knowledge.

Tell us about the project you’re planning to do for InterMine this summer.

My project is Create a CLI tool for managing InterMine instances. Building an intermine is a laborious process and requires a lot of system knowledge. But every user may not have deep knowledge of the system. Intermine Boot is part of the Intermine Cloud project. Intermine boot is a convenience tool which provides a single command setup to easily create and manage the intermine instances locally. Along with local instance creation the project supports building instances inside the docker container for e.g to use in Continuous Integration.

My aim is to extend the intermine boot to implement the Continuous Integration use case. Here, a CI pipeline will be written (using travis) and a docker image will be created which can be loaded during CI pipeline to run tests. Along with it, I will integrate wizard and configurator with intermine boot to ease the configuration and setup of local instances of Intermine.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Although I am comfortable with python scripting and development, my experience with docker and continuous integration is minimal which could create a steeper learning curve.

To overcome these issues, I have already started digging a little deeper into project requirements and pick up required knowledge for docker and continuous integration.

Share a meme or gif that represents your project!