Outreachy Interview: Sakshi Srivastava on JavaScript data visualisations for BlueGenes

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Sakshi Srivastava, who will be working on data visualisations for BlueGenes.

Hi Sakshi Srivastava! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Corona Namaste everybody! Delighted to be a part of the InterMine team. I’m an undergraduate pursuing Bachelor of Technology in computer science from Guru Gobind Singh Indraprastha University, Delhi, India. I’ve been working with JavaScript and the web ecosystem for the last 2 years. I like to take part in tech meet-ups and hackathons (also, have won a few of them). I like to solve puzzles that involve logical and mathematical questions. I’m also doing competitive programming to increase my problem-solving ability. I love to draw and paint, although I haven’t done it from the past few months, as it’s my best to escape from the real world and take a break from everything going on in life. I like to listen to soft relaxing music and play guitar sometimes. When I’m not on my laptop, you will mainly see me sleeping (mostly :P), delved into some interesting chat with friends, or day-dreaming. I’m in the phase of inspecting different kinds of technology sectors to discover the one which flatters me the most. One of my magnificent project in the field of data visualisation is IPLDataVizProject which was given in an interview as a task.

What interested you about Outreachy with InterMine?

Biologists study life on scales from single molecules to whole organisms to entire ecosystems. I’ve never explored the bioinformatics world much but getting acquainted with the science behind life always interests me. InterMine fits like a glove to me. Also, javascript is exactly where my interest revolves. I wanted to strengthen my skills and increase my capability to bring more and more conversions. Consequently, this perfect opportunity will give me a chance to get familiar with the underlying scientific notions by applying my computer science skills. But this is not the only reason that makes me choose InterMine. The primary reason was the optimistic environment at InterMine which never made me even go explore any other organisation during the application process. The mentors are highly admirable who always entertain the ideas, doubts, requests elegantly and motivate others to be awesome. The time spent with them discussing the details of the project was intriguing. They are one the most indispensable parts of the InterMine community.

Tell us about the project you’re planning to do for InterMine this summer.

The complexity of biological problems requires understanding and then analysis of networks and interactions. But when the data is huge it becomes difficult to get better insights easily. The aim of my project is to create different visualisation tools to propel the cluttered and chaotic data into an understandable form. This will help biologists to understand the networks and interactions between different entities in an easier way and consequently draw relevant conclusions with single sight to the graph.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

As we know InterMine has tons of biological data worldwide. The procurement and comprehension of data are essential in order to mold it into meaningful visualisations and get better insights. I will try to get familiar with the biological entities prior to beginning each viz by studying the InterMine’s data models and with the help of mentors. This will help me to write better documentation or maybe it could light me with new viz ideas in my mind.

I also came up with an interesting idea to use storybook.js to showcase all our visualisation tools in one place for demo purposes without actually needing anybody to run the tools locally. I’ve started exploring monorepo techniques and how we can actually integrate it with our visualisation tools. This is going to be a new and engaging challenge for me as I’ve never worked with monorepos before. This is going to be fun.

Share a meme or gif that represents your project

GSoC Interview: Akshat Bhargava on new data visualisations for BlueGenes

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Akshat Bhargava, who will be creating data visualisations for BlueGenes.

Hi Akshat! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

akshat1Hi InterMine team, I’m very excited too for this upcoming summer! I’m a Computer Science undergraduate going to start my 3rd year this August. I’m primarily a Javascript Developer (Web & Hybrid Mobile) and have been working with it for the last 2.5+ years, but the real me is a person who loves to solve problems in general, may they be related to programming or not. I’ve been exploring the field of data visualization for the last few months and I am in love with it. Have a look at IPL (cricket) data viz I created a few months back here.

Apart from coding, I love reading about psychology, history and watching horror movies.

What interested you about GSoC with InterMine?

I feel it magical how numbers show their true faces when seen via a meaningful visualization, and this is why I’m most excited for this summer with InterMine.

Real World Bio Data + Data Viz = Something big coming in! ❤

Another reason for my interest in InterMine, is that I applied to InterMine last year too for Cross InterMine Search Tool and couldn’t make it, but understood it’s community and how they work. The mentors are very helpful and supportive to everyone, so I directly jumped here this year. 😀

Tell us about the project you’re planning to do for InterMine this summer.

InterMine has tons of different types of biological data, this summer I’ll mostly be working on discussing and developing visualizations for data, making it easier to biologists to understand it in a easier way, and draw relevant conclusions with a single sight to the graphs.

There is a software called BlueGenes, which is already developed and helps explore different mines, it provides a tool API which allows Javascript developers to create additional visualization tools on top of it, which can be integrated on any Gene or Protein result page. My goal for this summer is to create a different variety of such visualization tools in order to enrich the visualization of different types of data.

As an example of how useful is what I’m doing, one of the visualizations I’ll be developing will help us understand how the expression of a particular gene is distributed among different tissues. This information is helpful for cancer biologists that want to assess if a gene is highly expressed across different tissues of an organism, because that gives a relative picture on to what degree it’s implicated in diseases.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Data visualization is something that requires you to understand the data properly first in order to be able to actually create some meaningful visualization out of it, and since I’m not very familiar with InterMine’s data model and the related biological terms, I’ll face some difficulties during my thought process of “what and why” to visualize. To overcome this, I’m already exploring more and more of the InterMine’s data model, trying to understand how to deal with different types of data, and how to create the appropriate visualization for them. Mentors are really helping me out with this (overall in terms of tech, viz and everything). 🙂

Share a meme or gif that represents your project

akshatmeme1

InterMineR package

InterMine data can be accessed via command line programs like cURL and client libraries for five programming languages (Java, JavaScript, Perl, Python and Ruby.) Aiming to expand the functionality of InterMine framework, an R package, InterMineR, had been started that provided basic access to InterMine instances through the R programming environment. (You could run template queries, but not much else!)

However, in order to fully utilize the statistical and graphical capabilities of the R language and make the InterMine framework available to an even greater number of life scientists, the goals were set to:

  1. Further develop and publish the InterMineR package to Bioconductor, a widely used, open source software project based in R, which aims to facilitate the integrative analysis of biological data derived from high-throughput assays.
  2. Add visualisation capabilities, e.g. “What features are close to my feature of interest?”
  3. Add enrichment analysis in InterMineR, a feature that will provide R users with access to the InterMine enrichment analysis widgets and can be effectively combined with the graphical capabilities of R libraries.

InterMineR performs a call to the InterMine Registry to retrieve up-to-date information about the available Mines. The information retrieved are then used to connect the Mines with the R environment using the InterMine web services.

Queries

The InterMineR package can be used to perform complicated queries on a Mine. The process is facilitated by the retrieval of the data model and the ready-to-use template queries of the respective Mine. The R functions setConstraints and setQuery have been created along with the formal class InterMineR, to create new or modify existing queries, store them as Intermine-class objects and apply them to the Mine with the runQuery method.

Genomic Coordinates

r_gviz

Figure 1: Gene visualisation done via InterMineR AND GVIZ

InterMineR can retrieve genomic coordinates and gene expression analysis data which can be converted to:

with the R functions convertToGRanges and convertToRangedSummarizedExperiment respectively. This way an interaction layer between InterMineR and other Bioconductor packages (e.g. GenomicRanges and SummarizedExperiment) is established, allowing for rapid analysis of the retrieved InterMine data.

Enrichment + GeneAnswers

InterMineR also retrieves InterMine enrichment widgets and facilitates the enrichment analysis on an InterMine instance using the R functions getWidgets and doEnrichment, respectively. With the usage of the R function convertToGeneAnswers the results of the enrichment analysis are converted to a GeneAnswers-class object, therefore allowing the visualization of:

  • Pie charts
  • Bar plots
  • Concept-gene networks
  • Annotation category (e.g. GO terms, KEGG pathways) – interaction networks
  • Gene interaction networks

by using R functions from the GeneAnswers R package.

geneanswers_go_structure_network

Figure 2: GeneAnswers GO structure network, generated via InterMineR

geneanswers_concept_gene_network_colors

Figure 3: GeneAnswers gene network generated using InterMineR

Final steps: Bioconductor & Vignettes

The updated InterMineR package complies to the instructions for submitting new packages to Bioconductor, has passed all automated checks (R CMD build, check and BiocCheck) and is currently under the process of manual review for Bioconductor submission.

Documentation of each function along with examples of its usage are available in the GitHub repo and as help files upon the installation of the package. Furthermore, a detailed vignette and tutorials concerning the new functionality of InterMineR package are currently available at the intermine/InterMineR/vignettes folder of the GitHub dev branch, and will be shortly available on the GitHub master branch as well.

This project is part of Google Summer of Code, still under development by me, Konstantinos Kyritsis, PhD student at the Aristotle University of Thessaloniki, under the mentoring of Julie Sullivan and Rachel Lyne. The GitHub repository of the InterMineR package can be found at https://github.com/intermine/InterMineR.

Commits made my Konstantinos can be found here: https://github.com/intermine/InterMineR/commits/master?author=kostaskyritsis

Cool InterMine features roundup

I’ve said this before, but I’ll proudly say it again: one of the greatest things about being open source is the community. People are continually creative and resourceful with the tools we’ve built, and we love seeing all the different things you guys do with InterMine. Here’s a quick roundup of some of the things we’ve seen so far this year:

TargetMine’s Auxiliary Toolkit

targetmine-new-stuff
TargetMine’s Auxiliary toolkit offers advanced analysis for networks and enrichment

TargetMine links out from report pages to provide external enrichment and interaction tools. Read more about it here, or  browse the tutorials: [Enrichment] [Interaction Network].

The Beany Mines:

The beany mines (Soy, Peanut, Legume, and Bean) recently added a shared motif search, as well as a couple of other great visualisations:legume-shared-motif-search

 

R and SOLR

Colin of HymenopteraMine and BovineMine did a great blog post about using our R client, InterMineR, and then continued to impress by making efforts to upgrade InterMine to use Solr.

MOLD

Ever wondered what Model Organism Linked Data might look like?  MOLD includes a queryable SPARQL endpoint and draws from multiple different InterMines to create a single dataset.

mold

Tip: Make it generic

Generic tools are ones that aren’t hard-coded to a specific Mine or model. We’re always on the look out for new and exciting features, whether it’s a visualisation or a web service or a database tweak. If you think it’s good, you can email us to discuss it or simply create a pull request, and bask in glory forever after.

We’d love to see more!

This list is awesome (thanks everyone!!) but by no means conclusive. If you think we’ve missed something out, or you’re doing something new at the moment, drop us a line and we’ll add you to the next round up. We’d also love to hear from others who might be interested in guest-blogging an InterMine related feature.