Internships Summer 2020: Closing thoughts and final presentations recording

Last week was the end of the Outreachy May 2020 Internship Round, where we held a final presentations call for our interns, mentors and community. Due to the wide range of time zones, from New York to Singapore, no one time was optimal for everyone, so we extend our sincere gratitude that all the students were willing to attend the call!

We are thankful to a Wellcome Trust Diversity and Inclusion Grant for funding three of our interns, Outreachy for matching this by funding two interns, and our main Wellcome grant for making it possible to fund two more. This brings us up to a count of seven interns, the highest number that we’ve had the pleasure of working with yet at one time! This wouldn’t have been possible without help from our external mentors — Akshat Bhargava, Aman Dwivedi, Ankur Kumar, Asher Pasha and Nikhil Vats — whom will all be receiving a small prize as a token of our gratitude.

During the internship period, we had to say goodbye to our valued InterMine team member for 5 years, Yo Yehudi, who drove our internship scheme to its current state and has left to become the Technical Lead for Open Source at the Wellcome Trust. Thanks to the efforts of our mentors for their support of our interns, and to Rachel for running the final presentation call, we were able to provide a closing to the internship period we can all be proud of.

A recording of the final presentation call is available on our YouTube channel and embedded below.

Additional information on our interns

Many of our interns have been writing blog posts throughout their internship, which you may find an enjoyable read:

The GitHub accounts of all our interns are listed below, if you wish to check out their contributions:

In closing

It’s been a joy to work with so many talented people, and this includes all the contributions during the Outreachy contribution period prior to intern selection. Many valuable contributions to InterMine projects were made during this period, and we regret we weren’t able to offer everyone an internship.

We hope the next year of internships will be as successful as this one, and look forward to coming up with more exciting internship projects, as well as working with more fantastic interns and mentors. Until then, let’s enjoy the fruits of this labour!

Announcing CovidMine – analyse integrated COVID genomic and geographical distribution data

We’re excited to announce that a project we’ve been working on for the last few weeks is ready for public consumption: CovidMine, an InterMine dedicated to COVID-19 / SARS-CoV-2 data. Data is updated on a daily basis Monday-Saturday at 6PM UK time. You can try CovidMine out now, or read more about it below. 

So, what’s it all about, and why another COVID resource? 

This is something we thought about a lot, initially – there have been a massive number of initiatives going into making data available and visualising it already. In the end it came down to a couple of reasonably simple facts: InterMine already has tools to draw data from a lot of sources and integrate it, but it also offers a familiar interface if you’ve used any of the other InterMines out there, and we have API language bindings for multiple programming languages, including R, Python, Perl, and Javascript

Data sources include confirmed Covid-19 cases, deaths, new confirmed cases and new deaths for countries from Our World In Data1, data separately for individual states (for the United States only) from the COVID Tracking Project2, Sars-CoV-2 reference genome3 and nucleotide sequences from isolates deposited in Genbank4.

If you’re aware of other data sets that might make this more useful please contact us to suggest them.

Jump straight in

We’ve prepared a few template queries to help you get started with your analysis –

What’s still missing and how can I help? 

We’re officially focusing our efforts on developing tools for CovidMine in our new user interface, BlueGenes, rather than the legacy JSP interface. 

A few things we’d like to add to the UI:

  • A data visualisation showing all results on a map.
  • A visualisation that shows change over time in countries or regions, for known cases, recovered, and deaths. 
  • A genome browser (JBrowse 1)

These visualisations would update based on the filters in the table showing in your data

Data updates: 

  • Find and integrate a data source which provides China data separately for individual states

Bioschemas Markup

We have applied structured data in JSON-LD format, using the Bioschemas.org profiles DataSet, Gene and Protein. It’s available in the legacy JSP interface only, but it will be integrated in the new interface soon.

If you’re aware of other data sets that might make this more useful, or other visualisations that might be exciting, please contact us to suggest them! 

References:

  1. https://covidtracking.com
  2. https://covid.ourworldindata.org
  3. https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/#reference-genome
  4. https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049

Outreachy Interview: Sakshi Srivastava on JavaScript data visualisations for BlueGenes

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Sakshi Srivastava, who will be working on data visualisations for BlueGenes.

Hi Sakshi Srivastava! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Corona Namaste everybody! Delighted to be a part of the InterMine team. I’m an undergraduate pursuing Bachelor of Technology in computer science from Guru Gobind Singh Indraprastha University, Delhi, India. I’ve been working with JavaScript and the web ecosystem for the last 2 years. I like to take part in tech meet-ups and hackathons (also, have won a few of them). I like to solve puzzles that involve logical and mathematical questions. I’m also doing competitive programming to increase my problem-solving ability. I love to draw and paint, although I haven’t done it from the past few months, as it’s my best to escape from the real world and take a break from everything going on in life. I like to listen to soft relaxing music and play guitar sometimes. When I’m not on my laptop, you will mainly see me sleeping (mostly :P), delved into some interesting chat with friends, or day-dreaming. I’m in the phase of inspecting different kinds of technology sectors to discover the one which flatters me the most. One of my magnificent project in the field of data visualisation is IPLDataVizProject which was given in an interview as a task.

What interested you about Outreachy with InterMine?

Biologists study life on scales from single molecules to whole organisms to entire ecosystems. I’ve never explored the bioinformatics world much but getting acquainted with the science behind life always interests me. InterMine fits like a glove to me. Also, javascript is exactly where my interest revolves. I wanted to strengthen my skills and increase my capability to bring more and more conversions. Consequently, this perfect opportunity will give me a chance to get familiar with the underlying scientific notions by applying my computer science skills. But this is not the only reason that makes me choose InterMine. The primary reason was the optimistic environment at InterMine which never made me even go explore any other organisation during the application process. The mentors are highly admirable who always entertain the ideas, doubts, requests elegantly and motivate others to be awesome. The time spent with them discussing the details of the project was intriguing. They are one the most indispensable parts of the InterMine community.

Tell us about the project you’re planning to do for InterMine this summer.

The complexity of biological problems requires understanding and then analysis of networks and interactions. But when the data is huge it becomes difficult to get better insights easily. The aim of my project is to create different visualisation tools to propel the cluttered and chaotic data into an understandable form. This will help biologists to understand the networks and interactions between different entities in an easier way and consequently draw relevant conclusions with single sight to the graph.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

As we know InterMine has tons of biological data worldwide. The procurement and comprehension of data are essential in order to mold it into meaningful visualisations and get better insights. I will try to get familiar with the biological entities prior to beginning each viz by studying the InterMine’s data models and with the help of mentors. This will help me to write better documentation or maybe it could light me with new viz ideas in my mind.

I also came up with an interesting idea to use storybook.js to showcase all our visualisation tools in one place for demo purposes without actually needing anybody to run the tools locally. I’ve started exploring monorepo techniques and how we can actually integrate it with our visualisation tools. This is going to be a new and engaging challenge for me as I’ve never worked with monorepos before. This is going to be fun.

Share a meme or gif that represents your project

Roshni Prajapati on BlueGenes UX, user research, and saving people from bad design

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Roshni Prajapati, who will be working on UX research and recommendations for BlueGenes.

Hi Roshni! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hi team, can’t deny the wait for a totally new experience is driving me crazy here! I’m an IT undergraduate pursuing bachelors of Technology at IIIT Allahabad and will be onboarded to IIIrd year from August onwards. Primarily my interest lies in interaction design, user research, product thinking and a bit of graphics design still I don’t mind banging lines of code to build stuff that interest me. Few of my works could be seen here & here.

Some days I try to solve user issues by merging aesthetics, a bit mathematics & data in unequal proportions while other days I can be spotted preparing for my upcoming hackathon, lying all day watching cartoons or enjoying 70s-80s classical playlist. 

    Other than this I’m a wanderlust person, a guitarist, a painter, an intermediate football and Table Tennis player and a coffee addict 😛 

What interested you about Outreachy with InterMine?

I have this craving of improving things to redefine work for living breathing humans i.e,  to save them from bad design. Case with InterMine is that while surfing through the mine-sites I noticed it mostly comprises analytical data and their representation. The current website has several user issues & pain points, also naive look and presentation of the data is not apt and even violates some design rules. This made me dive deeper into the real world biodata and their visualization for better usability of the website added the fact that the organization itself registered a design issue (driving me more to work).

    One of the facts is that design analysis needs views from users and developers and it becomes important that the community interacts. So I needed a better understanding of the real world bio data (new to me) and mentors willingly helped, this everready response brought an optimistic vibe to work for the team and organization.

Tell us about the project you’re planning to do for InterMine this summer.

The content layout in the current website design needs to be strategically placed in order to make it easier for users to go through. Since the site contains heavy analytical and a variety of biological data, my task will be to organize the website content such that users can find the things at ease, improving overall user experience. So basically I would try to carry out my process in following phases-

Discover & Define: Carry out questionnaire sessions and meetings for collecting user experience observations then interpret the observations and define insights. I will try to convey my ideas through user personas & stories and finally set my design challenges.

Develop & Deliver: and further will discuss ideas and through sketching and experimenting and prototyping by working on feedback iteratively.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Previously I have worked on several projects of my own and this would be the first time I would be working with a community. So collecting user experience observations remotely through unit testing and other methods is gonna be quite a challenge for me. One of the major tasks also includes my contribution in implementation of design of which I’m concerned. Since this is gonna take some time, it could be counted as another challenge still I’m pretty much sure that work would get done under the time duration provided 🙂 

Share a meme or gif that represents your project

Outreachy Interview: Pooja on the CLI tool for managing InterMine instances

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Pooja Gaur, who will be working on the InterMine Boot CLI tool project.

Hi Pooja! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hello!! Excited to join the Intermine team. I am from Ajmer Rajasthan, India. I am pursuing MS by research from IIIT Hyderabad, India. I have completed My btech Honours from Govt. Women Engineering College Ajmer, Rajasthan. After that I worked for two years in a startup, where I worked on automating common queries by pattern matching. Right now, I am a Research Student in the Data Science and Analytics lab at IIIT Hyderabad. My current research work deals with increasing revenue and user satisfaction for retail stores. My interest varies from research in data organization, data mining and analytics to web development. I developed interest in open source after participating in Hacktoberfest 2019. I came to know about Outreachy from one of my friends in college. I like dancing and visiting new places. I used to take part in regional dance competitions before joining college. 

What interested you about Outreachy with InterMine?

I was browsing the past projects on the outreachy site. From a coarse look, I shortlisted around 7 to 8 projects. The intermine’s documentation was clear for contribution, So I started digging deeper and developed more interest over time in this organization. I liked the idea of providing tech power to biologists to improve their work flow and ease their work.

When the projects list was out, I saw the making CLI tool project. I had manually set up the intermine which is a laborious process and I realised that this project would be very helpful for end users. Also my current knowledge is aligned to this project, and it would be helpful in extending my knowledge.

Tell us about the project you’re planning to do for InterMine this summer.

My project is Create a CLI tool for managing InterMine instances. Building an intermine is a laborious process and requires a lot of system knowledge. But every user may not have deep knowledge of the system. Intermine Boot is part of the Intermine Cloud project. Intermine boot is a convenience tool which provides a single command setup to easily create and manage the intermine instances locally. Along with local instance creation the project supports building instances inside the docker container for e.g to use in Continuous Integration.

My aim is to extend the intermine boot to implement the Continuous Integration use case. Here, a CI pipeline will be written (using travis) and a docker image will be created which can be loaded during CI pipeline to run tests. Along with it, I will integrate wizard and configurator with intermine boot to ease the configuration and setup of local instances of Intermine.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Although I am comfortable with python scripting and development, my experience with docker and continuous integration is minimal which could create a steeper learning curve.

To overcome these issues, I have already started digging a little deeper into project requirements and pick up required knowledge for docker and continuous integration.

Share a meme or gif that represents your project!

Outreachy Interview: Qian on the InterMine Training Portal

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Qian, who will be working on the InterMine Training Portal.

Hi Qian! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hi InterMine team! I feel so excited to be part of the team! I am a Computer Science undergraduate from National University of Singapore. Next semester I will be a sophomore. I transferred from Shanghai Jiao Tong University to NUS last year. Previously, I majored in chemistry and biology. So I have some biology background. I feel so happy to take advantage of both my major backgrounds to contribute to InterMine!

I did an HTML-related project last semester. This is the link. This is the pr I cooperated with my partner. We dealt with generating an HTML textbook in seconds which is our introductory book to programming. I learned Java and JavaScript this year. I also helped a professor to deal with huge data using Jupyter Notebooks last semester, from which I learned python. (To be frank, python is much easier than Java. I was so frustrated by Java final :(. )

I am a newbie in computer science. I feel very lucky there are many open CS resources  to help me learn basic concepts. And, open source platforms are good places to find people with similar interests. I learned much from InterMine discord chat!

Apart from studying, I love playing the piano. My biggest goal is to be able to play La campanella fluently. This is my favorite version.

What interested you about Outreachy with InterMine?

Among many projects, this project almost does not restrict anything. I can design the portal according to my own ideas (of course I get advice from mentors). This is my first internship so I want to try to be more flexible.

Another reason is InterMine is about biology data processing. I have a biology background and I am interested in genes. I think cooperating with computer science, especially data analysis, is the future of biology.

Further, I love the atmosphere in this community. Yo is a good mentor as she is so helpful and kind. Members in InterMine are all warm-hearted and enthusiastic about new ideas. I learned much during the application period, especially during this hard period. I got mental relaxation when communicating with people in InterMine!

Tell us about the project you’re planning to do for InterMine this summer.

  1. Change the layout of the training portal page to be more useful and beautiful. 
  2. Make text and video tutorials for different languages.
  3. Add some features to the page.
  4. Combine different tutorials together. 

Are there any challenges you anticipate for your project? How do you plan to overcome them?

I have to rewrite tutorials in different languages which I am not very familiar with. So I am learning Perl and R these days! 

Another challenge I think I will meet is I am not sure about the time arrangement. As this is my first internship, I don’t have experience in arranging a schedule by myself previously. I hope to finish as I wrote in my plan. To overcome it, I am going to get advice from my mentors and volunteers. As this is a 3-month internship, I think I can have better anticipation of productivity with the guidance of mentors after 2 or 3 weeks. Then I will adjust my plan timely.

Share a meme or gif that represents your project!

InterMine 4.1.0

InterMine team has just released InterMine 4.1.0.

The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.

This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:

  1. an ELIXIR identity
  2. register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.

More details here in the OpenAuth2 Settings section of the documentation.

Also new in this version is the gradle wrapper 4.9, which is compatible with Java11. This only effects the users which compile/install InterMine code.

Thank you so much to our contributor Joe Carlson for improving the generateUpdateTriggers task.

The release contains also a few bug fixes.

Bug Fixes

  • Solved the error caused by obsolete terms in the gene ontology
  • Fasta query result: CDS translation option + extra view parameter
  • The ONE OF constraint works properly when editing a template
  • The default queries configuration have been migrated to json
  • The task generateUpdateTriggers has been improved

See the release notes for the complete list and detailed information.

This is a non-disruptive release. To update your mine with these new changes, see the upgrade instructions.

Status update for BlueGenes

It’s been a while since we posted our last (rather optimistic) update around BlueGenes, so we thought we’d share a quick update, starting with the basics.

As a reminder, the long-term goal of BlueGenes is to replace the existing JSP-based UI with a more modern interface – one that works well with mobiles, one that hopefully responds more quickly and is easier to use, and perhaps most importantly, is easy to update and customise.

Some of the questions we’ve had in the last few months:

Q: Will BlueGenes replace the current JSP UI?

A: Yes, eventually. Once we reach official beta/prod release (we’re currently in alpha), we anticipate running them concurrently for a couple of years, but we probably will only provide small fixes for the JSP UI during this period, focusing most of our development effort on BlueGenes.

Q: Do I have to run my own BlueGenes, or can I use the central one at apps.intermine.org?

A: Since BlueGenes is powered purely by web services, it will probably be possible to run your InterMine as a server/api-only service and use BlueGenes at bluegenes.apps.intermine.org/. You can also run your own BlueGenes on your servers and domains, allowing you to customise it so it’s suitable for your data, and not having to rely on our uptime. Either (or both) should work fine. There will be some version requirements related to what version of InterMine can access all the features of BlueGenes – see the next point.

Q: What version of InterMine do I need to have to run BlueGenes?

A: BlueGenes will require a minimum version of InterMine to run. The original release of InterMine web services focused primarily on providing a way to give JSP users access to their data programmatically, but at the time there wasn’t an anticipated need for application level services such as superuser actions. There are a few web services and authentication-layer services we still need to implement, so it’s likely BlueGenes will need API version 31+ or higher in order to be fully-featured. InterMines with API version 27 or higher can run a basic version of BlueGenes. You can check out this table to see if your InterMine is configured to work with BlueGenes.

Q: Ok, so what’s left to do before BlueGenes is released as a public beta?

A: Mostly authentication, superuser and MyMine features – things  like saving and updating personal templates, sorting lists in folders, updating preferences and passwords. Some of these features require updates to InterMine itself in order to work – hence the minimum version noted in the previous question. Once these are ready we’ll move to the public beta stage.

Your input here will be incredibly welcome, too – the more feedback we get early on, the more polished we hope BlueGenes can be.

Q: Will BlueGenes work nicely with HTTPS InterMines?

A: You will be able to run BlueGenes without HTTPS, but in order to avoid inadvertently exposing user passwords, the login button will only be available over HTTPS connections. We’re also working with a student over the next few months, to implement a pilot InterMine Single Sign On service. You can read about it in our interview with Rahul Yadav.

Q: Will I be able to customise the way BlueGenes looks?

A: Totally! There are two ways you can do this. One is to make sure you have your logo and colour settings configured in your web properties. We have a nice guide for that. This’ll tell us what your preferred highlight colours are – FlyMine is purple, HumanMine green, etc. If you’re really dedicated and would like to write your own CSS, you can do that too, if you’re running your own InterMine/BlueGenes combo.

Q: I have some nice custom visualisation tools in my InterMine. I don’t want to have to re-write them!

A: We don’t want you to re-write them either! It depends how they’re implemented in your mine, but we’ve designed the BlueGenes Tool API with you in mind, and many Javascript-powered tools will require only a few lines of code to become BlueGenes ready.

As an example, the Cytoscape interaction viewer currently used in some InterMines only requires 20 lines of code to import into BlueGenes, plus a few lines of config – all the other files (and most of the config too) is boilerplate that we auto-generated.

InterMine 4.0 – InterMine as a FAIR framework

We are excited to publish the latest version of InterMine, version 4.0.

It’s a collection of our efforts to make InterMine more “FAIR“. As an open source data warehouse, InterMine’s raison d’être is to be a framework that enables people to quickly and easily provide public access to their data in a user friendly manner. Therefore InterMine has always strived to make data Findable, Accessible, Interoperable and Reusable and this push is meant to formally apply the FAIR principles to InterMine.

What’s included in this release?

  1. Generate globally unique and stable URLs to identify InterMine data objects in order to provide more findable and accessible data.
  2. Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability
  3. Embed metadata in InterMine web pages to make data more findable
  4. Improve accessibility of data licenses for integrated sources via web interface and REST web-service.

More details below!

How to upgrade?

This is a non-disruptive release, but there are additions to the data model. Therefore, you’ll want to increment your version, then build a new database when upgrading. No other action is required.

However, keep reading for how to take advantages of the new FAIR features in this release.

Unique and stable URLs

We’ve added a beautiful new user-friendly URL.

Example: http://beta.flymine.org/beta/gene:FBgn0000606

Currently this is used only in the “share” button in the report pages and in the web pages markup. In the future, this will be the only URL seen in the browser location bar.

For details on how to configure your mine’s URLs, see the docs here.

See our previous blog posts on unique identifiers.

Decorating the InterMine data model with ontology terms

InterMine 4.0 introduces the ability to annotate your InterMine data model with ontology terms.

While these data are not used (yet), it’s an important feature in that it’s going to facilitate cross-InterMine querying, and eventually cross-database analysis — allowing us to answer questions like “Is the ‘gene’ in MouseMine the same ‘gene’ at the EBI?”.

For details on how to add ontologies to your InterMine data model, see the docs here.

Embedding metadata in InterMine webpages

We’ve added structured data to web pages in format of JSON-LD to make data more findable, and these data are indexed by Google data search. Bioschemas.org is extending Schema.org with life science-specific types, adding required properties and cardinality on the types. For more details see the docs here.

By default this feature is disabled. For details on how to enable embedding metadata in your webpages, see the docs here.

Data licences

In our ongoing effort to make the InterMine system more FAIR, we have started working on improving the accessibility of data licences, retaining licence information supplied by the data sources integrated in InterMine, and making it available to humans via our web application and machines via queries.

See our previous blog post on data licences.

For details on how to add data licences to your InterMine, see the docs.

Future FAIR plans

  1. Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore
  2. Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets

Upcoming Releases

The next InterMine version will likely be ready in the Fall/Winter and include some user interface updates.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.

InterMine 3.1.2 – patch release

We’ve released a small batch of bug fixes and small features. Thank you so much to our contributors: Sam Hokin, Arunan Sugunakumar and Joe Carlson!

Features

  • Templates can be tagged by any user, not just the super user. (Via webservice only – for now)

Fixes

  • When searching our docs, some times the “.html” extension was dropped. This was fixed by our beautiful documentation hosters – readthedocs.org
  • Installing the “bio” project via Gradle does not fail if you do not have the test properties file.
  • Gradle logs error fixed
  • Removed old GAF 1.0 code
  • Fixed XML library issue:  java.lang.ClassCastException for org.apache.xerces
  • Set converter.class correctly
  • Updated the protein atlas expression graph
  • Handle NULL values returned by NCBI web services
  • Updated Solr to support new Solr versions
  • Removed unneeded Gretty plugin
  • Better error handling for CHEBI web services
  • Publication abstract is longer than postgres index
  • Removed phenotype key, it’s not in the core model and has conflicting key
  • Updated ObjectStoreSummary to handle ignored fields consistently.

Upcoming Releases

InterMine 4.0 is scheduled for release the week of 7 May 2019.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.