InterMine 3.1 – Extending the Core InterMine Data Model with Multiple Genome Versions, Strains

Advances in sequencing technologies mean that genome sequence and annotation data for multiple strains of a species are now often available. An update to the InterMine core data model was decided that would allow addition of Strain data should it be available without affecting InterMines which do not have this data.

It was decided that the addition of a new class, Strain, which is referenced by Organism and Sequence feature and vice versa, would allow both the flexibility required and allow for addition of further data and expansion if required.

 

strains

The Strain class has the following features/advantages:

  • SequenceFeature entities, such as Genes, would continue to reference Organism, but would also reference the new Strain class, allowing for queries returning SequenceFeatures for a specific strain.
  • Providing strain information as a separate class allows individual InterMine’s to reference other information as required, such as Genotype and Stocks.
  • The Strain class extends BioEntity so will include strain-relevant attributes such as PrimaryIdentifier and Name and will reference other collections such as synonym.
  • Minimal changes to the user interface will be required as, to our knowledge, SequenceFeatures in individual strains always have a unique identifier. With the help of templates if necessary, users will be able to identify particular SequenceFeatures and which strain they originate from.

 

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes and the notes from the community call for more details. Please join our community calls if you’d like to be part of future data model decisions! (Details of upcoming calls are available via our developer mailing list).

Advertisements

InterMine, Oracle and the Future of Java

There have been a few questions about Oracle’s announcements on the future of Java, so this post hopes to cover what actually has changed and how this impacts InterMine as a software package.

In short, these changes do not impact InterMine negatively, but we should be aware of these issues.

Oracle JDK 11 is not free for use in production; Use OpenJDK instead

Oracle changed its licencing a bit. Starting with Java 11, Oracle now releases its two JDKs under different licences:

  1. OpenJDK (open source under GPL)
  2. Oracle JDK (commercial licence)

(Previously, Oracle had released these both under the BCL licence which allows a mix of free and commercial use, so you only had to pay “sometimes”).

To use the Oracle JDK 11 in a production environment, you now need to purchase a commercial licence. You are still allowed to use this JDK in development, for demos etc but the Oracle JDK 11 is NOT free to use in production.

We develop InterMine against (and recommend people use) OpenJDK instead of the commercial JDK Oracle provides. As of Java 11, these two JDKs are now virtually identical so this is safe.

Oracle JDK 8 — “End of Public Updates”; Use OpenJDK instead

Oracle will provide public updates of Oracle JDK 8 through at least December 2020 for personal desktop use and January 2019 for commercial use. You can continue to use Oracle’s JDK indefinitely without updates, but that’s a bad idea for security and functionality reasons. If you want updates to Java 8, switch to OpenJDK, there are free OpenJDK builds from other providers like AdoptOpenJDK, Azul, IBM, Red Hat, other Linux distros etc.

OpenJDK binaries from Oracle will only be provided until the next JDK release; Use OpenJDK from a non-Oracle provider

Oracle changed their release schedule to be twice a year, and they will not provide a LTS release for OpenJDK. Oracle will not provide updates to older Open JDK versions, e.g. versions older than six months. This includes security fixes!

This is troubling as the InterMine release schedule is such that it’s not feasible to update Java versions every six months. But we can’t ignore needed security fixes.

However, RedHat announced in September that they would take a leadership role in this area. Some, e.g. https://adoptopenjdk.net, plan to offer an OpenJDK LTS releases for free. So there will be OpenJDK LTSs available, just not from Oracle.

What does this all mean for InterMine? Not Much!

We’ll keep monitoring the situation but this seems like the usual way that companies manage open source projects — providing open software and additional paid support. So nothing to be alarmed about. OpenJDK is open source, so we are safe.

People are (rightly?) concerned about Oracle’s true commitment to Java and open source going forward. What if they change their mind and don’t release updates to OpenJDK? For InterMine this isn’t too scary because worst case scenario we could use an older stable version of Java. However in this nightmare scenario it’s likely that Java would be forked and we could carry on.

Future InterMine plans

We have no plans to migrate away from Java and will continue to develop using the OpenJDK as normal. We develop against the Java specification not the version so we aren’t tied to a specific Java version. For now, we’re recommending staying with OpenJDK 8 but plan to start testing with Java 11 soon.

Although some are suspicious of Oracle due to past experiences, we are optimistic about the future of Java, as the community really seems to be responding to the need for a secure and open Java.

More reading:

 

 

InterMine Releases – Winter 2018 Update (Solr, Strains and being more FAIR)

Here’s a list of recent and upcoming InterMine releases.

InterMine 3.0 – Solr

Just released! This is the Solr project we discussed over the summer that was done as part of Google Summer of Code (Thanks again Arunan!). See our blog post for details.

InterMine 3.1 – Strains

This will be released next week. The release will include the data model changes we discussed on the last community call. We’ve added Strain to the core data model, with references to Organism and Sequence Feature.

Sam’s built a test mine you can query to preview the updates.

This will not be a disruptive release, except you may want to update your strains to match the core InterMine data model.

InterMine 3.1.1 – Bug fixes

3.1.1 is a small release comprised of a few very very small but useful bug fixes and features. If you have something specific you need done, please ask!

This will not be a disruptive release.

InterMine 4.0 – FAIR

We’ve been making InterMine more FAIR! This release will include things like adding licence information to data sets, adding ontologies to describe the data model etc. More details soon! We’re hoping this release is ready late January 2019.

This will not be a disruptive release.

Thanks for reading! As always, if you have any questions, please hop onto our discord server (chat.intermine.org) or drop us an email.

Helpful Links:

Release Notes

Upgrade Instructions

 

InterMine 3.0 – Solr search

InterMine 3.0 is now available and features a brand new search powered by Solr.

Default search configuration will work well, but Solr allows for endless configuration for your specific needs.

Now the first search after deployment is instant, you can inspect the search index directly (via http://localhost:8983/solr/) and there’s a facet web service (via /service/facet-list and /service/facets?q=gene). Certain bugs, e.g. searching for the gene “OR”, are also now fixed.

New Configuration Option – optimize

There is a new keyword search configuration setting: index.optimize. If set to `true`, reorganises the index so chunks are placed together in storage which might improve the search time. (Similar to defragmentation of a hard disk.) See the configuration docs for more details.

Docs

Installing Solr

Configuring the keyword search

InterMine 3.0 upgrade instructions | release notes

A big thank you to our clever and hard-working 2018 Google Summer of Code student Arunan Sugunakumar — who did the bulk of the work as part of his summer project. Great job!

STORM + InterMine: A partnership in the fight against cancer

In July 2018 Innovate UK awarded InterMine at the University of Cambridge and STORM Therapeutics a Knowledge Transfer Partnership (KTP). A KTP is a government program that helps businesses in the UK by linking them with an academic organisation — enabling them to bring in new skills and the latest academic thinking to deliver a specific, strategic innovation project.

The key objective of this particular project is to develop an analysis platform using the data warehouse InterMine to help STORM advance their cancer research.

Here we talk with Hendrik Weisser, Senior Bioinformatician at STORM, about this collaboration.

Can you tell me about this project?

Sure, my company (STORM) is partnering with InterMine in this project. We are going to develop a computational knowledge base for cancer drug discovery and RNA epigenetics, based on InterMine’s HumanMine database. We will extend InterMine by adding analysis tools, more biomedical data etc. to make it a bespoke platform to help us identify and validate drug targets.

Can you tell me more about STORM?

STORM Therapeutics is a drug discovery company focused on RNA epigenetics, developing small-molecule inhibitors of RNA-modifying enzymes for the treatment of cancer. We are a spin-out of Cambridge University, founded in 2015 by professors Eric Miska and Tony Kouzarides from the Gurdon Institute. You can find more information – and a cool animated video about RNA epigenetics – on our website, www.stormtherapeutics.com.

What do you hope to achieve?

For STORM, convenient access to available data on RNA-modifying enzymes, their roles in RNA epigenetics, and their associations to different cancers – both direct and via interaction partners – is vital for our efforts in target validation, indication prioritisation and patient stratification. A large amount of relevant data is publicly available but is scattered over many sources and not integrated, thus difficult and time-consuming to fully utilise. STORM’s vision is to develop an integrated database of relevant human biomedical data, that should enable our scientists to quickly view and interrogate the most pertinent data on target genes/proteins, but also allow us to easily perform bioinformatic analyses on these data.

What attracted you to InterMine? What makes InterMine a useful tool for drug discovery?

I found out about InterMine’s existence by chance and then quickly signed up to an InterMine training course at Cambridge University to learn more. I was impressed by the wealth of functionality offered by InterMine and by its sophisticated architecture that enables huge flexibility in dealing with different kinds of biological data. InterMine really represents the state of the art in terms of large-scale complex biomedical data integration. By focusing on extensibility and customisation and on enabling local installations, InterMine is able to serve a variety of research communities. These capabilities also make it an ideal fit for STORM’s requirements for an internal data management system that integrates diverse public data. The fact that InterMine is open-source, i.e. the code is and will stay available, is also important for us because it helps to ensure long-term maintainability.

—-

For more information see STORM’s website.

 

 

InterMine 2.1.0 release

We pushed out a few bug fixes and improvements:

  • FIX – “Update publications” data source failed when too many PubMed IDs were sent to the Entrez web service (Thank you to Norbert Auer!)
  • FIX – small bug for generating Python code
  • FIX – FASTA query web service times out when extensions are used (Thanks to Joel Richardson!)
  • FIX – Discontiguous CDS sequences and lengths not set properly
  • FIX – Some SO terms were not updated in 2.0 release
  • FIX – Region search: trying to leave “extended region” field blank results in error

See GitHub for details:

https://github.com/intermine/intermine/releases

How to Upgrade

Throughout the InterMine code, the InterMine version number is set via a global variable. Here’s an example:

# Maven will download the bio-core JAR with the correct version
compile group: 'org.intermine', name: 'bio-core', version: System.getProperty("bioVersion")

To change which InterMine version you are using , you will want to increment the value of the system property “imVersion” and “bioVersion“. These are located in  the “gradle.properties” file for your mine:

# gradle.properties in your mine
systemProp.imVersion=2.1.+
systemProp.bioVersion=2.1.+

Maven will now download, for example, the bio-core JAR of the latest version, e.g. “bio-core-2.1.0.jar”.

If you set the property to “2.1.+” you will get any small point releases that are published in the future. You can set the property to be 2.1.0 if you ONLY want to use version “2.1.0” and do not want to receive updates:

# gradle.properties to only get specific version
systemProp.imVersion=2.1.0
systemProp.bioVersion=2.1.0

Here is an example:

HumanMine upgrade to use the latest version.

InterMine 3.0 – SOLR

The next InterMine release will be InterMine 3.0 which will include SOLR. See our SOLR blog post for details.

We are currently testing SOLR with InterMine and should have a version ready for public beta testing early next month.

 

InterMine 2.0

We are excited to announce the official release of InterMine 2.0!

InterMine 2.0 includes some model updates, a big change in how InterMine itself is built, lots of new features, like a new UI, and a long list bug fixes. See the full list of updates here.

This release represents a large milestone for the InterMine team! Not only because we made big fundamental changes to the core InterMine data model and build system, but also because this release represents a major shift in philosophy for us. Previously InterMine was a big, monolithic, single piece of software. You downloaded the whole InterMine, you compiled the whole InterMine, you got the whole of InterMine. Instead, we are moving towards this idea of modularity and responsiveness. Smaller, independent libraries that are interconnected but can be used for tools and features separately or linked together.

Smaller decoupled InterMine packages will allow us to develop more features faster with less errors. InterMine maintainers might then have the flexibility to include (or not) the features in their mine, plug in their own tools, etc.

Version 2.0 represents a big step towards this goal!

A New Interface

A new feature in InterMine 2.0 is the ability to run our new UI, nicknamed “Blue Genes”. This app is in addition to the current webapp and offers a new and responsive search environment for your InterMine data.

Blue genes is a modern UI built in Clojure and provides a modern user experience.

  • Super fast response times
  • Interactive list upload
  • Redesigned “My account” section
  • Search autocomplete
  • Template and query builder result previews
  • .. lots more!

Once you have your InterMine updated to InterMine 2.0, there is a single command that will launch Blue Genes for your mine.

We are actively seeking feedback on Blue Genes, it’s still very much in the beta phase still, so please get in touch once you have some opinions!

Special Thanks

Thanks to everyone who helped test this release! Thanks Howie Motenko at MGI for your alpha testing and model insights. And a BIG thank you goes to Sam Hokin from the NCGR who spent a lot of time and effort helping improve InterMine! Thanks Sam and Howie! You are much appreciated.

Helpful Links

What exactly we changed (blog post)

Full list of GitHub tickets included this release

Docs on how to upgrade to 2.0

 

As always, please contact us if you have any questions or comments! We have an active twitter account, a discord server at chat.intermine.org, and a low traffic mailing list.