InterMine 4.1.0

InterMine team has just released InterMine 4.1.0.

The new release includes a better integration with Galaxy: we can import data into Galaxy from any InterMine of our choice (either starting from InterMine or Galaxy), and we can export a list of identifiers from Galaxy to any InterMine of our choice through the InterMine registry. No need to configure anything any more: all the Galaxy properties have been moved to InterMine core. No need to create a mine-specific Galaxy tool anymore, use the NEW intermine tool instead. Please read here for more details. A simple InterMine tutorial will be published soon in the Galaxy Training Material, under the Data Manipulation topic.

This release offers the integration with ELIXIR AAI (Authentication and Authorisation Infrastructure) allowing the researchers to log in the InterMine instances using their ELIXIR profile. You will need:

  1. an ELIXIR identity
  2. register the InterMine client in order to obtain the client-id and the client-secret which must be set in the mine properties file.

More details here in the OpenAuth2 Settings section of the documentation.

Also new in this version is the gradle wrapper 4.9, which is compatible with Java11. This only effects the users which compile/install InterMine code.

Thank you so much to our contributor Joe Carlson for improving the generateUpdateTriggers task.

The release contains also a few bug fixes.

Bug Fixes

  • Solved the error caused by obsolete terms in the gene ontology
  • Fasta query result: CDS translation option + extra view parameter
  • The ONE OF constraint works properly when editing a template
  • The default queries configuration have been migrated to json
  • The task generateUpdateTriggers has been improved

See the release notes for the complete list and detailed information.

This is a non-disruptive release. To update your mine with these new changes, see the upgrade instructions.

Status update for BlueGenes

It’s been a while since we posted our last (rather optimistic) update around BlueGenes, so we thought we’d share a quick update, starting with the basics.

As a reminder, the long-term goal of BlueGenes is to replace the existing JSP-based UI with a more modern interface – one that works well with mobiles, one that hopefully responds more quickly and is easier to use, and perhaps most importantly, is easy to update and customise.

Some of the questions we’ve had in the last few months:

Q: Will BlueGenes replace the current JSP UI?

A: Yes, eventually. Once we reach official beta/prod release (we’re currently in alpha), we anticipate running them concurrently for a couple of years, but we probably will only provide small fixes for the JSP UI during this period, focusing most of our development effort on BlueGenes.

Q: Do I have to run my own BlueGenes, or can I use the central one at apps.intermine.org?

A: Since BlueGenes is powered purely by web services, it will probably be possible to run your InterMine as a server/api-only service and use BlueGenes at bluegenes.apps.intermine.org/. You can also run your own BlueGenes on your servers and domains, allowing you to customise it so it’s suitable for your data, and not having to rely on our uptime. Either (or both) should work fine. There will be some version requirements related to what version of InterMine can access all the features of BlueGenes – see the next point.

Q: What version of InterMine do I need to have to run BlueGenes?

A: BlueGenes will require a minimum version of InterMine to run. The original release of InterMine web services focused primarily on providing a way to give JSP users access to their data programmatically, but at the time there wasn’t an anticipated need for application level services such as superuser actions. There are a few web services and authentication-layer services we still need to implement, so it’s likely BlueGenes will need API version 31+ or higher in order to be fully-featured. InterMines with API version 27 or higher can run a basic version of BlueGenes. You can check out this table to see if your InterMine is configured to work with BlueGenes.

Q: Ok, so what’s left to do before BlueGenes is released as a public beta?

A: Mostly authentication, superuser and MyMine features – things  like saving and updating personal templates, sorting lists in folders, updating preferences and passwords. Some of these features require updates to InterMine itself in order to work – hence the minimum version noted in the previous question. Once these are ready we’ll move to the public beta stage.

Your input here will be incredibly welcome, too – the more feedback we get early on, the more polished we hope BlueGenes can be.

Q: Will BlueGenes work nicely with HTTPS InterMines?

A: You will be able to run BlueGenes without HTTPS, but in order to avoid inadvertently exposing user passwords, the login button will only be available over HTTPS connections. We’re also working with a student over the next few months, to implement a pilot InterMine Single Sign On service. You can read about it in our interview with Rahul Yadav.

Q: Will I be able to customise the way BlueGenes looks?

A: Totally! There are two ways you can do this. One is to make sure you have your logo and colour settings configured in your web properties. We have a nice guide for that. This’ll tell us what your preferred highlight colours are – FlyMine is purple, HumanMine green, etc. If you’re really dedicated and would like to write your own CSS, you can do that too, if you’re running your own InterMine/BlueGenes combo.

Q: I have some nice custom visualisation tools in my InterMine. I don’t want to have to re-write them!

A: We don’t want you to re-write them either! It depends how they’re implemented in your mine, but we’ve designed the BlueGenes Tool API with you in mind, and many Javascript-powered tools will require only a few lines of code to become BlueGenes ready.

As an example, the Cytoscape interaction viewer currently used in some InterMines only requires 20 lines of code to import into BlueGenes, plus a few lines of config – all the other files (and most of the config too) is boilerplate that we auto-generated.

InterMine 4.0 – InterMine as a FAIR framework

We are excited to publish the latest version of InterMine, version 4.0.

It’s a collection of our efforts to make InterMine more “FAIR“. As an open source data warehouse, InterMine’s raison d’être is to be a framework that enables people to quickly and easily provide public access to their data in a user friendly manner. Therefore InterMine has always strived to make data Findable, Accessible, Interoperable and Reusable and this push is meant to formally apply the FAIR principles to InterMine.

What’s included in this release?

  1. Generate globally unique and stable URLs to identify InterMine data objects in order to provide more findable and accessible data.
  2. Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability
  3. Embed metadata in InterMine web pages to make data more findable
  4. Improve accessibility of data licenses for integrated sources via web interface and REST web-service.

More details below!

How to upgrade?

This is a non-disruptive release, but there are additions to the data model. Therefore, you’ll want to increment your version, then build a new database when upgrading. No other action is required.

However, keep reading for how to take advantages of the new FAIR features in this release.

Unique and stable URLs

We’ve added a beautiful new user-friendly URL.

Example: http://beta.flymine.org/beta/gene:FBgn0000606

Currently this is used only in the “share” button in the report pages and in the web pages markup. In the future, this will be the only URL seen in the browser location bar.

For details on how to configure your mine’s URLs, see the docs here.

See our previous blog posts on unique identifiers.

Decorating the InterMine data model with ontology terms

InterMine 4.0 introduces the ability to annotate your InterMine data model with ontology terms.

While these data are not used (yet), it’s an important feature in that it’s going to facilitate cross-InterMine querying, and eventually cross-database analysis — allowing us to answer questions like “Is the ‘gene’ in MouseMine the same ‘gene’ at the EBI?”.

For details on how to add ontologies to your InterMine data model, see the docs here.

Embedding metadata in InterMine webpages

We’ve added structured data to web pages in format of JSON-LD to make data more findable, and these data are indexed by Google data search. Bioschemas.org is extending Schema.org with life science-specific types, adding required properties and cardinality on the types. For more details see the docs here.

By default this feature is disabled. For details on how to enable embedding metadata in your webpages, see the docs here.

Data licences

In our ongoing effort to make the InterMine system more FAIR, we have started working on improving the accessibility of data licences, retaining licence information supplied by the data sources integrated in InterMine, and making it available to humans via our web application and machines via queries.

See our previous blog post on data licences.

For details on how to add data licences to your InterMine, see the docs.

Future FAIR plans

  1. Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore
  2. Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets

Upcoming Releases

The next InterMine version will likely be ready in the Fall/Winter and include some user interface updates.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.

InterMine 3.1.2 – patch release

We’ve released a small batch of bug fixes and small features. Thank you so much to our contributors: Sam Hokin, Arunan Sugunakumar and Joe Carlson!

Features

  • Templates can be tagged by any user, not just the super user. (Via webservice only – for now)

Fixes

  • When searching our docs, some times the “.html” extension was dropped. This was fixed by our beautiful documentation hosters – readthedocs.org
  • Installing the “bio” project via Gradle does not fail if you do not have the test properties file.
  • Gradle logs error fixed
  • Removed old GAF 1.0 code
  • Fixed XML library issue:  java.lang.ClassCastException for org.apache.xerces
  • Set converter.class correctly
  • Updated the protein atlas expression graph
  • Handle NULL values returned by NCBI web services
  • Updated Solr to support new Solr versions
  • Removed unneeded Gretty plugin
  • Better error handling for CHEBI web services
  • Publication abstract is longer than postgres index
  • Removed phenotype key, it’s not in the core model and has conflicting key
  • Updated ObjectStoreSummary to handle ignored fields consistently.

Upcoming Releases

InterMine 4.0 is scheduled for release the week of 7 May 2019.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.

Data integration and Machine Learning for drug target validation

Hi!

In this blog post I would like to give a brief overview of what I’m currently working on.

Knowledge Transfer Partnership: what & why?

First, in order to give context to this post, last year InterMine at University of Cambridge and STORM Therapeutics, a spin-out of University of Cambridge working on small modulating RNA enzymes for the treatment of cancer, were awarded a Knowledge Transfer Partnership (KTP) from the UK Government (read this post for more information). With this award, the objective is to help STORM Therapeutics advance their efforts in cancer research, and contribute to their ultimate goal of drug target validation.

As part of the KTP Award, a KTP Associate needs to be appointed by both the knowledge base (University of Cambridge) and the company (STORM). The role of the KTP Associate is to act as the KTP Project Manager and is in charge of the successful delivery of the project. For this project, I was appointed as the KTP Associate, with a Research Software Engineer / Research Associate role at the University of Cambridge, for the total duration of the project: 3 years.

Machine learning and a new mine: StormMine

Now that you know what the KTP project is about, and who is delivering it, let’s move on to more interesting matters. In order to successfully delivering this project, the idea is to use the InterMine data warehouse to build a knowledge base for the company, STORM, that enables their scientist to have all the relevant data for their research in a single, integrated, place. For this reason, several new data sources will be integrated into a STORM’s deployment of the InterMine data warehouse (StormMine, from now on), and appropiate data visualizations will be added.

Then, once the data is integrated, we can think towards analysing the data to gather insights that may help the company goals, such as applying statistical and Machine Learning methods to gather information from the data, as well as building computational intelligence models. This leads the way towards what I’ve been working on since my start in February, and will continue until July 2019.

In general terms, I’m currently focused on building Machine Learning models that are able to learn how to differentiate between known drug targets and non-targets from available biological data. This part of work is going to be used as my Master’s Thesis, which I hopefully will deliver in July! Moreover, with this analysis, we will be able to answer three extremely relevant questions for STORM, and which are the questions leading the current work on the project. These questions are

  1. Which are the most promising target genes for a cancer type?
  2. Which features are most informative in predicting novel targets?
  3. Given a gene, for which cancer types is it most relevant?

If you are interested in learning more about this work, stay tuned for next posts, and don’t hesitate contacting me, either by email (ar989@cam.ac.uk) or connect with me in LinkedIn (click here)!

 

InterMine 3.1 – Extending the Core InterMine Data Model with Multiple Genome Versions, Strains

Advances in sequencing technologies mean that genome sequence and annotation data for multiple strains of a species are now often available. An update to the InterMine core data model was decided that would allow addition of Strain data should it be available without affecting InterMines which do not have this data.

It was decided that the addition of a new class, Strain, which is referenced by Organism and Sequence feature and vice versa, would allow both the flexibility required and allow for addition of further data and expansion if required.

strains

The Strain class has the following features/advantages:

  • SequenceFeature entities, such as Genes, would continue to reference Organism, but would also reference the new Strain class, allowing for queries returning SequenceFeatures for a specific strain.
  • Providing strain information as a separate class allows individual InterMine’s to reference other information as required, such as Genotype and Stocks.
  • The Strain class extends BioEntity so will include strain-relevant attributes such as PrimaryIdentifier and Name and will reference other collections such as synonym.
  • Minimal changes to the user interface will be required as, to our knowledge, SequenceFeatures in individual strains always have a unique identifier. With the help of templates if necessary, users will be able to identify particular SequenceFeatures and which strain they originate from.

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes and the notes from the community call for more details. Please join our community calls if you’d like to be part of future data model decisions! (Details of upcoming calls are available via our developer mailing list).

InterMine 3.0 – Solr search

InterMine 3.0 is now available and features a brand new search powered by Solr.

Default search configuration will work well, but Solr allows for endless configuration for your specific needs.

Now the first search after deployment is instant, you can inspect the search index directly (via http://localhost:8983/solr/) and there’s a facet web service (via /service/facet-list and /service/facets?q=gene). Certain bugs, e.g. searching for the gene “OR”, are also now fixed.

New Configuration Option – optimize

There is a new keyword search configuration setting: index.optimize. If set to `true`, reorganises the index so chunks are placed together in storage which might improve the search time. (Similar to defragmentation of a hard disk.) See the configuration docs for more details.

Docs

Installing Solr

Configuring the keyword search

InterMine 3.0 upgrade instructions | release notes

A big thank you to our clever and hard-working 2018 Google Summer of Code student Arunan Sugunakumar — who did the bulk of the work as part of his summer project. Great job!