InterMine Registry

At the beginning of the development of this project, there was no place from where all the up-to-date InterMine instances information like name, url, description, versions, organism, colors, logo, could be retrieved at once. This lead to hard-coded information, and inefficient processes in order to get these data. Motivated by these problems, InterMine Registry idea was conceived. InterMine Registry is a place where all the up-to-date instances information is stored and can be consumed by applications like Blue Genes, iOS, InterMine R, the friendly mine tool or available to everyone who needs it.

The core of InterMine Registry is its RESTful API (http://registry.intermine.org/api-docs/). Running over Node.js integrated with MongoDB, it contains methods (endpoints) to administer the instances on the registry (add, update & delete) and search among them. Maintaining the registry up-to-date is critical. In order to achieve this goal, the Registry provides automatic updates of all the instances every 24 hours. In addition to this, all or one instances can be manually updated by using the API  synchronization methods. It should be noted that in order to administer instances, an authentication process must be done.

To complement the API, a fully responsive front-end web application is being developed (http://registry.intermine.org/), from which everyone can see all the InterMine instances and search among them. Instances are presented in a list and grid view, both of them having the same purpose but with different aspect. Moreover, a world view is presented, from which the users can see the InterMine instances location on a world map. In addition to this, authenticated users can administer the instances (add, update & delete) with a nice user interface.

This project is part of Google Summer of Code, still under development by me, Leonardo Kuffó, undergraduate student at ESPOL university (Guayaquil, Ecuador), under the mentoring of Daniela Butano. The source code of the application can be found at https://github.com/intermine/intermine-registry

 

New Branding Parameters – Mine Update Needed

We have written several non-InterMine applications that require mine-specific displays. For example:

  • iOS app needs colour and logo to distinguish between mines
  • Blue genes app needs config from mine to brand the site
  • InterMine home page
  • Registry UI
  • InterMine R – shiny app
  • Friendly mines tool

And there may be more applications in the future!

image

To make your logo and mine colour available to these applications, please set these properties in web.properties file:

branding.images.logo This image should be 45px by 45px, defaults to InterMine logo
branding.colors.header.main Main colour for your mine, defaults to grey, #595455
branding.colors.header.text Text colour designed to be readable against your main colour, defaults to white, #fff.

You will have to restart your webapp for these to take effect. You can view these parameters at the /branding API end point, e.g. flymine.org/flymine/service/branding

Here are the docs on the web.properties file, and here is FlyMine’s web.properties file. There’s also an example on Codepen.

If you need help finding the right colour, we can help, or try a colour picker!

 

InterMine 2.0: Proposed Model Changes (II)

We have several new additions and changes to the InterMine core data model coming in InterMine 2.0 (due Fall 2017).

We had a great discussion on Thursday about the proposed changes. Below are the decisions we made.

Multiple Genome Versions

Many InterMine instances have several different genome versions.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="annotationVersion" type="java.lang.String"/>
    <attribute name="assemblyVersion" type="java.lang.String"/>
  </class>

Multiple Varieties / Subspecies / Strains

We were going to add variety to the Organism data type to indicate subtypes that have the same taxon ID, however some people expressed a concern that this term wasn’t generic enough.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="variety" type="java.lang.String"/>
  </class>

Other suggestions:

  1. Strain
  2. Subspecies
  3. Stock
  4. Line
  5. Accession
  6. Subtype
  7. Ecotype
  8. Isolate
  9. Others? …

It was suggested that we take a vote to choose the name. Please note that you can overwrite attribute names locally. But it would be better if we could all (mostly) agree!

User Interface

Both the above changes will require updates to the core InterMine code where it is assumed that Organism.taxonID is the unique field. This assumption will be replaced so that the new fields in Organism, where present, are used for the primary key.

For user friendliness, it will be necessary to assign unique organism names. Users will then be able to easily identify distinct versions in template queries and widgets.

Syntenic Regions

Proposed addition to the InterMine core data model

<class name="SyntenicRegion" extends="SequenceFeature" is-interface="true">
   <reference name="syntenyBlock" referenced-type="SyntenyBlock" reverse-reference="syntenicRegions"/>
 </class>
 
 <class name="SyntenyBlock" is-interface="true"> 
   <collection name="syntenicRegions" referenced-type="SyntenicRegion" reverse-reference="syntenyBlock" />
 </class>
  • We decided against making a SyntenyBlock a bio-entity, even though it would benefit from inheriting some references.
  • We also decided against the SyntenicRegion1 / SyntenicRegion1 format and instead they will be in a collection of regions.

GO Evidence Codes

Currently the GO evidence codes are only a controlled vocabulary and are limited to the code abreviation, e.g IEA. However UniProt and other data sources have started to use ECO ontology terms to represent the GO evidence codes instead.

We decided against changing the GO Evidence Code to be an ECO ontology term.

  • The ECO ontology is not comprehensive
  • Some mines have a specific data model for evidence terms

Instead we are going to add attributes to the GO Evidence Code:

  • Add full description of the GO Evidence Code
  • Add a link to more information on the GO evidence codes
  • (Optional) add a link to the ECO term, if available.
<class name="GOEvidenceCode" is-interface="true">
 <attribute name="code" type="java.lang.String" />
 <attribute name="description" type="java.lang.String" />
 <attribute name="URL" type="java.lang.String" />
</class>

IEA evidence code example

Ontology Annotations – Subject

Currently you can only reference BioEntities, e.g. Proteins and Genes, in an annotation. This is unsuitable as any object in InterMine can be annotated, e.g. Protein Domains. To solve this problem, we will add a new data type, Annotatable.

<class name="Annotatable" is-interface="true"> <collection name="ontologyAnnotations" referenced-type="OntologyAnnotation" reverse-reference="subject"/> </class> <class name="OntologyAnnotation" is-interface="true"> <reference name="subject" referenced-type="BioObject" reverse-reference="ontologyAnnotations"/>
 </class>
 <class name="BioEntity" is-interface="true" extends="Annotatable"/>

This will add complexity to the data model but this would be hidden from casual users with templates.


If you would like to be involved in these discussions, please do join our community calls or add your comments to the GitHub tickets. We want to hear from you!

Researchers connected in Berlin

researchersConnected.png

I really enjoyed attending the Neo4j Life & Health Sciences Workshop, organized in Berlin, this week, by Michael and Petra: a day rich with great presentations about the application and utility of graph technology in several research areas. Here are only few examples:

  • The Ontology Lookup Service, a repository for biomedical ontologies, implemented with the support of graph databases and Apache Solr for indexing, different technologies for different purposes.
  • In the Lamond lab (University of Dundee), they model proteomics data with graph databases in order to understand protein behaviour under different conditions and dimensions of analysis.
  • MetaProteomeAnalyzer (MPA), a tool for analyzing & visualizing metaproteomics, uses Neo4j as the backend for metaproteomics data analysis software.
  • Tabloid Proteome is a database of associated protein pairs, derived from mass-spectrometry based proteomics experiments, implemented using a graphdb, which can help also to discover proteins that are connected indirectly, or may have information that you are not looking for!
  • Reactome is a pathway database which has recently migrated from MySQL to Neo4j, with relevant performance improvement. You can access data via the GraphCore open source Java library, developed  with Spring Data Neo4j, or via Neo4j browser.

I’ve lost count of how many times I heard sentences like: “Biology systems are complex and growing and graphs are the native data model” or “Graph database technology is an effective tool for modelling highly connected data as we have in biology systems”. We already knew it, but it’s been very encouraging and promising hearing it again from so many researchers and practitioners with higher experience than us in graph technologies.

In the afternoon, I attended the workshops “Data modelling with Neo4j”; starting from the data sources we usually work with, we have tried to model the entities and the relationships in order to answer some relevant questions. Modelling can be very challenging and, in some cases, it might depend on the questions you have to answer!

Before the end, I had the chance to give a short presentation about our experience with Neo4j.

Thanks again Michael and Petra for organizing such a great event!

InterMine 2.0: PROPOSED Model Changes

We have several new additions and changes to the InterMine core data model coming in InterMine 2.0 (due Fall 2017).

You can follow the detailed conversation for each change on GitHub. Please note, these are only the proposals and will be discussed further on community calls. Join the conversation!

Multiple Genome Versions

Many InterMine instances have several different genome versions.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="annotationVersion" type="java.lang.String"/>
    <attribute name="assemblyVersion" type="java.lang.String"/>
  </class>

Multiple Varieties / Subspecies / Strains

We’re going to add variety to the Organism data type to indicate two strains that have the same taxon ID.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="variety" type="java.lang.String"/>
  </class>

User Interface

Both the above changes will require updates to the core InterMine code where it is assumed that Organism.taxonID is the unique field. This assumption will be replaced so that the new fields in Organism, where present, are used for the primary key.

For user friendliness, it will be necessary to assign unique organism names. Users will then be able to easily identify distinct versions in template queries and widgets.

Syntenic Regions

Proposed addition to the InterMine core data model

  <class name="SyntenicRegion" extends="SequenceFeature" is-interface="true">
    <reference name="partner" referenced-type="SyntenicRegion" reverse-reference="partner" />    
    <reference name="syntenyBlock" referenced-type="SyntenyBlock"/>
  </class>
  
  <class name="SyntenyBlock" is-interface="true">
    <attribute name="medianKs" type="java.lang.Double"/>    
    <collection name="syntenicRegions" referenced-type="SyntenicRegion"/>
  </class>

GO Evidence Codes

Currently the GO evidence codes are only a controlled vocabulary and are limited to the code abreviation, e.g IEA. However UniProt and other data sources have started to use ECO ontology terms to represent the GO evidence codes instead.

Current model

<class name="GOEvidence" is-interface="true">
 <reference name="code" referenced-type="GOEvidenceCode"/>
</class>

Proposed change to the InterMine core data model

<class name="GOEvidence" is-interface="true">
 <reference name="code" referenced-type="ECOTerm"/>
</class>

The ECO term would have the GO evidence code abbreviation along with the full description.

IEA evidence code example

Not many GO annotation data sets use ECO terms (yet) but InterMine will implement a lookup-service to replace the traditional GO evidence codes with the corresponding ECO term during data loading.


If you would like to be involved in these discussions, please do join our community calls or add your comments to the GitHub tickets. We want to hear from you!

Out and about: where to find InterMiners over June and July 2017

We recently added a public google calendar you can subscribe to if you’re interested in knowing what we’re up to, or when public holidays might mean we’re out of the office. Here’s a quick lowdown on upcoming events:

20 June 2017: InterMine community dev call.

21 June 2017: Neo4j Life and Health sciences day in Berlin. Keep your eyes peeled for Daniela!

28 June 2017: Daniela will be presenting on our experiences with Neo4j at the London Neo4J GraphDB meetup.

4 and 18 July 2017: InterMine community dev calls.

22-23 July 2017: I’ll be presenting a poster at BOSC/ISMB about BlueGenes, with the fantastically witty title “Forever in BlueGenes: a next-generation genomic data interface powered by InterMine”. 👖


If you’re a GSoC student or mentor, there will also be the evaluation periods at the end of each month, but you’re doubtless well aware of those!

Further in the future, you may find us at SWAT4LS, ISWC, and further Bioschemas events. We’ll keep you posted!

Are you attending any fun events? Let us know!

If you’re going to be at an event this year where you’ll be telling others about your work with InterMine and might like some InterMine stickers or handouts – or perhaps you’d like to guest-blog about it or share your slides – please ping us.

 

 

 

InterMine community roundup: June 2017

Here are some of the exciting things that have been happening in the InterMine community recently:

Thanks to everyone who has contributed including students and their mentors. You guys are awesome!

excited Kermit via GIPHY

Have you done anything exciting with InterMine lately? email info [at] intermine [dot] org, tweet us at @intermineorg, or pop into chat.intermine.org to tell us about it… we’d love to feature you in a future round-up!