InterMine 2.0: Proposed Model Changes (III)

We have several new additions and changes to the InterMine core data model coming in InterMine 2.0 (due Fall 2017).

We had a great discussion on Thursday about the proposed changes. Below are the decisions we made.

Multiple Genome Versions, Varieties / Subspecies / Strains

 

We were not able to come to an agreement, but everyone still felt there might be a core data model that can allow for single and multiple genomes that will be useful for all InterMines.

The fundamental question is do we want organism to be per genome version, or allow organism to have several genome versions. In the latter case, we’d also need a helper class, e.g. “Strain”, that would contain information about the genome.

This topic is sufficiently complex that we’ve agreed to try a more formal process here, listing our different options, their potential impact etc. More information on this process soon!

Syntenic Regions

Proposed addition to the InterMine core data model

<class name="SyntenicRegion" extends="SequenceFeature" is-interface="true">
    <reference name="syntenyBlock" referenced-type="SyntenyBlock" reverse-reference="syntenicRegions"/>
  </class>
  
  <class name="SyntenyBlock" is-interface="true">    
   <collection name="syntenicRegions" referenced-type="SyntenicRegion" reverse-reference="syntenyBlock" />
   <collection name="dataSets" referenced-type="DataSet" />
   <collection name="publications" referenced-type="Publication" />
  </class>
  • We decided against making a SyntenyBlock a bio-entity, even though it would benefit from inheriting some references.
  • We also decided against the SyntenicRegion1 / SyntenicRegion1 format and instead they will be in a collection of regions.

GO Evidence Codes

Currently the GO evidence codes are only a controlled vocabulary and are limited to the code abreviation, e.g IEA. However UniProt and other data sources have started to use ECO ontology terms to represent the GO evidence codes instead.

We decided against changing the GO Evidence Code to be an ECO ontology term.

  • The ECO ontology is not comprehensive
  • Some mines have a specific data model for evidence terms

Instead we are going to add attributes to the GO Evidence Code:

  • Add a link to more information on the GO evidence codes
  • Add the full name of the evidence code.
  • Change GOEvidenceCode to be OntologyAnnotationEvidenceCode

We decided against loading a full description of the evidence code. The description on the GO site is a full page. We tried shortening but then it didn’t really add much information. Also there is no text file with the description available.

We are also going to move evidence to Ontology Annotation.

GOEvidenceCode will be renamed OntologyAnnotationEvidenceCode:

<class name="OntologyAnnotationEvidenceCode" is-interface="true">
 <attribute name="code" type="java.lang.String" />
 <attribute name="name" type="java.lang.String" />
 <attribute name="URL" type="java.lang.String" />
</class>

GOEvidence will be renamed OntologyEvidence:

<class name="OntologyEvidence" is-interface="true">
 <reference name="code" referenced-type="OntologyAnnotationEvidenceCode"/>
 <collection name="publications" referenced-type="Publication"/>
</class>

Evidence will move to OntologyAnnotation from GOAnnotation:

<class name="OntologyAnnotation" is-interface="true">
 <collection name="evidence" referenced-type="OntologyEvidence"/>
</class>

 

Ontology Annotations – Subject

Currently you can only reference BioEntities, e.g. Proteins and Genes, in an annotation. This is unsuitable as any object in InterMine can be annotated, e.g. Protein Domains. To solve this problem, we will add a new data type, Annotatable.

<class name="Annotatable" is-interface="true"> <collection name="ontologyAnnotations" referenced-type="OntologyAnnotation" reverse-reference="subject"/> </class> <class name="OntologyAnnotation" is-interface="true"> <reference name="subject" referenced-type="Annotatable" reverse-reference="ontologyAnnotations"/> </class> <class name="BioEntity" is-interface="true" extends="Annotatable"/>

This will add complexity to the data model but this would be hidden from casual users with templates.

Protein molecular weight

Protein.molecularWeight is going to be changed from an integer to a float.

Timeline

October

  • Julie makes changes to core InterMine data model and parsers
  • On ‘model-changes’ branch

November

  • Release beta FlyMine with new model changes for community review
    • Sam will help test Synteny changes
  • Finalise changes. Move changes from ‘model-changes’ branch to ‘release-candidate’ branch
  • InterMine 2.0 will be tested on a staging branch (‘release-candidate’) because the changes are so disruptive:
    • New software build system – Gradle
    • Require updated software dependencies, e.g. Java 8, Tomcat 8, Postgres 9.x
    • Model changes

December

  • “Code freeze”
    • All 2.0 changes tested on ‘release-candidate’ branch
    • Need help testing!
  • InterMine 2.0 release
    • Move changes from dev branch to master branch
    • Before Xmas

If you would like to be involved in these discussions, please do join our community calls or add your comments to the GitHub tickets. We want to hear from you!

Advertisements