InterMine 2.0: PROPOSED Model Changes

We have several new additions and changes to the InterMine core data model coming in InterMine 2.0 (due Fall 2017).

You can follow the detailed conversation for each change on GitHub. Please note, these are only the proposals and will be discussed further on community calls. Join the conversation!

Multiple Genome Versions

Many InterMine instances have several different genome versions.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="annotationVersion" type="java.lang.String"/>
    <attribute name="assemblyVersion" type="java.lang.String"/>
  </class>

Multiple Varieties / Subspecies / Strains

We’re going to add variety to the Organism data type to indicate two strains that have the same taxon ID.

Proposed addition to the InterMine core data model

  <class name="Organism" is-interface="true">
    <attribute name="variety" type="java.lang.String"/>
  </class>

User Interface

Both the above changes will require updates to the core InterMine code where it is assumed that Organism.taxonID is the unique field. This assumption will be replaced so that the new fields in Organism, where present, are used for the primary key.

For user friendliness, it will be necessary to assign unique organism names. Users will then be able to easily identify distinct versions in template queries and widgets.

Syntenic Regions

Proposed addition to the InterMine core data model

  <class name="SyntenicRegion" extends="SequenceFeature" is-interface="true">
    <reference name="partner" referenced-type="SyntenicRegion" reverse-reference="partner" />    
    <reference name="syntenyBlock" referenced-type="SyntenyBlock"/>
  </class>
  
  <class name="SyntenyBlock" is-interface="true">
    <attribute name="medianKs" type="java.lang.Double"/>    
    <collection name="syntenicRegions" referenced-type="SyntenicRegion"/>
  </class>

GO Evidence Codes

Currently the GO evidence codes are only a controlled vocabulary and are limited to the code abreviation, e.g IEA. However UniProt and other data sources have started to use ECO ontology terms to represent the GO evidence codes instead.

Current model

<class name="GOEvidence" is-interface="true">
 <reference name="code" referenced-type="GOEvidenceCode"/>
</class>

Proposed change to the InterMine core data model

<class name="GOEvidence" is-interface="true">
 <reference name="code" referenced-type="ECOTerm"/>
</class>

The ECO term would have the GO evidence code abbreviation along with the full description.

IEA evidence code example

Not many GO annotation data sets use ECO terms (yet) but InterMine will implement a lookup-service to replace the traditional GO evidence codes with the corresponding ECO term during data loading.


If you would like to be involved in these discussions, please do join our community calls or add your comments to the GitHub tickets. We want to hear from you!

Advertisements