InterMine and modENCODE

InterMine will be used as a major component of the Data Coordination Centre for the $57m modENCODE project; G. Micklem has been awarded two posts for four years to apply the technology developed in the FlyMine project to the dissemination of data produced as part of the $57m US NIH modENCODE project (www.modencode.org). This programme will generate an unprecedented amount of data about functional elements in the genomes of the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans.

Release 7.1

Release 7.1 includes a number of new user interface features. Results of queries can be ordered by a selected column and results tables have a new summary button which brings up statistics on the values in that column. Many template queries now have more descriptive column titles in results tables.

  • User Interface
    • NEW – Column summaries in results tables. Each column of a results table now has a summary icon, clicking this will bring up a box with more information about data in the column. For numerical data it will show the minimum value, maximum value, mean and standard deviation. For text it will display the number of unique values and the most commonly occurring values with their frequency.
    • NEW – Sorting query results. The QueryBuilder allows you to select an element from the output to sort results by. A sort button lets you choose ascending or descending order, for example to display results with the highest confidence score or most recent publication first.
    • NEW – Results column titles. Many template queries are now configured to have more descriptive column headings in results tables. The full path can be seen by hovering the mouse pointer over the description.
    • NEW – Chromosome distribution viewer. The gene details page for D. melanogaster and A. gambiae now includes a chromosome distribution viewer. This shows how many genes from the bag are found on each chromosome, click on a bar to see a list of the genes. The graph also shows an expected number of genes for each chromosome based on the distribution of all genes between chromosomes and the size of the bag.
    • NEW – Accurate counts on the results page. The results page used to show only an approximate number of rows returned from a query (unless the ‘Last’ link was clicked). The estimate is now updated to give an accurate ‘Total rows’ figure once it has been calculated.
    • UPDATE – The trail (e.g. Query -> Results -> Gene -> Protein) is now more complete to allow easy navigation back to recently viewed queries, results, object details or bag pages.
    • FIX – Performance has been improved when saving and viewing large bags of objects.
    • FIX – Renaming bags now works correctly.
    • FIX – Missing export options from results pages have been fixed. Genome features can be exported as FASTA or GFF3, protein interactions can be exported in Cytoscape SIF format.
    • FIX – Sequences from Translation objects can now be exported as FASTA.
    • FIX – Some minor issues with display on Internet Explorer have been fixed..
  • Tools
    • FIX – TFModules from REDfly are now shown in GBrowse and have a GBrowse image on their details pages.
  • Known issues
    • There are currently no known problems with release 7.1.

Release 7.0

Release 7.0 updates the D. melanogaster genome to 5.1 annotation and other genome annotation sources have been re-mapped. GO enrichment and KEGG pathway viewers have been added to the gene Bag Details page.

  • Data
    • UPDATE – The D. melanogaster genome has been updated to annotation version 5.1. Data from DrosDel, FlyReg, REDfly and the microarray tiling path have been re-mapped using USCS LiftOver.
    • NEW – UniProt keywords (e.g. Acetylation, Sulfate transport) and protein features (e.g. HELIX, DNA_BIND) have been added.
    • NEW – KEGG pathway information added for D. melanogaster.
    • NEW – FlyAtlas now has data for three more tissues – larval fat body, larval tubule and male accessory gland.
    • UPDATE – InParanoid orthologues have been updated to a release from January 2007.
    • UPDATE – Four more D. melanogaster RNAi screens added from the DRSC the RNAi aspect.
    • UPDATE GO annotation, protein-protein interactions and UniProt protein data are all updated to recent releases.
    • FIX – Missing protein structure data has been added, protein structures can now be viewed with JMol again.
    • FIX – Missing protein interaction detection method has been replaced.
    • FIX – Missing INDAC oligo sequences added.
  • User Interface
    • FIX -Drosophila gene names now work in the quick search box, a number of minor problems with quick search have been fixed.
    • NEW – The trail (e.g. Query -> Results -> Gene) has been improved for easy navigation between queries, results, and details pages.
    • NEW – GO enrichment widget on the gene Bag Details page. For the genes in the bag this lists the number of genes with a particular GO term and a p-value which is the probability that this number of genes were annotated with the GO term by chance, given the abundance of the GO term in a reference population.
    • NEW – KEGG pathway widget on the gene Bag Details page. This shows the number of genes in the bag that are associated with a particular KEGG pathway, links give the list of genes and more information about the pathway.
    • UPDATE – The constraint editor pane of the QueryBuilder has been made clearer.
    • NEW – import query from XML link added to the FlyMine home page.

Release 6.1

FlyMine 6.1 contains a major overhaul of the way bags are handled. Bags can now only contain actual objects rather than identifiers or symbols. This means that any object in a bag has already been found in FlyMine which should reduce confusion. A sophisticated bag upload system has been added to aid in creating bags from external lists of identifiers.

PLEASE NOTE – most saved user content is automatically upgraded between FlyMine releases. In this case it was not possible to port some types of bags. These are still available in the 6.0 archive, please contact support [at] flymine.org if you have any queries about transferring bags.

Bags also now have a type (class) assigned to them – for example Gene, Protein, GOTerm. This means that when editing a constraint in a template query only bags of the correct type will be listed – so if the template requires you to enter a Gene identifier the bags dropdown will list any Gene bags in your profile. The same is true when creating/editing a query in the QueryBuilder, just add a constraint on the identifier, name, etc of a class and you will see available bags.

The ‘Bags’ page in MyMine (select ‘Bags’ or ‘MyMine’ from the top menu bar) now allows you to paste in a list of identifiers and select a type for the new bag. The input can be a mixture of different identifier types, for example if you wish to create a bag of Drosophilagenes if can be a mixture of CGxx, FBgnxx and symbols. In the case where an object can’t be found to that matches a particular input identifier, FlyMine will attempt to help. For example if the input list contains a UniProt protein identifier, but you choose to make a gene bag, the website will attempt to find a related gene. Any matches found in this way will be reported for you to choose which are added to your bag.

As an example, when creating a Gene bag from these identifiers: zen CG2328 FBgn0015379 Q8IML9_DROME unknown_name FlyMine will find a gene for each of the first three identifiers and find the gene for the Q8IML9_DROME protein. The “unknown_name” will be reported as not found.

Also new are bag details pages. These are accessible for any of your saved bags in the ‘Bags’ tab of MyMine. They have a similar layout to object details pages but run templates for all objects in your page. On the page for gene bags is the first of many ‘widgets’ we plan to add for a viewing and analysis of data in bags. Currently there a widget that graphs the genes from a bag that are over/under expressed in different tissues according to the FlyAtlas data set (www.flyatlas.org). Note that clicking on any of the bars in this graph allows you to create a new bag of genes in that category. More functionality will be available on these pages in release 7.0/

Release 6.0

FlyMine 6.0 adds data from seven Dropsophila RNAi screens from the DRSC, microarray-based gene expression data from FlyAtlas and Transcriptional cis-regulatory modules (CRMs) from REDfly. Template queries have a new, succinct naming scheme and identifiers/names in results tables are now links to object details pages. Other data sources have been updated/added and there are numerous interface improvements.

NOTE – from release 6.0 FlyMine does not support saved bags of objects. User’s object bags have not been ported to release 6.0 but can still be retrieved from the release 5.0 archive. If any user wishes to use information from old object bags we recommend exporting from 5.0 as a list of appropriate identifiers and uploading to release 6.0. Please contact us if you require any help with this. All other saved information has been transferred to release 6.0.

An upcoming release of FlyMine will introduce a new system for uploading and managing bags of data.

  • Data
    • NEW – Microarray-based gene expression data for D. melanogaster from FlyAtlas.
    • NEW – High-throughput cell-based RNAi screens from the Drosophila RNAi Screening Center.
    • NEW – Transcriptional cis-regulatory modules (CRMs) for D. melanogaster from REDfly.
    • NEW – Probe sets from the Affymetrix GeneChip Drosophila Genome 2.0 Array.
    • NEW – Syntenic regions between D. melanogaster and D. pseudoobscura.
    • NEW – Orthologues and GO annotation for S. pombe and P. falciparum.
    • NEW – Anatomy ontology for Drosophila.
    • UPDATE - D. melanogaster genome annotation updated to version 4.3.
    • UPDATE – UniProt protein data is now at version 8.9.
    • UPDATE – Worm RNAi data from WormBase 30th September 2006.
    • UPDATE - D. melanogaster BACs are now loaded.
    • UPDATE – Protein fragments from UniProt are now identified by an isFragment true/false attribute.
    • UPDATE – Proteins now have a length (in amino acids) and molecularWeight (in Daltons).
    • FIX – Protein interaction confidence scores restored.
    • FIX – PCRProducts now reference the TilingPathSpan they create.
    • FIX – Some unpublished microarray experiments have been removed.
  • User Interface
    • NEW – Names and identifiers in results tables now link to object details pages.
    • NEW – Template names have been changed so that they now have a short name and a longer description. The short name is the main name and is displayed on the aspect pages, template search results and the object details pages. This shorter name has the format ‘query starting point(s)’ –> ‘query output’ and allows easier scanning of templates to find the one required. The longer description can be viewed when the template form is accessed.
    • NEW – When running a template query the title and description is displayed above the results.
    • NEW – The trail now includes a ‘Query’ link back to the template or Query Builder, i.e. now ‘Query -> Results’.
    • UPDATE – It is no longer possible to create bags of objects from results or upload.
    • UPDATE – Chromosomal locations are displayed in a more compact format (e.g. 2R:1598168-1676472).
    • UPDATE – ‘History’ has been renamed to ‘My Mine’ and the layout improved.
      • improved query and template XML import/export options.
      • added a ‘Change Password’ tab.
      • bag listing and upload pages have been combined.
    • FIX – Searches in bags of strings are no longer case sensitive.
    • FIX – After running a template query ‘Current Query’ will now return to the template form instead of the Query Builder page.
    • FIX – To address a performance issue bag size is limited to 100 entries unless you are logged in (when the limit is much higher).
  • Tools
    • There are no changes to tools available in release 6.0.
  • Known issues
    • There are currently no known issues with data in release 6.0.