Where’s Wally?

Last week InterMine attended the first RSE (Research Software Engineer) conference (look at the picture…we are there!)


But what’s an RSE? In the Introduction main talk, the first day, Caroline Jay from the University of Manchester defines RSEs as “the coalface of ensuring that computational science is accurate, reliable and reproducible, and their views on making progress in this domain are therefore particularly valuable.”  Particularly valuable because, as a slogan that everyone loved says, “Software can exist without a paper but a paper and the results can’t. If the software is wrong, the science is wrong”.

As promised by the organizers, the conference focused exclusively on the issues that affect people who write and use software in research, not people who write papers. In two days there were a lot of interesting talks and workshops about how research software engineers can grow a project for science, best practices, software development process, docker…

As InterMine team, we’ve contributed to the conference, sharing our story, what “open” really means to us, why we choose open source and how we try to be open. The image below shows our vision about Open Source.


We have also shared the best practices we’ve learned, over the years, in designing, writing and maintaining open source software for science, hoping that people embarking on their first open source project could benefit from these. [Slides from our talk]

We had a great time talking and meeting with a lot of very friendly and passionate people sharing idea, best practices, issues and doubts.

Thank UK-RSE folks for organizing a so great event!

See you next year!

Save the date: 29-31 March 2017

Remember the big International InterMine meetup we were tentatively discussing a few months back? Thanks again to everyone who responded to the survey, as it helped us a lot. We’re still in the process of nailing down the details, but here’s the rough program we are expecting (with details potentially subject to change but hopefully they won’t…)

Wednesday 29th March 2017: Arrival at Berkley in preparation for the fun ahead. Hopefully there will be some limited accommodation on site available for early birds.

Thursday 30th and Friday 31st March 2017: The conference itself. Details to be confirmed.

Saturday 1st and Sunday 2nd April: Hackathon! Entirely optional. Put your thinking caps on and start looking for fun ideas.

We’ll post more details as things become concrete, and we’d love to hear from you if you have any ideas or thoughts regarding the conference and its content. We still need to think of a catchy name and hashtag for twitter!

Finally for now, we’d like to give massive thanks to the folks at JGI for helping us to coordinate this.


HumanMine moving to HTTPS

What is happening?

To improve security and privacy, HumanMine is moving all of its Web sites and services, including Web APIs, to HTTPS only by 30 September, 2016.

If you use HumanMine only through a Web browser (like Safari, Firefox, Chrome, Internet Explorer, Opera, etc.), this document is not of interest to you. The only change you should notice after the deadline is that a green lock icon should appear inside the box, and the web addresses of the HumanMine pages you visit will start with https://.

If you maintain software that uses HumanMine APIs or accesses HumanMine servers through the Web, you should understand and act before the deadline to ensure uninterrupted service.

Applications that access HumanMine web servers using http: URLs, instead of https:// URLs, may fail partially or completely after HumanMine switches to HTTPS-only.



The HTTP protocol does not provide encryption, so anyone who can see web traffic between a client (for example, a web browser) and a server can intercept potentially sensitive information, and/or inject malware into users’ browsers or operating systems. HTTPS solves this problem. It works just like HTTP, except that traffic is encrypted in both directions, so observers between the client and the server can’t intercept or tamper with the requests or responses. It also provides authentication, ensuring that the client is communicating with the intended server given by the hostname, and not some impostor. (Source)

Please contact us with any questions or concerns!






Exploring Blazegraph

While we’ve been testing Neo4j with all FlyMine data and with PhytoMine to verify how well it performs and scales with big databases, we started exploring another open source implementation for graph databases: Blazegraph.

Blazegraph overview

Blazegraph is a open source high-performance graph database supporting the RDF data model.

RDF is a model to describe and store data: in this model, you express facts, also known as “statements”, composed by three parts knowns as triples. Each triple is composed of a subject (the resource), the predicate (the property name of the resource) and the object (the property value). For this reasons, Blazegraph is also called a “triples store”.

Subject Predicate Object
http: //flymine.intermine.org/flymine/1007664 :hasSymbol “zen”

Blazegraph supports SPARQL (pronounced “sparkle”), a rich and expressive query language for RDF, which is extremely standardized. Using query operations like union, sort, filter and aggregation, the user can query the data in a very flexible way. With federated queries, the user can aggregate information executing queries distributed over different SPARQL endpoints and consequently discover more data across the web.

Blazegraph provides a SPARQL endpoint where the user can remotely explore, access, and download the data stored using SPARQL language; Blazegraph workbench provides a graphical interface for the REST APIs.

Blazegraph and Neo4j: different graph modelling

In Neo4j, a node in the graph corresponds to an entity in a domain. A node, but also the relationships between the nodes, can contain properties describing the object that it represents.

By contrast, in Blazegraph, the nodes don’t contain properties but primitive data like string, integer, date.

In Neo4j we’ve represented the gene entity and its relation with the organism in this way:



In Blazegraph the same concept will be represented as:


with the following statements:

triplesOnly one statement represents the relation between the gene and the organism (that one containing the predicate hasOrganism), the others describe the properties of the two entities.

The resources represented in RDF are identified by unique HTTP URIs (in the example http: //flymine.intermine.org/flymine/1007664).

Exporting FlyMine data: Intermine-RDFizer

We have exported all FlyMine data using Intermine-RDFizer.

The Intermine-RDFizer can query any InterMine endpoint via InterMine API, download the tables in tsv files and transform them into RDF nquads based on the XML object model file.


The InterMine-RDFizer script converts every row in a table into a RDF resource. The resource type is based on the class name (e.g. Gene, Organism) and the resource URI is built using the column “id”. The script converts the columns in resource properties and builds a RDF literal typed with the column’s name.

blazegrah-triplesFor FlyMine, we have created roughly 365 million triples and imported them into Blazegraph using the REST APIs provided.


We’ve started testing Blazegraph performance using all FlyMine data imported via InterMine-RDFizer and comparing the results with Neo4j.

As usual, we will keep you updated!


InterMine in Orlando: TAGC16

As many of you may know, TAGC (The Allied Genetics Conference) is happening in July,  in Orlando, Florida, USA, covering multiple model organisms.
We’ll be there, with a booth at stand 403: (Link to interactive floorplan)

Blank Flowchart - GSA (1).png

Come say hi at the booth and maybe grab some swag, like these fabulous stickers.

We’re planning to arrive and set things up late-ish Wednesday the 13th. The booth will always be staffed, but you’ll be able to catch us at the following additional sessions:

Thursday 14th July: 

  • 1:30 – 2:30 PM: Cypress Ballroom – Poster Session

Friday 15th July:

  • 2:50 – 3:30 PM: Cypress Ballroom – Poster Session

Saturday 17th July:



HumanMine 3.0

HumanMine has been updated to the latest version of NCBI Entrez Gene. All other data sets have also been updated to the newest versions and we have fixed a few bugs. See the data sources page for a full list of data and their versions. All data can be accessed through our comprehensive library of template searches or by building your own queries using the query builder.


New Data Source: ClinVar

We added a new data source linking genes with their alleles and associated diseases. Here’s an example query:


Human Data Sources Switch

We switched Entrez and Ensembl gene identifiers around. Please see our blog for details. If you have questions or problems, please contact us.

Complex Interaction Viewer

We’ve added a nice viewer for complexes. Source: http://interactionviewer.org



We have docs and videos, and for a full list of data sources available in HumanMine see the data sources list.

However, please do not hesitate to contact us should you require any further assistance. For all types of help and feedback email support@humanmine.org

FlyMine 43.0

FlyMine has been updated to the latest version of FlyBase. All other data sets have also been updated to the newest versions and we have fixed a few bugs. See the data sources page for a full list of data and their versions. All data can be accessed through our comprehensive library of template searches or by building your own queries using the query builder.

If you have any questions, please see our docs and videos. Please do not hesitate to contact us should you require any further assistance. For all types of help and feedback email support@flymine.org