To improve security and privacy, HumanMine is moving all of its Web sites and services, including Web APIs, to HTTPS only by 30 September, 2016.
If you use HumanMine only through a Web browser (like Safari, Firefox, Chrome, Internet Explorer, Opera, etc.), this document is not of interest to you. The only change you should notice after the deadline is that a green lock icon should appear inside the box, and the web addresses of the HumanMine pages you visit will start with https://.
If you maintain software that uses HumanMine APIs or accesses HumanMine servers through the Web, you should understand and act before the deadline to ensure uninterrupted service.
Applications that access HumanMine web servers using http: URLs, instead of https:// URLs, may fail partially or completely after HumanMine switches to HTTPS-only.
The HTTP protocol does not provide encryption, so anyone who can see web traffic between a client (for example, a web browser) and a server can intercept potentially sensitive information, and/or inject malware into users’ browsers or operating systems. HTTPS solves this problem. It works just like HTTP, except that traffic is encrypted in both directions, so observers between the client and the server can’t intercept or tamper with the requests or responses. It also provides authentication, ensuring that the client is communicating with the intended server given by the hostname, and not some impostor. (Source)
HumanMine has been updated to the latest version of NCBI Entrez Gene. All other data sets have also been updated to the newest versions and we have fixed a few bugs. See the data sources page for a full list of data and their versions. All data can be accessed through our comprehensive library of template searches or by building your own queries using the query builder.
New Data Source: ClinVar
We added a new data source linking genes with their alleles and associated diseases. Here’s an example query:
FlyMine has been updated to the latest version of FlyBase. All other data sets have also been updated to the newest versions and we have fixed a few bugs. See the data sources page for a full list of data and their versions. All data can be accessed through our comprehensive library of template searches or by building your own queries using the query builder.
If you have any questions, please see our docs and videos. Please do not hesitate to contact us should you require any further assistance. For all types of help and feedback email firstname.lastname@example.org
Currently genes in the human InterMine (humanmine.org) have the Ensembl gene identifier (e.g. ENSG00000000003) as the “primary” identifier and the NCBI gene identifier (e.g. 7105) as the “secondary” identifier. In the next release of HumanMine, this will be switched.
A small change! But may impact your lists of genes. Please contact us if you are worried. We will keep the current version of HumanMine available for your convenience for the next few months just in case.
Why not just use both identifier schemes?
This is what we have done, and will continue to do. The problem is the two organisations do not agree completely on the genome annotation. This means that what Ensembl says is a gene may not be considered a gene by the NCBI. In fact there is a many-to-many relationship. There are some NCBI IDs that map to zero, one or several Ensembl identifiers. Conversely, there are some Ensembl identifiers that map to zero, one or several NCBI gene identifiers.
Why did you pick Ensembl identifiers?
There are a lot of quality data sets that use Ensembl identifiers. Not using Ensembl identifiers means that we may lose information from these valuable studies.
Why did you switch to using NCBI identifiers?
We are part of a BD2K pilot for the NIH Commons project involving six major model organism databases: fly (FlyBase), mouse (MGI), rat (RGD), worm (WormBase), yeast (SGD), zebrafish (ZFIN). All of the model organisms use NCBI identifiers for human genes. For interoperability, we decided to use NCBI identifiers as well.
What were the final numbers? How much data was “lost” or gained?
loaded into HumanMine database
both NCBI and Ensembl identifier
Only 36 NCBI genes do not have a corresponding HGNC symbol.
There were 94 Ensembl identifiers that are assigned to more than one NCBI gene.
There were approximately 100 NCBI genes associated with more than one Ensembl identifier. In these cases, we did not assign the Ensembl to be an identifier. Instead we placed the two as “synonyms” so users can still search and find the relevant genes.
Why do I care?
If you have a saved list using Ensembl IDs, there may be data loss. We will keep the current version of HumanMine available for your convenience for the next few months just in case — so you aren’t in danger of losing any of your saved data.
We have a brand new blog and so would like to take this opportunity to tell you our grand plans for 2016.
Currently InterMine is built with a series of ant commands, and dependencies are managed manually. This of course is not ideal, and we plan to use Gradle to replace Ant and manage our dependencies automatically. This change will make builds faster, easier and more efficient.
For those of you with InterMines of your own, this means that you will use different commands for building your databases and deploying your webapps. We’ll provide the new commands along with documentation, and aim to make the transition as easy as possible.
We currently use Lucene for our search index but plan to greatly expand our utilisation of this great library — making search on InterMine more robust, sensitive and powerful.
Some have already deployed their InterMine to the cloud. We intend to make this process much easier, probably by creating a custom InterMine buildpack which pre-configures a Docker container with all of InterMine’s dependencies.
New Data Sources
We are always adding new data sources and would like to hear your suggestions. On our list right now is:
And of course we will continue to update our current data source library as file formats and data change.
New User Interface
We’ve developed a new user interface which should be ready for beta testing in early 2016. It’ll exist alongside the current interface for some time, allowing you to feed back ideas, suggestions, and critiques in the new interface, whilst still being able to rely on the old one.
Here’s a sneak preview (subject to plenty of change, of course!):
To go along with our new interface, we’re going to be adding a lot of new tools for you to use. Our wish list so far (not in order of priority):
Advanced Search / Query builder / Guided search
Recommendation engine (which gene is like my gene?)
Complex Interaction viewer
more powerful region search
InterMine search tool
Text mining tool
JBrowse / other genome browsers
UniProt protein browser
We’d like to hear which tools are important to you. We also will improve the tools we currently have, making them easier to adapt to your data sets.
2017 and beyond
Genomes are being sequenced every day, technology is moving at an ever more rapid pace and everyone is facing a challenging funding environment. We don’t know quite what the world will look like in the next five years but we are working hard to be future proof. We’ve always had a deep commitment to openness, flexibility and collaboration, and feel that this will help us meet any future challenges.
Towards this end, we are running a pilot program to test out various graph databases and to explore the semantic web. We will keep you posted on our progress as always, and would like to hear your thoughts.
Thanks to our great community for all of their support over the years! We look forward to a really exciting year!