GSOC 2018 – Improved InterMine Search with Solr

Currently InterMine uses Apache Lucene (v3.0.2) library to index the data and provide a key-word style search over all data. The goal of this project is to introduce Apache Solr in InterMine so that indexing and searching can happen even quicker. Unlike Lucene which is a library, Apache Solr is a separate server application which is similar to a database server. We setup and configure Solr (v.7.2.1) independently from the application. We use Solr clients to communicate between the application and the Solr instance.

Here, SolrJ (v.7.2.1), a java client for solr is used to communicate between the InterMine and Solr. We also removed the bobo facet library which is used with Lucene since Solr itself provides faceted search. The implementations has been designed in a manner that InterMine would not be heavily coupled with Solr. When you want to change your search engine to something else in future, you just have make different implementations for the interfaces defined.

Currently the search index and the autocomplete index processes use Solr to index the data. The index time has improved significantly with compared to previous indexing times. For example, currently FlyMine takes around around 1900 seconds (32 mins) to index the data. But with Solr we see that it takes only 1250 seconds (21 mins) which is 34% reduction in time. Query time has also improved with Solr where a query of  “*:*” in FlyMine would take around 30-40 seconds which with Solr takes less than 1 second. Previously with Lucene, the indexed data has to be retrieved from the database during the first search after starting the webapp. This took some time but with Solr, it is not the case and the results are instantly returned.

Addition to the above, two web services have been implemented. A Facet service has been implemented which will return only the facet counts for a particular query rather than returning all the results. The other web service is Facet List service which is similar to the previous one but it will return all the facets available in a mine. It will be useful when you want to know all the facets in a mine before you run an actual search.

All these changes are made against InterMine 2.0 version. These changes will be included in an InterMine release in near future, but for those who want to try these changes immediately, can checkout this branch in Github and follow these instructions. All these changes are tested with Apache Solr (v7.2.1).

References :