Data integration and Machine Learning for drug target validation

Hi!

In this blog post I would like to give a brief overview of what I’m currently working on.

Knowledge Transfer Partnership: what & why?

First, in order to give context to this post, last year InterMine at University of Cambridge and STORM Therapeutics, a spin-out of University of Cambridge working on small modulating RNA enzymes for the treatment of cancer, were awarded a Knowledge Transfer Partnership (KTP) from the UK Government (read this post for more information). With this award, the objective is to help STORM Therapeutics advance their efforts in cancer research, and contribute to their ultimate goal of drug target validation.

As part of the KTP Award, a KTP Associate needs to be appointed by both the knowledge base (University of Cambridge) and the company (STORM). The role of the KTP Associate is to act as the KTP Project Manager and is in charge of the successful delivery of the project. For this project, I was appointed as the KTP Associate, with a Research Software Engineer / Research Associate role at the University of Cambridge, for the total duration of the project: 3 years.

Machine learning and a new mine: StormMine

Now that you know what the KTP project is about, and who is delivering it, let’s move on to more interesting matters. In order to successfully delivering this project, the idea is to use the InterMine data warehouse to build a knowledge base for the company, STORM, that enables their scientist to have all the relevant data for their research in a single, integrated, place. For this reason, several new data sources will be integrated into a STORM’s deployment of the InterMine data warehouse (StormMine, from now on), and appropiate data visualizations will be added.

Then, once the data is integrated, we can think towards analysing the data to gather insights that may help the company goals, such as applying statistical and Machine Learning methods to gather information from the data, as well as building computational intelligence models. This leads the way towards what I’ve been working on since my start in February, and will continue until July 2019.

In general terms, I’m currently focused on building Machine Learning models that are able to learn how to differentiate between known drug targets and non-targets from available biological data. This part of work is going to be used as my Master’s Thesis, which I hopefully will deliver in July! Moreover, with this analysis, we will be able to answer three extremely relevant questions for STORM, and which are the questions leading the current work on the project. These questions are

  1. Which are the most promising target genes for a cancer type?
  2. Which features are most informative in predicting novel targets?
  3. Given a gene, for which cancer types is it most relevant?

If you are interested in learning more about this work, stay tuned for next posts, and don’t hesitate contacting me, either by email (ar989@cam.ac.uk) or connect with me in LinkedIn (click here)!

 

InterMine 3.0 – Solr search

InterMine 3.0 is now available and features a brand new search powered by Solr.

Default search configuration will work well, but Solr allows for endless configuration for your specific needs.

Now the first search after deployment is instant, you can inspect the search index directly (via http://localhost:8983/solr/) and there’s a facet web service (via /service/facet-list and /service/facets?q=gene). Certain bugs, e.g. searching for the gene “OR”, are also now fixed.

New Configuration Option – optimize

There is a new keyword search configuration setting: index.optimize. If set to `true`, reorganises the index so chunks are placed together in storage which might improve the search time. (Similar to defragmentation of a hard disk.) See the configuration docs for more details.

Docs

Installing Solr

Configuring the keyword search

InterMine 3.0 upgrade instructions | release notes

A big thank you to our clever and hard-working 2018 Google Summer of Code student Arunan Sugunakumar — who did the bulk of the work as part of his summer project. Great job!

New Blog!

We’ve decided to streamline our blogging experience a little bit. Rather than maintaining several separate but mostly similar blogs for HumanMine, FlyMine, and InterMine, this blog will act as a combined stream.

Don’t worry – this doesn’t mean you’ll be forced to view irrelevant updates if you’re only interested in one of the sub categories. WordPress is great about filtering via tag or category. Here are a few quick links:

InterMine-specific updates: https://intermineorg.wordpress.com/category/intermine/

FlyMine-specific updates: https://intermineorg.wordpress.com/category/flymine/

HumanMine-specific updates: https://intermineorg.wordpress.com/category/humanmine/ (It’s empty now, but coming soon!)