GSoC’18- Cross InterMine Search Tool (Progress so far)


It has been three weeks since the commencement of the GSoC’18 Coding Phase and a lot of progress has been made till now. The project has been deployed here.

This week has been really very productive. I have worked upon several parts of the application this week. My main focus for this week was to add filters into the application. But I ended up with some more cool stuff here 😉
Let’s have a look at all of them one by one.

  1. Search Rating/Relevance Score
The application knows what you are searching for 😉

Relevance score is a value which is calculated by the InterMine QuickSearch API endpoint for every keyword which is searched. This rating determines how much relevant is the result item with the search keyword. The QuickSearch API response provides a floating value of ‘relevance’ parameter. InterMine uses a formula for determining the Search score in the specific InterMine Search Portals. I have used the same formula to convert the floating relevance score into a score out of 5. The formula is here:
=> Math.round(Math.max(0.1, Math.min(1, relevance)) * 5)

2. Different colors for different Categories in Results

The world is colorful and hence our app too 😉

This was a feedback received from the InterMine community. Having different category results shaded in different colors helps in exploring the results easily. At times, there may be a long list of result items returned by the application. Then having separate colors for separate categories helps us to explore faster. Our eyes are meant to perceive colors more quickly than text.

3. External links to reports

Every result item has a link which opens the result report on its particular InterMine portal. This report page contains more detailed information about the result item. On clicking the icon, a new tab with the given result report opens in the browser. The result link is generated dynamically:

4. Metadata about search results

Metadata for BMAP mine (search: ‘brca1′)

Having metadata about the results returned is always handy at work. Every tab in the application loads a set of metadata as attached above. This can certainly help in understanding the presence of a search term in the mine in a unified way.

5. Search/Relevance Score Filter

Score filter on sidebar of the application

With addition of score/relevance in the application, it was very much necessary to add an option to filter the results based on that. This section provides radio buttons to filter out the results based on the relevance score of the result item. I hope this feature will be extremely beneficial for the community members out there. 🙂

6. Category Filter

Every mine contains data from various diverse categories. So, this feature too was a necessary requirement of a full fledged searching tool. This section is loaded dynamically based on the types of categories returned by the API. The application uses the search result metadata received from the API to generate these category checkboxes dynamically. So, we need not worry about hard-coding any of these.


So, this was all about progress so far. Almost everything in the application is mobile optimized and ready to use. Most probably, I will also be extending the scope of this project and add a REST API service for searching multiple mines. So, the project will be a full fledged Cross InterMine Search Tool in future, i.e. a package of, a client driven search interface & a back-end REST API service. You can find the project repository here. Please have a look at the application here. I would love to have more feedback from the community. (For providing your feedback/suggestions, kindly email me at dwivedi.aman96@gmail.com) Thanks for reading. Happy Coding! 🙂

 

Advertisements

GSoC 2018 Students Announced! 🌞☀️

After last year’s great success, we’re really excited to welcome six Google Summer of Code students to work with us again this year:

Aman Dwivedi will be working on a Cross-InterMine search tool. This will use the registry to allow users to search multiple InterMines at once, and should be a good way to figure out which mine has the data you’re looking for. Aman will be mentored by Nadia Yudina, herself a graduate of one of last year’s InterMine+GSoC program.

Adrián Rodríguez Bazaga will be working on something we’ve always wanted: an InterMine data browser – hopefully a tool that will allow users to learn a bit more about data inside an InterMine without having to know the data model. Yay for easier learning curves! Adrian’s mentor will be Yo Yehudi.

Arunan Sugunakumar is going to explore hooking InterMine up to a more modern search package, probably Solr or ElasticSearch. Our current version of Lucene is very old, and we know there are better options out there!  Daniela Butano will mentor this project.

Jake Macneal is going to work on a prototype to convert natural language questions into InterMine PathQuery – it would be exciting to have a user type “Show me all the genes associated with diabetes” into an InterMine, and get a sensible set of results back! Aaron Golden will mentor Jake.

Nupur Gunwant will be adding additional features to our python client, such as registry communication, a query manager, and visualisations. Julie Sullivan will be Nupur’s mentor for this project.

Ankit Kumar Lohani will be working on Buzzbang – a search engine to crawl multiple biological sources including, but not exclusively, InterMine instances. Justin Clark-Casey will be Ankit’s mentor.

We’re also planning to post a short interview series highlighting each student and their plans for the summer. We can’t wait to get started!!

 

 

 

 

 

 

 

How did InterMine determine its FAIR milestones?

life-sciences-fair

(Cross-posted from my blog here)

At InterMine, a life sciences data integration platform, we’re working on a BBSRC grant to make data available through InterMine ‘FAIR’. What does this mean? Well, firstly  FAIR is an initiative to make dataFindable, Accessible, Interoperable and Reusable (I’ve written a lot more about this here).

Taken on its face this is a bit woolly – isn’t InterMine data already FAIR? You can find data (type some text in its general search box or perform a structured query), access it (click the web link), interoperate with it (run a live query on its API) and reuse it (hey the data’s there, download it). Well, one of the great things about FAIR is that it has specific principles and recommendations on how to make data findable, accessible, interoperable and reusable. These place a heavy emphasis on uniformity so that software can much more easily use and combine data across the countless distinct data sources hosted by different organizations across the planet.

So in applying for the grant, how did we propose to apply these recommendations to InterMine? Essentially, we performed a gap analysis between the 15 guiding principles documented in the original FAIR paper and InterMine’s current capabilities, coming up with a plan for how we would bridge this gap.

Let’s take the first findability and accessibility FAIR guiding principles as an example

F1. (meta)data are assigned a globally unique and 
persistent identifier

A1. (meta)data are retrievable by their identifier 
using a standardized communications protocol

One way to fulfil these principles, and something popular in the semantic web world,
is to make identifiers be URLs. So great, InterMine already has URLs that have a 1-to-1 mapping to biological data objects! Search for the gene MYH7 in HumanMine for instance, and the report page you get back has this URL (stripping away some non-essential tracking information).

http://www.humanmine.org/humanmine/report.do?id=1157771

Look at another biological object and that ID number will change, since this is the internal ID used to track objects within an InterMine database.

But there’s a problem here. These ID numbers are not persistent, as required by principle F1. When the data in an InterMine installation like HumanMine is updated, this is not done additively, but rather than entire database is rebuilt since data sources need to be integrated anew. And on this rebuild, MYH7 is no longer guaranteed to have the
internal InterMine ID 1157771. In fact, it’s very likely to be different.

So part of our proposal was to implement a resolution to this problem. For InterMine as a data integration platform rather than a primary data provider it’s a very complex topic, particularly as we’re generic and model driven (so in principle you could host something completely different like a company database in InterMine!). I won’t delve into the possible solutions too much here, but at the moment it looks like a tradeoff between trying to make our internal ID persistent (e.g. by maintaining the mapping to biological objects between database rebuilds) and trying to incorporate external IDs such as MYH7 directly into the InterMine URL as specified by the InterMine instance operator, something like

http://www.humanmine.org/humanmine/gene/MYH7

We’ll be reporting more on this in the future.

This was a fairly straightforward example. Some of the other principles, such as

I3. (meta)data include qualified references to other 
(meta)data

required more interpretation, and in our proposal we related actions broadly to the principles (i.e. whether they addressed one or more of findability, accessibility, etc.) rather than specific FAIR clauses.

However, we wrote our proposal some time ago. Things are moving rapidly and many of the original FAIR paper authors are working on the FAIR metrics initiative, which will measure FAIRness with programattic and quantitative tests. I think this is a great step and now something for anybody looking to FAIRify their data resource to look at closely. We’ll be looking to apply these metrics to our own work as we continue development.

 

Cambridge Science Festival 2018: A fruity crime of passion 🍏🍋🍊🍓

TL;DR: Science Festival was great & kids loved it. You can re-use our materials, here.

Longer version: Last weekend was InterMine’s very first year at Cambridge’s famous Science Festival, an event designed to enthuse younger people and adults alike with awe for science. We split our time across two locations,working at our home department, Genetics, on the Saturday, and at the Cambridge Guildhall on the Sunday.

Our theme was around open science, with an activity designed to reinforce the idea that shared data (and therefore more data from different sources) results in better science. For adults we had a couple of great posters about the importance of data sharing, designed by Julie and Rachel. The posters are available freely online for re-use under a CC0 licence.

The Story: A party is rudely interrupted

Meanwhile, for kids (and some adults too!) we had a crime-solving activity. In our scenario, a dastardly fruit villain had stolen the passionfruit in the midst of an otherwise enjoyable soirée. In their haste to flee, the culprit knocked over a tin of blue paint, leaving tracks behind, as well as injuring themselves and leaving DNA evidence behind as they jumped out the window. We had four fruity suspects:

suspects-sheet.png

Solving the crime

Step 1: footprints in the paint

In order to solve the crime using science, our young detectives were invited to examine the footprints left by the culprit:

Fruit tracks at the crime scene. Excuse the glare from the plastic!
Fruit tracks at the crime scene. Excuse the glare from the plastic!

It was usually pretty easy to rule out the apple, and after thinking a little more, the strawberry could be ruled out too, but the orange and the lemon both looked rather similar.

Step 2: Juice found at the scene

Since the devilish thief had hurt themselves, we had samples to analyse. Our criminal investigators took strips of litmus paper and carefully examined the evidence:

20180318_105143

Once again, the evidence wasn’t quite conclusive (and was very sticky). Still, it was fun! Let’s move on to the next bit of evidence…

Step 3: the skin

With sample fruits to compare, our enterprising criminologists got a step closer to the solution. Could the skin be from a lemon? Hmmm.

20180318_105151

Step 4: We have samples, so let’s sequence the DNA!

Okay, so you may have guessed that we didn’t sequence the DNA of the suspects ourselves – but thankfully the lab had four profiles for us to compare to and they managed to quickly provide a DNA fragment from the crime scene evidence, too. This fragment was far more conclusive than the others, pointing unequivocally to the shadiest character of the bunch – Lithium Lemon.

Step 5: Putting the puzzle pieces together, and sabotage!

As our sleuths solved each different activity, we gave them a puzzle piece. At this stage they had four pieces of the puzzle, but they were still missing a couple of critical bits: the two central pieces. It turns out there had been some CCTV footage – but it had been stolen! After looking around, our vigilant investigators discovered where the crime scene video had been hidden (under the table) and managed to put the entire story together. Once again, shown front and centre of the puzzle was our suspect, Lithium Lemon.

fruit-bowl

 

Wrap up

While the shady character wad hauled off in cuffs to the county jail, successful detectives were rewarded with candy, some awesome stickers,  and a handout that had a child-oriented activity sheet on one side, with a small copy of our open knowledge posters on the other side, for the slightly more grown-up folks.

What we learned

Our tables were generally very busy, and the kids seemed to have a great time examining the evidence and putting together the puzzle pieces one by one. I’m not sure how many of them quite perceived the data sharing theme, but some of the adults definitely did, and appreciated the posters as well.

I think one of the biggest surprises for use was how busy we all were! Genetics had a steady flow of people, but the Guildhall had even more. We haven’t heard numbers for this year yet, but in 2017 apparently there were around 3,000 people. What that meant in practical terms for us: Two tables with identical versions of the activity, two InterMine team members acting as detective wranglers at each table, and often two separate groups of people working through the activity simultaneously at each table. After several hours of this we were all ready for a nap! Next time, six staff might be better to allow people to have a breather.

We also learned to keep a good eye on our puzzles: Five puzzles left the office on Sunday morning but only four returned. Hopefully it’ll be cherished at someone’s house as memories of a great activity…. ?

Our materials are open!

Given that our activity was designed to advocate openly sharing your science, we’ve shared our materials online too, and you’re welcome to re-use them.

https://github.com/intermine/science-festival/

This includes:

  • The fruit images (lovingly created by Rachel’s daughter!)
  • Handouts
  • Posters
  • Guidance sheets and in-depth “sciencey details” about each activity.

If you do re-use them, we’d love to hear about it! You can email info@intermine.org, tweet @intermineorg, or even open an issue on the GitHub repository.

Finally, I’d like to thank Rachel again for all the work she put into designing this scenario. It was creative, exciting, and overall seemed to be a hit!

 

Google Summer of Code: Let the enquiries commence!

Last month we applied for InterMine to join Google Summer of Code (GSoC) as a mentor organisation, and we’re pleased to report that we have officially been accepted!

Students: Interested in working with us for GSoC?

Our GSoC site has a project ideas list and the student application guidance, which hopefully will answer most of your questions.

Want to learn more?

  • You can also read our GSoC blog posts from last year to learn more about how things went.
  • If you still have questions:
    • If the question is project-specific: email both listed mentors of the given project.
    • If the question is about GSoC in general, see the student manual.
    • We’ll be running a GSoC question and answer video call session where students can learn more about the specific projects. Updates about the exact date and time will go out on this blog, our mailing lists, and twitter.

We’ll look forward to hearing from you!

 

‘Twas the week before Christmas… [aka InterMine availability over the holiday period]

… and all through the lab, not an organism was stirring, not even a… crab?*

Emails and support: Just a quick blog reminder that the office will be pretty empty from around now until the second of January, so don’t be surprised if we take a while to reply to messages. Some of us may be in the office or working from home, but it’s pretty patchy over the holiday season, and I don’t think any of us will be answering emails between the 23rd and 26th of December, nor on the first of January.

Developer calls: There is no developer call this week (normally it would be scheduled for Thursday the 21st). I’m not sure at this point, but the call on the 4th of January may (or may not) be cancelled as well.

Be good, have fun, and we’ll see you next year!

*I can only apologise for the terrible rhyme. Apparently nothing rhymes with “Cambridge” except “drainage” (and even then it’s somewhat weak), so I tried “uni” and that only rhymed with “loony” and “goonie”. Finding rhymes for the word “office” was no better.

Looking ahead: InterMine+Google Summer of Code 2018. Could you be a mentor?

2017 is coming to an end, and I have to say it’s been a fabulous one! I’ll probably post a “cool things InterMine did this year” round-up in a week or two – but in the meantime, here’s my final Google Summer of Code blog for you all!  We’ll cover the InterMine swag just sent out across the globe, as well as plans for next year – and how you can help out.

Thank-you gifts for mentors and students

Last week, we posted care packages to all our GSoC mentors and summer students, in the form of t-shirts, stickers, and pens. The postal-service-wrinkled shirt shown above is the women’s fit shirt printed on black; unisex shirts are a slightly lighter grey colour. If you filled out the swag survey when it was sent to you, your gift should be with you soon! Tweet us your images of the items in use for extra InterMine Cool Points 😎.

GSoC 2018 – call for project ideas and mentors!

Early 2017, we put together an ideas list for GSoC projects – InterMine’s projects are numbers 3 to 9. If you want to get more of an idea what it’s like to apply, (or be a mentor), read our application guidance from last year.

Do you have a nifty idea, or an InterMine itch you’d like to scratch?

Please share it with us! Add it to our 2018 Google Summer of Code ideas list, or if you need to sound things out and discuss them a little bit, comment on the GitHub issue, or email the dev list. You can even propose several ideas, if you like! Please add all ideas by the end of 14th of December (end of this week).

Would you like to try mentoring?

Fancy a chance to earn some nifty exclusive swag like pictured above? Add your name as a possible mentor to an existing idea (or your own new idea). You can always drop us a line if you want to discuss things first. We like projects to have more than one mentor if possible.

Maybe you’re a student thinking of GSoC?

Awesome! If you have your own InterMine project idea (whether it’s brand new or you’ve already started it), or if one of the ideas on our ideas list lights your fire, it’s not too early to start talking with potential mentors about it. The application guidance we mentioned above would be a good read, too.