InterMine 4.0 – InterMine as a FAIR framework

We are excited to publish the latest version of InterMine, version 4.0.

It’s a collection of our efforts to make InterMine more “FAIR“. As an open source data warehouse, InterMine’s raison d’être is to be a framework that enables people to quickly and easily provide public access to their data in a user friendly manner. Therefore InterMine has always strived to make data Findable, Accessible, Interoperable and Reusable and this push is meant to formally apply the FAIR principles to InterMine.

What’s included in this release?

  1. Generate globally unique and stable URLs to identify InterMine data objects in order to provide more findable and accessible data.
  2. Apply suitable ontologies to the core InterMine data model to make the semantic of InterMine data explicit and facilitate data exchange and interoperability
  3. Embed metadata in InterMine web pages to make data more findable
  4. Improve accessibility of data licenses for integrated sources via web interface and REST web-service.

More details below!

How to upgrade?

This is a non-disruptive release, but there are additions to the data model. Therefore, you’ll want to increment your version, then build a new database when upgrading. No other action is required.

However, keep reading for how to take advantages of the new FAIR features in this release.

Unique and stable URLs

We’ve added a beautiful new user-friendly URL.

Example: http://beta.flymine.org/beta/gene:FBgn0000606

Currently this is used only in the “share” button in the report pages and in the web pages markup. In the future, this will be the only URL seen in the browser location bar.

For details on how to configure your mine’s URLs, see the docs here.

See our previous blog posts on unique identifiers.

Decorating the InterMine data model with ontology terms

InterMine 4.0 introduces the ability to annotate your InterMine data model with ontology terms.

While these data are not used (yet), it’s an important feature in that it’s going to facilitate cross-InterMine querying, and eventually cross-database analysis — allowing us to answer questions like “Is the ‘gene’ in MouseMine the same ‘gene’ at the EBI?”.

For details on how to add ontologies to your InterMine data model, see the docs here.

Embedding metadata in InterMine webpages

We’ve added structured data to web pages in format of JSON-LD to make data more findable, and these data are indexed by Google data search. Bioschemas.org is extending Schema.org with life science-specific types, adding required properties and cardinality on the types. For more details see the docs here.

By default this feature is disabled. For details on how to enable embedding metadata in your webpages, see the docs here.

Data licences

In our ongoing effort to make the InterMine system more FAIR, we have started working on improving the accessibility of data licences, retaining licence information supplied by the data sources integrated in InterMine, and making it available to humans via our web application and machines via queries.

See our previous blog post on data licences.

For details on how to add data licences to your InterMine, see the docs.

Future FAIR plans

  1. Provide a RDF representation of data stored, lists and query results, and the bulk download of all InterMine in RDF form, in order to allow the users to import InterMine resources into their local triplestore
  2. Provide an infrastructure for a SPARQL endpoint where the user can perform federated queries over multiple data sets

Upcoming Releases

The next InterMine version will likely be ready in the Fall/Winter and include some user interface updates.

Docs

To update your mine with these new changes, see upgrade instructions. This is a non-disruptive release.

See release notes for detailed information.

We’re adopting a code of conduct!

TL;DR:

Please read our Code of Conduct draft and comment if you need to.

Longer:

A growing and important movement in open source communities is to adopt a code of conduct, which generally governs behaviour amongst community members, and provides backing to enforce necessary actions if anyone within the community behaves in an unacceptable or unwelcoming manner. We haven’t had any problems, and we’d like things to stay that way in the future.

If the past is anything to go by, we’ll set this code of conduct up and rarely or never need to enforce anything, but it’s better to have clear guidelines in place and not need them, than vice versa. We’d also like to get this in place before anything happens, rather than as an obvious too-late response to an incident – not that we’re anticipating anything!

The draft we’ve put together is adapted quite closely from the Django code of conduct. We’re particularly grateful to them for licencing it under a creative commons attribution licence so we could re-use it.

Read the InterMine Code of Conduct draft here.

Questions?

If you’d like more info about codes of conduct – why they’re important, what topics they cover, etc., please see:

Comments, or questions that weren’t answered by the links above?

Feel free to comment on this post, tweet us, email yo@intermine.org, or info@intermine.org. Please comment by the 19th of March 2019.

 

Header image from flickr, taken by Mike McSharry and licenced under CC-BY-2.0 https://www.flickr.com/photos/mikemcsharry/5360225083/

InterMine, Oracle and the Future of Java

There have been a few questions about Oracle’s announcements on the future of Java, so this post hopes to cover what actually has changed and how this impacts InterMine as a software package.

In short, these changes do not impact InterMine negatively, but we should be aware of these issues.

Oracle JDK 11 is not free for use in production; Use OpenJDK instead

Oracle changed its licencing a bit. Starting with Java 11, Oracle now releases its two JDKs under different licences:

  1. OpenJDK (open source under GPL)
  2. Oracle JDK (commercial licence)

(Previously, Oracle had released these both under the BCL licence which allows a mix of free and commercial use, so you only had to pay “sometimes”).

To use the Oracle JDK 11 in a production environment, you now need to purchase a commercial licence. You are still allowed to use this JDK in development, for demos etc but the Oracle JDK 11 is NOT free to use in production.

We develop InterMine against (and recommend people use) OpenJDK instead of the commercial JDK Oracle provides. As of Java 11, these two JDKs are now virtually identical so this is safe.

Oracle JDK 8 — “End of Public Updates”; Use OpenJDK instead

Oracle will provide public updates of Oracle JDK 8 through at least December 2020 for personal desktop use and January 2019 for commercial use. You can continue to use Oracle’s JDK indefinitely without updates, but that’s a bad idea for security and functionality reasons. If you want updates to Java 8, switch to OpenJDK, there are free OpenJDK builds from other providers like AdoptOpenJDK, Azul, IBM, Red Hat, other Linux distros etc.

OpenJDK binaries from Oracle will only be provided until the next JDK release; Use OpenJDK from a non-Oracle provider

Oracle changed their release schedule to be twice a year, and they will not provide a LTS release for OpenJDK. Oracle will not provide updates to older Open JDK versions, e.g. versions older than six months. This includes security fixes!

This is troubling as the InterMine release schedule is such that it’s not feasible to update Java versions every six months. But we can’t ignore needed security fixes.

However, RedHat announced in September that they would take a leadership role in this area. Some, e.g. https://adoptopenjdk.net, plan to offer an OpenJDK LTS releases for free. So there will be OpenJDK LTSs available, just not from Oracle.

What does this all mean for InterMine? Not Much!

We’ll keep monitoring the situation but this seems like the usual way that companies manage open source projects — providing open software and additional paid support. So nothing to be alarmed about. OpenJDK is open source, so we are safe.

People are (rightly?) concerned about Oracle’s true commitment to Java and open source going forward. What if they change their mind and don’t release updates to OpenJDK? For InterMine this isn’t too scary because worst case scenario we could use an older stable version of Java. However in this nightmare scenario it’s likely that Java would be forked and we could carry on.

Future InterMine plans

We have no plans to migrate away from Java and will continue to develop using the OpenJDK as normal. We develop against the Java specification not the version so we aren’t tied to a specific Java version. For now, we’re recommending staying with OpenJDK 8 but plan to start testing with Java 11 soon.

Although some are suspicious of Oracle due to past experiences, we are optimistic about the future of Java, as the community really seems to be responding to the need for a secure and open Java.

More reading:

 

 

InterMine at #GCCBOSC Portland – 7 days of fun, sun, and code…

BOSC (the Bioinformatics Open Source Conference) is normally part of ISMB (Intelligent Systems for Molecular Biology), but for the first time this year, it teamed up with The Galaxy Community Conference (GCC) instead. For us, this presented an exciting opportunity – like a regular BOSC but with the added bonus of training days and the chance to interact with Galaxy contributors during the CollaborationFest hackathon (and the rest of the conference too).

Our agenda at the conference ended up being quite full:

Handling integrated biological data using Python (or R) and InterMine

We delivered a training session on the 26th of June: Handling integrated biological data using Python (or R) and InterMine. Leyla Ruzicka from ZFIN was kind enough to travel up from Eugene to Portland, to help us deliver the UI portion of the training. Once we’d familiarised users with how InterMine worked a little bit, Daniela introduced the API side of things, and then we spent the remainder of the session working through a series of exercises in Jupyter notebooks, live-coding on a projector so others could learn about our code and follow along themselves.

While we did recommend to people that they try to install the InterMine Python client, we also managed to work around the issue for anyone who didn’t have things installed, thanks to binder. You can still see the tutorial exercise notebooks and work through them, and we have the same set of notebooks with answers if you get stuck or need a hint. This was the first time we worked through the exercises interactively onscreen this way, but it seemed to work well! I’m hopeful we can continue providing the API portion of our tutorial this way in the future.

We had planned to do an R section, but actually ran out of time to do this – the tutorial was about two and a half hours in total. If an R tutorial is something of interest in the future though, please do let us know! You can do this via comments on this article, twitter, pop by chat.intermine.org, or email us at info – at – intermine – dot – org.

InterMine 2.0: More than fifteen years of open biological data integration

[Slides link] We were very pleased to have a talk accepted as well as the training, giving us a chance to introduce InterMine to others and talk about its history. While I was talking I mentioned that we were ranking at just under 300 stars on our main GitHub repo, and the audience kindly help bump it up and over 300!

intermine-stars

One of the topics I focused on during the talk included a massive thanks to all of the work our broader community does to help keep InterMine become and remain a great resource. Afterwards, Lorena Pantano raised the question: how do you get others to adopt your work and contribute to it?

Personally, I’ve been working at InterMine for three years now, so I certainly can’t attest to the entirely of the history – much of this is doubtless down to the team’s great work and Gos’s great vision (and grant writing!) – but I also think one of the most important parts is probably down to making it easy for others to use your work: good developer docs, tickets that explain issues clearly, help documentation for end-users, etc. I’d love to hear more thoughts about this in the comments!

Birds of a Feather sessions

Daniela and Yo both ran separate Birds of a Feather unconference-style sessions over lunch. Yo’s BoF focused on getting (and keeping) more open source contributors – Nicole Vasilevsky was kind enough to keep notes for this session. Thanks, Nicole!

Meanwhile Daniela shared  the InterMine approach to implement stable and persistent URIs and the possible related issues, inspired by other data integrators and the lessons learnt in the Identifiers for the 21st century paper; some attendees have also contributed providing their own solutions.

Hackathon

42394043775_eeb59807ee_o
Group meeting session at CoFest. Try to spot Daniela! 😉

During the CollaborationFest hackathon, Daniela and Yo were able to complete (yeahhhh!!) the integration between Galaxy and InterMine thanks to invaluable help of Daniel Blankenberg.
On the next Galaxy release, the new InterMine plugin will be available and will allow to import data (from InterMine) into Galaxy and export lists of identifiers (e.g. proteins, genes) from Galaxy (into InterMine) by selecting the mine instance from the InterMine registry. Watch this space – we’ll hopefully arrange to get some details on the Galaxy training network to explain how to run the data imports in each direction.

All GCCBOSC photographs in this post are from Berenice Batut’s Flickr album, under a CC-BY-SA licence

GSoC 2018 Students Announced! 🌞☀️

After last year’s great success, we’re really excited to welcome six Google Summer of Code students to work with us again this year:

Aman Dwivedi will be working on a Cross-InterMine search tool. This will use the registry to allow users to search multiple InterMines at once, and should be a good way to figure out which mine has the data you’re looking for. Aman will be mentored by Nadia Yudina, herself a graduate of one of last year’s InterMine+GSoC program.

Adrián Rodríguez Bazaga will be working on something we’ve always wanted: an InterMine data browser – hopefully a tool that will allow users to learn a bit more about data inside an InterMine without having to know the data model. Yay for easier learning curves! Adrian’s mentor will be Yo Yehudi.

Arunan Sugunakumar is going to explore hooking InterMine up to a more modern search package, probably Solr or ElasticSearch. Our current version of Lucene is very old, and we know there are better options out there!  Daniela Butano will mentor this project.

Jake Macneal is going to work on a prototype to convert natural language questions into InterMine PathQuery – it would be exciting to have a user type “Show me all the genes associated with diabetes” into an InterMine, and get a sensible set of results back! Aaron Golden will mentor Jake.

Nupur Gunwant will be adding additional features to our python client, such as registry communication, a query manager, and visualisations. Julie Sullivan will be Nupur’s mentor for this project.

Ankit Kumar Lohani will be working on Buzzbang – a search engine to crawl multiple biological sources including, but not exclusively, InterMine instances. Justin Clark-Casey will be Ankit’s mentor.

We’re also planning to post a short interview series highlighting each student and their plans for the summer. We can’t wait to get started!!

 

 

 

 

 

 

 

Cambridge Science Festival 2018: A fruity crime of passion 🍏🍋🍊🍓

TL;DR: Science Festival was great & kids loved it. You can re-use our materials, here.

Longer version: Last weekend was InterMine’s very first year at Cambridge’s famous Science Festival, an event designed to enthuse younger people and adults alike with awe for science. We split our time across two locations,working at our home department, Genetics, on the Saturday, and at the Cambridge Guildhall on the Sunday.

Our theme was around open science, with an activity designed to reinforce the idea that shared data (and therefore more data from different sources) results in better science. For adults we had a couple of great posters about the importance of data sharing, designed by Julie and Rachel. The posters are available freely online for re-use under a CC0 licence.

The Story: A party is rudely interrupted

Meanwhile, for kids (and some adults too!) we had a crime-solving activity. In our scenario, a dastardly fruit villain had stolen the passionfruit in the midst of an otherwise enjoyable soirée. In their haste to flee, the culprit knocked over a tin of blue paint, leaving tracks behind, as well as injuring themselves and leaving DNA evidence behind as they jumped out the window. We had four fruity suspects:

suspects-sheet.png

Solving the crime

Step 1: footprints in the paint

In order to solve the crime using science, our young detectives were invited to examine the footprints left by the culprit:

Fruit tracks at the crime scene. Excuse the glare from the plastic!
Fruit tracks at the crime scene. Excuse the glare from the plastic!

It was usually pretty easy to rule out the apple, and after thinking a little more, the strawberry could be ruled out too, but the orange and the lemon both looked rather similar.

Step 2: Juice found at the scene

Since the devilish thief had hurt themselves, we had samples to analyse. Our criminal investigators took strips of litmus paper and carefully examined the evidence:

20180318_105143

Once again, the evidence wasn’t quite conclusive (and was very sticky). Still, it was fun! Let’s move on to the next bit of evidence…

Step 3: the skin

With sample fruits to compare, our enterprising criminologists got a step closer to the solution. Could the skin be from a lemon? Hmmm.

20180318_105151

Step 4: We have samples, so let’s sequence the DNA!

Okay, so you may have guessed that we didn’t sequence the DNA of the suspects ourselves – but thankfully the lab had four profiles for us to compare to and they managed to quickly provide a DNA fragment from the crime scene evidence, too. This fragment was far more conclusive than the others, pointing unequivocally to the shadiest character of the bunch – Lithium Lemon.

Step 5: Putting the puzzle pieces together, and sabotage!

As our sleuths solved each different activity, we gave them a puzzle piece. At this stage they had four pieces of the puzzle, but they were still missing a couple of critical bits: the two central pieces. It turns out there had been some CCTV footage – but it had been stolen! After looking around, our vigilant investigators discovered where the crime scene video had been hidden (under the table) and managed to put the entire story together. Once again, shown front and centre of the puzzle was our suspect, Lithium Lemon.

fruit-bowl

 

Wrap up

While the shady character wad hauled off in cuffs to the county jail, successful detectives were rewarded with candy, some awesome stickers,  and a handout that had a child-oriented activity sheet on one side, with a small copy of our open knowledge posters on the other side, for the slightly more grown-up folks.

What we learned

Our tables were generally very busy, and the kids seemed to have a great time examining the evidence and putting together the puzzle pieces one by one. I’m not sure how many of them quite perceived the data sharing theme, but some of the adults definitely did, and appreciated the posters as well.

I think one of the biggest surprises for use was how busy we all were! Genetics had a steady flow of people, but the Guildhall had even more. We haven’t heard numbers for this year yet, but in 2017 apparently there were around 3,000 people. What that meant in practical terms for us: Two tables with identical versions of the activity, two InterMine team members acting as detective wranglers at each table, and often two separate groups of people working through the activity simultaneously at each table. After several hours of this we were all ready for a nap! Next time, six staff might be better to allow people to have a breather.

We also learned to keep a good eye on our puzzles: Five puzzles left the office on Sunday morning but only four returned. Hopefully it’ll be cherished at someone’s house as memories of a great activity…. ?

Our materials are open!

Given that our activity was designed to advocate openly sharing your science, we’ve shared our materials online too, and you’re welcome to re-use them.

https://github.com/intermine/science-festival/

This includes:

  • The fruit images (lovingly created by Rachel’s daughter!)
  • Handouts
  • Posters
  • Guidance sheets and in-depth “sciencey details” about each activity.

If you do re-use them, we’d love to hear about it! You can email info@intermine.org, tweet @intermineorg, or even open an issue on the GitHub repository.

Finally, I’d like to thank Rachel again for all the work she put into designing this scenario. It was creative, exciting, and overall seemed to be a hit!

 

#OpenConCam: Where open (science | access | source | data) meet.

What is OpenCon?

OpenCon is a yearly event designed to bring together people who are dedicated to open in all its incarnations. It’s in such high demand, the only way to get in is by application, and most attendees are provided with scholarships to help with travel/accommodation costs.

We weren’t able to attend the international event, but thankfully there was a great satellite event running in Cambridge – OpenConCam.

OpenConCam was in itself a day filled with memorable talks and worthwhile collaborations, including:

PeerJ – (Sierra Williams)

PeerJ is an open access journal which focuses on methodological rigour  when publishing, rather than preferring groundbreaking new science – something particularly important for early career researchers. One of my favourite points from her talk was when she demonstrated the checklist that PeerJ uses to help authors disseminate their content effectively:

Open access in developing nations (Tapoka Mkandawire)

Many of us know from personal experience that accessing scientific publications even in wealthier western countries can be controversially difficult, so it’s hard to imagine how much more difficult this must be in developing countries. Thankfully, there are initiatives such as Africa Information Highway, Eifl, and Hinari which aim to make data and publications more accessible. She also discussed the cultural concept of ubuntu – sharing and caring for each other as a concept that works hand-in-hand with the open* movement.

Bullied into Bad Science (Laurent Gatto)

Bullied Into Bad Science is a campaign to help early career researchers who may be under pressure to omit or tweak their scientific results in order to gain a desired outcome or exciting publication. Laurent was clearly passionate about this subject: Sometimes the system pressures mean that successful academics are not necessarily good scientists – and things really shouldn’t be this way.

Queen B

This session was frantic! The basic premise was that the room divided into groups of 4, nominated a “queen bee” who presented a problem (in one minute), and then the group broke up and discussed possible solutions with others in the room for three minutes, reporting back over the span of two minutes. Lather, rinse, repeat until all members in a group have been queen bees. Topics I recall discussing included getting humanities more involved in open science, open source code in science, how to inspire people to publish in journals with strict open policies when they could go for a less principled journal more easily, and how to sell open* to the disinterested.

Hitting a moving target in Open Access advocacy  (Danny Kingsley)

Danny shared something dear to our hearts: Getting others involved in open. While she was specifically referring to open access, most points could easily be applied to open science, data, and source too. Her focus was on figuring out how to get the most “bang for buck” – that is, find and influence people who will pay off the most for the least effort.

Undergrads, for example, aren’t great targets as they mostly don’t continue in academia, but PIs, and government bodies may be more useful, because they have much more influence if they’re sold on open access. Similarly, sometimes it makes more sense to influence decision makers and get them to evangelise for you, if you don’t have enough authority to impress people. Make sensible decisions, and don’t run up against brick walls repeatedly if it isn’t paying off!

Focus Groups

After lunch, we had an unconference-style set of sessions, where everyone nominated topics they were interested in, and added stars beside ideas they themselves were interested in attending. The resulting sessions were:

  • Self-care in Open: Many of us volunteer time outside a normal 9-5 job to help promote open, and the environment can be discouraging or rough sometimes – not everyone is as keep on open as we are! Suggestions presented by Kirstie Whitaker included working with micro-ambitions (turning your work into small, achievable chunks rather than trying to conquer everything), and thinking of success as a spectrum. A small win is still a win!
  • Open + inclusive: Laurent Gatto pointed out in a blog post earlier this year that the Open movements aren’t always as…. open as they should be. Sometimes Open Science can fall down in the same places less open science falls down – not making sure to have a decent balance of ethnicities, genders, sexual orientation, etc. Can we do better?

  • Open source code in science: If you’re an InterMiner, you’re probably already pretty keen on open source scientific software and can see the benefit of it – but not everyone does. Many, many papers that use code to produce their scientific results don’t expose that code. But if the code isn’t in the paper, or linked to it openly in some way… how was it peer reviewed? If the code is wrong, so is the science it produces. I proposed this discussion topic, and really enjoyed perspectives from my team mates. Some of the ideas generated included:
    • Share dummy data to run your code on, if the data are proprietary or there are privacy issues.
    • Try to encourage journals to have software availability statements
    • Encouraging researchers to share their code, even if it’s only a few lines. After all, if you’ve written 6 lines of code to configure an R plot, whilst it might seem insignificant – that’s actually really easy to peer review and correct mistakes! By comparison, bigger software packages can be hundreds, thousands, or even millions of lines of code. The thought of trying to review that (beyond reviewing quality metrics like testing, documentation, and commenting) makes me a bit scared.
  • Open in the humanities: This is a fascinating subject, and I don’t think many (any?) of the audience members were in the humanities. We raised a lot of questions about the shape of humanities data.

https://twitter.com/lauren_cad/status/931169637210972161

Opening the lab door (Christie Bahlai)

After the focus groups, Christie Bahlai skyped in to talk about running an open lab. She shared some of the different types of pushback against open science:

  • Those who consider themselves too busy to share
  • People who have been pushed from ‘busy’ status to actively hostile against open science, perhaps when they were asked to participate further and didn’t wish to
  • The worried –  people who have legitimate concerns about open science (I’m sure I’m not the only person who doesn’t really believe in “anonymised personal data”).
  • The unheard – those who are disadvantaged and marginalised already worry that practising open will marginalise them further. How can we protect these people?

She also talked about getting people involved in open as early as possible, including introductions to open as part of the undergrad curriculum:

A few more of her tips:

  • Get students’ feet wet in open science by slowly introducing them to the concepts using examples in their own fields – examples they’ll care about.
  • Share your lab policies openly and don’t tolerate the “brilliant jerk” – at the end of the day no matter how productive they are, they’re still jerks.
  • Keep science a kind place. Show others that you too can fail publicly, and fail often.
  • Share your lesson plans openly, too! Christie’s “Reproducible quantitative methods” curriculum is designed to provide a good introduction to open, reproducible data wrangling using R and GitHub.

The open source investigation revolution (Eliot Higgins)

This talk was an out-of-the-blue surprise. Rather than focusing on academia like most of the previous talks, Eliot shared how open videos, photos, and “facts” on the web can be verified for journalism. If you’ve heard of doxxing, you’ll know a bit about the techniques Eliot described, using social media, satellite imagery, and other online tools to track people who don’t want to be tracked – but this time, for Good. He described how some of the white supremacist rally leaders were identified, as well as verifying missile attacks in Syria – including who perpetrated them and who was lying about it.

This talk stilled twitter’s usually vibrant #OpenConCam discussions to a halt, probably due to the riot of emotions it induced in most of the participants. We’d been shown highly disturbing images, felt fear wondering how these techniques could be misused, and we awed by the massive importance of what we’re seeing, no matter how awful it was. I’m sure I wasn’t the only person torn between wishing I’d never seen it and knowing that I had to watch it, because burying our heads in the sand isn’t an option either.

Wrap-up

OpenCon 2018 hasn’t been announced yet, but this year, all around the world, there are still satellite events like the one I attended. If you haven’t attended a conference about working openly before, this is a great way to get a taste – or if you’re a die-hard enthusiast, you’ll get the chance to meet like-minded individuals and be inspired!