Looking ahead: InterMine+Google Summer of Code 2018. Could you be a mentor?

2017 is coming to an end, and I have to say it’s been a fabulous one! I’ll probably post a “cool things InterMine did this year” round-up in a week or two – but in the meantime, here’s my final Google Summer of Code blog for you all!  We’ll cover the InterMine swag just sent out across the globe, as well as plans for next year – and how you can help out.

Thank-you gifts for mentors and students

Last week, we posted care packages to all our GSoC mentors and summer students, in the form of t-shirts, stickers, and pens. The postal-service-wrinkled shirt shown above is the women’s fit shirt printed on black; unisex shirts are a slightly lighter grey colour. If you filled out the swag survey when it was sent to you, your gift should be with you soon! Tweet us your images of the items in use for extra InterMine Cool Points 😎.

GSoC 2018 – call for project ideas and mentors!

Early 2017, we put together an ideas list for GSoC projects – InterMine’s projects are numbers 3 to 9. If you want to get more of an idea what it’s like to apply, (or be a mentor), read our application guidance from last year.

Do you have a nifty idea, or an InterMine itch you’d like to scratch?

Please share it with us! Add it to our 2018 Google Summer of Code ideas list, or if you need to sound things out and discuss them a little bit, comment on the GitHub issue, or email the dev list. You can even propose several ideas, if you like! Please add all ideas by the end of 14th of December (end of this week).

Would you like to try mentoring?

Fancy a chance to earn some nifty exclusive swag like pictured above? Add your name as a possible mentor to an existing idea (or your own new idea). You can always drop us a line if you want to discuss things first. We like projects to have more than one mentor if possible.

Maybe you’re a student thinking of GSoC?

Awesome! If you have your own InterMine project idea (whether it’s brand new or you’ve already started it), or if one of the ideas on our ideas list lights your fire, it’s not too early to start talking with potential mentors about it. The application guidance we mentioned above would be a good read, too.




#OpenConCam: Where open (science | access | source | data) meet.

What is OpenCon?

OpenCon is a yearly event designed to bring together people who are dedicated to open in all its incarnations. It’s in such high demand, the only way to get in is by application, and most attendees are provided with scholarships to help with travel/accommodation costs.

We weren’t able to attend the international event, but thankfully there was a great satellite event running in Cambridge – OpenConCam.

OpenConCam was in itself a day filled with memorable talks and worthwhile collaborations, including:

PeerJ – (Sierra Williams)

PeerJ is an open access journal which focuses on methodological rigour  when publishing, rather than preferring groundbreaking new science – something particularly important for early career researchers. One of my favourite points from her talk was when she demonstrated the checklist that PeerJ uses to help authors disseminate their content effectively:

Open access in developing nations (Tapoka Mkandawire)

Many of us know from personal experience that accessing scientific publications even in wealthier western countries can be controversially difficult, so it’s hard to imagine how much more difficult this must be in developing countries. Thankfully, there are initiatives such as Africa Information Highway, Eifl, and Hinari which aim to make data and publications more accessible. She also discussed the cultural concept of ubuntu – sharing and caring for each other as a concept that works hand-in-hand with the open* movement.

Bullied into Bad Science (Laurent Gatto)

Bullied Into Bad Science is a campaign to help early career researchers who may be under pressure to omit or tweak their scientific results in order to gain a desired outcome or exciting publication. Laurent was clearly passionate about this subject: Sometimes the system pressures mean that successful academics are not necessarily good scientists – and things really shouldn’t be this way.

Queen B

This session was frantic! The basic premise was that the room divided into groups of 4, nominated a “queen bee” who presented a problem (in one minute), and then the group broke up and discussed possible solutions with others in the room for three minutes, reporting back over the span of two minutes. Lather, rinse, repeat until all members in a group have been queen bees. Topics I recall discussing included getting humanities more involved in open science, open source code in science, how to inspire people to publish in journals with strict open policies when they could go for a less principled journal more easily, and how to sell open* to the disinterested.

Hitting a moving target in Open Access advocacy  (Danny Kingsley)

Danny shared something dear to our hearts: Getting others involved in open. While she was specifically referring to open access, most points could easily be applied to open science, data, and source too. Her focus was on figuring out how to get the most “bang for buck” – that is, find and influence people who will pay off the most for the least effort.

Undergrads, for example, aren’t great targets as they mostly don’t continue in academia, but PIs, and government bodies may be more useful, because they have much more influence if they’re sold on open access. Similarly, sometimes it makes more sense to influence decision makers and get them to evangelise for you, if you don’t have enough authority to impress people. Make sensible decisions, and don’t run up against brick walls repeatedly if it isn’t paying off!

Focus Groups

After lunch, we had an unconference-style set of sessions, where everyone nominated topics they were interested in, and added stars beside ideas they themselves were interested in attending. The resulting sessions were:

  • Self-care in Open: Many of us volunteer time outside a normal 9-5 job to help promote open, and the environment can be discouraging or rough sometimes – not everyone is as keep on open as we are! Suggestions presented by Kirstie Whitaker included working with micro-ambitions (turning your work into small, achievable chunks rather than trying to conquer everything), and thinking of success as a spectrum. A small win is still a win!
  • Open + inclusive: Laurent Gatto pointed out in a blog post earlier this year that the Open movements aren’t always as…. open as they should be. Sometimes Open Science can fall down in the same places less open science falls down – not making sure to have a decent balance of ethnicities, genders, sexual orientation, etc. Can we do better?

  • Open source code in science: If you’re an InterMiner, you’re probably already pretty keen on open source scientific software and can see the benefit of it – but not everyone does. Many, many papers that use code to produce their scientific results don’t expose that code. But if the code isn’t in the paper, or linked to it openly in some way… how was it peer reviewed? If the code is wrong, so is the science it produces. I proposed this discussion topic, and really enjoyed perspectives from my team mates. Some of the ideas generated included:
    • Share dummy data to run your code on, if the data are proprietary or there are privacy issues.
    • Try to encourage journals to have software availability statements
    • Encouraging researchers to share their code, even if it’s only a few lines. After all, if you’ve written 6 lines of code to configure an R plot, whilst it might seem insignificant – that’s actually really easy to peer review and correct mistakes! By comparison, bigger software packages can be hundreds, thousands, or even millions of lines of code. The thought of trying to review that (beyond reviewing quality metrics like testing, documentation, and commenting) makes me a bit scared.
  • Open in the humanities: This is a fascinating subject, and I don’t think many (any?) of the audience members were in the humanities. We raised a lot of questions about the shape of humanities data.

Opening the lab door (Christie Bahlai)

After the focus groups, Christie Bahlai skyped in to talk about running an open lab. She shared some of the different types of pushback against open science:

  • Those who consider themselves too busy to share
  • People who have been pushed from ‘busy’ status to actively hostile against open science, perhaps when they were asked to participate further and didn’t wish to
  • The worried –  people who have legitimate concerns about open science (I’m sure I’m not the only person who doesn’t really believe in “anonymised personal data”).
  • The unheard – those who are disadvantaged and marginalised already worry that practising open will marginalise them further. How can we protect these people?

She also talked about getting people involved in open as early as possible, including introductions to open as part of the undergrad curriculum:

A few more of her tips:

  • Get students’ feet wet in open science by slowly introducing them to the concepts using examples in their own fields – examples they’ll care about.
  • Share your lab policies openly and don’t tolerate the “brilliant jerk” – at the end of the day no matter how productive they are, they’re still jerks.
  • Keep science a kind place. Show others that you too can fail publicly, and fail often.
  • Share your lesson plans openly, too! Christie’s “Reproducible quantitative methods” curriculum is designed to provide a good introduction to open, reproducible data wrangling using R and GitHub.

The open source investigation revolution (Eliot Higgins)

This talk was an out-of-the-blue surprise. Rather than focusing on academia like most of the previous talks, Eliot shared how open videos, photos, and “facts” on the web can be verified for journalism. If you’ve heard of doxxing, you’ll know a bit about the techniques Eliot described, using social media, satellite imagery, and other online tools to track people who don’t want to be tracked – but this time, for Good. He described how some of the white supremacist rally leaders were identified, as well as verifying missile attacks in Syria – including who perpetrated them and who was lying about it.

This talk stilled twitter’s usually vibrant #OpenConCam discussions to a halt, probably due to the riot of emotions it induced in most of the participants. We’d been shown highly disturbing images, felt fear wondering how these techniques could be misused, and we awed by the massive importance of what we’re seeing, no matter how awful it was. I’m sure I wasn’t the only person torn between wishing I’d never seen it and knowing that I had to watch it, because burying our heads in the sand isn’t an option either.


OpenCon 2018 hasn’t been announced yet, but this year, all around the world, there are still satellite events like the one I attended. If you haven’t attended a conference about working openly before, this is a great way to get a taste – or if you’re a die-hard enthusiast, you’ll get the chance to meet like-minded individuals and be inspired!

Community Outreach: What we’re up to & how you can participate

A large part of working in open source and science is sharing what you do with others – it’s not just about code and papers. We have quite a bit going on and coming up that we’d like to share and get your ideas about.

Community outreach calls

We’ll experimentally be trialling a community outreach call on December 7th at 5PM GMT. This happens at the same time as our normal developer call usually would, but we’re specifically focusing on community members and ways to communicate and help them out. It will not have a focus on technical issues or code.

Developers are still entirely welcome to come along, but please encourage your curators, enthusiastic users, and outreach people to come along too! Agenda

Open outreach repo on GitHub

We’ve created a GitHub repository dedicated to outreach-related topics. The idea is to take discussions out to the open about what we’re doing so others can chime in and/or re-use or work. Examples include:

Science Festival – March 2018

We’ll be participating in the Cambridge Science Festival, teaching about better data enabling better science. The basic idea is teach this through gameplay with puzzles, rewarded with candy and stickers. Do you have kids who might be willing to playtest our ideas? Let us know!

Webinars and tutorials

We’ve done workshops in person, we did a developer workshop: we’d like to try something online this time! What formats interest you / your users the most?

  • A series of short 5-minute-ish webinars covering various topics
  • A longer training session, covering querying InterMine via website and/or API? Perl, Python or R?
  • Other? Share your feelings in a comment, contact us, or add to the GitHub issue
  • Maybe you’d like to volunteer to run one!

Google Summer of Code

Do you have an idea for a fun InterMine project that would only take a couple of months? Or maybe you would like to mentor a project over the summer? We had a great time during GSoC this year, and we’re planning to apply to do it again next year. Interested? More info on GitHub.

Rachel’s world tour of the UK

As part of the upcoming ISA-InterMine cloud grant, Rachel will be visiting bioinformatics cores and labs to try and solicit use-cases from people who are working with biological data right at the front. Want to help our or invite us to your lab? Get in touch.

Guest blogging

Come tell our followers about the awesome thing InterMine thing you just did. A conference? a talk? a new features or exciting dataset in your mine? We’d love to be the platform for your voice!



Talks and Workshops: Sharing our materials for re-use

Would you like to grab some ready-made slides or InterMine training workshop materials? We’ve rounded up of some recent things that have been going on. Feel free to remix materials for your own talks and outreach efforts. If you do use them, we’d love to see the result!


You should have permissions to make a copy; if not, please contact us / tweet us / pop by chat to poke us with a stick.

3-min lightning talk at GSoC Mentor Summit: Citable version on FigshareGoogle Drive (editable) version

Better Science Through Better Data: Citable version on Figshare | Google Drive (editable) version | Featured image above was live-scribed during  the talk. Licence is CC-BY from Springer Nature, and the image is available from https://figshare.com/articles/Better_Science_through_Better_Data_2017_scidata17_scibe_images/5558653

Blank InterMine-branded slides: Get ’em here.


BlueGenes Poster: This poster was presented at BOSC 2017Citeable version on F1000Inkscape editable version –  (download Inkscape here: https://inkscape.org/en/release/0.92.2/)

InterMine Poster for Elixir UK All Hands 2017: PDF version | Inkscape editable version 

Workshop learning materials

We run an InterMine training workshop every term, covering the basics of using the webapp, as well as discussing how to draw data from the API. If you’re near Cambridge, keep your eyes open on the blog or twitter feed, as we’ll always announce them well in advance.

Workshop training materials in PDF: Workshop Exercises – handouts with answers | Workshop slides – note that these exercises were all correct with data from HumanMine in October 2017. Numbers of results may change if we add or update new data sources in the future, but the majority of the materials should still be generally correct apart from the results counts. 

You can download the original OpenOffice files as well if you’d like to adapt the materials for your own workshops, or feel free to contact us if you’d like to coordinate some training with us.

Side note: We’re also delivering a half-day workshop training session as part of the EBI’s 4-day Introduction to Multiomics Data Integration course – applications are open now until 01 December 2017.


Data, Scientific (2017): Better Science through Better Data 2017 (#scidata17) scribe images. figshare.


Retrieved: 15:48, Nov 06, 2017 (GMT)

Where to find InterMiners: September-December 2017 edition

We’re busy as ever, and Gos is away at the #biohack2017 in Japan right now – you can spot him in a gold shirt sitting towards the back of the room here:

Other places to find InterMiners over the next few months include:


12 September: FAIR in practice focus group – Research support professionals. Daniela will be at the British Library participating in this consultation. You may also be interested in the researchers focus group on the 13th. It looks like tickets are still available! (More)

21 September: **Cancelled**The usual community call is cancelled this week. We’ll be back as normal with updates in October, though!

25-27 September: Justin and Yo will be attending the Cambridge Bioinformatics Hackathon.


2-3 October: You’ll be able to find Justin at the Bioschemas Elixir implementation meeting in Hinxton.

5 October: InterMine dev community call – back to our normally scheduled calls. Agenda.

13-15 October: Find Yo at the 2017 GSoC mentors summit in Sunnyvale, California

19 October: It’s another community developer call, yay! 

21-25 October: Justin will be representing us at ISWC in Vienna.

25 October: Better Science through Better Data in London – we’ll be sharing the story of InterMine in a lightning talk. Open data is awesome and InterMine couldn’t exist without it!

27 October: We’ll be delivering an InterMine training course in Cambridge, including an all-new API training section. Please spread the word about this one!


November 1-2: You’ll be able to spot Justin at the Elixir UK all hands in Edinburgh.


December 4-7: Get your Semantic Web on with Daniela at SWAT4LS in Rome!

Phew, that’s a lot!




Blog: InterMine Cloud + ISATools: coming to a cloud near you

As many InterMiners may remember, due to some unlucky timing, we had a grant deadline that occurred during the InterMine Developer Workshop earlier this year, with seemingly countless group-work sessions like the one pictured here:

Thankfully, it looks like all our caffeine-fuelled hard work paid off the way we hoped: we are extremely excited to announce that the Wellcome Trust awarded us the grant! Here are a few highlights of what we plan to work on when the project starts in April next year:

Collaboration and metadata

This grant was written with Susanna-Assunta Sansone (Oxford e-Research Centre, University Of Oxford) to support a collaboration with the ISA-Tools group. ISA provides tools to structure metadata, covering the Investigation, Study and Assay of experiments; we will integrate the ISA format into InterMine and the ISA team will develop web-based tools to make metadata creation easier than ever.

Make your own InterMine with less sweat and toil

InterMines can be a challenge to set up right now unless you’re a developer. We’d like to do better. For simpler data formats, we’re hoping to create a UI-based wizard that allows you to drag and drop your files, select a few settings, and start a build – no need to touch a text editor. The more advanced / custom data formats will probably still require hands-on developer time.

InterMine in the cloud

If setting up your own InterMine becomes pleasantly easy, why not do it online? It’d be awesome if you could go to a website, click the “New InterMine” button, upload a few files (or paste their urls!), and end up with my-new-experiment.some-cloud.org for all your lab’s InterMine needs. Data will be merged with external supplementary data sources, and you’ll be able to analyse your data using visualisations (we’ll dedicate a chunk of time specifically to adding more datavis), our famous results tables, and familiar tools like gene set enrichment.

Enhanced import and export

We’d like to smooth out your way to sharing data with the community. Maybe you want to import data from Galaxy, or perhaps you want to export your InterMine as a virtual machine to be powered up and re-examined at a later date. Maybe you’d prefer to export data for publication as an ISA archive, bundled neatly with metadata, or maybe even generate the scaffold of a data paper automatically from your datasets. These are all things we’ll be working on with Susanna’s group.

Tell us what you think

As ever we are keen for input from the community and as we gear up for the next phase of development, now would be a particularly good time to hear from you! Tweet us, leave a comment, email the developer list, or pop by for a chat.

Google Analytics in BlueGenes: what should we track?

TL;DR: We’re implementing analytics tracking in BlueGenes. We can probably track anything you like, within reason. Leave a comment [comments now closed] or email us if you have anything you’d like to see! Must adhere our privacy policy.

Longer version:

InterMine’s JSP pages (the current, older UI) are set up with a couple of different types of tracking:

  1. Google Analytics, which currently anonymously records things like:
    1. Number of users and their locations
    2. Pages viewed
    3. With a bit of effort you can figure out what items were searched for by analysing query strings.
  2. InterMine home-brew internal analytics (to view in your own mine, log in as the super user and select the “usage” tab.) It tracks:
    1. Logins (anonymously)
    2. Keyword search terms
    3. Popular templates
    4. Count of custom queries executed
    5. List views by InterMine object type (but not list contents)
    6. Count of lists created, by type

So we have a couple of questions we’d love some feedback on, as we implement Google Analytics in BlueGenes:

  1. Do you use the current analytics? Which, or both?
  2. What would you *like* to record? Here’s a list of ideas

Things that are probably okay to track

  • Pageviews including counts and times – e.g. “17 views for /region-search on Monday the 13th at 10:pm”
  • Logins (anonymously)
  • Visitor location
  • Tools used (e.g. report page tools interacted with)
  • Popular templates
  • Mine used / switched to a different mine

Things we’re not sure about – what do you think?

  • Keyword search contents (anonymously). Pros: interesting analyses like this one. Cons: Could someone avoid InterMine out of fear someone would notice their gene is getting too much attention?
  • List contents (anon, as above).
  • What about mistyped identifier names in list upload?
  • Region search
  • Queries built in the query builder

I’m sure I’ve missed off quite a few things from both lists. We’d love to hear your input and feelings, both with regards to privacy and with ideas about useful trackable events and pages. Tweet us, comment on the web services tracking  github issue, email the dev group, or contact us some other way: http://intermine.readthedocs.io/en/latest/about/contact-us/