Research software engineering as a career, #RSE2018, and programming languages as poetry…

Research software engineering as a career path

What is an RSE (Research Software Engineer), you may ask? It’s a role that has existed for decades, but has only been using this name for a few years. As RSEs, we tend to be software engineers who work in academia, or perhaps academics who write production-ready code – or maybe both.

A common theme seems to be universities establishing RSE groups who work in consultancy-style ways – academics who have code, or have a need for code, approach the groups and are helped through their tasks, whether it be refactoring some old/messy/slow code, providing suggestions, or writing code to make their research easier. The RSE group may also provide training in programming languages, version control, best practices and other relevant computational basics that ease the needs of researchers.

Whilst I think most or all of us at InterMine would consider ourselves to be RSEs, we don’t really fit this model – we all write code, we all contribute to papers, but all of our sub-projects and work focus around a single primary project – some of us are working to make InterMine more FAIR, others to make it easy to launch InterMine on the cloud, but it’s still all InterMine. I’m sure we’re not the only group like this, and it makes me wonder if there should be names for the different flavours of RSE groups out there. Central RSE groups vs. dedicated RSE groups? Consultancy / support / advocacy RSEs vs. RSE specialist groups? I’m not sure if any of these are quite right, and I’d be curious to hear what others think.

RSE 2018: a grassroots conference for research software engineers

Moving on from musing about job titles, though – a bit about the recent conference. RSE2018 is the third annual UK conference for Research Software Engineers, but it’s the first time I’ve attended, personally. It made a change to have a conference where everyone around was working in research and software development, but not all of it was open source or bioinformatics related. I relished the chance to meet and discuss career paths with others, and enjoyed perhaps too much when the late-night conference dinner descended into attempts to assign poetry genres to different programming languages. Java is obviously epic poetry, but others get trickier. Terse Clojure might be a haiku, and perhaps Python, with its structured whitespace, is a form of concrete poetry?

The conference keynotes varied – there was an introduction to a digital humanities project, Oracc, which hosts annotated and transcribed cuneiform, we were introduced to the Microsoft Hololens and some of the challenges and history of its creation, a talk about Google Deepmind, and I particularly enjoyed the keynote talking about the sustainability of research software. Given how chaotic dependencies make everything, it’s no wonder that maintaining software takes a significant amount of time and money!

There were some hands-on tutorials and workshops, but I mostly attended RSE-community related sessions. A couple that stood out to me, in no particular order:

  • Diversity in recruiting RSEs. We had speakers from Microsoft talking about their efforts to make their research staffing pool more diverse, which included gruelling-sounding half-days sessions where candidates were interviewed by four different interviewers in an attempt to remove bias. Somewhat entertainingly, the room this was conducted in – the senate chamber – had red throne-like seats and eight large portraits on the walls, every single one depicting an older white male. The irony was not lost upon the session attendees!
  • The RSE community AGM. Rather than being an informal gathering of individuals, the UK RSE group will soon be re-launching as an official society that members can join for a nominal fee. The AGM gave us a chance to hear about some of their plans (you can sign up to hear about the launch date), as well as the opportunity to share your wish list of likes, dislikes, and comments on the activities the group performs. I’m looking forward to interacting with the society and seeing where they head!

It’s a conference I’d definitely like to attend again. If you missed out, you can catch up with many of the relevant points on twitter, under the hashtag #RSE18.


Google Summer of Code is over for another year – and well done to all!

One of the goals of Google Summer of Code (GSoC) is to help turn students into long-term open source maintainers and contributors. I suspect we’ve managed this with our current batch of students, who have contributed to our projects across a broad range of topics, whether it was querying InterMine using natural language sentences, updating our search capabilities (both UIs and search backends), or adding new features to the InterMine python client.

From the start of the application process, our fabulous pool of applicants spent time interacting with each other and even helping each other out before anyone had been officially accepted. We received numerous PRs, tickets, and suggestions on our GitHub repos, and for this year we had returning GSoC mentors who previously had been students. It’s almost hard to believe we hadn’t participated before 2017, seeing all of the great work and enthusiasm GSoC brings, all while being able to pay students for their time and give them valuable work experience.

To wrap up this year’s great set of projects we had a community call [agenda & notes here] where our students presented their work in roughly 5 minute slots. You can catch up on each of the recorded presentations in our GSoC 2018 playlist, or here are direct links to each of the videos:

InterMine NLP – create InterMine queries by asking questions (Jake Macneal)

InterMine Data Browser Faceted Search tool (Adrián Rodríguez Bazaga)

Improving the InterMine Python Client (Nupur Gunwant)

InterMine Solr Search (Arunan Sugunakumar)

Buzzbang biological data search – Ankit Lohani


All our talented students deserve a massive round of applause for all the hard work they put in to this!


Coming up soon: InterMine 2.0 release webinar, community calls, and GSoC presentations

What’s coming up soon in InterMineLand? Here are a few of the highlights:

Upgrading to 2.0 – Thursday 2nd August

With the release of InterMine 2.0 RC1, we’ll be dedicating the InterMine Developer call to an InterMine 2.0 Upgrade Webinar, spending around 20 minutes discussing how one upgrades an InterMine 1.x installation to use the newer (and much more easygoing) Gradle dependency management system. Q&A afterwards so you can learn everything you’ve been burning to know. [Call in information]

This call will be recorded so anyone who couldn’t make it can catch up.

GSoC Student project presentations – Thursday 16th August

Six students, six awesome projects. Our students have been blogging prolifically while working over the last three months, and they’ll be presenting their work on the developer call, with five minutes slots per student + time for Q&A afterwards. [Agenda here]

This call will be recorded so anyone who couldn’t make it can catch up.

Community Outreach Call – 6th September 

Once a quarter we host non-techie calls where we focus on interesting things the community has been doing as well as community engagement in general. This time we’ll be featuring Kevin Macpherson, who runs some fantastic community outreach at SGD, including amazing webinar video use-cases.  [Agenda, still work in progress]

Previous featured speakers include Jacqueline Campbell talk about her approach to community engagement, Wayne Decateur demonstrating InterMine code in Jupyter notebooks, and Abby Cabunoc Mayes, Mozilla’s Working Open Practice Lead.

We’re still looking for speakers for this call and the next one, in December – If you have a topic you’d like to share about InterMine, open science/source, or bioinformatics in general, ping to pitch the idea.

InterMine at #GCCBOSC Portland – 7 days of fun, sun, and code…

BOSC (the Bioinformatics Open Source Conference) is normally part of ISMB (Intelligent Systems for Molecular Biology), but for the first time this year, it teamed up with The Galaxy Community Conference (GCC) instead. For us, this presented an exciting opportunity – like a regular BOSC but with the added bonus of training days and the chance to interact with Galaxy contributors during the CollaborationFest hackathon (and the rest of the conference too).

Our agenda at the conference ended up being quite full:

Handling integrated biological data using Python (or R) and InterMine

We delivered a training session on the 26th of June: Handling integrated biological data using Python (or R) and InterMine. Leyla Ruzicka from ZFIN was kind enough to travel up from Eugene to Portland, to help us deliver the UI portion of the training. Once we’d familiarised users with how InterMine worked a little bit, Daniela introduced the API side of things, and then we spent the remainder of the session working through a series of exercises in Jupyter notebooks, live-coding on a projector so others could learn about our code and follow along themselves.

While we did recommend to people that they try to install the InterMine Python client, we also managed to work around the issue for anyone who didn’t have things installed, thanks to binder. You can still see the tutorial exercise notebooks and work through them, and we have the same set of notebooks with answers if you get stuck or need a hint. This was the first time we worked through the exercises interactively onscreen this way, but it seemed to work well! I’m hopeful we can continue providing the API portion of our tutorial this way in the future.

We had planned to do an R section, but actually ran out of time to do this – the tutorial was about two and a half hours in total. If an R tutorial is something of interest in the future though, please do let us know! You can do this via comments on this article, twitter, pop by, or email us at info – at – intermine – dot – org.

InterMine 2.0: More than fifteen years of open biological data integration

[Slides link] We were very pleased to have a talk accepted as well as the training, giving us a chance to introduce InterMine to others and talk about its history. While I was talking I mentioned that we were ranking at just under 300 stars on our main GitHub repo, and the audience kindly help bump it up and over 300!


One of the topics I focused on during the talk included a massive thanks to all of the work our broader community does to help keep InterMine become and remain a great resource. Afterwards, Lorena Pantano raised the question: how do you get others to adopt your work and contribute to it?

Personally, I’ve been working at InterMine for three years now, so I certainly can’t attest to the entirely of the history – much of this is doubtless down to the team’s great work and Gos’s great vision (and grant writing!) – but I also think one of the most important parts is probably down to making it easy for others to use your work: good developer docs, tickets that explain issues clearly, help documentation for end-users, etc. I’d love to hear more thoughts about this in the comments!

Birds of a Feather sessions

Daniela and Yo both ran separate Birds of a Feather unconference-style sessions over lunch. Yo’s BoF focused on getting (and keeping) more open source contributors – Nicole Vasilevsky was kind enough to keep notes for this session. Thanks, Nicole!

Meanwhile Daniela shared  the InterMine approach to implement stable and persistent URIs and the possible related issues, inspired by other data integrators and the lessons learnt in the Identifiers for the 21st century paper; some attendees have also contributed providing their own solutions.


Group meeting session at CoFest. Try to spot Daniela! 😉

During the CollaborationFest hackathon, Daniela and Yo were able to complete (yeahhhh!!) the integration between Galaxy and InterMine thanks to invaluable help of Daniel Blankenberg.
On the next Galaxy release, the new InterMine plugin will be available and will allow to import data (from InterMine) into Galaxy and export lists of identifiers (e.g. proteins, genes) from Galaxy (into InterMine) by selecting the mine instance from the InterMine registry. Watch this space – we’ll hopefully arrange to get some details on the Galaxy training network to explain how to run the data imports in each direction.

All GCCBOSC photographs in this post are from Berenice Batut’s Flickr album, under a CC-BY-SA licence

GSoC Student Interview spotlight: Natural Language to InterMine Queries + Jake Macneal

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Jake Macneal, who will be working on converting natural language phrases to InterMine PathQuery.

Hi Jake! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

jakeExcited to be joining the team! I’m an undergraduate studying computer engineering at McGill University in Montreal, about to enter my final semester in the fall. I’m originally from Philadelphia in the US, and I’ll be hopping around North America a little bit this summer (currently in Toronto). I’ve got a passion for robotics and artificial intelligence, which led to me joining my university’s robotics team to help design and build a Mars rover. Additionally, I’ve had the opportunity to intern at NASA Johnson Space Center in Texas, where I worked on a project which uses machine learning to track sensors around the space station (hopefully it’ll be put into use soon).

Aside from those technical interests, I enjoy soccer/football (both playing and watching), classical guitar, analog synthesizers (just getting into this but they’re really fun and fascinating), and the field of space exploration. Part of me is still holding on to the hope of becoming an astronaut some day.

What interested you about GSoC with InterMine?

I searched the GSoC organizations page for projects looking for a Clojure developer, and this was unsurprisingly one of the only ones. However, a language is hardly enough motivation to become passionate about a project. I’ve never had the chance to work in bioinformatics, but I did have a beloved computer science professor (Matthieu Blanchette) whose research was in that field, and he often spoke during lectures about his research. When I read through the organization and task descriptions I immediately thought of him, and knew that this would be a cool project to join. Nothing is more rewarding to me than the thought of using software as a tool to help others do good.

Tell us about the project you’re planning to do for InterMine this summer.

InterMine uses a graph query language (PathQuery) to retrieve information from the database. My project is to implement a more user-friendly alternative, allowing non-technical users to interact with an InterMine database without the need for esoteric queries crafted by an experienced programmer or system administrator. This will take the form of a natural language to PathQuery translation tool, written as a Clojure library. In addition, I’ll be building a proof-of-concept interface allowing novice users to submit English queries which will be translated and then submitted to the query engine. This simple app will be integrated with the InterMine web app, similar to the graphical query builder.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Natural language processing is a difficult field, and from working on a compiler, I learned that the key to building a correct parser and code generator is a huge number of test cases. Fortunately, the basic principle behind testing a translation tool is simple: assemble a set of English queries, along with the intended output (in the form of a PathQuery string). However, actually assembling such a set of tests which are useful and demonstrate realistic/important queries requires interacting with actual users in the community. I hope to spend much of my initial weeks working with the community to figure out the syntax they’d like to see supported, as well as the types of queries already being written in PathQuery.

Share a meme or gif that represents your project


GSoC Student Interview spotlight: Buzzbang Bioschemas search + Ankit Lohani

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Ankit Kumar Lohani, who will be working on Buzzbang.

Hi Ankit! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hello InterMiners, I’m Ankit Lohani, a final year undergraduate student, Indian Institute of Technology, Kharagpur, India.  I will be completing my undergraduate studies in a few months with my major in Chemical Engineering.

Right from my first year, I have been interested in robotics and programming and my interest in this field has only grown with time. Initially, I spent over a year working on the hardware front and then shifted to making path planners for our soccer playing bots. Though my academic background has been completely different, it has only pushed me forward to work harder and to learn more. My interests are inclined towards natural language processing and information retrieval.

Apart from these, I love travelling and trekking. I am also planning to complete my 3rd trek this summer, this time above 14,000 feet.

What interested you about GSoC with InterMine?

I have never worked on an open source project and I realized that GSoC is the best place to start learning and seeing my stuff at work. Honestly, while looking for organisations, in which I may be able to contribute, I came across InterMine and the various projects enlisted here. The application domain of InterMine is very appealing and I could relate myself with this organisation because of two key reasons – first, my past internship was on information retrieval on data. I touched upon various topics like – semantics, ontologies, UMLS (Unified Medical Language System), PubMed, Named-Entity Recognition for biological terms etc. Secondly, because the technologies used in this project were something I have been familiar with as a part of my course and term projects, like Solr, elasticsearch, docker. Apart from these, the project itself has got a unique potential to create a breakthrough in the way complex scientific data is organised on the internet.

Tell us about the project you’re planning to do for InterMine this summer.

My project – Buzzbang, is significantly different from all other InterMine instances and it focuses on scraping all the data we have on internet marked with and indexing them in a search tool – Apache Solr. So far, a basic scraping module and an indexing engine are up and running. I am planning to integrate “Scrapy” for crawling and indexing new paths and upgrading the Solr search tool in this project. Towards the end of this project, I will make sure all the changes are reflected in the front-end as well.
Are there any challenges you anticipate for your project? How do you plan to overcome them?

I believe there could be some serious challenges that I might face with Scrapy. Making a generalised scraping tool looks easy with the data having markup, but, the organisation of this data on various domains varies, and crawling across some of those domains might not be a simple task. Moreover, we are also planning to introduce some degree of parallel processing to this module. Though my focus would be on EBI biosamples domain, which should make my task easier, I will try to keep the crawler as general and powerful as I can. Additionally, I suspect I would need some help in planning the architecture for the re-crawling and re-indexing part from the community. I am not very sure about what level of automation would be desirable in this project with respect to the previous point.

GSoC Student Interview spotlight: Cross InterMine Search Tool + Aman Dwivedi

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Aman Dwivedi, who will be working on the Cross-InterMine Search tool.

Hi Aman! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Heya! I’m Aman Dwivedi, a final year undergraduate student from Jabalpur Engineering College, India. I’m a web enthusiast and a Javascript lover (JS is love <3). I have worked with two startup companies as a Full Stack Node.js Developer Intern in the past. I’m also a proud member of the Mozilla Open Source Community (I have worked on the renowned Mozilla Firefox project). I have worked with many great programmers in the past and I’m extremely excited to work with the InterMine team.

What interested you about GSoC with InterMine?

I believe in the fact that a good open source community comes with its members sharing ideas and helping each other throughout. The sign of a good team is a friendly, yet productive environment. The best thing about InterMine is its team and its proud contributors. Everyone has a great helping attitude. The Application Phase was awesome, and I never had such a great experience in any of the past teams I worked with. Everyone is so much enthusiastic about new features and new implementations all the time. Also, one more brownie point is that my project work here will affect a very large scale of society (this is the most important motivating factor for me <3).

Tell us about the project you’re planning to do for InterMine this summer.

I will be working on the Cross InterMine Search Tool. This project will be developed from scratch. It will use the InterMine APIs and the registry to fire concurrent requests to all the selected InterMines for a search query. The project will be developed in Vue.js. It will have a great impact as currently there is no such tool which is capable of searching multiple mines at once. It will make life of all InterMiners and researchers very easy to search and browse through genomic data in all the InterMine instances.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The most important thing in the development of an open source project is the community. I will need suggestions and user reviews from the community to make the project better. My first priority is always the Community User experience. Suggestions will be really valuable throughout the project development, testing and the documentation phase.