Internships Summer 2020: Closing thoughts and final presentations recording

Last week was the end of the Outreachy May 2020 Internship Round, where we held a final presentations call for our interns, mentors and community. Due to the wide range of time zones, from New York to Singapore, no one time was optimal for everyone, so we extend our sincere gratitude that all the students were willing to attend the call!

We are thankful to a Wellcome Trust Diversity and Inclusion Grant for funding three of our interns, Outreachy for matching this by funding two interns, and our main Wellcome grant for making it possible to fund two more. This brings us up to a count of seven interns, the highest number that we’ve had the pleasure of working with yet at one time! This wouldn’t have been possible without help from our external mentors — Akshat Bhargava, Aman Dwivedi, Ankur Kumar, Asher Pasha and Nikhil Vats — whom will all be receiving a small prize as a token of our gratitude.

During the internship period, we had to say goodbye to our valued InterMine team member for 5 years, Yo Yehudi, who drove our internship scheme to its current state and has left to become the Technical Lead for Open Source at the Wellcome Trust. Thanks to the efforts of our mentors for their support of our interns, and to Rachel for running the final presentation call, we were able to provide a closing to the internship period we can all be proud of.

A recording of the final presentation call is available on our YouTube channel and embedded below.

Additional information on our interns

Many of our interns have been writing blog posts throughout their internship, which you may find an enjoyable read:

The GitHub accounts of all our interns are listed below, if you wish to check out their contributions:

In closing

It’s been a joy to work with so many talented people, and this includes all the contributions during the Outreachy contribution period prior to intern selection. Many valuable contributions to InterMine projects were made during this period, and we regret we weren’t able to offer everyone an internship.

We hope the next year of internships will be as successful as this one, and look forward to coming up with more exciting internship projects, as well as working with more fantastic interns and mentors. Until then, let’s enjoy the fruits of this labour!

Outreachy Internship blog: It’s a Wrap

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

This is the last week of the Outreachy Internship. From filling up the initial application form just based on the gut feeling even though I thought I lacked technical skills to apply for an internship, picking up intermine org to getting selected, the memories seem a bit fuzzy.

If someone would have asked me before applying to Outreachy whether I’d be able to get selected and successfully complete the internship, my answer would probably have been a no.

In a mere span of a few months, I feel like I have come quite a long way in terms of my confidence, approach to new problems and skills. I’ll reflect on three parts in this blog – my before and after thoughts of the internship, how Outreachy helped me grow and progress of my project.

Apart from my little stint at Hacktoberfest 2019, I did not have any quotable experience working with OpenSource. I feel lucky to land in such an encouraging org. I have had trouble thinking about communicating during the project as I have mostly just communicated in my mother tongue so understanding different accents or fear of mis-pronouncing words, mis-form sentences are there. But I never saw any judgement from anyone based on how correct my communication skills were. The only important thing is to express yourself. This really helped my self confidence. Before applying to Outreachy, I thought it is for people who have good open source experience and skill set, who have worked on reasonably big projects and are able to carry on the project completely independently. Now, I would say Outreachy is for anyone having basic skills and trying to learn new things. The collaborative environment of open source would surprise you.

I think Outreachy has helped me grow in a multitude of ways. It has definitely boosted my self confidence. It taught me working in collaborative projects and how it is ok to ask for help when stuck and not feel bad about it. Amongst technical skills, I have learnt about docker, docker-py, github actions, picked up more info about bash, python along the way.

The concept of having a mentor (actually two mentors!) is great. I have a background in research where most of the time I am all on my own to move the project forward (codewise) if stuck. Discussions with the advisor are mainly on the theoretical and experiment part. I am very grateful for my mentors being patient and guiding me during the internship.

I have definitely deviated from my initial project proposal. Some old tasks were removed paving the way for picking new tasks along the way. I think majorly the task of setting up wizard and configuration was not touched because they are not ready to be integrated. The task of running any mine (and not just mines based on biotestmine) would still require a bit of tussle. So there are definitely many open tasks requiring contribution.

I have got a taste of open source from Outreachy and hope to carry it forward. 

All good things come to an end eventually, but the next experience awaits

Anonymous

Outreachy Internship blog: After the internship

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

In this blog, I reflect on my academic journey so far and what I would like to do after the Outreachy internship ends.

I have done my schooling in Rajasthan, India. After appearing for RPET (Rajasthan Pre Engineering Test), a popular entrance exam for engineering college admissions in Rajasthan, I got admitted in Information Technology branch in Govt. Women College in Rajasthan which is one of the best Women only Colleges in India.

I had worked on some projects in college but exposure to diverse technologies was very much less. I felt compelled to pursue a master’s degree to both increase job opportunities and dive deeper in one of the domains in computer science.

After completing the BTech I worked in a startup, After that I got admission in IIIT Hyderabad, India in 2017 as a part-time Mtech student in computer science. As my interest developed, I talked to different research advisors for exploring my knowledge in research and converted to full-time MS student in data science under Prof P. Krishna Reddy. My MS would tentatively finish around May/June 2021.

I am looking towards joining the industry in an engineering role. I am interested both in data science and software development. My college organizes campus placement sessions around December every year. After Outreachy, I am planning to polish my data structures and algorithms knowledge and sit for campus placement.

I feel that after coming to IIIT, I have gained a lot of skills. I have worked on projects in distributed systems and machine learning. I was head teaching assistant for the Distributed Systems course and managed making assignments and paper evaluations for a class of around 150 students.

I am open to learning new skills for advancing my career. I am also open for remote jobs. I can communicate in Hindi and English language.

Github: https://github.com/22PoojaGaur
Linkedin: https://www.linkedin.com/in/pooja-gaur-b5848048/

To everybody reading this. Please feel free to contact me in case you want to ask anything or you feel that I could be a part of your organization!

The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle.

Steve Jobs

Outreachy Internship blog: Half Time

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

I started the Outreachy internship with Intermine on 19th May 2020. As I write this blog, I have covered a little over half of this 3 month journey. Through this blog, I will look back in retrospect on what has been done, what goals have been achieved, what goals have been changed, what remains in the coming half of Outreachy. Let’s get started!

I am working on the Intermine boot project in the Intermine organization. As part of the internship, the goal is to improve the intermine boot functionality for local instance setup and integrate it with cloud API for other features. The intermine boot project is at a very early stage of development. Let’s break down the tasks in few big chunks:
– Implement versioned mine data uploading to cloud storage.
– Create Continuous Integration setup case
– Add wizard and configurator to get custom configuration from project
– Use all of the above to orchestrate docker containers for usage.

As with all projects, you can not plan everything beforehand and requirements change/emerge as the project evolves. Right from the start, one additional task was updated to move the intermine_boot project from using docker-compose file to use docker-py for setting up intermine instance. Another extra work was to add docker-intermine-gradle as a submodule to the project.

The wizard and configurator are part of the Intermine Cloud project. They are not at a state where we can integrate them intermine_boot right now. So, they’ve been moved back in the project and that task would be pursued if they manage to be at a state so that they can be integrated. In place of that, my mentor has added other tasks which are needed to improve the usability of intermine_boot.

I have submitted 6 pull requests for the project which cover some of the tasks mentioned above. I am fairly happy with my progress. There were roadblocks which led to slower progress than I would have initially assumed but I think I am getting better equipped to handle the coming half of the project.

I feel very happy that I got accepted to such an amazing community and got a great mentor!

Now, buckling up for the next half of the internship to get stuff done!!

Start by doing what’s necessary, then do what’s possible; and suddenly you are doing the impossible 

Saint Francis of Assissi

Outreachy Internship blog: A beginner’s guide to Intermine Boot

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

Data is of paramount importance in research works. In biological research domains, there are multiple research communities working and generating new biological datasets for DNA, yeast, mouses etc etc. At the same time, there are many researchers who need to work with these datasets for their research projects.

One way to share data may be to just hand over archived datasets. In this case, there are numerous problems like, how can you understand the data format, how do you clean this data in case of any inconsistency, how do you search through this data, how do you integrate this data with different datasets, how do you store huge data.

Intermine is a biological data warehouse which aims to resolve these issues and make accessing data easier for researchers. Once a dataset is added to intermine, users can perform complex queries over it to get the required information.

There are different intermines for different types of data like FlyMine, YeastMine, HumanMine, WormMine. The intermine project is open source and it allows research organizations to set up intermine instances dedicated to their datasets.

An intermine instance provides both web app and web service where you can host data and clients can make queries to get integrated biological data. Now that we have covered basics, let’s move towards why the project I am working on becomes relevant!

Setting up your own instance of intermine is a time consuming and complex process requiring a fair amount of Linux administration skills. We would want to make this process easier so that people with very little programming knowledge can do it. Intermine cloud project attempts to solve this and lower the barrier of running an intermine instance.

Intermine Cloud is composed of three main parts – wizard, configurator and compose. The wizard provides an easy way for setting custom configuration for the new intermine instance. The configurator is the backend of the wizard which creates necessary configuration files required to build the intermine instance. Once an intermine instance is built, the compose handles deploying and managing intermine instances on the cloud.

At times, a user may want to set up the intermine instance locally to see how the project will look or while he is trying to make some customizations to extend intermine for the different use cases. Or if he wants to host the intermine instance on his own servers. That’s where the Intermine Boot project comes in.

Intermine Boot is a command line tool which aims to allow users to easily setup local intermine instances inside docker containers, upload data archives to the cloud and other functionalities to make the convenience features for users.

Let’s understand the use case with an example. Suppose as an end user, you get interested in intermine. You want to set up and host your intermine instance on your servers. You dig in the documentation, start setting up postgresql, gradle, perl, solr etc etc. Meanwhile, you are also polluting your system’s environment in case you are not using docker or any other virtualization. The intermine boot aims to make this process as easy as running few commands on terminal. Below is a meme version to explain the benefits in a funny way!

You can find the intermine boot at https://github.com/intermine/intermine_boot and all intermine org projects at https://github.com/intermine

This is enough introduction for the Intermine and Intermine boot. Feel free to dive in the project now, we have a lot of interesting things going on!

If you can’t explain it simply, you don’t understand it well enough.

– Albert Einstein

Outreachy Internship blog: Everybody Struggles!

Hello! This blog is part of the series of blogs I am writing during my Outreachy 2020 summer internship with Intermine Boot project in the Intermine Organization.

No matter how experienced or novice a person is, everybody experiences struggle at some point in their journey. The statement seems pretty easy to admit for many people. But when you are a beginner stepping your foot in the mammoth field of software development, it’s very difficult to acknowledge that even your mentors or other senior developers would have ever struggled at basic problems like you do. This gap in acknowledgement creates an inferiority complex and makes your journey to the top much more difficult than it should be.

Today, I’ll be sharing one such incident where I was stuck on an issue for quite a long time just because I was hesitant to ask someone else. As you read on, I’ll recommend to ignore all the technical jargon in the coming paragraphs if you don’t get it as that’s not essential to the point I want to make. There can be lot of similar situations.

I am in my third week of internship with Intermine. I have been doing some form of coding for past 4 years or more (mostly as part of my course curriculum) but I am still very much a beginner in most of the domains. Giving some context to the following discussion, the intermine_boot project is a command line tool to ease the building process for the Intermine instances. It fetches an already built docker image or builds a docker image if needed and runs docker container with the image to get the intermine instance running. I was working on a task to modify the build file for a docker image in such a way that a new image is only built if a build folder does not already exist on the system. To test the changes, I’d have to run the intermine_boot command in such a way that the rebuild of the image is triggered and I can see if the changes are taking effect. My mentor, Kevin, gave me instructions on how to test this. The instructions, although clear, involved a number of steps out of which one step wasn’t clear to me even after going through the explanation multiple times. The fear of asking a stupid question kicked in and I thought I’ll just go on with whatever I understood.

I started my 16 hour long journey to debugging my changes by modifying the code and testing the functionality. I followed the instructions and tested my build and it failed (obviously, as I was missing that piece). I searched the error online to no and landed on some stack overflow results. I tried to make the suggested changes without understanding them and it resulted in other errors. Finally, I gave up and took a nap for the second time. After waking up I was attaching the errors in a message to ask the mentor again. But, voila! When I started putting all things together during asking I realized the fix that could be useful and it worked. I realized that I had become frantic and started trying a lot of things without understanding them.

I took-away following lessons from this incident and consciously try to follow them.

  1. When you don’t understand what the other person has said, don’t just assume that you will figure it out. Just ask him again to clarify and that will save you a lot of time.
  2. When stuck on issue, you can become frantic and trying random solutions. Just take a small break or nap and see the magic.
  3. Don’t code before understanding what you are trying to do. It’s a recipe for failure.

The Struggle you are in today is developing the Strength you need for tomorrow

– Robert Tew

Status update for BlueGenes

It’s been a while since we posted our last (rather optimistic) update around BlueGenes, so we thought we’d share a quick update, starting with the basics.

As a reminder, the long-term goal of BlueGenes is to replace the existing JSP-based UI with a more modern interface – one that works well with mobiles, one that hopefully responds more quickly and is easier to use, and perhaps most importantly, is easy to update and customise.

Some of the questions we’ve had in the last few months:

Q: Will BlueGenes replace the current JSP UI?

A: Yes, eventually. Once we reach official beta/prod release (we’re currently in alpha), we anticipate running them concurrently for a couple of years, but we probably will only provide small fixes for the JSP UI during this period, focusing most of our development effort on BlueGenes.

Q: Do I have to run my own BlueGenes, or can I use the central one at apps.intermine.org?

A: Since BlueGenes is powered purely by web services, it will probably be possible to run your InterMine as a server/api-only service and use BlueGenes at bluegenes.apps.intermine.org/. You can also run your own BlueGenes on your servers and domains, allowing you to customise it so it’s suitable for your data, and not having to rely on our uptime. Either (or both) should work fine. There will be some version requirements related to what version of InterMine can access all the features of BlueGenes – see the next point.

Q: What version of InterMine do I need to have to run BlueGenes?

A: BlueGenes will require a minimum version of InterMine to run. The original release of InterMine web services focused primarily on providing a way to give JSP users access to their data programmatically, but at the time there wasn’t an anticipated need for application level services such as superuser actions. There are a few web services and authentication-layer services we still need to implement, so it’s likely BlueGenes will need API version 31+ or higher in order to be fully-featured. InterMines with API version 27 or higher can run a basic version of BlueGenes. You can check out this table to see if your InterMine is configured to work with BlueGenes.

Q: Ok, so what’s left to do before BlueGenes is released as a public beta?

A: Mostly authentication, superuser and MyMine features – things  like saving and updating personal templates, sorting lists in folders, updating preferences and passwords. Some of these features require updates to InterMine itself in order to work – hence the minimum version noted in the previous question. Once these are ready we’ll move to the public beta stage.

Your input here will be incredibly welcome, too – the more feedback we get early on, the more polished we hope BlueGenes can be.

Q: Will BlueGenes work nicely with HTTPS InterMines?

A: You will be able to run BlueGenes without HTTPS, but in order to avoid inadvertently exposing user passwords, the login button will only be available over HTTPS connections. We’re also working with a student over the next few months, to implement a pilot InterMine Single Sign On service. You can read about it in our interview with Rahul Yadav.

Q: Will I be able to customise the way BlueGenes looks?

A: Totally! There are two ways you can do this. One is to make sure you have your logo and colour settings configured in your web properties. We have a nice guide for that. This’ll tell us what your preferred highlight colours are – FlyMine is purple, HumanMine green, etc. If you’re really dedicated and would like to write your own CSS, you can do that too, if you’re running your own InterMine/BlueGenes combo.

Q: I have some nice custom visualisation tools in my InterMine. I don’t want to have to re-write them!

A: We don’t want you to re-write them either! It depends how they’re implemented in your mine, but we’ve designed the BlueGenes Tool API with you in mind, and many Javascript-powered tools will require only a few lines of code to become BlueGenes ready.

As an example, the Cytoscape interaction viewer currently used in some InterMines only requires 20 lines of code to import into BlueGenes, plus a few lines of config – all the other files (and most of the config too) is boilerplate that we auto-generated.

Data integration and Machine Learning for drug target validation

Hi!

In this blog post I would like to give a brief overview of what I’m currently working on.

Knowledge Transfer Partnership: what & why?

First, in order to give context to this post, last year InterMine at University of Cambridge and STORM Therapeutics, a spin-out of University of Cambridge working on small modulating RNA enzymes for the treatment of cancer, were awarded a Knowledge Transfer Partnership (KTP) from the UK Government (read this post for more information). With this award, the objective is to help STORM Therapeutics advance their efforts in cancer research, and contribute to their ultimate goal of drug target validation.

As part of the KTP Award, a KTP Associate needs to be appointed by both the knowledge base (University of Cambridge) and the company (STORM). The role of the KTP Associate is to act as the KTP Project Manager and is in charge of the successful delivery of the project. For this project, I was appointed as the KTP Associate, with a Research Software Engineer / Research Associate role at the University of Cambridge, for the total duration of the project: 3 years.

Machine learning and a new mine: StormMine

Now that you know what the KTP project is about, and who is delivering it, let’s move on to more interesting matters. In order to successfully delivering this project, the idea is to use the InterMine data warehouse to build a knowledge base for the company, STORM, that enables their scientist to have all the relevant data for their research in a single, integrated, place. For this reason, several new data sources will be integrated into a STORM’s deployment of the InterMine data warehouse (StormMine, from now on), and appropiate data visualizations will be added.

Then, once the data is integrated, we can think towards analysing the data to gather insights that may help the company goals, such as applying statistical and Machine Learning methods to gather information from the data, as well as building computational intelligence models. This leads the way towards what I’ve been working on since my start in February, and will continue until July 2019.

In general terms, I’m currently focused on building Machine Learning models that are able to learn how to differentiate between known drug targets and non-targets from available biological data. This part of work is going to be used as my Master’s Thesis, which I hopefully will deliver in July! Moreover, with this analysis, we will be able to answer three extremely relevant questions for STORM, and which are the questions leading the current work on the project. These questions are

  1. Which are the most promising target genes for a cancer type?
  2. Which features are most informative in predicting novel targets?
  3. Given a gene, for which cancer types is it most relevant?

If you are interested in learning more about this work, stay tuned for next posts, and don’t hesitate contacting me, either by email (ar989@cam.ac.uk) or connect with me in LinkedIn (click here)!

 

BlueGenes OAuth2 Authentication: Community feedback requested!

BlueGenes development is at the point where we need to store BlueGenes specific data to a database. This is an important step because it paves the way for customisation, branding, and tool configuration, and an enhanced My Data section to let users manage all of their InterMine assets.

There are a few architecture and design decisions that need to be made now, and be made correctly. In particular: OAuth2 Authentication. If you’re up to speed on how InterMine and BlueGenes authenticate then feel free to skip to the bottom.

Background

The current InterMine web application is a monolith. Users login to the UI with a username and password and their identity gets stored in memory on the server (called the “session”). When they perform a query or upgrade a list the JSP code sends messages to the Java layer along with the user’s identity which is used to retrieve data from the object store and user profile.

For example, when Sally views her list page today, the workflow looks something like:

Figure 1

today.png

Everything you see in InterMine today lives somewhere layered between the JSP Web App and the Object Store.

BlueGenes works differently. It communicates with the Java layer, object store, and user profile entirely through web services known as the InterMine API. No exceptions. This cleaves the dependency between the visual tools that we develop and the lower level operations of InterMine such as handling queries.

When Sally views her list page in BlueGenes, the workflow looks more like this:

Figure 2

tomorrow.png

BlueGenes lives in the browser, not on the server. InterMine’s web services respond with raw data about her lists in JSON format and BlueGenes renders the page in the browser. This is equivalent to running Python scripts in your console to fetch your lists, resolve IDs, perform a search, etc.

Web services (InterMine or otherwise) are stateless by design. They can’t tell if requests are made by a new user or a revisiting one. In order for a web service to authorise a user the request must contain some sort of secret token as seen in Figure 2. Like any good web application, InterMine provides web services for authenticating a user and retrieving their identity token which can be used in future requests rather than a username and password.

BlueGenes Authentication

Now it gets a bit trickier. BlueGenes has its own small web server to provide the actual javascript application, and it requires database access to store BlueGenes specific information such as additional MyMine data, tool config, etc. It really looks more like this:

Figure 3

blugenes_server.png

 

A user can authenticate using InterMine’s web services via the browser, but if they want to save user specific data to BlueGenes’s database using BlueGene’s web services then they need to provide an identity. BlueGenes does not have access to the user profile directly, so the authentication request needs to be piped through the BlueGenes server.

Figure 4

auth.png

 

When Sally logs into BlueGenes she provides her username and password which is sent to the BlueGenes server rather than the InterMine server. If BlueGenes successfully authenticates as Sally then it sends her back her InterMine API token embedded in a signed JSON Web Token (JWT). All future requests between BlueGenes and InterMine will contain her API token, and all requests to the BlueGenes server will contain the signed JWT.

It sounds a bit complicated, but this only happens when logging in and remains hidden from the user. This configuration protects BlueGenes from storing passwords and doesn’t require direct access to the user profile.

The problem: OAuth2 Authentication

Logging into InterMine using your Google account uses the OAuth2 framework. For it to work you must configure Google’s developer console with a hardcoded URL that redirects users back to the application after they’ve authenticated. This redirection page is given a token that is exchanged by the servers for the user’s Google identity (email address and Google ID). We can do the same in BlueGenes:

  1. We put a Google Signin button in BlueGenes.
  2. Sally clicks it and is redirected to Google.
  3. Upon authentication Sally is sent back to BlueGenes with an authentication token.
  4. BlueGenes server exchanges the token for Sally’s Google ID.

So far so good. She can update her tool configurations and tags which are stored in the BlueGenes database.

Now Sally wants to save a list which is an action performed in InterMine, not BlueGenes. This requires an API token which she doesn’t yet have.

  • She can’t authenticate with InterMine using a username and password because she doesn’t have one (she’s a Google user).
  • She has no way of exchanging her Google ID with InterMine’s web services for an API token because InterMine has no way of trusting who she is. Anyone could access the end point and get a user’s API token if they knew their Google ID.
  • BlueGenes can’t fetch her API token from the user profile because it doesn’t have access (by design).

There are a few workaround solutions but they couple BlueGenes to a single InterMine instance with varying degrees.

Solution 1: JWTs and sharing secrets

InterMine server gets a new end point that accepts a user ID and a JSON Web Token. The user’s API token is returned only if the signature on the JWT is valid.

Pain point: Both BlueGenes server and InterMine server will need matching secret keys. A third party cannot host their own BlueGenes and point it at a remote mine while supporting OAuth2 without knowing that mine’s secret key (aka access to all accounts).

InterMine admins could potentially whitelist third party instances of BlueGenes by generating secret keys for them, but this would be an active process of curation and still give third parties full access to all Google accounts..

Solution 2: Shared database

BlueGenes accesses the user profile directly.

Pain point: This requires database access which entirely rules out remote instances of BlueGenes

Solution 3: Double Login

InterMine has a URL redirect for Google authentication. It accepts a URL of a BlueGenes instance and generates a link with an embedded API key.

  1. A user clicks Google Login on BlueGenes and is redirected to Google
  2. After authenticating the user is redirected back to the BlueGenes server.
  3. BlueGenes generates a JWT containing the user’s identity.
  4. A mandatory button is then shown to “Authorise My Account to use Remote Data Sources” (which means InterMine server).
  5. Clicking the button sends the user to a /service/google-auth end point on the remote mine with a return_to parameters containing the URL of BlueGenes.
  6. The return_to parameter is stored in the session and the user is sent back to Google Login where they authorise for the second time.
  7. After authenticating the user is redirected to an InterMine /service/google-auth-redirect end point.
  8. The /service/google-auth-redirect page automatically redirects the user back to the BlueGenes URL stored in the session with the API token as a parameter

A workflow would look something like this:

solution3.png

There are quite a few steps, but steps 5+ are automatic.

Pain point: Users will have to double authentication the first time they login to Bluegenes, but we can make this as painless as possible. Also, if an admin is running both InterMine server and BlueGenes server then they’ll need two OAuth2 projects in their Google developer console (also a one time activity).

Solution 4: Outsource

We use a third party single sign-on vendor such as https://auth0.com/

Pain point: We can’t guarantee that InterMine admins will remain within the Terms of Service for their free offering to open source projects. Otherwise it’s very expensive.

Solution 3 seems to be the most feasible and keeps InterMine and BlueGenes completely decoupled. (Thanks, Yo!)

Does anyone feel strongly about a particular solution, or have other advice for bridging the OAuth2 gap? Feel free to leave a comment or join in the discussion on our mailing list (mailing list subscription link is here: https://lists.intermine.org/mailman/listinfo/dev)

California Dreaming: InterMine Dev Conf 2017 Report – Day 1

2017’s developer conference has been and gone; time to pay my dues in a blog post or two.

Day 0: Welcome dinner, 29 March 2017

The Cambridge InterMine arrived at Walnut Creek without a hitch, and after a jetlagged attempt at a night’s sleep we sat down to a mega-grant-writing session in the hotel lobby, fuelled by several pots of coffee and plates of nachos.

By 7PM, people had begun to gather in the lobby to head to the inaugural conference dinner at the delicious Walnut Creek Yacht Club. We had to change the venue quite late on in the game, meaning we decided to wander down the street to collect some of the InterMiners who had ended up at the original venue (sorry!!). By the end of the meal, most of the UK contingent was dead on their feet – 10pm California time worked out to be 6am according to our body clocks, so when Joe offered to give several of us a lift back to the hotel, it was impossible to decline.

20170329_221945

Day 1: Workshop Intro

The day started with intros from our PI, Gos, and our host, David Goodstein. 

Josh and I followed up by introducing BlueGenes, the UI we’ve been working on to replace InterMine’s older JSP-based UI. You can view Josh’s slide deck , try out a live demoor browse / check out the source on GitHub.

Next came one of my favourite parts: short talks from InterMiners.

Short community talks

Doppelgangers – Joel Richardson, MGI

Joel gave a great presentation about Doppelgangers in InterMine – that is, occasionally, depending on your data sets and config, you can end up with duplicate or strange / incomplete InterMine objects in your mine. He follows up with explanations of the root causes and mitigation methods – a great resource for any InterMiner who is working in data source integration! 

Genetic data in Mines – Sam Hokin, NCGR/LegFed

Next up was Sam’s talk about his various beany mines, including CowpeaMine, which has only genetics data, rather than the more typical InterMine genomic data. He’s also implemented several custom data visualisations on gene report pages – check out the slides or mines for more details.

JBrowse and Inter-mine communication – Vivek Krishnakumar, JCVI

Vivek focused on some great cross-InterMine collaborations (slides here), including the technical challenges integrating JBrowse into InterMine, as well as a method to link to other InterMines using synteny rather than InterMine’s typical homology approach.

InterMine at JGI – Joe Carlson, Phytozome, JGI

Joe has the privilege to run the biggest InterMine, covering (currently) 72 data sets on 69 organisms. Compared to most InterMines, this is massive! Unsurprisingly, this scale comes with a few hitches many of the other mines don’t encounter. Joe’s slides give a great overview of the problems you might encounter in a large-scale InterMine and their solutions.

Afternoon sessions

FAIR and the semantic web – Daniela & Justin

After a yummy lunch at a nearby cafe, Justin introduced the concept of FAIR, and discussed InterMine’s plans for a FAIRer future (slides). Discussion topics included:

  • How to make stable URIs (InterMine object IDs are transient and will change between builds)
  • Enhanced embedded metadata in webpages and query results (data provenance, licencing)
  • Better Findablility (the F in FAIR) by registering InterMine resources with external registries
  • RDF generation / SPARQL querying

This was followed up by Daniela’s introduction to RDF and SPARQL, which provided a great basic intro to the two concepts in an easily-understood manner. I really loved these slides, and I reckon they’d be a good introduction for anyone interested in learning more about what RDF and SPARQL are, whether or not you’re interested in InterMine .

Extending the InterMine Core Data Model – Sergio

Sergio ran the final session, “Extending the InterMine Core Data Model“. Shared models allow for easier cross-InterMine queries, as demoed in the GO tool prototype:

This discussion raised several interesting talking points:

  • Should model extensions be created via community RFC?
  • If so, who is involved? Developers, community members, curators, other?
  • Homologue or homolog? Who knew a simple “ue” could cause incompatibility problems? Most InterMine use the “ue” variation, with the exception of PhytoMine. An answer to this problem was presented in the “friendly mine” section of Vivek’s talk earlier in the day.

Another great output was Siddartha Basu’s gist on setting up InterMine – outlining some pain points and noting the good bits.

Most of us met up for dinner afterwards at Kevin’s Noodle House – highly recommended for meat eaters, less so for veggies.