Looking ahead: InterMine+Google Summer of Code 2018. Could you be a mentor?

2017 is coming to an end, and I have to say it’s been a fabulous one! I’ll probably post a “cool things InterMine did this year” round-up in a week or two – but in the meantime, here’s my final Google Summer of Code blog for you all!  We’ll cover the InterMine swag just sent out across the globe, as well as plans for next year – and how you can help out.

Thank-you gifts for mentors and students

Last week, we posted care packages to all our GSoC mentors and summer students, in the form of t-shirts, stickers, and pens. The postal-service-wrinkled shirt shown above is the women’s fit shirt printed on black; unisex shirts are a slightly lighter grey colour. If you filled out the swag survey when it was sent to you, your gift should be with you soon! Tweet us your images of the items in use for extra InterMine Cool Points 😎.

GSoC 2018 – call for project ideas and mentors!

Early 2017, we put together an ideas list for GSoC projects – InterMine’s projects are numbers 3 to 9. If you want to get more of an idea what it’s like to apply, (or be a mentor), read our application guidance from last year.

Do you have a nifty idea, or an InterMine itch you’d like to scratch?

Please share it with us! Add it to our 2018 Google Summer of Code ideas list, or if you need to sound things out and discuss them a little bit, comment on the GitHub issue, or email the dev list. You can even propose several ideas, if you like! Please add all ideas by the end of 14th of December (end of this week).

Would you like to try mentoring?

Fancy a chance to earn some nifty exclusive swag like pictured above? Add your name as a possible mentor to an existing idea (or your own new idea). You can always drop us a line if you want to discuss things first. We like projects to have more than one mentor if possible.

Maybe you’re a student thinking of GSoC?

Awesome! If you have your own InterMine project idea (whether it’s brand new or you’ve already started it), or if one of the ideas on our ideas list lights your fire, it’s not too early to start talking with potential mentors about it. The application guidance we mentioned above would be a good read, too.

 

 

Advertisements

Community Outreach: What we’re up to & how you can participate

A large part of working in open source and science is sharing what you do with others – it’s not just about code and papers. We have quite a bit going on and coming up that we’d like to share and get your ideas about.

Community outreach calls

We’ll experimentally be trialling a community outreach call on December 7th at 5PM GMT. This happens at the same time as our normal developer call usually would, but we’re specifically focusing on community members and ways to communicate and help them out. It will not have a focus on technical issues or code.

Developers are still entirely welcome to come along, but please encourage your curators, enthusiastic users, and outreach people to come along too! Agenda

Open outreach repo on GitHub

We’ve created a GitHub repository dedicated to outreach-related topics. The idea is to take discussions out to the open about what we’re doing so others can chime in and/or re-use or work. Examples include:

Science Festival – March 2018

We’ll be participating in the Cambridge Science Festival, teaching about better data enabling better science. The basic idea is teach this through gameplay with puzzles, rewarded with candy and stickers. Do you have kids who might be willing to playtest our ideas? Let us know!

Webinars and tutorials

We’ve done workshops in person, we did a developer workshop: we’d like to try something online this time! What formats interest you / your users the most?

  • A series of short 5-minute-ish webinars covering various topics
  • A longer training session, covering querying InterMine via website and/or API? Perl, Python or R?
  • Other? Share your feelings in a comment, contact us, or add to the GitHub issue
  • Maybe you’d like to volunteer to run one!

Google Summer of Code

Do you have an idea for a fun InterMine project that would only take a couple of months? Or maybe you would like to mentor a project over the summer? We had a great time during GSoC this year, and we’re planning to apply to do it again next year. Interested? More info on GitHub.

Rachel’s world tour of the UK

As part of the upcoming ISA-InterMine cloud grant, Rachel will be visiting bioinformatics cores and labs to try and solicit use-cases from people who are working with biological data right at the front. Want to help our or invite us to your lab? Get in touch.

Guest blogging

Come tell our followers about the awesome thing InterMine thing you just did. A conference? a talk? a new features or exciting dataset in your mine? We’d love to be the platform for your voice!

 

 

Talks and Workshops: Sharing our materials for re-use

Would you like to grab some ready-made slides or InterMine training workshop materials? We’ve rounded up of some recent things that have been going on. Feel free to remix materials for your own talks and outreach efforts. If you do use them, we’d love to see the result!

Slides

You should have permissions to make a copy; if not, please contact us / tweet us / pop by chat to poke us with a stick.

3-min lightning talk at GSoC Mentor Summit: Citable version on FigshareGoogle Drive (editable) version

Better Science Through Better Data: Citable version on Figshare | Google Drive (editable) version | Featured image above was live-scribed during  the talk. Licence is CC-BY from Springer Nature, and the image is available from https://figshare.com/articles/Better_Science_through_Better_Data_2017_scidata17_scibe_images/5558653

Blank InterMine-branded slides: Get ’em here.

Posters

BlueGenes Poster: This poster was presented at BOSC 2017Citeable version on F1000Inkscape editable version –  (download Inkscape here: https://inkscape.org/en/release/0.92.2/)

InterMine Poster for Elixir UK All Hands 2017: PDF version | Inkscape editable version 

Workshop learning materials

We run an InterMine training workshop every term, covering the basics of using the webapp, as well as discussing how to draw data from the API. If you’re near Cambridge, keep your eyes open on the blog or twitter feed, as we’ll always announce them well in advance.

Workshop training materials in PDF: Workshop Exercises – handouts with answers | Workshop slides – note that these exercises were all correct with data from HumanMine in October 2017. Numbers of results may change if we add or update new data sources in the future, but the majority of the materials should still be generally correct apart from the results counts. 

You can download the original OpenOffice files as well if you’d like to adapt the materials for your own workshops, or feel free to contact us if you’d like to coordinate some training with us.

Side note: We’re also delivering a half-day workshop training session as part of the EBI’s 4-day Introduction to Multiomics Data Integration course – applications are open now until 01 December 2017.

Refs:

Data, Scientific (2017): Better Science through Better Data 2017 (#scidata17) scribe images. figshare.

https://doi.org/10.6084/m9.figshare.5558653.v1

Retrieved: 15:48, Nov 06, 2017 (GMT)

InterMine 2017 Fall Workshop – Biological Data Analysis using InterMine

University of Cambridge is hosting an InterMine workshop 27 October 2017.

The course is aimed at bench biologists and bioinformaticians who need to analyse their own data against large biological datasets, or who need to search against several biological datasets to gain knowledge of a gene/gene set, biological process or function. The exercises will mainly use the fly, human and mouse databases, but the course is applicable to anyone working with data for which an InterMine database is available.

The workshop is composed of two parts:

Part 1 (2.5 – 3 hours) will introduce participants to all aspects of the user interface, starting with some simple exercises and building up to more complex analysis encompassing several analysis tools and comparative analysis across organisms. No previous experience is necessary for this part of the workshop.

The following features of the InterMine web interface will be covered:

  • Search interfaces and advanced query builder
  • Automated analysis of sets, e.g gene sets, including enrichment statistics
  • Analysis workflows
  • Tools for cross-organism analysis between InterMine databases.
  • Web services

Part 2 (1 hour) will focus on the InterMine API and introduce running InterMine searches through Python and Perl scripts. While complete beginners are welcome, some basic knowledge of Perl, and/or Python would be an advantage. The InterMineR package will also be introduced. Those not interested in this part of the workshop are welcome to leave or there will be a more advanced exercise using the web interface available as an alternative.

See here for details: https://www.gen.cam.ac.uk/events/intermine-training

 

 

InterMine 2.0 – Summer update

InterMine 2.0 is a large, disruptive release scheduled for this autumn, before the Xmas holidays.

There are lots of exciting features, but they will require InterMine maintainers to update their mines. Usually devs are able to update their mines with a simple git pull request. In this case, they’ll have to take specific actions to make sure their software is up to date.

Model changes

Several changes and additions to the core InterMine data model were discussed and approved by the community. See here for specific details on the new core data model.

This means that it’s likely that an InterMine 2.0 webapp will require a database built by InterMine 2.0 code.

Blue Genes

InterMine 2.0 will come with detailed instructions on how to deploy the new InterMine user interface.

Come to the next InterMine community call to see a demo of the latest features!

Gradle

We’ve got a new software build system in the works. This will change the commands you use to build a data source and deploy your webapp. See a previous blog post for details.

Closer to the time, we’ll release detailed instructions on how to update your build system to work with the new tools. And as always the InterMine team will be on hand to answer any questions or issues on the community calls and the dev list and chat.

We hope to make the transition as easy as possible!

Software Dependencies

All software dependencies will need to be on the latest version.

  • Java 8
  • Tomcat 8.5.x
  • Postgres 9.4+

API Changes

We are making some non-backwards compatible changes to our API.

/user/queries will be moved to /queries

These three end points have a parameter called xml which holds the XML query. We are going to rename this parameter to be query (as we now accept JSON queries!) to match the syntax of all the other end points.

/query/upload
/template/upload
/user/queries (POST)

If this update is going to cause you any trouble at all, please let us know ASAP!

 

If you have any questions or concerns about any of these changes, please contact us or come along to the community calls.

 

 

 

Toxygates: exposing toxicogenomics datasets and linking with InterMine

This is a guest post from our colleague Johan Nyström-Persson, who works with ToxyGates and the NIBIOHN in Japan.

Toxygates (http://toxygates.nibiohn.go.jp) has been developed as a user-friendly toxicogenomics analysis platform at the Mizuguchi Lab, National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN) in Osaka since 2012. The first public release was in 2013. At this time, the main focus of Toxygates was exposing the Open TG-GATEs dataset, a large, systematically organised toxicogenomics dataset compiled during more than a decade by the Japanese Toxicogenomics Project (http://toxico.nibiohn.go.jp). This dataset consists of over 24,000 microarray samples. To make use of such a large dataset without time-consuming data manipulation and programming, it is necessary to have a rich user interface and access to many kinds of secondary data.

Toxygates allows anyone with a web browser to explore and analyse this data in context. Various kinds of filtering and statistical testing are available, allowing users to discover and refine gene sets of interest, with respect to particular compounds. For a reasonably sized data selection, hierarchical clustering and heat-maps can be displayed directly in the browser. Through TargetMine (http://targetmine.nibiohn.go.jp) integration (based on the InterMine framework), enrichment of various kinds is possible. Compounds can also be ranked according to how they influence genes of interest.

To support all of these functions, we came up with the concept of a “hybrid” data model which recognises that, while gene expression values by themselves may be viewed as a large matrix with a flat structure, secondary annotations of genes and samples, such as
proteins, pathways, GO terms or pathological findings, have an open-ended structure. Thus, we combine an efficient key-value store (for gene expressions) with RDF and linked data (for gene and sample annotations) to allow for both high performance and a flexible data structure.

Today, the project continues to evolve in new directions as a general transcriptomics data analysis platform. We have integrated Toxygates not only with TargetMine, but also with HumanMine, RatMine and MouseMine. Recently, users can also upload their own transcriptomics data and analyse it in context alongside Open TG-GATEs data. We may
also add more datasets in the future.

P1000874The current project members are Kenji Mizuguchi (project leader) and Chen Yi-An (NIBIOHN), Johan Nyström-Persson and Yuji Kosugi (Level Five), and Yayoi Natsume-Kitatani and Yoshinobu Igarashi (NIBIOHN).

InterMine 2.0 – Gradle

NB: To upgrade to InterMine 2.0 you must not have custom code in the core InterMine repository.

We have been planning out the tasks for future InterMine, and there is a lot of exciting projects on the horizon. We’re making InterMine more FAIR, putting InterMine in Docker and the cloud, our beautiful new user interface, Semantic Web and so on.

However a prerequisite for these exciting features is to update our build system. We are still using ant and it’s grown, let’s say, “organically” over the years — making updates and maintenance expensive and tedious.

After careful consideration and looking very seriously at other build and dependency management systems we’ve decided on Gradle. Gradle is hugely popular with a great community, and it’s used by such projects as Android, Spring and Hibernate. We were really impressed with Gradle’s power and flexibility, being able to run scripts in Gradle will give us the power we need to accomplish all our lofty goals.

Our goals for moving to Gradle

Managed dependencies

Our dependencies currently are manually managed — meaning if we need a third party library, we copy the JAR manually into our /lib directory. This is unsupportable for modern software and has resulted in lots of duplication and general heartache. With Gradle we can instead fetch dependencies automatically from online repositories.

A smaller repository

Implementing Gradle will allow us to replace many of our custom Ant-based facilities with Gradle infrastructure and widely-supported plugins. Our codebase will become smaller and more maintainable as a result.

A faster build

Currently, due to the way that InterMine implemented a custom project dependency system in Ant,  every InterMine JAR is compiled on every build and every time a webapp is deployed. This is unnecessary and wastes developer time. We will use Gradle’s sophisticated dependency management system to make the InterMine build more robust and efficient.

Maintainable, extensible, documented

The current Ant-based InterMine build system has been extended over the years as needed in an ad-hoc manner, and unfortunately no documentation exists. Adding a new ant task is a challenge, and debugging the current build process is time consuming and difficult. Moving to Gradle will base InterMine on a well maintained, extensible, documented and widely-used build system.

Simpler to run test suite

Currently, developers have to create property files and databases to run the full system tests, steps that are not straightforward to perform or execute. With Gradle’s help we hope to make this much easier, so that the wider InterMine community can benefit from running the InterMine test suite on their installations and code patches.

Simplicity

Finally, Gradle’s tests are in the same project as the main directory, thus cutting the number of separate projects will be cut in half. In addition, when building, the tests will be run automatically.

As an example, here is a new standard Gradle directory layout:

src/main/java
src/main/resources
src/test/java
src/test/resources

Currently our main and test projects are in different packages but in InterMine 2.0 these will be unified under single projects, as per standard practice.

What does this mean for you and your InterMine?

If you are currently maintaining an InterMine, moving to InterMine 2.0 is going to require a bit of effort on your part.

Operationally, commands such as database building and web application publishing are very likely to use Gradle commands rather than Ant targets or custom scripts. Users who have scripts to manage InterMine installations will need to adjust them appropriately. This shouldn’t require too much work.

InterMine users who have custom projects in the bio/sources directory to load data sources will need to make more adjustments. Project structures in InterMine 2.0 will not be the same as in earlier versions, since they will follow Gradle conventions rather than custom InterMine ones. However, the changes will not be major and we will provide a script to do as much automatic updating of custom sources as possible.

The greatest migration work will come for the most sophisticated operators who have directly patched core InterMine code. In this case, there are two options. Firstly, they can continue to patch and build core InterMine JARs themselves, though they will need to make adjustments for the Gradle build process. Secondly, we can work with them to add new configuration parameters to core InterMine to make such patching unnecessary, wherever possible. In both cases work will be required but the effort should not be large, since it is largely the structure of code that is changing rather than core logic or functionality.

 

This is a significant transition but one that should put InterMine on a solid base that lowers long-term maintenance costs and makes lots of exciting stuff possible in the future. As ever, please contact us if you have any concerns and we look forward to discussing this and any other subjects on community calls, blog comments, in our Discord chat and on the mailing list!