GSoC Student Interview spotlight: Natural Language to InterMine Queries + Jake Macneal

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Jake Macneal, who will be working on converting natural language phrases to InterMine PathQuery.

Hi Jake! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

jakeExcited to be joining the team! I’m an undergraduate studying computer engineering at McGill University in Montreal, about to enter my final semester in the fall. I’m originally from Philadelphia in the US, and I’ll be hopping around North America a little bit this summer (currently in Toronto). I’ve got a passion for robotics and artificial intelligence, which led to me joining my university’s robotics team to help design and build a Mars rover. Additionally, I’ve had the opportunity to intern at NASA Johnson Space Center in Texas, where I worked on a project which uses machine learning to track sensors around the space station (hopefully it’ll be put into use soon).

Aside from those technical interests, I enjoy soccer/football (both playing and watching), classical guitar, analog synthesizers (just getting into this but they’re really fun and fascinating), and the field of space exploration. Part of me is still holding on to the hope of becoming an astronaut some day.

What interested you about GSoC with InterMine?

I searched the GSoC organizations page for projects looking for a Clojure developer, and this was unsurprisingly one of the only ones. However, a language is hardly enough motivation to become passionate about a project. I’ve never had the chance to work in bioinformatics, but I did have a beloved computer science professor (Matthieu Blanchette) whose research was in that field, and he often spoke during lectures about his research. When I read through the organization and task descriptions I immediately thought of him, and knew that this would be a cool project to join. Nothing is more rewarding to me than the thought of using software as a tool to help others do good.

Tell us about the project you’re planning to do for InterMine this summer.

InterMine uses a graph query language (PathQuery) to retrieve information from the database. My project is to implement a more user-friendly alternative, allowing non-technical users to interact with an InterMine database without the need for esoteric queries crafted by an experienced programmer or system administrator. This will take the form of a natural language to PathQuery translation tool, written as a Clojure library. In addition, I’ll be building a proof-of-concept interface allowing novice users to submit English queries which will be translated and then submitted to the query engine. This simple app will be integrated with the InterMine web app, similar to the graphical query builder.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Natural language processing is a difficult field, and from working on a compiler, I learned that the key to building a correct parser and code generator is a huge number of test cases. Fortunately, the basic principle behind testing a translation tool is simple: assemble a set of English queries, along with the intended output (in the form of a PathQuery string). However, actually assembling such a set of tests which are useful and demonstrate realistic/important queries requires interacting with actual users in the community. I hope to spend much of my initial weeks working with the community to figure out the syntax they’d like to see supported, as well as the types of queries already being written in PathQuery.

Share a meme or gif that represents your project

jake-meme

Advertisements

GSoC Student Interview spotlight: Buzzbang Bioschemas search + Ankit Lohani

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Ankit Kumar Lohani, who will be working on Buzzbang.

Hi Ankit! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hello InterMiners, I’m Ankit Lohani, a final year undergraduate student, Indian Institute of Technology, Kharagpur, India.  I will be completing my undergraduate studies in a few months with my major in Chemical Engineering.

Right from my first year, I have been interested in robotics and programming and my interest in this field has only grown with time. Initially, I spent over a year working on the hardware front and then shifted to making path planners for our soccer playing bots. Though my academic background has been completely different, it has only pushed me forward to work harder and to learn more. My interests are inclined towards natural language processing and information retrieval.

Apart from these, I love travelling and trekking. I am also planning to complete my 3rd trek this summer, this time above 14,000 feet.

What interested you about GSoC with InterMine?

I have never worked on an open source project and I realized that GSoC is the best place to start learning and seeing my stuff at work. Honestly, while looking for organisations, in which I may be able to contribute, I came across InterMine and the various projects enlisted here. The application domain of InterMine is very appealing and I could relate myself with this organisation because of two key reasons – first, my past internship was on information retrieval on clinicaltrials.gov data. I touched upon various topics like – semantics, ontologies, UMLS (Unified Medical Language System), PubMed, Named-Entity Recognition for biological terms etc. Secondly, because the technologies used in this project were something I have been familiar with as a part of my course and term projects, like Solr, elasticsearch, docker. Apart from these, the project itself has got a unique potential to create a breakthrough in the way complex scientific data is organised on the internet.

Tell us about the project you’re planning to do for InterMine this summer.

My project – Buzzbang, is significantly different from all other InterMine instances and it focuses on scraping all the data we have on internet marked with bioschemas.org and indexing them in a search tool – Apache Solr. So far, a basic scraping module and an indexing engine are up and running. I am planning to integrate “Scrapy” for crawling and indexing new paths and upgrading the Solr search tool in this project. Towards the end of this project, I will make sure all the changes are reflected in the front-end as well.
Are there any challenges you anticipate for your project? How do you plan to overcome them?

I believe there could be some serious challenges that I might face with Scrapy. Making a generalised scraping tool looks easy with the data having bioschemas.org markup, but, the organisation of this data on various domains varies, and crawling across some of those domains might not be a simple task. Moreover, we are also planning to introduce some degree of parallel processing to this module. Though my focus would be on EBI biosamples domain, which should make my task easier, I will try to keep the crawler as general and powerful as I can. Additionally, I suspect I would need some help in planning the architecture for the re-crawling and re-indexing part from the community. I am not very sure about what level of automation would be desirable in this project with respect to the previous point.

GSoC Student Interview spotlight: Cross InterMine Search Tool + Aman Dwivedi

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Aman Dwivedi, who will be working on the Cross-InterMine Search tool.

Hi Aman! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Heya! I’m Aman Dwivedi, a final year undergraduate student from Jabalpur Engineering College, India. I’m a web enthusiast and a Javascript lover (JS is love <3). I have worked with two startup companies as a Full Stack Node.js Developer Intern in the past. I’m also a proud member of the Mozilla Open Source Community (I have worked on the renowned Mozilla Firefox project). I have worked with many great programmers in the past and I’m extremely excited to work with the InterMine team.

What interested you about GSoC with InterMine?

I believe in the fact that a good open source community comes with its members sharing ideas and helping each other throughout. The sign of a good team is a friendly, yet productive environment. The best thing about InterMine is its team and its proud contributors. Everyone has a great helping attitude. The Application Phase was awesome, and I never had such a great experience in any of the past teams I worked with. Everyone is so much enthusiastic about new features and new implementations all the time. Also, one more brownie point is that my project work here will affect a very large scale of society (this is the most important motivating factor for me <3).

Tell us about the project you’re planning to do for InterMine this summer.

I will be working on the Cross InterMine Search Tool. This project will be developed from scratch. It will use the InterMine APIs and the registry to fire concurrent requests to all the selected InterMines for a search query. The project will be developed in Vue.js. It will have a great impact as currently there is no such tool which is capable of searching multiple mines at once. It will make life of all InterMiners and researchers very easy to search and browse through genomic data in all the InterMine instances.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The most important thing in the development of an open source project is the community. I will need suggestions and user reviews from the community to make the project better. My first priority is always the Community User experience. Suggestions will be really valuable throughout the project development, testing and the documentation phase.

mamandebug.png

GSoC Student Interview spotlight: ElasticSearch / Solr Project + Arunan Sugunakumar

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Arunan Sugunakumar, who will be working on upgrading InterMine’s search facilities.

arunan-architecture.png

Hi Arunan! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

I am Arunan Sugunakumar, an undergraduate from Department of Computer Science and Engineering, University of Moratuwa. I am attracted to the concept of open source because I get to learn a lot by seeing contributions from other people all over the world and I learn by contributing myself. I did my internship in WSO2, a open source middleware company. I mostly contribute to Java, Python and JavaScript related projects. I am also interested in Internet of Things and Big Data stuff.

I like to read books in my spare time. It helps me to clear my mind. Also I like to play scrabble which is a popular word game.

What interested you about GSoC with InterMine?

I came to know about InterMine through a friend, and when I went through the project ideas and the community, I fixated in my mind that I should give a try to be part of this organization. Most of the project ideas were associated with core InterMine product rather than trial and error projects. So I know if I become a part of it, my contributions would be there in all InterMine instances. That gave me most of the excitement and the mentors were also very friendly and supportive.

Tell us about the project you’re planning to do for InterMine this summer.

Currently InterMine uses an outdated library to handle bio data search. My project aims to improve the search feature using modern search engines like Apache Solr / ElasticSearch. The existing architecture in InterMine has to be modified to handle the new approach and it should reduce the complexity to the user.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The main challenge for me is to understand the existing code base so that I can change it without breaking the workflow. I need to work closely with my mentor and need to update them with every change I make. Also I have to communicate my doubts to the community in a friendly manner so that I can get input from everyone.

Another challenge that I might face is choosing the appropriate search engine. There are many open source search engines out there and all of them are best in their own way. So I need to discuss with my mentor to select an appropriate search engine that would be suitable for the project.

Share a meme or gif that represents your project

apache-solr-spongebob.gif

GSoC Student Interview Spotlight: InterMine Data Browser + Adrián Rodríguez Bazaga

This is our blog series interviewing our 2018 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Adrián Rodríguez Bazaga, who will be working on the InterMine Data Browser.

Hi Adrian! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

adrian.pngHi, I am Adrián Rodríguez Bazaga, a student from UPC, Barcelona (Spain). I am a Computer Scientist who is currently pursuing a Master’s degree in Data Science and Machine Learning. One of the things that characterizes me, is my desire for science, and to solve problems using technology, and when it’s posible, do it in collaboration with other enthusiasts in the field, which is how the Open Source philosophy works!

Apart from this, I love animals, especially cats and Cavalier King Charles Spaniels, I could spend all day long cuddling them if I had the time. I also love to play chess and any kind of board games in my spare time!

What interested you about GSoC with InterMine?

As a student, I’m still learning about everything that interest me, and although my major is Computer Science (and Artificial Intelligence related topics), I’m very interested in the bioinformatics world, a landscape where InterMine lies around, and, consequently, gives me the perfect opportunity to learn the “bio-concepts” behind the project on which I am involved, by applying my Computer Science skills.

 

Tell us about the project you’re planning to do for InterMine this summer.

Currently, the InterMine services offer a query builder to search for biological data over the different mines. Although this is a very useful tool, the user needs to know how the data is structured (data model) on each mine, in order to create the desired queries. Since knowing the data model is mandatory to use this query builder, it can, indeed, become overwhelming for new users who want to search for some specific information in the data.

On top of this idea, my project is to implement a faceted search tool to display the data from InterMine database, allowing the users to search easily within the different mines available around InterMine, and we have already made some advancements on that, as you can check in the following picture:

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The main challenge of my project lies on the fact that, the data browser is intended to be used by beginners, without the added difficulty of knowing deeply the data model, this means that I will need to deploy an application capable of working with all the functionality of searching on InterMine repositories but with an easy-to-use interface for users, which is by itself, a great challenge.

Share a meme or gif that represents your project

 

GSoC Student Interview spotlight: InterMine Python Client + Nupur Gunwant

This is our blog series interviewing our 2018 Google Summer of Code students, who working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Nupur Gunwant, who will be working on the InterMine Python Client.

Hi Nupur! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

I am Nupur Gunwant, a student from IIT Kharagpur, India. I am pursuing an Integrated Masters degree in Mathematics and Computing. I am an open source enthusiast and a maths lover. I love to solve problems and talk about ideas. I firmly believe in the power of Python and admire its versatility.

Apart from that, I am a lover of art. I want to further pursue my studies in the intersection of my artistic and technical interests. And most importantly, I always carry a book wherever I am.

What interested you about GSoC with InterMine?

I was deeply intrigued by the work InterMine does and as a student, I wanted to work with an organization with such a huge impact on the society. Another thing that motivated me towards preparing hard to work with InterMine as a student developer was the fact that it’s such a healthy and friendly community, where ideas are appreciated and one is always motivated to work on them. I think that made InterMine the most desired place to work with.

Tell us about the project you’re planning to do for InterMine this summer.

I will be working on adding functionalities to Python Client, a very important part of InterMine at present. I will begin with creating a link between the InterMine Registry and Python Client, so that the user can make use of the Registry features on the terminal.

Further I will build a Query Manager that will be a key source to perform operations on user queries using the terminal and lastly, I will add visuality to the Python Client using matplotlib.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The biggest challenge to meet all the needs of the user for the Client in all the three subparts of my project. I am planning to make community interactions and their feedback the greatest source of review on my work, which because of the communities’ experience in user experience should help a great deal in overcoming this problem.

Share a meme or gif that represents your project

Webp.net-gifmaker (1)

GSoC 2018 Students Announced! 🌞☀️

After last year’s great success, we’re really excited to welcome six Google Summer of Code students to work with us again this year:

Aman Dwivedi will be working on a Cross-InterMine search tool. This will use the registry to allow users to search multiple InterMines at once, and should be a good way to figure out which mine has the data you’re looking for. Aman will be mentored by Nadia Yudina, herself a graduate of one of last year’s InterMine+GSoC program.

Adrián Rodríguez Bazaga will be working on something we’ve always wanted: an InterMine data browser – hopefully a tool that will allow users to learn a bit more about data inside an InterMine without having to know the data model. Yay for easier learning curves! Adrian’s mentor will be Yo Yehudi.

Arunan Sugunakumar is going to explore hooking InterMine up to a more modern search package, probably Solr or ElasticSearch. Our current version of Lucene is very old, and we know there are better options out there!  Daniela Butano will mentor this project.

Jake Macneal is going to work on a prototype to convert natural language questions into InterMine PathQuery – it would be exciting to have a user type “Show me all the genes associated with diabetes” into an InterMine, and get a sensible set of results back! Aaron Golden will mentor Jake.

Nupur Gunwant will be adding additional features to our python client, such as registry communication, a query manager, and visualisations. Julie Sullivan will be Nupur’s mentor for this project.

Ankit Kumar Lohani will be working on Buzzbang – a search engine to crawl multiple biological sources including, but not exclusively, InterMine instances. Justin Clark-Casey will be Ankit’s mentor.

We’re also planning to post a short interview series highlighting each student and their plans for the summer. We can’t wait to get started!!