Roshni Prajapati on BlueGenes UX, user research, and saving people from bad design

This is our blog series interviewing our 2020 Outreachy interns, who are working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Roshni Prajapati, who will be working on UX research and recommendations for BlueGenes.

Hi Roshni! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself? 

Hi team, can’t deny the wait for a totally new experience is driving me crazy here! I’m an IT undergraduate pursuing bachelors of Technology at IIIT Allahabad and will be onboarded to IIIrd year from August onwards. Primarily my interest lies in interaction design, user research, product thinking and a bit of graphics design still I don’t mind banging lines of code to build stuff that interest me. Few of my works could be seen here & here.

Some days I try to solve user issues by merging aesthetics, a bit mathematics & data in unequal proportions while other days I can be spotted preparing for my upcoming hackathon, lying all day watching cartoons or enjoying 70s-80s classical playlist. 

    Other than this I’m a wanderlust person, a guitarist, a painter, an intermediate football and Table Tennis player and a coffee addict 😛 

What interested you about Outreachy with InterMine?

I have this craving of improving things to redefine work for living breathing humans i.e,  to save them from bad design. Case with InterMine is that while surfing through the mine-sites I noticed it mostly comprises analytical data and their representation. The current website has several user issues & pain points, also naive look and presentation of the data is not apt and even violates some design rules. This made me dive deeper into the real world biodata and their visualization for better usability of the website added the fact that the organization itself registered a design issue (driving me more to work).

    One of the facts is that design analysis needs views from users and developers and it becomes important that the community interacts. So I needed a better understanding of the real world bio data (new to me) and mentors willingly helped, this everready response brought an optimistic vibe to work for the team and organization.

Tell us about the project you’re planning to do for InterMine this summer.

The content layout in the current website design needs to be strategically placed in order to make it easier for users to go through. Since the site contains heavy analytical and a variety of biological data, my task will be to organize the website content such that users can find the things at ease, improving overall user experience. So basically I would try to carry out my process in following phases-

Discover & Define: Carry out questionnaire sessions and meetings for collecting user experience observations then interpret the observations and define insights. I will try to convey my ideas through user personas & stories and finally set my design challenges.

Develop & Deliver: and further will discuss ideas and through sketching and experimenting and prototyping by working on feedback iteratively.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Previously I have worked on several projects of my own and this would be the first time I would be working with a community. So collecting user experience observations remotely through unit testing and other methods is gonna be quite a challenge for me. One of the major tasks also includes my contribution in implementation of design of which I’m concerned. Since this is gonna take some time, it could be counted as another challenge still I’m pretty much sure that work would get done under the time duration provided 🙂 

Share a meme or gif that represents your project

GSoC Interview: Akshat Bhargava on new data visualisations for BlueGenes

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Akshat Bhargava, who will be creating data visualisations for BlueGenes.

Hi Akshat! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

akshat1Hi InterMine team, I’m very excited too for this upcoming summer! I’m a Computer Science undergraduate going to start my 3rd year this August. I’m primarily a Javascript Developer (Web & Hybrid Mobile) and have been working with it for the last 2.5+ years, but the real me is a person who loves to solve problems in general, may they be related to programming or not. I’ve been exploring the field of data visualization for the last few months and I am in love with it. Have a look at IPL (cricket) data viz I created a few months back here.

Apart from coding, I love reading about psychology, history and watching horror movies.

What interested you about GSoC with InterMine?

I feel it magical how numbers show their true faces when seen via a meaningful visualization, and this is why I’m most excited for this summer with InterMine.

Real World Bio Data + Data Viz = Something big coming in! ❤

Another reason for my interest in InterMine, is that I applied to InterMine last year too for Cross InterMine Search Tool and couldn’t make it, but understood it’s community and how they work. The mentors are very helpful and supportive to everyone, so I directly jumped here this year. 😀

Tell us about the project you’re planning to do for InterMine this summer.

InterMine has tons of different types of biological data, this summer I’ll mostly be working on discussing and developing visualizations for data, making it easier to biologists to understand it in a easier way, and draw relevant conclusions with a single sight to the graphs.

There is a software called BlueGenes, which is already developed and helps explore different mines, it provides a tool API which allows Javascript developers to create additional visualization tools on top of it, which can be integrated on any Gene or Protein result page. My goal for this summer is to create a different variety of such visualization tools in order to enrich the visualization of different types of data.

As an example of how useful is what I’m doing, one of the visualizations I’ll be developing will help us understand how the expression of a particular gene is distributed among different tissues. This information is helpful for cancer biologists that want to assess if a gene is highly expressed across different tissues of an organism, because that gives a relative picture on to what degree it’s implicated in diseases.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Data visualization is something that requires you to understand the data properly first in order to be able to actually create some meaningful visualization out of it, and since I’m not very familiar with InterMine’s data model and the related biological terms, I’ll face some difficulties during my thought process of “what and why” to visualize. To overcome this, I’m already exploring more and more of the InterMine’s data model, trying to understand how to deal with different types of data, and how to create the appropriate visualization for them. Mentors are really helping me out with this (overall in terms of tech, viz and everything). 🙂

Share a meme or gif that represents your project

akshatmeme1

GSoC Interview: Migrating from Struts to Spring with Prabodh Kotasthane

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Prabodh Kotasthane, who will be working on a project to migrate InterMine’s RESTful web services from Struts to Spring.

Hi Prabodh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

I am a final year Computer Science Engineering (B.Tech.) student from Birla Institute of Technology, Mesra, India. I have been coding in  JAVA and related frameworks, C/C++ and python since last 4 years. I have contributed to Open Source Community, some software development projects and hackathon projects at college level. Prabodh
I successfully completed GSoC 2018 with OpenMRS. My project was OAuth Module Enhancement and SMART Apps Support. Details about the project can be found here:

https://pkatgithub.github.io/GSoC-2018-Final-Evaluations/

Presently I am doing internship under Microland Limited, Bengaluru, India till end of May 2019. Here I am working around graph databases and technologies like Apache Kafka and Neo4j with all the coding part done in python.
Apart from coding, I have many other interests and hobbies which include singing, cooking, photography, fine arts, basketball and writing.

What interested you about GSoC with InterMine?

It was around mid Jan this year when I got to know about InterMine.
Previously, I have worked with Java Spring Framework and hence I am comfortable with the same. So, I was searching for GSoC organisations which have something to do with Spring and then I got to know about InterMine and their project in which they were planning to migrate the web-services from Struts to Spring.
I read more about InterMine and also about the project, and I found it interesting. I went through the documentation of the project and joined the discord handle of InterMine so that I could connect with the mentors and the community.
I had a warm welcome into the community. Julie, Daniela and Yo are always excited to chat and exchange thoughts and it’s been such a good time with them till now.
All in all, this community, this project and the people associated to this project made me believe that I can do a GSoC with InteMine!

Tell us about the project you’re planning to do for InterMine this summer.

Presently InterMine uses Struts framework which is outdated. InterMine provides RESTful web-services which facilitates to execute custom or templated queries, search keywords, manage lists, discover metadata, perform enrichment statistics and manage user profiles.
The main objective of this project is to migrate the web-services from Struts to Spring framework and document the APIs with Swagger in compliance with OpenAPI Specifications.
Spring framework is evolving all the time and is more robust and flexible as compared to the Struts framework.
OpenAPI specifications are easy to write and Swagger Codegen, which supports Spring, makes the job of developer easy by generating the code stubs which can be modified to render the services.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

InterMine has a lot of web-services, a total of 70, with various different functionalities.
The business logic of web-services is strongly dependent upon the
classes in webcore and in order to migrate a web-service, the knowledge of underlying logic layer is a must.This is going to a real challenge. It is a requirement to give proper time and understand this business logic layer of the project.
Apart from this, writing tests is also a time taking job. I wish I could get some help in that! 😛

Share a meme or gif that represents your project

PrabodhMeme

GSoC Interview: InterMine Schema Validator with Deepak Kumar

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Deepak Kumar, who will be working on the InterMine Schema Validator.

Hi Deepak! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi, Thank you for this opportunity, Let me first talk about myself, My name is Deepak Kumar, I live in Ahmedabad, India with my family. I started coding when I was in 17, I had two great teachers in my school days who introduced me to computer programming, and from that time I got interested in this field.

I completed my graduation in Computer Applications from St. Xavier’s College, Ahmedabad and currently I’m doing  Post Graduate Program MSC.IT(Information Technology) at DA-IICT, Gandhinagar, India.

Now talking about my technical details, I love working on challenging projects, I’ve worked on several projects, One of my favourite project that I created while pursuing my bachelors was ‘Smallscript’, It’s a compiled programming language that compiles to bytecode and runs on JVM that makes it platform-independent. It’s my favourite project because It was challenging and when I started with the project I didn’t know any technical detail about compilers, so I had to start from very scratch.

I’ve also worked with a startup company, where I worked as a backend-developer with a team of 8 people and our team was really fantastic, I worked on two projects there, and I really enjoyed it, working with a big team wonderful experience.

I’ve recently started my open source journey with GSoC 2019. Though I’m new to open source, I’ve started contributing to ‘JabRef’ and as I’m selected for GSoC 2019, I’m also going to work with Intermine this summer, and have future plan to contribute to Intermine after completion of GSOC. I also regularly participate in coding contests and hackathon, In one of the AI contest, I built an AI game that ranked 68 among thousands of participants.

Currently, I’m working at OpenXcell Technolabs as an Intern, which is part of my MSC.IT Master’s program. I love reading, travelling, table-tennis and working with new technologies.

What interested you about GSoC with InterMine?

When GSoC 2019 was about to start, I had already bookmarked a few of the previous year organizations I was interested in, and hoping that Intermine will be part of GSoC 2019 too. When the organization list came out, I was super excited to see Intermine in the list. After going through the Intemine’s idea list, I found myself very interested in ‘Intemine Schema Validator Project’, So it was really the Intermine’s project that made me interested in the community.

Tell us about the project you’re planning to do for InterMine this summer.

I’ll be working on a project named ‘Schema Validator’ for Intermine this summer. Well, the project is quite simple to explain, it’s going to be a library that takes a file as input and outputs whether that file is following a particular schema or not. While working on the project my goal from the first day would be to create this project as general as possible, so that the project can be easily extended to support other schemas as well.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Yes, there are few challenges that I will face while working on this project, One of the biggest challenges which I’m currently trying to solve is about performance. As the purpose of this project is to validate schema files, then the problem is how will I handle larger files that are filled with the content of like 10GB or more. I need to discuss this problem with my mentors that what is their expectation about the performance of the library.

Currently, I’m thinking about the solution to this problem. Maybe I can boost the performance by concurrently running multiple instances of a Schema Validator, Although it doesn’t matter how I implement it If the library is validating a 10GB file that it is definitely going to take a little amount of time.

Then there are also a few challenges regarding the implementation of the schema rules.

GSoC Interview: Laksh Singla on imjs and imtables upgrades

This is our blog series interviewing our 2019 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Laksh Singla, who will be working on upgrading imjs and imtables.

Hi Laksh! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

pic

Hi, I am excited to be a part of the team too!! I am a Computer Science undergraduate student studying at BITS Pilani, India. I will be entering my third year in August. I was originally passionate about web development, but after entering my sophomore year, I was exposed to a wide variety of fields in computer science, and hence my current interest is primarily divided between exploring new web technologies, understanding internals of computer systems and a little bit of data science (read above as I am confused :/ ).

I listen to rock music a lot and my favorite band at the moment (and maybe forever) is Queen. I used to play Basketball too but left it soon after entering college. I am constantly looking to diversify my interests.

What interested you about GSoC with InterMine?

After getting to know about open source, I was determined to actively take part in GSoC. One of the primary reasons why I was interested in InterMine was the friendly and helpful community of mentors and volunteers who enthusiastically answered all my doubts. Moreover, bioinformatics is a field that I have never explored and I thought it would be fun to gain some insight into it without getting much out of my comfort zone.

Tell us about the project you’re planning to do for InterMine this summer.

My project over this summer has multifold tasks, all towards a single goal – maintenance of the im-tables and imjs libraries. Following are the major tasks which I plan to complete over the summers:

  • Upgrade current dependencies of the libraries
  • Improving the test suite of imjs libraries
  • Updating current docs to be more newcomer friendly (user side for imjs, developer side for im-tables
  • Adding a few helper functions to query the intermine-registry data

Are there any challenges you anticipate for your project? How do you plan to overcome them?

One of the serious challenges that I will face would be fully upgrading dependencies of both of the libraries, as it has been a pretty long time since they were last updated and the Javascript/Web ecosystem moves fairly quickly. Mocha (for imjs) and CoffeeScript (for im-tables) on being upgraded broke the library. Although the errors encountered during upgrading Mocha were decent in number (approximately 200 total errors, 5-6 distinct errors), I was able to debug some of them down giving me a little bit of confidence that they could be overcome.

For CoffeeScript however, the whole grunt system has gone obsolete and the error messages are esoteric and non-informative. I am not certain that all of the dependencies for im-tables would be able to get updated, and might require a rehaul of the library, something that is not possible during the timeline stipulated by GSoC. If such a case occurs, I will make sure to create a doc highlighting issues faced, long term goals regarding those pending upgrades and hopefully vulnerabilities present in the old (i.e. currently used) versions of those libraries.

Share a meme or gif that represents your project

unnamed

GSoC Interview: Ankur Kumar on putting InterMine in the cloud

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Ankur Kumar, who will be working on the project “Intermine Cloud: Making Intermine cloud native and easing deployments”.

Hi Ankur! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Namaste everyone! I am a second-year undergraduate student at the Indian Institute of Engineering Science and Technology, Shibpur. I am pursuing a Bachelor’s degree in Mechanical Engineering. To properly introduce myself, honestly, It is always a hard thing to do for me. I do not associate myself with a single identity of a particular subject, a stream of study or profession. I do design bike frames, refrigeration systems and power generation plants. But I also code control algorithms for motors that power those bikes and path planning algorithms that are used by autonomous bikes and robots. I grow plants in controlled environment with help of various sensors and actuators to enhance their yield and study their response to different stresses and also connect those sensors to cloud as iot devices to do data analysis on collected data. I have huge interest in commerce, working of businesses and financial markets. I spend a good amount of my time learning about these things. This list is not exhaustive, But finally, as a mandatory disclaimer, I have not figured out everything yet, about the things that I just mentioned. I hope that one day I will and then I will move on to new projects. So, to put it in a poetic way, I am a curious explorer, who is ready to embark on any journey without even knowing the destination. As long as the journey has a lot of surprises to momentarily satisfy my curiosity. I know what are you thinking after reading this, Why and how you do all this? (Except that I am too ambitious, show off or just insane 😅) Well, I do not have a proper or detailed answer to these questions. I just keep trying to do things and they eventually happen. But, I have a better question for everyone instead of this one. Why not? It is too much fun to live this way. I promise!

What interested you about GSoC with InterMine?

I always wanted to work on a project that is at the intersection of computer science and biology. Both of these fields equally attract me. I had a really hard time choosing between them when I was filling my admission form for senior secondary. I eventually went for biology, if you are wondering. Intermine is a perfect place for me to explore both of these fields. But, this is not the most important thing that makes me choose Intermine. The most important thing is the people at Intermine. Intermine has an awesome and very friendly community. Mentors are very supportive and responsive. I had a great experience discussing the details of my project with mentors. Well, I can confidently say that my mentors are the best. If anyone thinks otherwise, I am ready for a debate!!

Tell us about the project you’re planning to do for InterMine this summer.

My project forms a part of larger efforts of Intermine team that will make Intermine more accessible to its users. More specifically, my project aims to create a service that offers managed intermine instances on the cloud. Also, the work done on my project will be used to create a cli tool that will ease the creation of intermine instances locally, using the same cloud technologies.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

The most important one is time. I have a long list of tasks that needs to be completed. Also, I need to coordinate with two other projects, which can be tricky. To overcome these challenges, I worked hard to come up with a very detailed timeline and design documentation. So, now my plan for the coding period is simple, while tasks remain, pick one task at a time, work hard on it, complete tasks on time and then party hard on weekends.

Share a meme or gif that represents your project

Replacing a lightbulb - Imgur

GSoC Student Interview spotlight: Single Sign-in For Intermine + Rahul Yadav

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Rahul Yadav, who will be working on the InterMine single sign-in project.

Hi Rahul! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi ! Excited to be on the team. I am a third year undergraduate student, pursuing my Bachelors of Technology in Computer Science from USICT (GGSIPU, Delhi). I love being in front of my laptop. I can certainly spend more time writing code than doing anything else, but Football and Basketball have always been an exception.

I have done many projects during my past academic year in order to utilise and explore my skill set. I have always loved contributing to open source because it is such a huge community of amazing developers who are always there to help you out.
Apart from this, I have worked on oauth2 implementation during my internship in last summer where I used Java to connect google services like G-Drive, Hangout and others with the company codebase. I was always fascinated by cloud services so I kept working on GCP, AWS, AZURE and etc frequently.

What interested you about GSoC with InterMine?

To be honest, I never thought i would get an opportunity to work with a community like InterMine. But, when I saw list of projects, it intrigued me and I found myself on this very interesting project, single sign in which the project requirements and the tech seemed very familiar to me and because of that I kept on digging about the project requirements and did lots of research on it, and with every minute spent on this, my interest escalated exponentially, and Eureka! I finally came up with solution which helped me to be a part of this amazing community.

Tell us about the project you’re planning to do for InterMine this summer.

In the current scenario, a user logs in the desired intermine and saves the results and the required data. The problem arises when the same user wants to access a different intermine, he/she will have to register again on this new mine and log in again. Currently, InterMine community does not have a single common sign-in mechanism and thus it is authenticating users with the help of tokens (temporary and permanent one) or using google service to log in. This project will modify the existing token mechanism by making the intermine as an OAuth2 provider with a single common Authorization server for all 30 mines so that user could access all the mines with the single set of credentials i.e just one time registration.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

This project is related to security and the most important part about it is, that it is all about user credentials which means a single wrong logic or step can expose our security, so implementing a fully secure system is a major challenge for this project.

I’m going to consider all the possible threats and vulnerabilities during the development phase of the system, and will focus on a lots of testing and debugging in search of any kind of loopholes, if so then fixing it before deployment.

Share a meme or gif that represents your project

 

 

STORM + InterMine: A partnership in the fight against cancer

In July 2018 Innovate UK awarded InterMine at the University of Cambridge and STORM Therapeutics a Knowledge Transfer Partnership (KTP). A KTP is a government program that helps businesses in the UK by linking them with an academic organisation — enabling them to bring in new skills and the latest academic thinking to deliver a specific, strategic innovation project.

The key objective of this particular project is to develop an analysis platform using the data warehouse InterMine to help STORM advance their cancer research.

Here we talk with Hendrik Weisser, Senior Bioinformatician at STORM, about this collaboration.

Can you tell me about this project?

Sure, my company (STORM) is partnering with InterMine in this project. We are going to develop a computational knowledge base for cancer drug discovery and RNA epigenetics, based on InterMine’s HumanMine database. We will extend InterMine by adding analysis tools, more biomedical data etc. to make it a bespoke platform to help us identify and validate drug targets.

Can you tell me more about STORM?

STORM Therapeutics is a drug discovery company focused on RNA epigenetics, developing small-molecule inhibitors of RNA-modifying enzymes for the treatment of cancer. We are a spin-out of Cambridge University, founded in 2015 by professors Eric Miska and Tony Kouzarides from the Gurdon Institute. You can find more information – and a cool animated video about RNA epigenetics – on our website, www.stormtherapeutics.com.

What do you hope to achieve?

For STORM, convenient access to available data on RNA-modifying enzymes, their roles in RNA epigenetics, and their associations to different cancers – both direct and via interaction partners – is vital for our efforts in target validation, indication prioritisation and patient stratification. A large amount of relevant data is publicly available but is scattered over many sources and not integrated, thus difficult and time-consuming to fully utilise. STORM’s vision is to develop an integrated database of relevant human biomedical data, that should enable our scientists to quickly view and interrogate the most pertinent data on target genes/proteins, but also allow us to easily perform bioinformatic analyses on these data.

What attracted you to InterMine? What makes InterMine a useful tool for drug discovery?

I found out about InterMine’s existence by chance and then quickly signed up to an InterMine training course at Cambridge University to learn more. I was impressed by the wealth of functionality offered by InterMine and by its sophisticated architecture that enables huge flexibility in dealing with different kinds of biological data. InterMine really represents the state of the art in terms of large-scale complex biomedical data integration. By focusing on extensibility and customisation and on enabling local installations, InterMine is able to serve a variety of research communities. These capabilities also make it an ideal fit for STORM’s requirements for an internal data management system that integrates diverse public data. The fact that InterMine is open-source, i.e. the code is and will stay available, is also important for us because it helps to ensure long-term maintainability.

—-

For more information see STORM’s website.

 

 

GSoC Student Interview spotlight: Natural Language to InterMine Queries + Jake Macneal

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Jake Macneal, who will be working on converting natural language phrases to InterMine PathQuery.

Hi Jake! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

jakeExcited to be joining the team! I’m an undergraduate studying computer engineering at McGill University in Montreal, about to enter my final semester in the fall. I’m originally from Philadelphia in the US, and I’ll be hopping around North America a little bit this summer (currently in Toronto). I’ve got a passion for robotics and artificial intelligence, which led to me joining my university’s robotics team to help design and build a Mars rover. Additionally, I’ve had the opportunity to intern at NASA Johnson Space Center in Texas, where I worked on a project which uses machine learning to track sensors around the space station (hopefully it’ll be put into use soon).

Aside from those technical interests, I enjoy soccer/football (both playing and watching), classical guitar, analog synthesizers (just getting into this but they’re really fun and fascinating), and the field of space exploration. Part of me is still holding on to the hope of becoming an astronaut some day.

What interested you about GSoC with InterMine?

I searched the GSoC organizations page for projects looking for a Clojure developer, and this was unsurprisingly one of the only ones. However, a language is hardly enough motivation to become passionate about a project. I’ve never had the chance to work in bioinformatics, but I did have a beloved computer science professor (Matthieu Blanchette) whose research was in that field, and he often spoke during lectures about his research. When I read through the organization and task descriptions I immediately thought of him, and knew that this would be a cool project to join. Nothing is more rewarding to me than the thought of using software as a tool to help others do good.

Tell us about the project you’re planning to do for InterMine this summer.

InterMine uses a graph query language (PathQuery) to retrieve information from the database. My project is to implement a more user-friendly alternative, allowing non-technical users to interact with an InterMine database without the need for esoteric queries crafted by an experienced programmer or system administrator. This will take the form of a natural language to PathQuery translation tool, written as a Clojure library. In addition, I’ll be building a proof-of-concept interface allowing novice users to submit English queries which will be translated and then submitted to the query engine. This simple app will be integrated with the InterMine web app, similar to the graphical query builder.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Natural language processing is a difficult field, and from working on a compiler, I learned that the key to building a correct parser and code generator is a huge number of test cases. Fortunately, the basic principle behind testing a translation tool is simple: assemble a set of English queries, along with the intended output (in the form of a PathQuery string). However, actually assembling such a set of tests which are useful and demonstrate realistic/important queries requires interacting with actual users in the community. I hope to spend much of my initial weeks working with the community to figure out the syntax they’d like to see supported, as well as the types of queries already being written in PathQuery.

Share a meme or gif that represents your project

jake-meme

GSoC Student Interview spotlight: Buzzbang Bioschemas search + Ankit Lohani

This is our blog series interviewing our 2018 Google Summer of Code students, who will be working remotely for InterMine for 3 months on a variety of projects. We’ve interviewed Ankit Kumar Lohani, who will be working on Buzzbang.

Hi Ankit! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hello InterMiners, I’m Ankit Lohani, a final year undergraduate student, Indian Institute of Technology, Kharagpur, India.  I will be completing my undergraduate studies in a few months with my major in Chemical Engineering.

Right from my first year, I have been interested in robotics and programming and my interest in this field has only grown with time. Initially, I spent over a year working on the hardware front and then shifted to making path planners for our soccer playing bots. Though my academic background has been completely different, it has only pushed me forward to work harder and to learn more. My interests are inclined towards natural language processing and information retrieval.

Apart from these, I love travelling and trekking. I am also planning to complete my 3rd trek this summer, this time above 14,000 feet.

What interested you about GSoC with InterMine?

I have never worked on an open source project and I realized that GSoC is the best place to start learning and seeing my stuff at work. Honestly, while looking for organisations, in which I may be able to contribute, I came across InterMine and the various projects enlisted here. The application domain of InterMine is very appealing and I could relate myself with this organisation because of two key reasons – first, my past internship was on information retrieval on clinicaltrials.gov data. I touched upon various topics like – semantics, ontologies, UMLS (Unified Medical Language System), PubMed, Named-Entity Recognition for biological terms etc. Secondly, because the technologies used in this project were something I have been familiar with as a part of my course and term projects, like Solr, elasticsearch, docker. Apart from these, the project itself has got a unique potential to create a breakthrough in the way complex scientific data is organised on the internet.

Tell us about the project you’re planning to do for InterMine this summer.

My project – Buzzbang, is significantly different from all other InterMine instances and it focuses on scraping all the data we have on internet marked with bioschemas.org and indexing them in a search tool – Apache Solr. So far, a basic scraping module and an indexing engine are up and running. I am planning to integrate “Scrapy” for crawling and indexing new paths and upgrading the Solr search tool in this project. Towards the end of this project, I will make sure all the changes are reflected in the front-end as well.
Are there any challenges you anticipate for your project? How do you plan to overcome them?

I believe there could be some serious challenges that I might face with Scrapy. Making a generalised scraping tool looks easy with the data having bioschemas.org markup, but, the organisation of this data on various domains varies, and crawling across some of those domains might not be a simple task. Moreover, we are also planning to introduce some degree of parallel processing to this module. Though my focus would be on EBI biosamples domain, which should make my task easier, I will try to keep the crawler as general and powerful as I can. Additionally, I suspect I would need some help in planning the architecture for the re-crawling and re-indexing part from the community. I am not very sure about what level of automation would be desirable in this project with respect to the previous point.