GSoC Interview: InterMine Schema Validator with Deepak Kumar

This is our blog series interviewing our 2019 Google Summer of Code students, who working remotely for InterMine for 3 months on on a variety of projects. We’ve interviewed Deepak Kumar, who will be working on the InterMine Schema Validator.

Hi Deepak! We’re really excited to have you on board as part of the team this summer. Can you introduce yourself?

Hi, Thank you for this opportunity, Let me first talk about myself, My name is Deepak Kumar, I live in Ahmedabad, India with my family. I started coding when I was in 17, I had two great teachers in my school days who introduced me to computer programming, and from that time I got interested in this field.

I completed my graduation in Computer Applications from St. Xavier’s College, Ahmedabad and currently I’m doing  Post Graduate Program MSC.IT(Information Technology) at DA-IICT, Gandhinagar, India.

Now talking about my technical details, I love working on challenging projects, I’ve worked on several projects, One of my favourite project that I created while pursuing my bachelors was ‘Smallscript’, It’s a compiled programming language that compiles to bytecode and runs on JVM that makes it platform-independent. It’s my favourite project because It was challenging and when I started with the project I didn’t know any technical detail about compilers, so I had to start from very scratch.

I’ve also worked with a startup company, where I worked as a backend-developer with a team of 8 people and our team was really fantastic, I worked on two projects there, and I really enjoyed it, working with a big team wonderful experience.

I’ve recently started my open source journey with GSoC 2019. Though I’m new to open source, I’ve started contributing to ‘JabRef’ and as I’m selected for GSoC 2019, I’m also going to work with Intermine this summer, and have future plan to contribute to Intermine after completion of GSOC. I also regularly participate in coding contests and hackathon, In one of the AI contest, I built an AI game that ranked 68 among thousands of participants.

Currently, I’m working at OpenXcell Technolabs as an Intern, which is part of my MSC.IT Master’s program. I love reading, travelling, table-tennis and working with new technologies.

What interested you about GSoC with InterMine?

When GSoC 2019 was about to start, I had already bookmarked a few of the previous year organizations I was interested in, and hoping that Intermine will be part of GSoC 2019 too. When the organization list came out, I was super excited to see Intermine in the list. After going through the Intemine’s idea list, I found myself very interested in ‘Intemine Schema Validator Project’, So it was really the Intermine’s project that made me interested in the community.

Tell us about the project you’re planning to do for InterMine this summer.

I’ll be working on a project named ‘Schema Validator’ for Intermine this summer. Well, the project is quite simple to explain, it’s going to be a library that takes a file as input and outputs whether that file is following a particular schema or not. While working on the project my goal from the first day would be to create this project as general as possible, so that the project can be easily extended to support other schemas as well.

Are there any challenges you anticipate for your project? How do you plan to overcome them?

Yes, there are few challenges that I will face while working on this project, One of the biggest challenges which I’m currently trying to solve is about performance. As the purpose of this project is to validate schema files, then the problem is how will I handle larger files that are filled with the content of like 10GB or more. I need to discuss this problem with my mentors that what is their expectation about the performance of the library.

Currently, I’m thinking about the solution to this problem. Maybe I can boost the performance by concurrently running multiple instances of a Schema Validator, Although it doesn’t matter how I implement it If the library is validating a 10GB file that it is definitely going to take a little amount of time.

Then there are also a few challenges regarding the implementation of the schema rules.