Internship and new members

If you are a new intern into the lab, you should consider following these general guidelines. These are based on previous internship experience and supervision.

Python programming

New interns must know how to program in Python language. Python programming language is mandatory in machine learning. You should also be familiar with common scientific libraries such as Numpy, Scipy, Pandas and Scikit-Learn. Common design patterns and object oriented programming concepts should be known by the intern.

If you feel you don’t comply with these basic programming skills, following a crash course on either of these subjects will definitively be a good thing and will help you a lot in the near future. There are also a few books which you can read for getting familiar with applied machine learning with Python. Don’t try to skip these steps and go directly into your project. Basic programming concepts and skills like using libraries and using library documentation should be mastered prior working on your project, especially in applied machine learning.

Getting familiar with in-house libraries

If you are now confident with your programming skills and software engineering concepts, you are now ready to explore the libraries we developed in the lab. These are Kerosene and SAMITorch. Documentation of both library is still a work in progress, but you should be able to use them quickly. While Kerosene is a PyTorch wrapper for accelerating research code development, SAMITorch is more applied to medical imaging using Numpy and PyTorch Tensors. Knowing how to use PyTorch library is now mandatory once you get to this step.

Your code will likely use Kerosene and SAMITorch. Knowing how to work with these Python libraries will ease and accelerate your work, will make you avoid mistakes and will give you all the tools to succeed in your project. You can also participate to their development by proposing and implementing new features by creating a Github Issue or submitting a Pull Request. We would like all members to participate in the development of these libraries and use them in their research project. These libraries lie on solid software engineering concepts and are built with strong software quality in mind. They also force the user to code more intelligently and more efficiently to achieve better results sooner.

Don’t skip steps and do the right things

Don’t always focus only on getting your work published. By doing so, you will likely fail at understanding and mastering basic machine learning concepts and software engineering skills. Maybe you will code fast, but you will also make your work practically unusable by other members of the lab and you will possibly do a lot of mistakes. This isolate the lab member with its project, rendering it impossible to troubleshoot and debug by others. This may lead to take a lot of time for another member to take over your work. This render other members inefficient at completing work when you are leaving the lab, especially if your work is near of being published. This could be avoided if you use our lab’s software stack and mastering basic programming skills. Coding with the developed libraries (Kerosene and SAMITorch) known by other members creates a coding standard in the lab, helping every other member understanding quickly your code and debug way more easily in case you need help. We also eventually want these libraries be released and used in other research labs of ETS and this begins by using and promoting the libs.

Skipping steps in your personal knowledge of machine learning is also risky. To master your project, you must master your expertise.

Starting a project during your first week of internship might not be the best idea. Your understanding of your project and problematic might not be enough at this point to dive directly into your project. Take the necessary time to do a literature review and to master the concepts underneath the problem you are aiming to solve. Instead of coding right away in your personal project, writing small code modules into SAMITorch or Kerosene that you know you are going to need in the future is a better starting point. This code is going to be reviewed by the libraries’ authors and will for sure require some adjustments, which are integral parts of your learning curve and adaptation. You will also have to write unit and/or integration tests when supplying your code. Writing these tests will ensure you don’t make mistakes at the very beginning of your work, which could potentially save you an enormous amount of time all along your internship/master/Ph.D degree.

Also, always keep in mind you must work on something useful. Always ask you the questions “Is what I’m doing useful for my personal research project?” or “Is what I’m doing useful for the lab?”. If you answer no to these two questions, you need to adjust yourself.

It will take time

Writing clean code and tests obviously take more time than not caring about those points. Software quality is usually directly proportional to the amount of time you took at cleaning your code and writing relevant tests for it. You will feel you are not moving forward enough quickly for teachers or not delivering things into expect time frame. This is totally normal and you should not worry about this. Teacher always underestimate the effort you spend during the coding phase of your project. Sometimes many trial and errors has to be done before achieving a publishable grade work. These try naturally takes time and this is totally normal. You need to convince yourself that the time you spend on testing and clean coding will potentially save you a very large amount of time in the future and will permit all the other members to help you if needed.