Open Source Data Science Master – The Plan
Design your own Open Source Data Science Master
Editor’s Note: This post was originally published on June 11th here and written by Fras and Sabine. They are a couple in their mid 20s living in Thailand and designing their own DIY Data Science Masters. Follow their journey here.
Free!! education platforms have put some of the world’s most prestigious courses online in the last few years. This is our plan to use these and create our own custom open source data science Master. Quickly before we begin though, in the spirit of openness we should explain where we are starting from:
We both have Physics degrees and are comfortable with maths, logic, algorithms and manipulating data. But perhaps most importantly we enjoy this type of work. The program we have designed does not require any pre-conceived knowledge in the topics below; however we do feel it is an advantage to come from a numerical background.
So what does it take to become a data scientist?
Statistics
Statistics is perhaps the start point of data science. For most questions in the world we have neither measured every phenomenon nor asked every person what they think, instead we have a small recorded subset of conversations and measurements. Statistics helps us understand what we can, and as importantly cannot reasonably learn from that smaller group.
Visualisation
“There is no such thing as information overload. There is only bad design.” Edward Tufte
There is no point having a story to tell but being unable to tell it. With the number of new visualisation tools springing up each year there is no excuse not to make your story beautiful and compelling. This involves elements of design and artistic principles, not things you pick up in your average Physics warehouse.
Programming
As far as I understand programming is making the computer do what you can’t be bothered or do not have a long enough life span to do yourself. Most analysis is crunched by programs and now most beautiful data visuals are drawn by them. Although we had done basic programming before: simple loops and stringing together conditional statements, we needed programming as the glue that tied everything else together. As a data scientist, you could probably get away with a certain level of R or python but you’d be reliant on your back-end developers to retrieve/ manipulate data and front end developers to showcase it for you ! Limiting huh ?
Data Manipulation
Having both worked as data scientists before leaving for Thailand, we quickly understood that the majority of Data Science is actually finding, cleaning and reformatting data. Although it doesn’t sound exciting, a thorough understanding of current data formats, querying databases and building interfaces for your data models will allow you to work well in a team and more importantly actually leave the office on time!
Machine Learning & Algorithms
This topic is broad, with a masters worth of active research in numerous fields and areas. However it is also where our motivation to become data scientists came from: building self driving cars, identifying people based on their ear lobes 🙂. The common process these all share is the ability to teach a computer to understand patterns like the human brain, whether it is oncoming traffic or youtube cats. We will be studying some of the current most common tools for uncovering patterns, but it is an active field and a life long learning.
The Plan
This is our study: it is divided into two (circa 2 months each) terms and contains the courses we have found to be most suitable for learning each of the above topics. NB: It also contains some time to work on projects which apply what we are learning to mimic the structure of a Masters that would include a final project.
Pre – Requisites & Pre – Read
We completed the courses below whilst still in our corporate jobs, in the evenings and weekends, before fully embarking on this journey and hence will consider these courses as pre-requisites because some of the new courses build upon them.
– Programming Methodology: CS106A
Also, as we didn’t have previous Javascript experience, and needed some concepts for the visualisation course, we set reading Eloquent Javascript as a pre-read.
Term 1
Mon | Tue | Wed | Thu | Fri | Sat | |
9 – 10 | NLP | CS106B | NLP | CS106B | CS106B | |
10 – 11 | NLP | CS106B | NLP | CS106B | CS106B | Spanish1 |
11 – 12 | NLP | CS106B | NLP | CS106B | Stats Work | Spanish |
12 – 13 | Lunch | Lunch | Lunch | Lunch | Lunch | Lunch |
13 – 14 | NLP | Stats | NLP | Stats | Vis | |
14 – 15 | DB | Stats | DB | Stats | Vis | Catch-up2 |
15 – 16 | DB | Stats Work | DB | Stats Work | DB | Catch-up |
16 – 17 | CS106B | Vis | CS106B | Vis | DB | |
17 – 18 | CS106B | Vis | CS106B | Vis |
Term 2
Mon | Tue | Wed | Thu | Fri | Sat | Sun | |
9 – 10 | CS1693 | Ruby | CS169 | Project4 | CS169 | Project | |
10 – 11 | CS169 | Ruby | CS169 | Project | CS169 | Project | Spanish |
11 – 12 | CS169 | RoR | CS169 | Project | CS169 | Project | Spanish |
12 – 13 | Lunch | RoR | Lunch | Project | Lunch | Project | Lunch |
13 – 14 | CS169 | Lunch | RoR | Project | Lunch | Project | |
14 – 15 | API | Web Dev5 | API | Project | API | Project | |
15 – 16 | Choice6 | CS169 | Choice | Project | Choice | Project | |
16 – 17 | Choice | CS169 | Choice | Project | Choice | Project | |
17 – 18 | Choice | CS169 | Choice | Project | Choice | Project |
1 Yep, we thought it would be fun and a useful skill to learn Spanish!
2 Whatever work needs catching up on as and if we fall behind…
3 Part 2 of CS169 is also running on EDX.
4 Project work hasn’t yet been defined as we hope ideas will finalise as we work through the programme.
5 This hour will comprise a mix of learning skills around web development such as Twitter bootstrap.
6 This course will be a choice of the following or another that we decide nearer the time, as our skills and interests get more defined:
– Programming Paradigms CS106C
– Probability and Random Variables