Data Science is a highly developing field with a steady upslope of data scientists demand. The increase in job openings for data scientists over the past year is 56%, according to LinkedIn. There are more and more people who want to start their career in Data Science or those who plan to use some Data Science techniques in their work. The question then arises for them, “Where to start learning Data Science?” There is no simple answer to this question. Data Science is a complex multi-disciplinary field. It employs techniques and theories from statistics, multivariable calculus, linear algebra, and Machine Learning. Data scientist needs good knowledge in fields mentioned above and strong programming and data visualization skills. There are many offline and online university programs for those who want to gain a degree in Data Science. In this article, we will consider the case of a person who already has enough background in math, statistics, programming and focus on online resources specifically for Data Science.
The basic concepts and techniques of Data Science could be learned in different ways, but, in general, it is better to use a resource that gives complete picture of the subject, as MOOCS. E-books are also very useful in understanding the basic concepts of Data Science. Usually, books open the subject deeper, but less widely than MOOCS. So, in my opinion, the best way to start is to find a MOOC or e-book that correspond your level of the requirement skills for Data Science mentioned above. Here we listed some MOOC platforms, courses and e-books that can be helpful for the beginner.
· Machine Learning by Stanford University. This is extremely popular (2M+ enrolled participants) and high rated (4.9) course. It includes multiple case studies and applications.
· Applied Data Science with Python Specialization by the University of Michigan. This program contains 5 courses which are good introduction to Data Science through the Python programming language.
· Statistics and Data Science by Massachusetts Institute of Technology. The advanced and graduate-level program include probability and statistics courses that makes it well-rounded curriculum of the Data Science.
· Microsoft Professional Program in Data Science by Microsoft. This program is focused mostly on learning key Data Science tools and widely used programming languages.
· Python for Data Science and Machine Learning Bootcamp. This course is a good way to learn how to use NumPy, Pandas, Seaborn, Matplotlib, Scikit-Learn, and other tools used in Data Science.
This list of courses is incomplete and many new courses of appear each year. Additionally, you can search for courses on other platforms as DataCamp, fast.ai, Udacity. The last observation, Coursera and edX platforms propose courses which usually are better for theory and foundational material. That is not surprising since they are made with the participation of the leading universities. Udemy courses are generally better for more applied learning material.
- Python Data Science Handbook and Python for Data Analysis. The core libraries for working with data in Python are introduced in these books. It is important to mention here that O’Reilly Media has published many wonderful books on Data Science.
- Understanding Machine Learning: From Theory to Algorithms. This book helps to understand deeper machine learning algorithms.
Beginners and advanced data scientists regularly need detailed explanations for some functionality of the Data Science tools which can be found in the documentation of these tools, as scikit-learn, Python, NymPy, pandas, matplotlib, IPython, NLTK. And of course, the additional widely used source for solving mistakes, looking for ideas is Stack Overflow (Cross Validated will be also helpful).
Another important part along with techniques and tools of Data Science actually is data. In particular, we mean datasets that could be used for learning Data Science. There are several standard datasets which are mostly clean, small enough to fit into memory and review in a spreadsheet, and wonderful for a demonstration of a new learning technique. Most of the sources listed above use datasets such as Wine Quality Dataset, Iris Flowers Dataset, Banknote Dataset and some others. The problem is that using these standard datasets you cannot really feel the efficiency of your technique when the complexity, size, noisiness of data is scaled up. And also, these datasets are extremely boring. So as soon as you start learning process there will be a problem to test your knowledge on datasets from real life. So, where you can find such datasets?
There are plenty of data sources available online. Most of the governments give free access to some of their data: European government datasets, US Gov Data, Indian Government Dataset and many others. It is possible to find datasets of some companies such as Amazon, Google, Microsoft, etc. One of the most wonderful places where there are plenty of real-life datasets is Kaggle. This platform has a big community where you can discuss about data, there you can find some public code with algorithms that solve the prediction problem in specific dataset.
Many data scientists believe that the fastest way to learn Data Science is by working on competitions. Data competition is a great way to learn best practices and gather feedback of your work. Data competition platforms as Kaggle, DrivenData, CodaLab, CrowdANALYTICX, etc. are also sources of datasets and professional community.
It is also worth noting some useful online platforms where you can find articles about data science, code examples, Data Science news: KDnuggets, Quora, Reddit, Data Science Central, Dataconomy, and much more others on Twitter, Facebook, YouTube, etc.
In conclusion, there are different job types for data scientists according to their roles in a data science team: Data Analyst, Data Scientist, Data Engineer, Machine Learning Engineer. Their learning paths is slightly different, but the starting point described above is common for all of the job types.