"Basics of Data Science" is designed to provide participants with a comprehensive overview of the fundamental challenges, concepts and tools of data science. The content can be organized in three main areas of data science:
Initially, a brief overview is given to data science infrastructure concerned with volume and velocity. Topics include instrumentation, big data infrastructures and distributed systems, databases and data management. The main challenge is to make things scalable and instant.
The main focus of the course is on data analysis concerned with extracting knowledge from data. Key topics covered are data exploration and visualization, data preprocessing, data quality issues and transformations, various supervised learning techniques with a focus on their evaluation, unsupervised learning, clustering, pattern mining, process mining and text mining. The main challenge of data analysis is to provide answers to known and unknown unknowns.
Finally, data science affects people, organizations, and society. The course is concluded by discussing challenges and providing guidelines and techniques to apply data science techniques responsibly with a focus on confidentiality and fairness. Topics include ethics & privacy, IT law, human-technology interaction, operations management, business models, entrepreneurship, and the main challenge is to do all of the above in a responsible manner.
Throughout the course, the ideas and concepts conveyed in the videos are complemented by hands-on exercises using Python (Jupyter notebooks). Participants will be guided to apply the presented techniques on artificial and real-life data sets to gain valuable hands-on experience.
After the course participants should have a good overview of the best practices, challenges, goals and concepts of the broader data science field, providing a strong foundation for further study or professional development in this rapidly evolving field. Through the combination with hands-on experience with commonly used Python Libraries, participants will be able to conceptualize and implement various basic data analysis techniques in their own projects and accurately evaluate and interpret analysis results.