Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

University of Colorado Boulder

Fundamental Tools of Data Wrangling

University of Colorado Boulder via Coursera

Overview

Data wrangling is a crucial step in the data analysis process, as it involves the transformation and preparation of raw data into a suitable format for analysis. The "Fundamental Tools for Data Wrangling" course is designed to provide participants with essential skills and knowledge to effectively manipulate, clean, and analyze data. Participants will be introduced to the fundamental tools commonly used in data wrangling, including Python, data structures, NumPy, and pandas. Through hands-on exercises and practical examples, participants will gain the necessary proficiency to work with various data formats and effectively prepare data for analysis. In this course, participants will dive into the world of data manipulation using Python as the primary programming language. They will learn about data structures, such as lists, dictionaries, and arrays, and how to use them to store and organize different types of data. Furthermore, participants will explore the power of Python packages like random and math for generating and performing mathematical operations on data. They will also be introduced to NumPy, a powerful library for numerical computing, and learn how to efficiently work with multi-dimensional arrays and matrices. A significant focus of the course will be on pandas, a versatile library for data manipulation and analysis. Participants will discover various techniques to clean, reshape, and aggregate data using pandas, enabling them to derive valuable insights from messy datasets.

Syllabus

  • Python
    • This week provides an introduction to the Python programming language, covering fundamental concepts and practical applications. You will gain a solid understanding of Python's syntax and semantics, enabling you to write efficient and concise code. We will also cover essential topics such as basic variables and operations, flow control structures, functions, and the utilization of external packages to enhance Python's capabilities.
  • Data Structures
    • The "Data Structures" week provides you with a comprehensive understanding of commonly used data structures for efficient organization and manipulation of data. You will explore various data structures, including strings, lists, sets, and dictionaries. Through theoretical explanations and practical examples, you will grasp the advantages of using each data structure and learn the fundamental operations associated with them.
  • Numpy
    • The "NumPy" week serves as an introduction to the fundamental concepts and practical applications of NumPy, a powerful library for numerical computing in Python. You will gain insights into the advantages of utilizing NumPy for efficient data manipulation and mathematical operations. The week will cover the underlying data structure of NumPy arrays and guide students through basic array operations, including accessing and manipulation. Moreover, you will delve into advanced operations, such as masking and filtering, to perform complex data manipulations effectively.
  • Pandas
    • The "Pandas" week provides you with a comprehensive introduction to Pandas, a powerful and widely used library for data manipulation and analysis in Python. You will explore the advantages of using Pandas for handling structured data efficiently. The week will cover the underlying data structure of Pandas, namely DataFrames and Series, and guide you through basic data operations, including accessing and manipulation. Moreover, you will delve into advanced data manipulations, such as masking, filtering, aggregating, pivot tables, and more, to effectively analyze and transform datasets.
  • Case Study
    • The "Case Study" week offers you the opportunity to apply the knowledge you have gained throughout the course in a practical simulation case study. Through hands-on exercises and real-world scenarios, you will use Python and relevant packages to create a dummy dataset, mimicking a real dataset they might encounter in data analysis or scientific research. Throughout the case study, you will face challenges commonly encountered in real-world data analysis and will be encouraged to employ critical thinking and problem-solving skills to overcome them. This practical exercise will not only consolidate their understanding of Python and relevant packages but also foster a deeper appreciation for the importance of data preparation and analysis in various domains.

Taught by

Di Wu

Reviews

4.7 rating at Coursera based on 15 ratings

Start your review of Fundamental Tools of Data Wrangling

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.