Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Intermediate Data Analysis Techniques with Pandas

Packt via Coursera

Overview

This Pandas course focuses on mastering DataFrame functionalities, starting with in-depth comparisons between Series and DataFrame methods. You'll learn essential skills such as selecting columns, adding data, and utilizing methods like value_counts and fillna for effective data cleaning. Advanced topics include filtering data, optimizing memory usage, handling missing values, and managing MultiIndex and text data. By exploring techniques for merging and concatenating DataFrames, you'll gain proficiency in handling complex data analysis tasks. This course is tailored for data analysts, scientists, and professionals seeking to enhance their Pandas skills for practical applications and real-world data challenges.

Syllabus

  • DataFrames I: Introduction
    • In this module, we will explore the foundational concepts of working with DataFrames in Pandas, starting with a comparison of Series and DataFrame methods and attributes. You will learn to select and manipulate both single and multiple columns, and add new columns to your DataFrames. We will cover the use of value_counts for column analysis and strategies for handling missing values. Additionally, you'll master data type conversions using the astype method, sorting DataFrames with sort_values and sort_index, and ranking values within columns using the rank method.
  • DataFrames II: Filtering Data
    • In this module, we will dive into filtering data within DataFrames. You'll be introduced to the dataset and learn memory optimization techniques. We will cover filtering rows based on conditions and using logical operators like AND (&) and OR (|). Advanced filtering methods such as isin, isnull, and notnull will be explored. You'll also learn to filter data within a range using the between method, identify and handle duplicates with duplicated and drop_duplicates, and find and count unique values using unique and nunique methods.
  • DataFrames III: Data Extraction
    • In this module, we will explore essential data extraction techniques in Pandas. You'll start with an introduction to the dataset and learn to set and reset indices using set_index and reset_index methods. We will cover retrieving rows by index positions with iloc and by labels with loc, and understand the second arguments for precise data retrieval. You'll learn to overwrite individual and multiple values, rename index labels or columns, and delete rows or columns. Advanced extraction techniques like sampling with the sample method, extracting specific rows with nsmallest and nlargest, conditional filtering with where, and executing functions across DataFrame rows or columns with apply, will also be covered.
  • Working with Text Data
    • In this module, we will focus on working with text data in Pandas. You'll start with an introduction to the dataset and learn to use common string methods for text data manipulation. We will cover filtering DataFrame rows using string methods and applying these methods to DataFrame indices and columns. You'll master the split method to divide text data into multiple parts and enhance your skills with additional practice exercises. Finally, you'll learn to customize text splitting using the expand and n parameters of the split method for more detailed analysis.
  • MultiIndex
    • In this module, we will explore the advanced capabilities of MultiIndex in Pandas, starting with an introduction to its concepts. You'll learn to create and manage MultiIndex DataFrames for complex data grouping and analysis. We will cover techniques to extract and rename index level values for clarity, and how to sort and extract specific rows for better data organization. Additionally, you'll master methods like transpose, stack, and unstack to reshape DataFrames, and apply pivot, melt, and pivot_table methods to reorganize and transform data efficiently.
  • GroupBy
    • In this module, we will delve into the GroupBy functionality in Pandas, starting with an introduction to its essential concepts for data aggregation. You'll learn to use the groupby method to group data and retrieve specific groups with the get_group method. We will explore various aggregation methods available on GroupBy objects and cover techniques for grouping data by multiple columns. Additionally, you'll master the agg method to apply multiple operations on grouped data and learn to iterate through groups for individual data processing.
  • Merging DataFrames
    • In this module, we will explore essential techniques for merging DataFrames in Pandas. You'll begin with an introduction to various merging methods, followed by a detailed look at using the pd.concat function to concatenate DataFrames along a specified axis. We will cover left joins and the use of left_on and right_on parameters for specific column matching, as well as inner joins to combine DataFrames based on intersecting keys. Additionally, you'll learn about full-outer joins to merge DataFrames including all keys from both frames, and how to merge by indexes using left_index and right_index parameters. Finally, you'll be introduced to the join method as a simpler alternative for merging DataFrames.

Taught by

Packt - Course Instructors

Reviews

Start your review of Intermediate Data Analysis Techniques with Pandas

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.