MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
Learning Objectives:
1. Understand the importance of data processing and manipulation in the data analysis pipeline.
2. Learn techniques to handle missing values in datasets, including imputation and exclusion strategies.
3. Identify and detect outliers to assess their impact on data analysis and decision-making.
4. Explore sampling methods and dimension reduction techniques for large datasets and high-dimensional data.
5. Apply data scaling techniques to normalize and standardize variables for meaningful comparisons.
6. Utilize discretization to transform continuous data into categorical representations, simplifying analysis.
7. Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.
8. Create pivot tables to summarize and reshape data, gaining valuable insights from complex datasets.
Throughout the course, students will actively engage in practical exercises and projects, allowing them to apply data processing and manipulation techniques to real-world datasets. By the end of the course, participants will be well-equipped to effectively prepare, clean, and transform data for subsequent analysis tasks and data-driven decision-making.
This course is part of the Data Wrangling with Python Specialization.
What you'll learn
- Understand the importance of data processing and manipulation in the data analysis pipeline.
- Learn techniques to handle missing values and outliers, data reduction, and data scaling and discretization.
- Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.
Syllabus
Missing Values and Outliers
Module 1
The "Missing Values and Outliers" week focuses on how to handle missing values and detect outliers using the Pandas library. You will learn essential techniques to identify and address missing data effectively, as well as methods to detect and manage outliers in datasets.
Data Reduction
Module 2
The "Data Reduction" week focuses on how to reduce data through sampling and dimensionality reduction using the Pandas library. You will learn essential techniques to obtain manageable subsets of data while preserving meaningful information for analysis and visualization.
Scaling and Discretization
Module 3
The "Scaling and Discretization" week focuses on the importance of data scaling and discretization in the data preprocessing process. You will learn why and how to perform data scaling to normalize variables and handle data with different scales. Additionally, you will explore the concept of data discretization and its application in transforming continuous data into categorical representations.
Data Warehouse
Module 4
The "Data Warehouse" week focuses on the concepts and methodologies of organizing data using data cubes and pivot tables in Pandas. You will learn the importance of data warehousing for efficient data management and analysis, as well as how to construct data cubes and pivot tables to facilitate multidimensional data exploration.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.