Throughout the course, students will work on their data wrangling project, applying the knowledge and skills gained in each module to achieve a refined and well-prepared dataset. By the end of the course, participants will be proficient in the data wrangling process and ready to tackle real-world data challenges in diverse domains.
This course is part of the Data Wrangling with Python Specialization.
What you'll learn
- Initiate and conduct a data wrangling project from raw data to a refined dataset for analysis.
- Apply data wrangling techniques learned in the specialization to handle real-life data scenarios.
- Utilize Python libraries and tools effectively for data wrangling tasks. Communicate and present data wrangling results effectively to stakeholders.
Data Wrangling Pipeline
In this introductory week, you will gain an understanding of the data wrangling pipeline, which serves as a structured approach to transform raw data into a cleaned and organized dataset for analysis. You will learn the key stages involved in the pipeline, setting the foundation for the rest of the course.
Identify Your Data
In this week, you will learn how to identify and define the scope and objectives of your data wrangling project. You will explore various data sources, understand their structure, and assess the suitability of each source for the project.
Data Collection and Integration
This week covers the data collection and integration stage of the data wrangling process. You will learn techniques for data collection, validate the collected data, and integrate data from multiple sources.
Data Understanding and Visualization
This week focuses on gaining a comprehensive understanding of the dataset through statistical analysis and data visualization. You will learn how to perform descriptive statistics, create informative visualizations, and conduct exploratory data analysis (EDA).
Data Processing and Manipulation
In this week, you will delve into essential data processing and manipulation techniques. You will learn how to handle missing values, detect and handle outliers, perform data sampling and dimensionality reduction, apply data scaling and discretization, and explore data cubes and pivot tables.