Data Processing and Manipulation (Coursera)

Data Processing and Manipulation (Coursera)

The "Data Processing and Manipulation" course provides students with a comprehensive understanding of various data processing and manipulation concepts and tools. Participants will learn how to handle missing values, detect outliers, perform sampling and dimension reduction, apply scaling and discretization techniques, and explore data cube and pivot table operations. This course equips students with essential skills for efficiently preparing and transforming data for analysis and decision-making.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Learning Objectives:

  1. Understand the importance of data processing and manipulation in the data analysis pipeline.
  2. Learn techniques to handle missing values in datasets, including imputation and exclusion strategies.
  3. Identify and detect outliers to assess their impact on data analysis and decision-making.
  4. Explore sampling methods and dimension reduction techniques for large datasets and high-dimensional data.
  5. Apply data scaling techniques to normalize and standardize variables for meaningful comparisons.
  6. Utilize discretization to transform continuous data into categorical representations, simplifying analysis.
  7. Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.
  8. Create pivot tables to summarize and reshape data, gaining valuable insights from complex datasets.

Throughout the course, students will actively engage in practical exercises and projects, allowing them to apply data processing and manipulation techniques to real-world datasets. By the end of the course, participants will be well-equipped to effectively prepare, clean, and transform data for subsequent analysis tasks and data-driven decision-making.
This course is part of the Data Wrangling with Python Specialization.

What you'll learn

  • Understand the importance of data processing and manipulation in the data analysis pipeline.
  • Learn techniques to handle missing values and outliers, data reduction, and data scaling and discretization.
  • Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.

Syllabus

Missing Values and Outliers
Module 1
The "Missing Values and Outliers" week focuses on how to handle missing values and detect outliers using the Pandas library. You will learn essential techniques to identify and address missing data effectively, as well as methods to detect and manage outliers in datasets.

Data Reduction
Module 2
The "Data Reduction" week focuses on how to reduce data through sampling and dimensionality reduction using the Pandas library. You will learn essential techniques to obtain manageable subsets of data while preserving meaningful information for analysis and visualization.

Scaling and Discretization
Module 3
The "Scaling and Discretization" week focuses on the importance of data scaling and discretization in the data preprocessing process. You will learn why and how to perform data scaling to normalize variables and handle data with different scales. Additionally, you will explore the concept of data discretization and its application in transforming continuous data into categorical representations.

Data Warehouse
Module 4
The "Data Warehouse" week focuses on the concepts and methodologies of organizing data using data cubes and pivot tables in Pandas. You will learn the importance of data warehousing for efficient data management and analysis, as well as how to construct data cubes and pivot tables to facilitate multidimensional data exploration.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Data Analysis and Visualization (Coursera) Coursera
University at Buffalo,The State University of New York

Data Analysis and Visualization (Coursera)

By the end of this course, learners are provided a high-level overview of data analysis and visualization tools, and are prepared to discuss best practices and develop an ensuing action plan that addresses key discoveries. It begins with common hurdles that obstruct adoption of a data-driven culture before introducing data analysis tools (R software, Minitab, MATLAB, and Python). Deeper examination is spent on statistical process control (SPC), which is a method for studying variation over time. The course also addresses do’s and don’ts of presenting data visually, visualization software (Tableau, Excel, Power BI), and creating a data story.

Jun 8th 2026
4 Weeks
Crash Course on Python (Coursera) Coursera
Google

Crash Course on Python (Coursera)

This course is designed to teach you the foundations in order to write simple programs in Python using the most common structures. No previous exposure to programming is needed. By the end of this course, you'll understand the benefits of programming in IT roles; be able to write simple programs using Python; figure out how the building blocks of programming fit together; and combine all of this knowledge to solve a complex programming problem.

Jun 9th 2026
5-12 Weeks
Basic Data Processing and Visualization (Coursera) Coursera
University of California, San Diego

Basic Data Processing and Visualization (Coursera)

This is the first course in the four-course specialization Python Data Products for Predictive Analytics, introducing the basics of reading and manipulating datasets in Python. In this course, you will learn what a data product is and go through several Python libraries to perform data retrieval, processing, and visualization.

Jun 8th 2026
5-12 Weeks
Identifying Patient Populations (Coursera) Coursera
University of Colorado System

Identifying Patient Populations (Coursera)

This course teaches you the fundamentals of computational phenotyping, a biomedical informatics method for identifying patient populations. In this course you will learn how different clinical data types perform when trying to identify patients with a particular disease or trait. You will also learn how to program different data manipulations and combinations to increase the complexity and improve the performance of your algorithms.

Jun 8th 2026
5-12 Weeks
Introduction to Python Programming (Coursera) Coursera
University of Pennsylvania

Introduction to Python Programming (Coursera)

This course provides an introduction to programming and the Python language. Students are introduced to core programming concepts like data structures, conditionals, loops, variables, and functions. This course includes an overview of the various tools available for writing and running Python, and gets students coding quickly. It also provides hands-on coding exercises using commonly used data structures, writing custom functions, and reading and writing to files.

Jun 8th 2026
4 Weeks
Business Intelligence Concepts, Tools, and Applications (Coursera) Coursera
University of Colorado System

Business Intelligence Concepts, Tools, and Applications (Coursera)

This is the fourth course in the Data Warehouse for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will gain the knowledge and skills for using data warehouses for business intelligence purposes and for working as a business intelligence developer. You’ll have the opportunity to work with large data sets in a data warehouse environment and will learn the use of MicroStrategy's Online Analytical Processing (OLAP) and Visualization capabilities to create visualizations and dashboards.

Jun 8th 2026
5-12 Weeks
Python for Data Science, AI & Development (Coursera) Coursera
IBM

Python for Data Science, AI & Development (Coursera)

Kickstart your learning of Python for data science, as well as programming in general, with this beginner-friendly introduction to Python. Python is one of the world’s most popular programming languages, and there has never been greater demand for professionals with the ability to apply Python fundamentals to drive business solutions across industries.

Jun 9th 2026
5-12 Weeks
Fitting Statistical Models to Data with Python (Coursera) Coursera
University of Michigan

Fitting Statistical Models to Data with Python (Coursera)

In this course, we will expand our exploration of statistical inference techniques by focusing on the science and art of fitting statistical models to data. We will build on the concepts presented in the Statistical Inference course (Course 2) to emphasize the importance of connecting research questions to our data analysis methods. We will also focus on various modeling objectives, including making inference about relationships between variables and generating predictions for future observations.

Jun 8th 2026
4 Weeks
Data Warehouse Concepts, Design, and Data Integration (Coursera) Coursera
University of Colorado System

Data Warehouse Concepts, Design, and Data Integration (Coursera)

This is the second course in the Data Warehousing for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will learn exciting concepts and skills for designing data warehouses and creating data integration workflows. These are fundamental skills for data warehouse developers and administrators. You will have hands-on experience for data warehouse design and use open source products for manipulating pivot tables and creating data integration workflows.

Jun 8th 2026
5-12 Weeks
Introduction to Data Science in Python (Coursera) Coursera
University of Michigan

Introduction to Data Science in Python (Coursera)

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.

Jun 8th 2026
4 Weeks
Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 8th 2026
5-12 Weeks