Coursera

Data Processing and Manipulation (Coursera)

Offered by University of Colorado Boulder,

The "Data Processing and Manipulation" course provides students with a comprehensive understanding of various data processing and manipulation concepts and tools. Participants will learn how to handle missing values, detect outliers, perform sampling and dimension reduction, apply scaling and discretization techniques, and explore data cube and pivot table operations. This course equips students with essential skills for efficiently preparing and transforming data for analysis and decision-making.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Learning Objectives:

Understand the importance of data processing and manipulation in the data analysis pipeline.
Learn techniques to handle missing values in datasets, including imputation and exclusion strategies.
Identify and detect outliers to assess their impact on data analysis and decision-making.
Explore sampling methods and dimension reduction techniques for large datasets and high-dimensional data.
Apply data scaling techniques to normalize and standardize variables for meaningful comparisons.
Utilize discretization to transform continuous data into categorical representations, simplifying analysis.
Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.
Create pivot tables to summarize and reshape data, gaining valuable insights from complex datasets.

Throughout the course, students will actively engage in practical exercises and projects, allowing them to apply data processing and manipulation techniques to real-world datasets. By the end of the course, participants will be well-equipped to effectively prepare, clean, and transform data for subsequent analysis tasks and data-driven decision-making.
This course is part of the Data Wrangling with Python Specialization.

What you'll learn

Understand the importance of data processing and manipulation in the data analysis pipeline.
Learn techniques to handle missing values and outliers, data reduction, and data scaling and discretization.
Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.

Syllabus

Missing Values and Outliers
Module 1
The "Missing Values and Outliers" week focuses on how to handle missing values and detect outliers using the Pandas library. You will learn essential techniques to identify and address missing data effectively, as well as methods to detect and manage outliers in datasets.

Data Reduction
Module 2
The "Data Reduction" week focuses on how to reduce data through sampling and dimensionality reduction using the Pandas library. You will learn essential techniques to obtain manageable subsets of data while preserving meaningful information for analysis and visualization.

Scaling and Discretization
Module 3
The "Scaling and Discretization" week focuses on the importance of data scaling and discretization in the data preprocessing process. You will learn why and how to perform data scaling to normalize variables and handle data with different scales. Additionally, you will explore the concept of data discretization and its application in transforming continuous data into categorical representations.

Data Warehouse
Module 4
The "Data Warehouse" week focuses on the concepts and methodologies of organizing data using data cubes and pivot tables in Pandas. You will learn the importance of data warehousing for efficient data management and analysis, as well as how to construct data cubes and pivot tables to facilitate multidimensional data exploration.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Rice University

An Introduction to Interactive Programming in Python (Part 1) (Coursera)

CS: Programming

This two-part course is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easy-to learn, high-level computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browser-based programming environment that makes developing interactive applications in Python simple.

Aug 10th 2026

5-12 Weeks

Programming Python Interactive Applications

Coursera

University of Illinois at Urbana-Champaign

Visualization for Data Journalism (Coursera)

Statistics & Data Analysis Data Science

While telling stories with data has been part of the news practice since its earliest days, it is in the midst of a renaissance. Graphics desks which used to be deemed as “the art department,” a subfield outside the work of newsrooms, are becoming a core part of newsrooms’ operation. Those people (they often have various titles: data journalists, news artists, graphic reporters, developers, etc.) who design news graphics are expected to be full-fledged journalists and work closely with reporters and editors.

Aug 10th 2026

5-12 Weeks

Python Storytelling Data Analysis

Coursera

Pontificia Universidad Católica de Chile

Introducción a la programación en Python I: Aprendiendo a programar con Python (Coursera)

CS: Software Engineering CS: Programming

Decía Steve Jobs que “todo el mundo debería aprender a programar un ordenador porque esto te ayuda a pensar”. Hoy en día la programación es una herramienta fundamental para el desarrollo de la tecnología moderna. Este curso te introduce en el mundo de la programación en el lenguaje Python.

Aug 10th 2026

5-12 Weeks

Programming Python Python Programming

Coursera

Rice University

Principles of Computing (Part 2) (Coursera)

CS: Software Engineering CS: Theory

This two-part course introduces the basic mathematical and programming principles that underlie much of Computer Science. Understanding these principles is crucial to the process of creating efficient and well-structured solutions for computational problems. To get hands-on experience working with these concepts, we will use the Python programming language. The main focus of the class will be weekly mini-projects that build upon the mathematical and programming principles that are taught in the class.

Aug 10th 2026

4 Weeks

Programming Python Computing

Coursera

Board Infinity

Excel Essentials and Beyond (Coursera)

Statistics & Data Analysis

Dive into "Excel Essentials and Beyond", a comprehensive exploration of Excel, the world's leading spreadsheet tool. This course is thoughtfully crafted for both newcomers to Excel and also for those aiming for mastery. We begin by introducing Excel's robust interface and foundational features, ensuring a firm grasp of data organization techniques. As you advance, you'll delve into data visualization, transforming raw data into captivating stories.

Aug 17th 2026

5-12 Weeks

Excel Microsoft Excel Data Visualization

Coursera

University of Michigan

Estructuras de datos de Python (Coursera)

CS: Software Engineering Computer Science

Este curso presentará las estructuras de datos básicas del lenguaje de programación Python. Veremos los conceptos básicos de la programación de procedimientos y exploraremos cómo podemos usar las estructuras de datos integrados de Python, como listas, diccionarios y tuplas, para realizar análisis de datos cada vez más complejos. Este curso abarcará los capítulos 6 a 10 del libro de texto “Python para todos”. Este curso cubre Python 3.

Aug 17th 2026

5-12 Weeks

Python Data Structures Dictionary

Coursera

Whizlabs

Selenium WebDriver with Python (Coursera)

CS: Software Engineering

“Selenium WebDriver with Python” is a foundational course that aims to provide a comprehensive understanding of Selenium and its components. It also helps in understanding how Selenium WebDriver Operates. This course begins by demonstrating an environment setup for Selenium WebDriver with Python. A brief description of locating Web elements and web Interactions is provided in this course. This course covers an overview of testing frameworks with Selenium WebDriver. Some advanced topics such as Handling Popup, Alerts, Multiple Browser Tabs, Mouse and Keyboard interactions are also highlighted in this course.

Aug 17th 2026

3 Weeks

Python Python Programming Selenium

Coursera

University of Michigan

Uso de bases de datos con Python (Coursera)

CS: Software Engineering Computer Science

Este curso presentará a los estudiantes los conceptos básicos del lenguaje de consulta estructurado (Structured Query Language, SQL), así como el diseño básico de bases de datos para almacenar datos como parte de una iniciativa de varios pasos para recopilar, analizar y procesar datos. El curso utilizará SQLite3 como base de datos. También crearemos rastreadores web y procesos de visualización y recopilación de datos de varios pasos. Utilizaremos la biblioteca D3.js para realizar la visualización básica de datos.

Aug 17th 2026

5-12 Weeks

Programming Python Databases

Coursera

Board Infinity

Dataplex by Google Cloud (Coursera)

CS: Information & Technology

Welcome to "Dataplex By Google Cloud " a comprehensive course designed to provide a thorough understanding of Google Cloud Dataplex, a platform for managing, monitoring, and analyzing data across various data systems in Google Cloud. Spanning two modules, the course begins with the fundamentals of Dataplex, including its setup, configuration, and basic functionalities.

Aug 17th 2026

2 Weeks

Data Management Monitoring Data Analytics

Coursera

The Hong Kong University of Science and Technology - HKUST

Python and Statistics for Financial Analysis (Coursera)

Economics & Finance Business

Python is now becoming the number 1 programming language for data science. Due to python’s simplicity and high readability, it is gaining its importance in the financial industry. The course combines both python coding and statistical concepts and applies into analyzing financial data, such as stock data.

Aug 10th 2026

4 Weeks

Python Statistics Inference

Coursera

Edureka

Gen AI for Code Generation for Python (Coursera)

CS: Software Engineering

Welcome to the 'Gen AI for Code Generation for Python' course, where you'll embark on a journey to explore and develop your skills in the art of code generation with Generative AI. Throughout this short course, you will delve into various techniques for generating Python code effortlessly, ranging from simple scripts to complete end-to-end projects.

Aug 17th 2026

1 Week

Python Artificial Intelligence Coding

Coursera

Rice University

An Introduction to Interactive Programming in Python (Part 2) (Coursera)

CS: Software Engineering CS: Programming

Aug 10th 2026

4 Weeks

Game Programming Python