Getting and Cleaning Data (Coursera)

Getting and Cleaning Data (Coursera)

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
This course can be applied to multiple Specializations or Professional Certificates programs. Completing this course will count towards your learning in any of the following programs:

What You Will Learn

  • Understand common data storage systems
  • Apply data cleaning basics to make data "tidy"
  • Use R for text and date manipulation
  • Obtain usable data from the web, APIs, and databases

Syllabus

WEEK 1
In this first week of the course, we look at finding data and reading different file types.
WEEK 2
This week the primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
WEEK 3
This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.
WEEK 4
This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Communicating Data Science Results (Coursera) Coursera
University of Washington

Communicating Data Science Results (Coursera)

Making predictions is not enough! Effective data scientists know how to explain and interpret their results, and communicate findings accurately to stakeholders to inform business decisions. Visualization is the field of research in computer science that studies effective communication of quantitative results by linking perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will learn to recognize, design, and use effective visualizations.

Jun 8th 2026
3 Weeks
The Structured Query Language (SQL) (Coursera) Coursera
University of Colorado Boulder

The Structured Query Language (SQL) (Coursera)

In this course you will learn all about the Structured Query Language ("SQL".) We will review the origins of the language and its conceptual foundations. But primarily, we will focus on learning all the standard SQL commands, their syntax, and how to use these commands to conduct analysis of the data within a relational database. Our scope includes not only the SELECT statement for retrieving data and creating analytical reports, but also includes the DDL ("Data Definition Language") and DML ("Data Manipulation Language") commands necessary to create and maintain database objects.

Jun 9th 2026
5-12 Weeks
Understanding China, 1700-2000: A Data Analytic Approach, Part 1 (Coursera) Coursera
The Hong Kong University of Science and Technology - HKUST

Understanding China, 1700-2000: A Data Analytic Approach, Part 1 (Coursera)

The purpose of this course is to summarize new directions in Chinese history and social science produced by the creation and analysis of big historical datasets based on newly opened Chinese archival holdings, and to organize this knowledge in a framework that encourages learning about China in comparative perspective. Our course demonstrates how a new scholarship of discovery is redefining what is singular about modern China and modern Chinese history.

Jun 8th 2026
5-12 Weeks
Fundamentals of Social Media Advertising (Coursera) Coursera
Facebook

Fundamentals of Social Media Advertising (Coursera)

This course takes a deep dive into paid advertising on social media. Learn how to start advertising on platforms like Facebook and Instagram by developing effective ads. Learn how to work with design teams by capturing the essence of your ad campaign in a creative brief, and understand how privacy policies may affect your ads. Complete the course with a project where you will produce a creative brief with assets you would deliver to a design team for your ad campaign. You’ll also create your first social media ad.

Jun 9th 2026
5-12 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 8th 2026
5-12 Weeks
Using Python to Interact with the Operating System (Coursera) Coursera
Google

Using Python to Interact with the Operating System (Coursera)

By the end of this course, you’ll be able to manipulate files and processes on your computer’s operating system. You’ll also have learned about regular expressions -- a very powerful tool for processing text files -- and you’ll get practice using the Linux command line on a virtual machine. And, this might feel like a stretch right now, but you’ll also write a program that processes a bunch of errors in an actual log file and then generates a summary file. That’s a super useful skill for IT Specialists to know.

Jun 9th 2026
5-12 Weeks
Exploratory Data Analysis (Coursera) Coursera
Johns Hopkins University

Exploratory Data Analysis (Coursera)

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 8th 2026
4 Weeks
Foundations of strategic business analytics (Coursera) Coursera
ESSEC Business School

Foundations of strategic business analytics (Coursera)

Who is this course for? This course is designed for students, business analysts, and data scientists who want to apply statistical knowledge and techniques to business contexts. For example, it may be suited to experienced statisticians, analysts, engineers who want to move more into a business role. You will find this course exciting and rewarding if you already have a background in statistics, can use R or another programming language and are familiar with databases and data analysis techniques such as regression, classification, and clustering.

Jun 8th 2026
4 Weeks
Introducción a Data Science: Programación Estadística con R (Coursera) Coursera
Universidad Nacional Autónoma de México

Introducción a Data Science: Programación Estadística con R (Coursera)

Este curso te proporcionará las bases del lenguaje de programación estadística R, la lengua franca de la estadística, el cual te permitirá escribir programas que lean, manipulen y analicen datos cuantitativos. Te explicaremos la instalación del lenguaje; también verás una introducción a los sistemas base de gráficos y al paquete para graficar ggplot2, para visualizar estos datos. Además también abordarás la utilización de uno de los IDEs más populares entre la comunidad de usuarios de R, llamado RStudio.

Jun 8th 2026
4 Weeks
Introduction to Recommender Systems: Non-Personalized and Content-Based (Coursera) Coursera
University of Minnesota

Introduction to Recommender Systems: Non-Personalized and Content-Based (Coursera)

This course, which is designed to serve as the first course in the Recommender Systems specialization, introduces the concept of recommender systems, reviews several examples in detail, and leads you through non-personalized recommendation using summary statistics and product associations, basic stereotype-based or demographic recommendations, and content-based filtering recommendations.

Jun 8th 2026
4 Weeks
Introduction to Data Science in Python (Coursera) Coursera
University of Michigan

Introduction to Data Science in Python (Coursera)

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.

Jun 8th 2026
4 Weeks