Data Collection and Integration (Coursera)

Data Collection and Integration (Coursera)

The "Data Collection and Integration" course provides students with comprehensive techniques for gathering data from diverse sources, including files, relational databases, web pages, and APIs. Participants will gain practical experience in collecting and integrating data for further processing and analysis. The course emphasizes the utilization of appropriate tools and packages, such as Pandas, Beautiful Soup, and SQL, to effectively handle real-life datasets and address data integration challenges.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

What you'll learn

  • How to utilize Python and Python packages to collect data from various sources
  • How to integrate data collected from various sources to a unified dataset for further processing and analysis

This course is part of the Data Wrangling with Python Specialization.

Syllabus

Collect Data From Files
Module 1
The "Collect Data from Files" week focuses on equipping you with the necessary skills to handle various file formats, such as txt, csv, json, xml, html, and more, for effective data collection. You will learn how to read, parse, and extract relevant data from different file types, enabling you to gather valuable information from diverse sources.

Collect Data From Web
Module 2
The "Collect Data from Web" week focuses on empowering you with the skills to extract data from various webpage formats using Python libraries like requests and Beautiful Soup. You will learn how to access web pages, retrieve HTML content, and parse the data to collect relevant information effectively.

Collect Data From Database
Module 3
The "Collect Data from Database" week focuses on equipping you with the skills to interact with various SQL-like databases using Python packages. You will learn how to connect to databases, execute queries, and retrieve data from different database systems, enabling you to collect and utilize data efficiently.

Collect Data From APIs
Module 4
The "Collect Data from APIs" week focuses on enabling you to interact with various websites that provide Application Programming Interfaces (APIs). You will learn how to access APIs, retrieve data in structured formats (e.g., JSON or XML), and utilize Python to process and extract valuable information from API responses.

Data Integration
Module 5
The "Data Integration" week focuses on the techniques and methodologies for integrating data collected from various sources. You will learn how to combine and merge datasets, handle data inconsistencies, and create a unified dataset for further analysis and decision-making.

Case Study
Module 6
The "Case Study" week offers you the opportunity to apply the knowledge you have learned throughout the course in a practical and comprehensive case study. You will engage in data collection from various sources, including files, SQL-like databases, and web APIs, and then integrate the collected data into a unified dataset for further analysis. This week serves as a culminating activity, allowing you to demonstrate your skills in data collection, integration, and preparation for analysis.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Introduction to Data Science in Python (Coursera) Coursera
University of Michigan

Introduction to Data Science in Python (Coursera)

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.

Jun 22nd 2026
4 Weeks
Machine Learning with Python (Coursera) Coursera
IBM

Machine Learning with Python (Coursera)

This course dives into the basics of machine learning using an approachable, and well-known programming language, Python. In this course, we will be reviewing two main components: First, you will be learning about the purpose of Machine Learning and where it applies to the real world. Second, you will get a general overview of Machine Learning topics such as supervised vs unsupervised learning, model evaluation, and Machine Learning algorithms.

Jun 22nd 2026
5-12 Weeks
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 22nd 2026
5-12 Weeks
Managing Big Data with MySQL (Coursera) Coursera
Duke University

Managing Big Data with MySQL (Coursera)

This course is an introduction to how to use relational databases in business analysis. You will learn how relational databases work, and how to use entity-relationship diagrams to display the structure of the data held within them. This knowledge will help you understand how data needs to be collected in business contexts, and help you identify features you want to consider if you are involved in implementing new data collection efforts.

Jun 22nd 2026
5-12 Weeks
Hypothesis Testing with Python and Excel (Coursera) Coursera
Tufts University

Hypothesis Testing with Python and Excel (Coursera)

In today's job market, leaders need to understand the fundamentals of data to be competitive. An essential procedure to understand business and analytics is hypothesis testing. This short course, designed by Tufts University expert faculty, will teach the fundamentals of hypothesis testing of a population mean and a population proportion, using Excel and Python for calculations. You'll also discover the central limit theorem, which is essential for hypothesis testing. To conclude the course, you will apply your newfound skills by creating a plan for an experiment in your own workplace that uses hypothesis testing.

Jun 23rd 2026
1 Week
Relational database systems (Coursera) Coursera
Universidad Nacional Autónoma de México

Relational database systems (Coursera)

Welcome to the specialization course Relational Database Systems. This course will be completed on six weeks, it will be supported with videos and various documents that will allow you to learn in a very simple way how several types of information systems and databases are available to solve different problems and needs of the companies.

Jun 22nd 2026
5-12 Weeks
Big Data Analytical Platform on Alibaba Cloud (Coursera) Coursera
Alibaba Cloud Academy

Big Data Analytical Platform on Alibaba Cloud (Coursera)

Building an Analytical Platform on Alibaba Cloud can empower how you take in, analyze, and demonstrate clear metrics from a set of Big Data. This course is designed to teach engineers how to use Alibaba Cloud Big Data products. It covers basic distributed system theory and Alibaba Cloud's core products like MaxCompute, DataWorks, E-MapReduce as well as a bundle of ecosystem tools.

Jun 22nd 2026
5-12 Weeks
Data Manipulation at Scale: Systems and Algorithms (Coursera) Coursera
University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 22nd 2026
4 Weeks
Surveillance Systems: Analysis, Dissemination, and Special Systems (Coursera) Coursera
Johns Hopkins University

Surveillance Systems: Analysis, Dissemination, and Special Systems (Coursera)

In this course, we'll build on the previous lessons in this specialization to focus on some very specific skills related to public health surveillance. We'll learn how to get the most out of surveillance data analysis, focusing specifically on interpreting time trend data to detect temporal aberrations as well as person, place, and time in the context of surveillance data.

Jun 22nd 2026
4 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026
5-12 Weeks
Surveillance Systems: The Building Blocks (Coursera) Coursera
Johns Hopkins University

Surveillance Systems: The Building Blocks (Coursera)

Epidemiology is often described as the cornerstone science and public health and public health surveillance is a cornerstone of epidemiology. This course will help you build your technical awareness and skills for working with a variety of surveillance systems. Along the way, we'll focus on system objectives, data reporting, the core surveillance attributes, and performance assessment.

Jun 22nd 2026
4 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 22nd 2026
4 Weeks