Data Engineering Capstone Project (Coursera)

Offered by IBM,
Data Engineering Capstone Project (Coursera)

In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate. You will assume the role of a Junior Data Engineer who has recently joined the organization and be presented with a real-world use case that requires a data engineering solution.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Course 13 of 13 in the IBM Data Engineering Professional Certificate.

Syllabus

WEEK 1
Data Platform Architecture and OLTP Database
In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.

WEEK 2
Querying Data in NoSQL Databases
In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.

WEEK 3
Build a Data Warehouse
In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.

WEEK 4
Data Analytics
In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.

WEEK 5
ETL & Data Pipelines
In this module, you will use the given python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.

WEEK 6
Big Data Analytics with Spark
In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.

WEEK 7
Final Submission and Peer Review
In this final module you will complete your submission of screenshots from the hands-on labs for your peers to review. Once you have completed your submission you will then review the submission of one of your peers and grade their submission.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Python and Machine Learning for Asset Management (Coursera) Coursera
EDHEC Business School

Python and Machine Learning for Asset Management (Coursera)

This course will enable you mastering machine-learning approaches in the area of investment management. It has been designed by two thought leaders in their field, Lionel Martellini from EDHEC-Risk Institute and John Mulvey from Princeton University. Starting from the basics, they will help you build practical skills to understand data science so you can make the best portfolio decisions.

Jun 1st 2026
5-12 Weeks
Regression Modeling in Practice (Coursera) Coursera
Wesleyan University

Regression Modeling in Practice (Coursera)

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

Jun 5th 2026
4 Weeks
Data Management and Visualisation (Coursera) Coursera
Wesleyan University

Data Management and Visualisation (Coursera)

Whether being used to customize advertising to millions of website visitors or streamline inventory ordering at a small restaurant, data is becoming more integral to success. Too often, we’re not sure how use data to find answers to the questions that will make us more successful in what we do. In this course, you will discover what data is and think about what questions you have that can be answered by the data – even if you’ve never thought about data before. Based on existing data, you will learn to develop a research question, describe the variables and their relationships, calculate basic statistics, and present your results clearly.

Jun 1st 2026
4 Weeks
Python Data Analysis (Coursera) Coursera
Rice University

Python Data Analysis (Coursera)

This course will continue the introduction to Python programming that started with Python Programming Essentials and Python Data Representations. We'll learn about reading, storing, and processing tabular data, which are common tasks. We will also teach you about CSV files and Python's support for reading and writing them. CSV files are a generic, plain text file format that allows you to exchange tabular data between different programs. These concepts and skills will help you to further extend your Python programming knowledge and allow you to process more complex data.

Jun 1st 2026
4 Weeks
Machine Learning for Accounting with Python (Coursera) Coursera
University of Illinois at Urbana-Champaign

Machine Learning for Accounting with Python (Coursera)

This course, Machine Learning for Accounting with Python, introduces machine learning algorithms (models) and their applications in accounting problems. It covers classification, regression, clustering, text analysis, time series analysis. It also discusses model evaluation and model optimization. This course provides an entry point for students to be able to apply proper machine learning models on business related datasets with Python to solve various problems.

Jun 1st 2026
5-12 Weeks
Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera) Coursera
Icahn School of Medicine at Mount Sinai

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Jun 1st 2026
5-12 Weeks
Data Processing Using Python (Coursera) Coursera
Nanjing University

Data Processing Using Python (Coursera)

This course is mainly for non-computer majors. It starts with the basic syntax of Python, to how to acquire data in Python locally and from network, to how to present data, then to how to conduct basic and advanced statistic analysis and visualization of data, and finally to how to design a simple GUI to present and process data, advancing level by level.

Jun 1st 2026
5-12 Weeks
Data-driven Decision Making (Coursera) Coursera
PwC

Data-driven Decision Making (Coursera)

Welcome to Data-driven Decision Making. In this course, you'll get an introduction to Data Analytics and its role in business decisions. You'll learn why data is important and how it has evolved. You'll be introduced to “Big Data” and how it is used. You'll also be introduced to a framework for conducting Data Analysis and what tools and techniques are commonly used. Finally, you'll have a chance to put your knowledge to work in a simulated business setting. This course was created by PricewaterhouseCoopers LLP with an address at 300 Madison Avenue, New York, New York, 10017.

Jun 1st 2026
4 Weeks
Python Basics (Coursera) Coursera
University of Michigan

Python Basics (Coursera)

This course introduces the basics of Python 3, including conditional execution and iteration as control structures, and strings and lists as data structures. You'll program an on-screen Turtle to draw pretty pictures. You'll also learn to draw reference diagrams as a way to reason about program executions, which will help to build up your debugging skills.

Jun 1st 2026
4 Weeks
Understanding and Visualizing Data with Python (Coursera) Coursera
University of Michigan

Understanding and Visualizing Data with Python (Coursera)

In this course, learners will be introduced to the field of statistics, including where data come from, study design, data management, and exploring and visualizing data. Learners will identify different types of data, and learn how to visualize, analyze, and interpret summaries for both univariate and multivariate data. Learners will also be introduced to the differences between probability and non-probability sampling from larger populations, the idea of how sample estimates vary, and how inferences can be made about larger populations based on probability sampling.

Jun 1st 2026
4 Weeks
Python Project for Data Science (Coursera) Coursera
IBM

Python Project for Data Science (Coursera)

This mini-course is intended to for you to demonstrate foundational Python skills for working with data. The completion of this course involves working on a hands-on project where you will develop a simple dashboard using Python. This course is part of the IBM Data Science Professional Certificate and the IBM Data Analytics Professional Certificate.

Jun 4th 2026
1 Week