EdX

Building Batch Data Pipelines on Google Cloud (edX)

Offered by Google Cloud,
Building Batch Data Pipelines on Google Cloud (edX)

Developers responsible for designing pipelines and architectures for data processing. Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
This course is part of the Google Cloud Data Engineer Learning Path Professional Certificate.

What you'll learn

  • Review different methods of data loading: EL, ELT and ETL and when to use what
  • Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs
  • Build your data processing pipelines using Dataflow
  • Manage data pipelines with Data Fusion and Cloud Composer

Prerequisites:
To benefit from this course, participants should have completed “Google Cloud Big Data and Machine Learning Fundamentals” or have equivalent experience.
Participant should also have:
• Basic proficiency with a common query language such as SQL.
• Experience with data modeling and ETL (extract, transform, load) activities.
• Experience with developing applications using a common programming language such as Python.
• Familiarity with machine learning and/or statistics

Syllabus

  1. Introduction

In this module, we introduce the course and agenda

  1. Introduction to Building Batch Data Pipelines

This module reviews different methods of data loading: EL, ELT and ETL and when to use what

  1. Executing Spark on Dataproc

This module shows how to run Hadoop on Dataproc, how to leverage Cloud Storage, and how to optimize your Dataproc jobs.

  1. Serverless Data Processing with Dataflow

This module covers using Dataflow to build your data processing pipelines.

  1. Manage Data Pipelines with Cloud Data Fusion and Cloud Composer

This module shows how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

  1. Course Summary

Course Summary

  1. Course Resources

PDF links to all modules

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Smart Analytics, Machine Learning, and AI on Google Cloud (edX) EdX
Google Cloud

Smart Analytics, Machine Learning, and AI on Google Cloud (edX)

This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required. Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required.

Self Paced
Self-Paced
Data Storage and Processing (edX) EdX
ITMO University,ITMOx

Data Storage and Processing (edX)

Master the culture of data representation, interpretation and outcomes evaluation. Learn the fundamentals of relational and NoSQL database management systems. Want to learn data processing and interpreting the result you’ve got? This course is for you! Get acquainted with preparing and analyzing large amount of data, as well as data storage fundamentals.

No sessions available
5-12 Weeks
Preparing for the Google Cloud Professional Data Engineer Exam (Coursera) Coursera
Google Cloud

Preparing for the Google Cloud Professional Data Engineer Exam (Coursera)

From the course: "The best way to prepare for the exam is to be competent in the skills required of the job." This course uses a top-down approach to recognize knowledge and skills already known, and to surface information and skill areas for additional preparation. You can use this course to help create your own custom preparation plan. It helps you distinguish what you know from what you don't know. And it helps you develop and practice skills required of practitioners who perform this job.

Jun 13th 2026
5-12 Weeks
Data Engineering with Rust (Coursera) Coursera
Duke University

Data Engineering with Rust (Coursera)

Are you a data engineer, software developer, or a tech enthusiast with a basic understanding of Rust, seeking to enhance your skills and dive deep into the realm of data engineering with Rust? Or are you a professional from another programming language background, aiming to explore the efficiency, safety, and concurrency features of Rust for data engineering tasks? If so, this course is designed for you.

Jun 11th 2026
4 Weeks
Machine Learning Operations 2 (MLOps2-AML): Data Pipeline Automation & Optimization using Microsoft Azure Machine Learning (AML) (edX) EdX
Statistics.comX,Statistics.com

Machine Learning Operations 2 (MLOps2-AML): Data Pipeline Automation & Optimization using Microsoft Azure Machine Learning (AML) (edX)

Most data science projects fail. There are various reasons why, but one of the primary reasons is the challenge of deployment. One piece to the deployment puzzle is understanding how to automate your pipeline’s functions and continuously optimize its performance, which is why we developed this course, MLOps2: Data Pipeline Automation & Optimization using Microsoft Azure Machine Learning (AML).

Self Paced
Self-Paced
Authoritative GCP (edX) EdX
AI (Pragmatic AI Labs)

Authoritative GCP (edX)

Master Google Cloud Architecture and prepare for the Professional Cloud Architect certification exam through hands-on labs and expert instruction. This comprehensive course, designed for cloud architects and developers, covers the essential skills needed to design, plan, and manage robust enterprise solutions on Google Cloud Platform (GCP).

Self Paced
Self-Paced
AI Skills for Engineers: Data Engineering and Data Pipelines (edX) EdX
Delft University of Technology,DelftX

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

Good data is central to effective AI applications. This course teaches the basics of data for AI, covering what data is needed, how to extract data from existing databases and basic data skills including setup of a Python notebook environment, basic data exploration and simple data visualizations.

Self Paced
Self-Paced
Google Cloud Big Data and Machine Learning Fundamentals (edX) EdX
Google Cloud

Google Cloud Big Data and Machine Learning Fundamentals (edX)

Data Analysts, Data Engineers, Data Scientists, and ML Engineers who are getting started with Google Cloud. This course introduces the Google Cloud big data and machine learning products and services that support the data-to-AI lifecycle. It explores the processes, challenges, and benefits of building a big data pipeline and machine learning models with Vertex AI on Google Cloud.

Self Paced
Self-Paced