EdX

Serverless Data Processing with Dataflow: Develop Pipelines (edX)

Offered by Google Cloud,
Serverless Data Processing with Dataflow: Develop Pipelines (edX)

In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs. We move onto reviewing best practices that help maximize your pipeline performance. Towards the end of the course, we introduce SQL and Dataframes to represent your business logic in Beam and how to iteratively develop pipelines using Beam notebooks.
This course is part of the Google Cloud Data Engineer Learning Path Professional Certificate.

What you'll learn

  • Review main Apache Beam concepts covered in DE (Pipeline, PCollections, PTransforms, Runner; reading/writing, Utility PTransforms, side inputs, bundles & DoFn Lifecycle)
  • Review core streaming concepts covered in DE (unbounded PCollections, windows, watermarks, and triggers)
  • Select & tune the I/O of your choice for your Dataflow pipeline
  • Use schemas to simplify your Beam code & improve the performance of your pipeline
  • Implement best practices for Dataflow pipelines
  • Develop a Beam pipeline using SQL & DataFrames

Syllabus

  1. Introduction

This module introduces the course and course outline.

  1. Beam Concepts Review

Review main concepts of Apache Beam, and how to apply them to write your own data processing pipelines.

  1. Windows, Watermarks Triggers

In this module, you will learn about how to process data in streaming with Dataflow. For that, there are three main concepts that you need to learn: how to group data in windows, the importance of watermark to know when the window is ready to produce results, and how you can control when and how many times the window will emit output.

  1. Sources & Sinks

In this module, you will learn about what makes sources and sinks in Google Cloud Dataflow. The module will go over some examples of Text IO, FileIO, BigQueryIO, PubSub IO, KafKa IO, BigTable IO, Avro IO, and Splittable DoFn. The module will also point out some useful features associated with each IO.

  1. Schemas

This module will introduce schemas, which give developers a way to express structured data in their Beam pipelines.

  1. State and Timers

This module covers State and Timers, two powerful features that you can use in your DoFn to implement stateful transformations.

  1. Best Practices

This module will discuss best practices and review common patterns that maximize performance for your Dataflow pipelines.

  1. Dataflow SQL & DataFrames

This modules introduces two new APIs to represent your business logic in Beam: SQL and Dataframes.

  1. Beam Notebooks

This module will cover Beam notebooks, an interface for Python developers to onboard onto the Beam SDK and develop their pipelines iteratively in a Jupyter notebook environment.

  1. Summary

This module provides a recap of the course.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Big Data Analysis Deep Dive (Coursera) Coursera
Alibaba Cloud Academy

Big Data Analysis Deep Dive (Coursera)

The job market for architects, engineers, and analytics professionals with Big Data expertise continues to increase. The Academy’s Big Data Career path focuses on the fundamental tools and techniques needed to pursue a career in Big Data. This course includes: data processing with python, writing and reading SQL queries, transmitting data with MaxCompute, analyzing data with Quick BI, using Hive, Hadoop, and spark on E-MapReduce, and how to visualize data with data dashboards. Work through our course material, learn different aspects of the Big Data field, and get certified as a Big Data Professional!

Jun 8th 2026
5-12 Weeks
Hacking PostgreSQL: Data Access Methods (edX) EdX
Ural Federal University,UrFUx

Hacking PostgreSQL: Data Access Methods (edX)

Learn the science, engineering practices and hacking techniques of data access – core aspects of information processing in a database. This course is about data storage and data processing technologies with examples from PostgreSQL. It is geared toward database core developers, operation systems developers, system architects, and all those who want to understand databases in more detail.

No sessions available
13-24 Weeks
Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera) Coursera
Icahn School of Medicine at Mount Sinai

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Jun 1st 2026
5-12 Weeks
Serverless Data Processing with Dataflow: Develop Pipelines (Coursera) Coursera
Google Cloud

Serverless Data Processing with Dataflow: Develop Pipelines (Coursera)

In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts. Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

Jun 1st 2026
3 Weeks
Serverless Data Processing with Dataflow: Foundations (Coursera) Coursera
Google Cloud

Serverless Data Processing with Dataflow: Foundations (Coursera)

This course is part 1 of a 3-course series on Serverless Data Processing with Dataflow. In this first course, we start with a refresher of what Apache Beam is and its relationship with Dataflow. Next, we talk about the Apache Beam vision and the benefits of the Beam Portability framework. The Beam Portability framework achieves the vision that a developer can use their favorite programming language with their preferred execution backend.

Jun 1st 2026
2 Weeks
Predictive Modeling and Machine Learning with MATLAB (Coursera) Coursera
MathWorks

Predictive Modeling and Machine Learning with MATLAB (Coursera)

In this course, you will build on the skills learned in Exploratory Data Analysis with MATLAB and Data Processing and Feature Engineering with MATLAB to increase your ability to harness the power of MATLAB to analyze data relevant to the work you do. These skills are valuable for those who have domain knowledge and some exposure to computational tools, but no programming background.

Jun 8th 2026
4 Weeks
Basic Data Processing and Visualization (Coursera) Coursera
University of California, San Diego

Basic Data Processing and Visualization (Coursera)

This is the first course in the four-course specialization Python Data Products for Predictive Analytics, introducing the basics of reading and manipulating datasets in Python. In this course, you will learn what a data product is and go through several Python libraries to perform data retrieval, processing, and visualization.

Jun 8th 2026
5-12 Weeks