Serverless Data Processing with Dataflow: Operations (Coursera)

Serverless Data Processing with Dataflow: Operations (Coursera)
Course Auditing
Categories
Effort
Certification
Languages
Misc

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Serverless Data Processing with Dataflow: Operations (Coursera)
In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance. We will then review testing, deployment, and reliability best practices for Dataflow pipelines.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

We will conclude with a review of Templates, which makes it easy to scale Dataflow pipelines to organizations with hundreds of users. These lessons will help ensure that your data platform is stable and resilient to unanticipated circumstances.


What You Will Learn

- Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.

- Deploy Dataflow pipelines with reliability in mind to maximize stability for your data processing platform


Course 3 of 3 in the Serverless Data Processing with Dataflow Specialization


Syllabus


WEEK 1

Introduction

This module covers the course outline

Monitoring

In this module, we learn how to use the Jobs List page to filter for jobs that we want to monitor or investigate. We look at how the Job Graph, Job Info, and Job Metrics tabs collectively provide a comprehensive summary of your Dataflow job. Lastly, we learn how we can use Dataflow’s integration with Metrics Explorer to create alerting policies for Dataflow metrics.

Logging and Error Reporting

In this module, we learn how to use the Log panel at the bottom of both the Job Graph and Job Metrics pages, and learn about the centralized Error Reporting page.

Troubleshooting and Debug

In this module, we learn how to troubleshoot and debug Dataflow pipelines. We will also review the four common modes of failure for Dataflow: failure to build the pipeline, failure to start the pipeline on Dataflow, failure during pipeline execution, and performance issues.


WEEK 2

Performance

In this module, we will discuss performance considerations we should be aware of while developing batch and streaming pipelines in Dataflow.

Testing and CI/CD

This module will discuss unit testing your Dataflow pipelines. We also introduce frameworks and features available to streamline your CI/CD workflow for Dataflow pipelines.

Reliabiity

In this module we will discuss methods for building systems that are resilient to corrupted data and data center outages.

Flex Templates

This module covers Flex Templates, a feature that helps data engineering teams standardize and reuse Dataflow pipeline code. Many operational challenges can be solved with Flex Templates.

Summary

This module reviews the topics covered in the course



MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Course Auditing
42.00 EUR/month

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.