EdX

Data Engineering with Databricks (edX)

Data Engineering with Databricks (edX)

Become an expert in modern data engineering on Databricks' unified lakehouse platform. Master ETL pipelines, data transformations with Apache Spark, and Delta Lake for reliable data management.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Data Engineering on Databricks Lakehouse Platform

  • Learn Databricks architecture, cluster management & notebook analysis
  • Build reliable ETL pipelines with Delta Lake for data transformation
  • Implement advanced data processing techniques with Apache Spark

Course Highlights:

  • Create & scale Databricks clusters for workloads
  • Load data from diverse sources into notebooks
  • Explore, visualize & profile datasets with notebooks
  • Version control & share notebooks via Git integration
  • Read & ingest data in various file formats
  • Transform data with SQL & DataFrame operations
  • Handle complex data types like arrays, structs, timestamps
  • Deduplicate, join & flatten nested data structures
  • Identify & fix data quality issues with UDFs
  • Load cleansed data into Delta Lake for reliability
  • Build production-ready pipelines with Delta Live Tables
  • Schedule & monitor workloads using Databricks Jobs
  • Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.
This course is part of the Large Language Model Operations (LLMOps) Professional Certificate.

What you'll learn

  • Use Databricks for data engineering and ML workloads
  • Create and design ML pipelines
  • Use Llamafile and other local LLMs like Mixtral

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

  • Introduction to the Databricks Lakehouse Platform and its architecture
  • Creating, managing, and configuring clusters
  • Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI
  • Introduction to notebooks, including execution, sharing, and multi-language support
  • Efficient data transformation with Spark SQL and the Catalog Explorer
  • Creating tables from files and querying external data sources
  • Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines
Automated pipelines with Delta Live Tables
Delta Live Tables components
Continuous vs triggered pipelines
Configuring Auto Loader
Querying pipeline events
End-to-end example of Delta Live
Vacuum and garbage collection
Orchestrating workloads with Databricks Jobs
Multi-task workflows and task dependencies
Viewing job history
Using dashboards
Handling failures and configuring retries
Unified data access with Unity Catalog
Catalogs vs metastores
Unity Catalog quickstart in Python
Applying object security
Best practices for catalogs, connections, and business units

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

The Path to Insights: Data Models and Pipelines (Coursera) Coursera
Google

The Path to Insights: Data Models and Pipelines (Coursera)

This is the second of three courses in the Google Business Intelligence Certificate. In this course, you'll explore data modeling and how databases are designed. Then you’ll learn about extract, transform, load (ETL) processes that extract data from source systems, transform it into formats that enable analysis, and drive business processes and goals.

Jun 1st 2026
4 Weeks
Data Science Essentials (edX) EdX
Microsoft

Data Science Essentials (edX)

Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from MIT and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization. Plus, look at examples of how to build a cloud data science solution using Azure Machine Learning, R, and Python.

Not Available
Course Not Available
Big Data, Hadoop, and Spark Basics (edX) EdX
IBM

Big Data, Hadoop, and Spark Basics (edX)

This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on. Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery, and more, to identify behaviors and preferences of prospects, clients, competitors, and others. ****

Self Paced
Self-Paced
Building ETL and Data Pipelines with Bash, Airflow and Kafka (edX) EdX
IBM

Building ETL and Data Pipelines with Bash, Airflow and Kafka (edX)

This course provides you with practical skills to build and manage data pipelines and Extract, Transform, Load (ETL) processes using shell scripts, Airflow and Kafka. Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

Self Paced
Self-Paced