EdX

Data Engineering with Databricks (edX)

Offered by AI (Pragmatic AI Labs),

Become an expert in modern data engineering on Databricks' unified lakehouse platform. Master ETL pipelines, data transformations with Apache Spark, and Delta Lake for reliable data management.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Data Engineering on Databricks Lakehouse Platform

Learn Databricks architecture, cluster management & notebook analysis
Build reliable ETL pipelines with Delta Lake for data transformation
Implement advanced data processing techniques with Apache Spark

Course Highlights:

Create & scale Databricks clusters for workloads
Load data from diverse sources into notebooks
Explore, visualize & profile datasets with notebooks
Version control & share notebooks via Git integration
Read & ingest data in various file formats
Transform data with SQL & DataFrame operations
Handle complex data types like arrays, structs, timestamps
Deduplicate, join & flatten nested data structures
Identify & fix data quality issues with UDFs
Load cleansed data into Delta Lake for reliability
Build production-ready pipelines with Delta Live Tables
Schedule & monitor workloads using Databricks Jobs
Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.
This course is part of the Large Language Model Operations (LLMOps) Professional Certificate.

What you'll learn

Use Databricks for data engineering and ML workloads
Create and design ML pipelines
Use Llamafile and other local LLMs like Mixtral

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

Introduction to the Databricks Lakehouse Platform and its architecture
Creating, managing, and configuring clusters
Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI
Introduction to notebooks, including execution, sharing, and multi-language support
Efficient data transformation with Spark SQL and the Catalog Explorer
Creating tables from files and querying external data sources
Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines
Automated pipelines with Delta Live Tables
Delta Live Tables components
Continuous vs triggered pipelines
Configuring Auto Loader
Querying pipeline events
End-to-end example of Delta Live
Vacuum and garbage collection
Orchestrating workloads with Databricks Jobs
Multi-task workflows and task dependencies
Viewing job history
Using dashboards
Handling failures and configuring retries
Unified data access with Unity Catalog
Catalogs vs metastores
Unity Catalog quickstart in Python
Applying object security
Best practices for catalogs, connections, and business units

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

EdX

Google Cloud

Modernizing Data Lakes and Data Warehouses with Google Cloud (edX)

Computer Science

This course is intended for developers who are responsible for: Querying datasets, visualizing query results, and creating reports. Specific job roles include: Data Engineer, Data Analyst, Database Administrators, Big Data Architects.

Self Paced

Self-Paced

Data Warehouse Google Cloud Data Lake

EdX

AI (Pragmatic AI Labs)

Introduction to Generative AI (edX)

Robotics & Computer Vision

Unlock the Power of Generative AI: Master the Fundamentals and Explore Boundless Possibilities.

Self Paced

Self-Paced

Artificial Intelligence AI Prompt Engineering

Coursera

Microsoft

Extract, Transform and Load Data in Power BI (Coursera)

Statistics & Data Analysis Data Science

This course forms part of the Microsoft Power BI Analyst Professional Certificate. This Professional Certificate consists of a series of courses that offers a good starting point for a career in data analysis using Microsoft Power BI.

Jun 1st 2026

4 Weeks

Data Analysis Power BI Power Query

Coursera

Google

The Path to Insights: Data Models and Pipelines (Coursera)

Statistics & Data Analysis Data Science

This is the second of three courses in the Google Business Intelligence Certificate. In this course, you'll explore data modeling and how databases are designed. Then you’ll learn about extract, transform, load (ETL) processes that extract data from source systems, transform it into formats that enable analysis, and drive business processes and goals.

Jun 1st 2026

4 Weeks

Databases Data Management Data Modeling

Apache Spark for Data Engineering and Machine Learning (edX)

EdX

IBM

Apache Spark for Data Engineering and Machine Learning (edX)

Computer Science

This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.

Self Paced

Self-Paced

ML Machine Learning Apache Spark

EdX

AI (Pragmatic AI Labs)

Cloud Computing Foundations (edX)

Computer Science

Learn the foundations of cloud computing and build websites using serverless, PaaS, and IaaS technologies. Apply DevOps principles and create continuous delivery pipelines for efficient cloud infrastructure management.

Self Paced

Self-Paced

Machine Learning Cloud Computing Cloud Infrastructures

Applied Local Large Language Models (edX)

EdX

AI (Pragmatic AI Labs)

Applied Local Large Language Models (edX)

Computer Science

Unlock the power of large language models on your machine. Master setup and interaction with cutting-edge LLMs through intuitive web interfaces and APIs. Explore diverse tools, programming languages, and frameworks like Hugging Face and Mozilla for seamless LLM integration. Gain invaluable skills for efficient local LLM deployment.

Self Paced

Self-Paced

Hugging Face Large Language Models LLMs

EdX

Microsoft

Data Science Essentials (edX)

Data Science

Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from MIT and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization. Plus, look at examples of how to build a cloud data science solution using Azure Machine Learning, R, and Python.

Not Available

Course Not Available

Machine Learning Data Science Data Visualization

Big Data, Hadoop, and Spark Basics (edX)

EdX

IBM

Big Data, Hadoop, and Spark Basics (edX)

Computer Science

This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on. Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery, and more, to identify behaviors and preferences of prospects, clients, competitors, and others. ****

Self Paced

Self-Paced

Big Data Hadoop Apache Spark

Python and Pandas for Data Engineering (edX)

EdX

AI (Pragmatic AI Labs)

Python and Pandas for Data Engineering (edX)

Computer Science

Master Python essentials and Pandas for data engineering. Learn to set up development environments, manipulate data, and efficiently solve real-world problems.

Self Paced

Self-Paced

Python Git Visual Studio Code

Building ETL and Data Pipelines with Bash, Airflow and Kafka (edX)

EdX

IBM

Building ETL and Data Pipelines with Bash, Airflow and Kafka (edX)

Engineering Computer Science

This course provides you with practical skills to build and manage data pipelines and Extract, Transform, Load (ETL) processes using shell scripts, Airflow and Kafka. Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

Self Paced

Self-Paced

ETL Kafka Bash

EdX

AI (Pragmatic AI Labs)

Large Language Models with Azure (edX)

Computer Science

Harness Azure's AI Power: Master Large Language Models, (LLMs) Optimize Deployments, and Build Cutting-Edge Applications.

Self Paced

Self-Paced

Artificial Intelligence Azure Machine Learning Scalability