Spark, Hadoop, and Snowflake for Data Engineering (Coursera)

Offered by Duke University,
Spark, Hadoop, and Snowflake for Data Engineering (Coursera)

This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programming. Gain the skills for building efficient and scalable data pipelines.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) as well as learn how to optimize and manage them. Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks, while honing your Python data science skills with PySpark. Finally, discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks.
This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. In addition to the technologies you will learn, you will also gain methodologies to help you hone your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops methodologies and best practices.
With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.

What you'll learn

  • Create scalable data pipelines (Hadoop, Spark, Snowflake, Databricks) for efficient data handling.
  • Optimize data engineering with clustering and scaling to boost performance and resource use.
  • Build ML solutions (PySpark, MLFlow) on Databricks for seamless model development and deployment.
  • Implement DataOps and DevOps practices for continuous integration and deployment (CI/CD) of data-driven applications, including automating processes.

Syllabus

Overview and Introduction to PySpark
This week, you will learn how to work with different data engineering platforms, such as Hadoop and Spark, and apply their concepts to real-world scenarios. First, you will explore the fundamentals of Hadoop to store and process big data. Next, you will delve into Spark concepts, distributed computing, deferred execution, and Spark SQL. By the end of the week, you will gain hands-on experience with PySpark DataFrames, DataFrame methods, and deferred execution strategies.

Snowflake
This week, you will explore the Snowflake platform, gaining insights into its architecture and key concepts. Through hands-on practice in the Snowflake Web UI, you'll learn to create tables, manage warehouses, and use the Snowflake Python Connector to interact with tables. By the end of this week, you'll solidify your understanding of Snowflake's architecture and practical applications, emerging with the ability to effectively navigate and leverage the platform for data management and analysis.

Azure Databricks and MLFLow
This week, you will practice the essential skills for seamlessly managing machine learning workflows using Databricks and MLFlow. First, you will create a Databricks workspace and configure a cluster, setting the stage for efficient data analysis. Next, you will load a sample dataset into the Databricks workspace using the power of PySpark, enabling data manipulation and exploration. Finally, you will install MLFlow either locally or within the Databricks environment, gaining the ability to orchestrate the entire machine learning lifecycle. By the end of this week, you will be able to craft, track, and manage machine learning experiments within Databricks, ensuring precision, reproducibility, and optimal decision-making throughout your data-driven journey.

DataOps and Operations Methodologies
This week, you will explore the concepts of Kaizen, DevOps, and DataOps and how these methodologies synergistically contribute to efficient and seamless data engineering workflows. Through practical examples, you will learn how Kaizen's continuous improvement philosophy, DevOps' collaborative practices, and DataOps' focus on data quality and integration converge to enhance the development, deployment, and management of data engineering platforms. By the end of this week, you will have the knowledge and perspective needed to optimize data engineering processes and deliver scalable, reliable, and high-quality solutions.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Big Data Modeling and Management Systems (Coursera) Coursera
University of California, San Diego

Big Data Modeling and Management Systems (Coursera)

Once you’ve identified a big data issue to analyze, how do you collect, store and organize your data using Big Data solutions? In this course, you will experience various data genres and management tools appropriate for each. You will be able to describe the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems and analytical tools.

Jun 8th 2026
5-12 Weeks
Explore Core Data Concepts in Microsoft Azure (Coursera) Coursera
Microsoft

Explore Core Data Concepts in Microsoft Azure (Coursera)

In this course, you will learn the fundamentals of database concepts in a cloud environment, get basic skilling in cloud data services, and build your foundational knowledge of cloud data services within Microsoft Azure. You will identify and describe core data concepts such as relational, non-relational, big data, and analytics, and explore how this technology is implemented with Microsoft Azure. You will explore the roles, tasks, and responsibilities in the world of data.

Jun 15th 2026
2 Weeks
Machine Learning with Apache Spark (Coursera) Coursera
IBM

Machine Learning with Apache Spark (Coursera)

Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos.

Jun 15th 2026
4 Weeks
Hadoop Platform and Application Framework (Coursera) Coursera
University of California, San Diego

Hadoop Platform and Application Framework (Coursera)

This course is for novice programmers or business people who'd like to understand the core tools used to wrangle and analyze big data. With no prior experience, you'll have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment.

Jun 15th 2026
5-12 Weeks
Data Engineering Capstone Project (Coursera) Coursera
IBM

Data Engineering Capstone Project (Coursera)

In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate. You will assume the role of a Junior Data Engineer who has recently joined the organization and be presented with a real-world use case that requires a data engineering solution.

Jun 15th 2026
5-12 Weeks
A Crash Course in Data Science (Coursera) Coursera
Johns Hopkins University

A Crash Course in Data Science (Coursera)

By now you have definitely heard about data science and big data. In this one-week class, we will provide a crash course in what these terms mean and how they play a role in successful organizations. This class is for anyone who wants to learn what all the data science action is about, including those who will eventually need to manage data scientists. The goal is to get you up to speed as quickly as possible on data science without all the fluff. We've designed this course to be as convenient as possible without sacrificing any of the essentials.

Jun 15th 2026
1 Week
Big Data Integration and Processing (Coursera) Coursera
University of California, San Diego

Big Data Integration and Processing (Coursera)

At the end of the course, you will be able to: Retrieve data from example database and big data management systems; Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications; Identify when a big data problem needs data integration; Execute simple big data integration and processing on Hadoop and Spark platforms.

Jun 15th 2026
5-12 Weeks
The Importance of Listening (Coursera) Coursera
Northwestern University

The Importance of Listening (Coursera)

In this second MOOC in the Social Marketing Specialization - "The Importance of Listening" - you will go deep into the Big Data of social and gain a more complete picture of what can be learned from interactions on social sites. You will be amazed at just how much information can be extracted from a single post, picture, or video.

Jun 8th 2026
4 Weeks
Big Data, Artificial Intelligence, and Ethics (Coursera) Coursera
University of California, Davis

Big Data, Artificial Intelligence, and Ethics (Coursera)

This course gives you context and first-hand experience with the two major catalyzers of the computational science revolution: big data and artificial intelligence. With more than 99% of all mediated information in digital format and with 98% of the world population using digital technology, humanity produces an impressive digital footprint.

Jun 15th 2026
4 Weeks
Google Cloud Platform Fundamentals: Core Infrastructure (Coursera) Coursera
Google

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera)

This course introduces you to important concepts and terminology for working with Google Cloud Platform (GCP). You learn about, and compare, many of the computing and storage services available in Google Cloud Platform, including Google App Engine, Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, Google Cloud SQL, and BigQuery. You learn about important resource and policy management tools, such as the Google Cloud Resource Manager hierarchy and Google Cloud Identity and Access Management. Hands-on labs give you foundational skills for working with GCP.

Jun 15th 2026
1 Week
Accounting Analytics (Coursera) Coursera
University of Pennsylvania

Accounting Analytics (Coursera)

Accounting Analytics explores how financial statement data and non-financial metrics can be linked to financial performance. In this course, taught by Wharton’s acclaimed accounting professors, you’ll learn how data is used to assess what drives financial performance and to forecast future financial scenarios. While many accounting and financial organizations deliver data, accounting analytics deploys that data to deliver insight, and this course will explore the many areas in which accounting data provides insight into other business areas including consumer behavior predictions, corporate strategy, risk management, optimization, and more.

Jun 15th 2026
4 Weeks
Introdução ao Big Data (Coursera) Coursera
FIA Business School

Introdução ao Big Data (Coursera)

Este curso é indicado para profissionais que desejam entender de forma fácil o que é Big Data, conhecer algumas tecnologias de Big Data, ter acesso a algumas aplicações de Analytics, Internet das Coisas - IOT e de Big Data. Ao final do curso você será capaz de participar de um projeto de Big Data contribuindo com estratégias e direcionando o projeto para a escolha da adequada técnica de análise de dados.

Jun 15th 2026
4 Weeks