Machine Learning with Apache Spark (Coursera)

Offered by IBM,
Machine Learning with Apache Spark (Coursera)

Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML.
In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models. Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally, demonstrate your acquired skills through a final assignment.
This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.

What you'll learn

  • Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence.
  • Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines.
  • Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML.
  • Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

Syllabus

Get Started with Machine Learning
Module 1
In this module, you will gain knowledge of machine learning techniques that enable computers to perform tasks without explicit programming. You will explore the lifecycle of machine learning models and understand the crucial role of data engineering in machine learning projects. The module covers supervised and unsupervised learning techniques, including classification, regression, and clustering. Furthermore, you will acquire valuable insights into Generative AI and its potential to revolutionize multiple industries, enhance people's lives, and generate newer and previously unimaginable data and experiences.

Machine Learning with Apache Spark
Module 2
This module will introduce you to Spark and provide an overview of its key features and applications in the field of data engineering. You will discover the process of connecting to a Spark cluster using SN labs and delve into various topics such as regression, mileage prediction, classification, diabetic classification, clustering, and clustering load data using SparkML. Additionally, you will gain insights into how to construct these models using Spark ML. Moreover, this module will cover GraphFrames on Apache Spark and guide you in hands-on labs.

Data Engineering for Machine Learning using Apache Spark
Module 3
This module begins with Apache Spark Structured Streaming and its role in processing streaming data with Spark SQL. You will acquire knowledge about key terms associated with Structured Streaming. The module then covers the Extract-Transform-Load process and provides hands-on experience in transferring data from one source to another destination with varying data formats or structures. Additionally, you will gain a practical understanding of feature extraction and transformation using Spark extract and transform features. The module also delves into machine learning pipelines using Spark, demonstrating the process and benefits involved. Lastly, you will grasp the concept of model persistence and its significant role in Machine Learning.

Final Project
Module 4
In this module, you will apply the data engineering skills and techniques you have acquired throughout the course. The course concludes with a final project and assignments that allow you to demonstrate your proficiency in these areas. You will step into the role of a data engineer working at a renowned aeronautics consulting company recognized for its adeptness in handling large datasets. Your role as a data engineer is crucial as the data scientists rely on your expertise to carry out ETL (Extract, Transform, Load) tasks and establish machine learning pipelines. While data scientists possess expertise in machine learning, they depend on your specialized knowledge to handle various algorithms and data formats. Your contribution plays a vital role in ensuring the smooth execution of their tasks.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera) Coursera
Google

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera)

This course introduces you to important concepts and terminology for working with Google Cloud Platform (GCP). You learn about, and compare, many of the computing and storage services available in Google Cloud Platform, including Google App Engine, Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, Google Cloud SQL, and BigQuery. You learn about important resource and policy management tools, such as the Google Cloud Resource Manager hierarchy and Google Cloud Identity and Access Management. Hands-on labs give you foundational skills for working with GCP.

Jun 1st 2026
1 Week
Mathematics for Machine Learning: Linear Algebra (Coursera) Coursera
Imperial College London

Mathematics for Machine Learning: Linear Algebra (Coursera)

In this course on Linear Algebra we look at what linear algebra is and how it relates to vectors and matrices. Then we look through what vectors and matrices are and how to work with them, including the knotty problem of eigenvalues and eigenvectors, and how to use these to solve problems. Finally we look at how to use these to do fun things with datasets - like how to rotate images of faces and how to extract eigenvectors to look at how the Pagerank algorithm works.

Jun 1st 2026
5-12 Weeks
Introduction to Applied Machine Learning (Coursera) Coursera
Alberta Machine Intelligence Institute

Introduction to Applied Machine Learning (Coursera)

This course is for professionals who have heard the buzz around machine learning and want to apply machine learning to data analysis and automation. Whether finance, medicine, engineering, business or other domains, this course will introduce you to problem definition and data preparation in a machine learning project.

Jun 1st 2026
4 Weeks
A Crash Course in Data Science (Coursera) Coursera
Johns Hopkins University

A Crash Course in Data Science (Coursera)

By now you have definitely heard about data science and big data. In this one-week class, we will provide a crash course in what these terms mean and how they play a role in successful organizations. This class is for anyone who wants to learn what all the data science action is about, including those who will eventually need to manage data scientists. The goal is to get you up to speed as quickly as possible on data science without all the fluff. We've designed this course to be as convenient as possible without sacrificing any of the essentials.

Jun 1st 2026
1 Week
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 1st 2026
5-12 Weeks
Data Science Companion (Coursera) Coursera
MathWorks

Data Science Companion (Coursera)

The Data Science Companion provides an introduction to data science. You will gain a quick background in data science and core machine learning concepts, such as regression and classification. You’ll be introduced to the practical knowledge of data processing and visualization using low-code solutions, as well as an overview of the ways to integrate multiple tools effectively to solve data science problems.

Jun 5th 2026
4 Weeks
Introduction to Machine Learning (Coursera) Coursera
Duke University

Introduction to Machine Learning (Coursera)

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 5th 2026
5-12 Weeks
Advanced Algorithms and Complexity (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Advanced Algorithms and Complexity (Coursera)

You've learned the basic algorithms now and are ready to step into the area of more complex problems and algorithms to solve them. Advanced algorithms build upon basic ones and use new ideas. We will start with networks flows which are used in more typical applications such as optimal matchings, finding disjoint paths and flight scheduling as well as more surprising ones like image segmentation in computer vision.

Jun 1st 2026
5-12 Weeks
Python and Machine Learning for Asset Management (Coursera) Coursera
EDHEC Business School

Python and Machine Learning for Asset Management (Coursera)

This course will enable you mastering machine-learning approaches in the area of investment management. It has been designed by two thought leaders in their field, Lionel Martellini from EDHEC-Risk Institute and John Mulvey from Princeton University. Starting from the basics, they will help you build practical skills to understand data science so you can make the best portfolio decisions.

Jun 1st 2026
5-12 Weeks
Matrix Factorization and Advanced Techniques (Coursera) Coursera
University of Minnesota

Matrix Factorization and Advanced Techniques (Coursera)

In this course you will learn a variety of matrix factorization and hybrid machine learning techniques for recommender systems. Starting with basic matrix factorization, you will understand both the intuition and the practical details of building recommender systems based on reducing the dimensionality of the user-product preference space. Then you will learn about techniques that combine the strengths of different algorithms into powerful hybrid recommenders.

Jun 1st 2026
5-12 Weeks
Convolutional Neural Networks (Coursera) Coursera
DeepLearning.AI

Convolutional Neural Networks (Coursera)

This course will teach you how to build convolutional neural networks and apply it to image data. Thanks to deep learning, computer vision is working far better than just two years ago, and this is enabling numerous exciting applications ranging from safe autonomous driving, to accurate face recognition, to automatic reading of radiology images.

Jun 1st 2026
4 Weeks