Coursera

Data Engineering and Machine Learning using Spark (Coursera)

Offered by IBM,

Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others. In this short course you'll gain practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The course culminates in a project where you will apply your Spark skills to an ETL for ML workflow use-case.
NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or similar prior to starting this one.
This course can be applied to multiple Specializations or Professional Certificates programs. Completing this course will count towards your learning in any of the following programs:

What You Will Learn

Glean insights into how streaming data and Spark Structured Streaming empower machine learning and AI tasks.
Delve into graph theory and Apache Spark GraphFrames, used for motif finding in genetics and biological sciences, and learn to identify data.
Discover how ETL processes work with Apache Spark and machine learning and extend that knowledge to Spark MLlib capabilities and related benefits.
Explore supervised learning and unsupervised learning, clustering, and learn how to use the k-means clustering algorithm with Spark MLlib.

Syllabus

WEEK 1
Spark for Data Engineering
In this first of two modules, learn what streaming data is and get the essential knowledge to use Spark for Structured Streaming. Learn about data sources, streaming output modes, and supported data destinations. Learn about data operations considerations and discover how Spark Structured streaming listeners and checkpointing benefit streaming data processing. Discover how graph theory works with streaming data. You’ll gain insights into the advantages that Apache Spark GraphFrames offers and learn what qualities make data suitable for GraphFrames processing. Then, explore ETL and learn how to use Apache Spark for data extraction, transformation, and loading, put your newfound knowledge to practice, and gain practical, real-world skills in the ETL for Machine Learning Pipelines hands-on lab.

WEEK 2
SparkML
This module demystifies the concepts and practices related to machine learning using SparkML and the Spark Machine learning library. Explore both supervised and unsupervised machine learning Explore classification and regression tasks and learn how SparkML supports these machine learning tasks. Gain insights into unsupervised learning, with a focus on clustering, and discover how to apply the k-means clustering algorithm using the Spark MLlib. Complete this learning with the lab that solidifies your learning and gain real-world experience with Spark ML.

WEEK 3
Final Project
This final project provides real-world experience where you'll create your own Apache Spark application. You will create this Spark application as an end-to-end use-case that follows the Extract, Transform and Load processes (ETL) including data acquisition, transformation, model training, and deployment using IBM Watson Machine Learning.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

DeepLearning.AI

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning (Coursera)

CS: Software Engineering Computer Science

If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.

Aug 17th 2026

4 Weeks

Artificial Intelligence Machine Learning Neural Networks

Coursera

UNSW Sydney - University of New South Wales

Remote Sensing Image Acquisition, Analysis and Applications (Coursera)

Engineering

Welcome to Remote Sensing Image Acquisition, Analysis and Applications, in which we explore the nature of imaging the earth's surface from space or from airborne vehicles. This course covers the fundamental nature of remote sensing and the platforms and sensor types used. It also provides an in-depth treatment of the computational algorithms employed in image understanding, ranging from the earliest historically important techniques to more recent approaches based on deep learning.

Aug 17th 2026

13-24 Weeks

Analysis Algorithms Machine Learning

Coursera

University of Michigan

Applied Text Mining in Python (Coursera)

Statistics & Data Analysis Data Science

This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes. The third week will apply basic natural language processing methods to text, and demonstrate how text classification is accomplished. The final week will explore more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling).

Aug 17th 2026

4 Weeks

Programming Python Machine Learning

Coursera

Google Cloud

Encoder-Decoder Architecture (Coursera)

CS: Information & Technology

This course gives you a synopsis of the encoder-decoder architecture, which is a powerful and prevalent machine learning architecture for sequence-to-sequence tasks such as machine translation, text summarization, and question answering. You learn about the main components of the encoder-decoder architecture and how to train and serve these models. In the corresponding lab walkthrough, you’ll code in TensorFlow a simple implementation of the encoder-decoder architecture for poetry generation from the beginning.

Aug 17th 2026

1 Week

Machine Learning TensorFlow Coursera Plus

Coursera

Edureka

Build a Data Warehouse in AWS (Coursera)

CS: Information & Technology

Embark on a transformative journey with our "Build a Data Warehouse in AWS" course, immersing yourself in the landscape of Amazon Redshift. This comprehensive course equips you with the essential skills not only to navigate but also to harness the full potential of this robust cloud-based data warehousing solution.

Aug 10th 2026

4 Weeks

SQL Data Warehousing Data Warehouse

Coursera

École Polytechnique Fédérale de Lausanne

Big Data Analysis with Scala and Spark (Scala 2 version) (Coursera)

CS: Theory Data Science

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout.

Aug 17th 2026

4 Weeks

SQL Scala Big Data

Coursera

Pontificia Universidad Católica de Chile

Introducción a la Minería de Datos (Coursera)

CS: Information & Technology

En este curso, aprenderás de manera gradual y práctica los conceptos básicos de Minería de Datos, junto a los algoritmos más utilizados hoy en día. Al finalizar el curso, serás capaz de entender la importancia de manejar la información y de explorar por ti mismo distintas bases de datos reales. Este curso es el primer paso para convertirte en un/a profesional con habilidades básicas de un científico de datos o Data Scientist, de manera tal que puedas abrirle la puerta al futuro.

Aug 10th 2026

5-12 Weeks

Algorithms Clustering Data Management

Coursera

University of London,Goldsmiths, University of London

Foundations of Data Science: K-Means Clustering in Python (Coursera)

Data Science

This MOOC, designed by an academic team from Goldsmiths, University of London, will quickly introduce you to the core concepts of Data Science to prepare you for intermediate and advanced Data Science courses. It focuses on the basic mathematics, statistics and programming skills that are necessary for typical data analysis tasks.

Aug 10th 2026

5-12 Weeks

Programming Python Machine Learning

Coursera

Stanford University

Probabilistic Graphical Models 1: Representation (Coursera)

Statistics & Data Analysis Data Science

Probabilistic graphical models (PGMs) are a rich framework for encoding probability distributions over complex domains: joint (multivariate) distributions over large numbers of random variables that interact with each other. These representations sit at the intersection of statistics and computer science, relying on concepts from probability theory, graph algorithms, machine learning, and more. They are the basis for the state-of-the-art methods in a wide variety of applications, such as medical diagnosis, image understanding, speech recognition, natural language processing, and many, many more. They are also a foundational tool in formulating many machine learning problems.

Aug 3rd 2026

5-12 Weeks

MATLAB Octave Machine Learning

Coursera

EDUCBA

Regression & Forecasting for Data Scientists using Python (Coursera)

CS: Information & Technology Data Science

This course provides comprehensive training in regression analysis and forecasting techniques for data science, emphasizing Python programming. You will master time-series analysis, forecasting, linear regression, and data preprocessing, enabling you to make data-driven decisions across industries.

Aug 10th 2026

4 Weeks

Python Regression Linear Regression

Coursera

Stanford University

Introduction to Statistics (Coursera)

Statistics & Data Analysis Data Science

Stanford's "Introduction to Statistics" teaches you statistical thinking concepts that are essential for learning from data and communicating insights. By the end of the course, you will be able to perform exploratory data analysis, understand key principles of sampling, and select appropriate tests of significance for multiple contexts. You will gain the foundational skills that prepare you to pursue more advanced topics in statistical thinking and machine learning.

Aug 10th 2026

5-12 Weeks

Statistics Analysis Probability

Coursera

New York University

Guided Tour of Machine Learning in Finance (Coursera)

Data Science

This course aims at providing an introductory and broad overview of the field of ML with the focus on applications on Finance. Supervised Machine Learning methods are used in the capstone project to predict bank closures. Simultaneously, while this course can be taken as a separate course, it serves as a preview of topics that are covered in more details in subsequent modules of the specialization Machine Learning and Reinforcement Learning in Finance.

Aug 10th 2026

4 Weeks

ML Artificial Intelligence Machine Learning