MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
The course culminates in a project where you will apply your Spark skills to an ETL for ML workflow use-case.
NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or similar prior to starting this one.
This course can be applied to multiple Specializations or Professional Certificates programs. Completing this course will count towards your learning in any of the following programs:
- NoSQL, Big Data, and Spark Foundations Specialization
- IBM Data Engineering Professional Certificate
What You Will Learn
- Glean insights into how streaming data and Spark Structured Streaming empower machine learning and AI tasks.
- Delve into graph theory and Apache Spark GraphFrames, used for motif finding in genetics and biological sciences, and learn to identify data.
- Discover how ETL processes work with Apache Spark and machine learning and extend that knowledge to Spark MLlib capabilities and related benefits.
- Explore supervised learning and unsupervised learning, clustering, and learn how to use the k-means clustering algorithm with Spark MLlib.
Syllabus
WEEK 1
Spark for Data Engineering
In this first of two modules, learn what streaming data is and get the essential knowledge to use Spark for Structured Streaming. Learn about data sources, streaming output modes, and supported data destinations. Learn about data operations considerations and discover how Spark Structured streaming listeners and checkpointing benefit streaming data processing. Discover how graph theory works with streaming data. You’ll gain insights into the advantages that Apache Spark GraphFrames offers and learn what qualities make data suitable for GraphFrames processing. Then, explore ETL and learn how to use Apache Spark for data extraction, transformation, and loading, put your newfound knowledge to practice, and gain practical, real-world skills in the ETL for Machine Learning Pipelines hands-on lab.
WEEK 2
SparkML
This module demystifies the concepts and practices related to machine learning using SparkML and the Spark Machine learning library. Explore both supervised and unsupervised machine learning Explore classification and regression tasks and learn how SparkML supports these machine learning tasks. Gain insights into unsupervised learning, with a focus on clustering, and discover how to apply the k-means clustering algorithm using the Spark MLlib. Complete this learning with the lab that solidifies your learning and gain real-world experience with Spark ML.
WEEK 3
Final Project
This final project provides real-world experience where you'll create your own Apache Spark application. You will create this Spark application as an end-to-end use-case that follows the Extract, Transform and Load processes (ETL) including data acquisition, transformation, model training, and deployment using IBM Watson Machine Learning.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.