EdX

Apache Spark for Data Engineering and Machine Learning (edX)

Offered by IBM,

This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.
In this short course, you explore concepts and gain hands-on skills to use Spark for data engineering and machine learning applications. You'll learn about Spark Structured Streaming, including data sources, output modes, operations. Then, explore how Graph theory works and discover how GraphFrames supports Spark DataFrames and popular algorithms.
Organizations can acquire data from structured and unstructured sources and deliver the data to users in formats they can use. Learn how to use Spark for extract, transform and load (ETL) data. Then, you'll hone your newly acquired skills during your "ETL for Machine Learning Pipelines" lab.
Next, discover why machine learning practitioners prefer Spark. You'll learn how to create pipelines and quickly implement features for extraction, selections, and transformations on structured data sets. Discover how to perform classification and regression using Spark. You'll be able to define and identify both supervised and unsupervised learning. Learn about clustering and how to apply the k-mean s clustering algorithm using Spark MLlib. You'll reinforce your knowledge with focused, hands-on labs and a final project where you will apply Spark to a real-world inspired problem.
Prior to taking this course, please ensure you have foundational Spark knowledge and skills, for example, by first completing the IBM course titled "Big Data, Hadoop and Spark Basics."
This course is part of the NoSQL, Big Data and Spark Fundamentals Professional Certificate.

What you'll learn

Describe the features, benefits, limitations, and application of Apache Spark Structured Streaming
Describe Graph theory and explain how GraphFrames benefits developers
Explain how developers can apply extract, transform and load (ETL) processes using Spark.
Describe how Spark ML supports machine learning development
Apply Spark ML for regression and classification
Differentiate between supervised and unsupervised Machine learning"
Explain how Spark ML uses clustering
Demonstrate hands-on working knowledge of using Spark for ETL processes

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

PyTorch Basics for Machine Learning (edX)

EdX

IBM

PyTorch Basics for Machine Learning (edX)

Statistics & Data Analysis Data Science

This course is the first part in a two part course and will teach you the fundamentals of PyTorch. In this course you will implement classic machine learning algorithms, focusing on how PyTorch creates and optimizes models. You will quickly iterate through different aspects of PyTorch giving you strong foundations and all the prerequisites you need before you build deep learning models.

Self Paced

Self-Paced

ML Machine Learning Linear Regression

Introducción a R para ciencia de datos (edX)

EdX

Universitat Politècnica de València,UPValenciaX

Introducción a R para ciencia de datos (edX)

Statistics & Data Analysis

Este es un curso de introducción al lenguaje R, uno de los lenguajes de programación más utilizados en investigación científica y una gran herramienta para introducirse en el mundo del aprendizaje automático.

Self Paced

Self-Paced

Programming Machine Learning Data Analysis

EdX

HarvardX,Harvard University

Data Science: Capstone (edX)

Statistics & Data Analysis Data Science

Show what you’ve learned from the Professional Certificate Program in Data Science. To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

Self Paced

Self-Paced

Probability Machine Learning Regression

Python for Data Engineering Project (edX)

EdX

IBM

Python for Data Engineering Project (edX)

Statistics & Data Analysis

An opportunity to apply your foundational Python skills via a project, using various techniques to collect and work with data. Journey into the realm of becoming a Data Engineer and apply your basic Python knowledge of working with data. You will exercise various techniques in Python to extract data in multiple file formats from different sources, transform it into specific datatypes, and then prepare it for loading it into a database.

Self Paced

Self-Paced

Python APIs Jupyter Notebooks

Data Science and Machine Learning Capstone Project (edX)

EdX

IBM

Data Science and Machine Learning Capstone Project (edX)

Statistics & Data Analysis Data Science

Create a project that you can use to showcase your Data Science skills to prospective employers. Apply various data science and machine learning techniques to analyze and visualize a data set involving a real life business scenario and build a predictive model. Now that you've taken several courses on data science and machine learning, it’s time to put your learning to work on a data problem involving a real life scenario. Employers really care about how well you can apply your knowledge and skills to solve real world problems, and the work you do in this capstone project will make you stand out in the job market.

Self Paced

Self-Paced

Python Machine Learning Data Science

CS50's Introduction to Artificial Intelligence with Python (edX)

EdX

HarvardX,Harvard University

CS50's Introduction to Artificial Intelligence with Python (edX)

Robotics & Computer Vision

Learn to use machine learning in Python in this introductory course on artificial intelligence. AI is transforming how we live, work, and play. By enabling new technologies like self-driving cars and recommendation systems or improving old ones like medical diagnostics and search engines, the demand for expertise in AI and machine learning is growing rapidly. This course will enable you to take the first step toward solving important real-world problems and future-proofing your career.

Self Paced

Self-Paced

Python Artificial Intelligence Machine Learning

Selected Topics on Discrete Choice (edX)

EdX

École Polytechnique Fédérale de Lausanne,EPFLx

Selected Topics on Discrete Choice (edX)

Engineering

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction to Discrete Choice Models”. We have selected some important advanced topics, that are presented in detail.

Self Paced

Self-Paced

Machine Learning Sampling Discrete Choice

EdX

University of Adelaide,AdelaideX

Big Data Analytics (edX)

Statistics & Data Analysis Data Science

Learn key technologies and techniques, including R and Apache Spark, to analyse large-scale data sets to uncover valuable business information. Gain essential skills in today’s digital age to store, process and analyse data to inform business decisions.

Self Paced

Self-Paced

Big Data R Language Statistical Analysis

EdX

HarvardX,Harvard University

MLOps for Scaling TinyML (edX)

Data Science Computer Science

This course introduces learners to Machine Learning Operations (MLOps) through the lens of TinyML (Tiny Machine Learning). Learners explore best practices to deploy, monitor, and maintain (tiny) Machine Learning models in production at scale.

Self Paced

Self-Paced

Machine Learning TinyML Tiny Machine Learning

Machine Learning Use Cases in Finance (edX)

EdX

Université de Montréal,UMontrealX

Machine Learning Use Cases in Finance (edX)

Economics & Finance Data Science

In the last six years, the financial sector has seen an increase in the use of machine learning models in financial, banking and insurance contexts. Data science and advanced analytics teams in the financial and insurance community are implementing these models regularly and have found a place for them in their toolbox.

Self Paced

Self-Paced

Finance Machine Learning Neural Networks

Deep Learning with Python and PyTorch (edX)

EdX

IBM

Deep Learning with Python and PyTorch (edX)

Statistics & Data Analysis Data Science

This course is the second part of a two-part course on how to develop Deep Learning models using Pytorch. In the first course, you learned the basics of PyTorch; in this course, you will learn how to build deep neural networks in PyTorch. Also, you will learn how to train these models using state of the art methods.

Self Paced

Self-Paced

Python ML Machine Learning

EdX

IBM

Data Engineering Capstone Project (edX)

Computer Science

This Capstone Project is designed for you to apply and demonstrate your Data Engineering skills and knowledge in SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools and Big Data.

Self Paced

Self-Paced

Python NoSQL SQL