EdX

Apache Spark for Data Engineering and Machine Learning (edX)

Offered by IBM,
Apache Spark for Data Engineering and Machine Learning (edX)

This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.
In this short course, you explore concepts and gain hands-on skills to use Spark for data engineering and machine learning applications. You'll learn about Spark Structured Streaming, including data sources, output modes, operations. Then, explore how Graph theory works and discover how GraphFrames supports Spark DataFrames and popular algorithms.
Organizations can acquire data from structured and unstructured sources and deliver the data to users in formats they can use. Learn how to use Spark for extract, transform and load (ETL) data. Then, you'll hone your newly acquired skills during your "ETL for Machine Learning Pipelines" lab.
Next, discover why machine learning practitioners prefer Spark. You'll learn how to create pipelines and quickly implement features for extraction, selections, and transformations on structured data sets. Discover how to perform classification and regression using Spark. You'll be able to define and identify both supervised and unsupervised learning. Learn about clustering and how to apply the k-mean s clustering algorithm using Spark MLlib. You'll reinforce your knowledge with focused, hands-on labs and a final project where you will apply Spark to a real-world inspired problem.
Prior to taking this course, please ensure you have foundational Spark knowledge and skills, for example, by first completing the IBM course titled "Big Data, Hadoop and Spark Basics."
This course is part of the NoSQL, Big Data and Spark Fundamentals Professional Certificate.

What you'll learn

  • Describe the features, benefits, limitations, and application of Apache Spark Structured Streaming
  • Describe Graph theory and explain how GraphFrames benefits developers
  • Explain how developers can apply extract, transform and load (ETL) processes using Spark.
  • Describe how Spark ML supports machine learning development
  • Apply Spark ML for regression and classification
  • Differentiate between supervised and unsupervised Machine learning"
  • Explain how Spark ML uses clustering
  • Demonstrate hands-on working knowledge of using Spark for ETL processes
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Data Science: Machine Learning (edX) EdX
HarvardX,Harvard University

Data Science: Machine Learning (edX)

Build a movie recommendation system and learn the science behind one of the most popular and successful data science techniques. Perhaps the most popular data science methodologies come from machine learning. What distinguishes machine learning from other computer guided decision processes is that it builds prediction algorithms using data.

Self Paced
Self-Paced
Python for Data Science (edX) EdX
University of California, San Diego,UC San DiegoX

Python for Data Science (edX)

Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets. In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a large dataset, would you be able to find the answers you seek?

Self Paced
Self-Paced
High-Dimensional Data Analysis (edX) EdX
HarvardX,Harvard University

High-Dimensional Data Analysis (edX)

A focus on several techniques that are widely used in the analysis of high-dimensional data. If you’re interested in data analysis and interpretation, then this is the data science course for you. We start by learning the mathematical definition of distance and use this to motivate the use of the singular value decomposition (SVD) for dimension reduction and multi-dimensional scaling and its connection to principle component analysis.

Self Paced
Self-Paced
Data Analysis: Statistical Modeling and Computation in Applications (edX) EdX
MIT,MITx

Data Analysis: Statistical Modeling and Computation in Applications (edX)

A hands-on introduction to the interplay between statistics and computation for the analysis of real data. -- Part of the MITx MicroMasters program in Statistics and Data Science. Data science requires multi-disciplinary skills ranging from mathematics, statistics, machine learning, problem solving to programming, visualization, and communication skills. In this course, learners will combine these foundational and practical skills with domain knowledge to ask and answer questions using real data.

May 13th 2024
13-24 Weeks
Applied Quantum Computing III: Algorithm and Software (edX) EdX
Purdue University,PurdueX

Applied Quantum Computing III: Algorithm and Software (edX)

Learn domain-specific quantum algorithms and how to run them on present-day quantum hardware. This course is part III of the series of Quantum computing courses, which covers aspects from fundamentals to present-day hardware platforms to quantum software and programming. The goal of part III is to discuss some of the key domain-specific algorithms that are developed by exploiting the fundamental quantum phenomena (e.g. entanglement)and computing models discussed in part I.

Mar 25th 2024
5-12 Weeks
Computing for Data Analysis (edX) EdX
Georgia Institute of Technology,GTx

Computing for Data Analysis (edX)

A hands-on introduction to basic programming principles and practice relevant to modern data analysis, data mining, and machine learning. The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data. In the course, you’ll see how computing and mathematics come together.

Aug 19th 2024
13-24 Weeks
Data Analytics and Visualization in Health Care (edX) EdX
Rochester Institute of Technology,RITx

Data Analytics and Visualization in Health Care (edX)

Learn best practices in data analytics, informatics, and visualization to gain literacy in data-driven, strategic imperatives that affect all facets of health care. Big data is transforming the health care industry relative to improving quality of care and reducing costs—key objectives for most organizations. Employers are desperately searching for professionals who have the ability to extract, analyze, and interpret data from patient health records, insurance claims, financial records, and more to tell a compelling and actionable story using health care data analytics.

Self Paced
Self-Paced
Data Science and Machine Learning Capstone Project (edX) EdX
IBM

Data Science and Machine Learning Capstone Project (edX)

Create a project that you can use to showcase your Data Science skills to prospective employers. Apply various data science and machine learning techniques to analyze and visualize a data set involving a real life business scenario and build a predictive model. Now that you've taken several courses on data science and machine learning, it’s time to put your learning to work on a data problem involving a real life scenario. Employers really care about how well you can apply your knowledge and skills to solve real world problems, and the work you do in this capstone project will make you stand out in the job market.

Self Paced
Self-Paced
Artificial Intelligence (AI) (edX) EdX
Columbia University,ColumbiaX

Artificial Intelligence (AI) (edX)

Learn the fundamentals of Artificial Intelligence (AI), and apply them. Design intelligent agents to solve real-world problems including, search, games, machine learning, logic, and constraint satisfaction problems. What do self-driving cars, face recognition, web search, industrial robots, missile guidance, and tumor detection have in common? They are all complex real world problems being solved with applications of intelligence (AI).

This course is archived
5-12 Weeks