EdX

Big Data, Hadoop, and Spark Basics (edX)

Offered by IBM,

This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on. Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery, and more, to identify behaviors and preferences of prospects, clients, competitors, and others. ****

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

This course introduces you to Big Data concepts and practices. You will understand the characteristics, features, benefits, limitations of Big Data and explore some of the Big Data processing tools. You'll explore how Hadoop, Hive, and Spark can help organizations overcome Big Data challenges and reap the rewards of its acquisition.
Hadoop, an open-source framework, enables distributed processing of large data sets across clusters of computers using simple programming models. Each computer, or node, offers local computation and storage, allowing datasets to be processed faster and more efficiently. Hive, a data warehouse software, provides an SQL-like interface to efficiently query and manipulate large data sets in various databases and file systems that integrate with Hadoop.
Open-source Apache Spark is a processing engine built around speed, ease of use, and analytics that provides users with newer ways to store and use big data.
You will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the different components that make up Apache Spark. In this course, you will also learn how Resilient Distributed Datasets, known as RDDs, enable parallel processing across the nodes of a Spark cluster.
You'll gain practical skills when you learn how to analyze data in Spark using PySpark and Spark SQL and how to create a streaming analytics application using Spark Streaming, and more.

What you'll learn
"After completing this course, a learner will be able to..."

Describe Big Data, its impact, processing methods and tools, and use cases.
Describe Hadoop architecture, ecosystem, practices, and applications, including Distributed File - -
Describe Spark programming basics, including parallel programming basics, for DataFrames, data sets, and SparkSQL.
Describe how Spark uses RDDs, creates data sets, and uses Catalyst and Tungsten to optimize SparkSQL.
Apply Apache Spark development and runtime environment options.

This course is part of the NoSQL, Big Data and Spark Fundamentals Professional Certificate

Syllabus

Module 1 – What is Big Data?
___Introduction to Big Data_ *
o What is Big Data?
o Impact of Big Data
o Parallel Processing, Scaling, and Data Parallelism
o Tools of Big Data
o Beyond the Hype
o Big Data Use Cases
o Viewpoints about Big Data

Module 2 – Introduction to the Hadoop Ecosystem
___Introduction to the Hadoop Ecosystem_ *
o What is Hadoop
o An introduction to MapReduce
o The Hadoop Ecosystem/Common components: Introducing HDFS, Hive, HBase, and Spark, other modules
o Working with HDFS
o Working with HBase
o Lab: MapReduce

Module 3 – Introduction to Apache Spark
___Introduction to Apache Spark_ *
o Why use Apache Spark?
o Functional Programming Basics
o Parallel Programming using Resilient Distributed Datasets
o Scale-out / Data Parallelism in Apache Spark
o DataFrames and SparkSQL
o Lab: Practical examples with PySpark

Module 4 – DataFrames and SparkSQL
___DataFrames and SparkSQL_ *
o Introduction to Data-Frames & SparkSQL
o RDDs in Parallel Programming and Spark
o Data-frames and Datasets
o Catalyst and Tungsten
o ETL with Data-frames
o Lab: ETL with Data-frames
o Real-world usage of SparkSQL
o Lab: SparkSQL

Module 5 – Development and Runtime Environment options
___Development and Runtime Environment options_ *
o Apache Spark architecture
o Overview of Apache Spark Cluster Modes
o How to Run an Apache Spark Application
o Using Apache Spark on IBM Cloud
o Lab: Scale-out on IBM Spark Environment in Watson Studio
o Setting Apache Spark Configuration
o Running Spark on Kubernetes
o Lab: Spark on Kube

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

EdX

HECMontrealX,HEC Montréal

L'analyse de données UX (edX)

Statistics & Data Analysis

Devenez un scientifique des données UX! De l'analyse de données qualitatives à l'analyse du « Big Data », vous serez en mesure de dégager des « insights » des données afin de formuler des recommandations sur des bases empiriques.

Self Paced

Self-Paced

Big Data User Experience ANOVA

EdX

University of Adelaide,AdelaideX

Programming for Data Science (edX)

CS: Programming Data Science

Learn how to apply fundamental programming concepts, computational thinking and data analysis techniques to solve real-world data science problems. There is a rising demand for people with the skills to work with Big Data sets and this course can start you on your journey through our Big Data MicroMasters program towards a recognised credential in this highly competitive area. Using practical activities you will learn how digital technologies work and will develop your coding skills through engaging and collaborative assignments.

Self Paced

Self-Paced

Programming Big Data Data Analysis

EdX

University of Pennsylvania,PennX

Big Data and Education (edX)

Education Statistics & Data Analysis

Learn the methods and strategies for using large-scale educational data to improve education and make discoveries about learning. Online and software-based learning tools have been used increasingly in education. This movement has resulted in an explosion of data, which can now be used to improve educational effectiveness and support basic research on learning.

Self Paced

Self-Paced

Education Big Data Data Mining

Enabling Technologies for Data Science and Analytics: The Internet of Things (edX)

EdX

Columbia University,ColumbiaX

Enabling Technologies for Data Science and Analytics: The Internet of Things (edX)

Statistics & Data Analysis

Discover the relationship between Big Data and the Internet of Things (IoT). The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices will be connected by 2020. In this data science course, you will learn about the major components of the Internet of Things and how data is acquired from sensors. You will also examine ways of analyzing event data, sentiment analysis, facial recognition software and how data generated from devices can be used to make decisions.

Self Paced

Self-Paced

Analysis Big Data Data Analysis

EdX

IBM

NoSQL Database Basics (edX)

Computer Science

This course introduces you to the fundamentals of NoSQL, including the four key non-relational database categories. By the end of the course you will have hands-on skills for working with MongoDB, Cassandra and IBM Cloudant NoSQL databases.

Self Paced

Self-Paced

MongoDB NoSQL NoSQL Databases

EdX

University of California, San Diego,UC San DiegoX

Big Data Analytics Using Spark (edX)

Statistics & Data Analysis Data Science

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

Dec 5th 2023

5-12 Weeks

Machine Learning Big Data Hadoop

Introduction to Management Information Systems (MIS): A Survival Guide (edX)

EdX

Universidad Carlos III de Madrid - UC3M,UC3Mx

Introduction to Management Information Systems (MIS): A Survival Guide (edX)

Management & Leadership

Gain the skills and knowledge needed to succeed in an MIS-dominated corporate world. This MIS course will cover supporting tech infrastructures (Cloud, Databases, Big Data), the MIS development/ procurement process, and the main integrated systems, ERPs, such as SAP®, Oracle® or Microsoft Dynamics Navision®, as well as their relationship with Business Process Redesign.

Self Paced

Self-Paced

Cloud Databases Big Data

Tangible Things: Discovering History Through Artworks, Artifacts, Scientific Specimens, and the Stuff Around You (edX)

EdX

HarvardX,Harvard University

Tangible Things: Discovering History Through Artworks, Artifacts, Scientific Specimens, and the Stuff Around You (edX)

Humanities Art and Culture

Gain an understanding of history, museum studies, and curation by looking at, organizing, and interpreting art, artifacts, scientific curiosities, and the stuff of everyday life. Have you ever wondered about how museum, library, and other kinds of historical or scientific collections all come together? Or how and why curators, historians, archivists, and preservationists do what they do?

Self Paced

Self-Paced

History Artifacts Artwork

EdX

University of Adelaide,AdelaideX

Big Data Fundamentals (edX)

Statistics & Data Analysis Data Science

Learn how big data is driving organisational change and essential analytical tools and techniques, including data mining and PageRank algorithms. Organizations now have access to massive amounts of data and it’s influencing the way they operate. They are realizing in order to be successful they must leverage their data to make effective business decisions.

Self Paced

Self-Paced

Big Data Data Mining MapReduce

Big Data Technology Capstone Project (edX)

EdX

The Hong Kong University of Science and Technology - HKUST,HKUSTx

Big Data Technology Capstone Project (edX)

Statistics & Data Analysis Computer Science

The Big Data Technology Capstone Project will allow you to apply the techniques and theory you have gained from the four courses in this MicroMasters program to a medium-scale project. In this capstone course, you will get an opportunity to apply the knowledge and skills that you have gained throughout this MicroMasters program.

Self Paced

Self-Paced

Big Data Data Mining Data Analysis

Visualización de Datos y Storytelling (edX)

EdX

Tecnológico de Monterrey,TecdeMonterreyX

Visualización de Datos y Storytelling (edX)

Engineering Statistics & Data Analysis

Aprende en este curso en línea que es la visualización de datos, sus usos; los elementos que la conforman y la forma de poder utilizarla para el apoyo en la toma de las mejores decisiones para las empresas basadas en el análisis de datos. Digamos que necesitas comprender big data; miles o incluso millones de filas de datos, y tienes poco tiempo para hacerlo. los datos pueden provenir de tu equipo, en cuyo caso tal vez ya estés familiarizado con lo que estás midiendo y de los resultados que se esperan. O puede provenir de otro equipo, o tal vez de varios equipos a la vez, y estar completamente familiarizado.

Self Paced

Self-Paced

Big Data Storytelling Data Science

Herramientas de la Inteligencia de Negocios (edX)

EdX

Galileo University,GalileoX

Herramientas de la Inteligencia de Negocios (edX)

Statistics & Data Analysis

Aprende el proceso de extraer y transformar data para generar insumos y tomar decisiones. Usa software, herramientas y sistemas de apoyo. Con este curso aprenderas a tomar decisiones empresariales exitosas. Para ello, aprenderas el proceso completo desde extraer data, hasta su integracion, visualizacion, depuracion, analisis y uso. Podr as transformar data cruda en insumos para la toma de decisiones. Dominaras el uso de software, herramientas y sistemas de apoyo.

Self Paced

Self-Paced

Big Data Business Intelligence Power BI