Udacity

Deploying a Hadoop Cluster (Udacity)

Offered by Udacity,

Analyze Data with Hadoop and MapReduce. Learn how to tackle big data problems with your own Hadoop clusters! In this course, you’ll deploy Hadoop clusters in the cloud and use them to gain insights from large datasets.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

Using massive datasets to guide decisions is becoming more and more important for modern businesses. Hadoop and MapReduce are fundamental tools for working with big data. By knowing how to deploy your own Hadoop clusters, you’ll be able to start exploring big data on your own.

What You Will Learn

Lesson 1
Deploying a Hadoop cluster on Amazon EC2
Learn how to deploy a small Hadoop cluster on Amazon EC2 instances.

Lesson 2
Deploy a Hadoop cluster with Ambari
Use Apache Ambari to automatically deploy a larger
more powerful Hadoop cluster.

Lesson 3
On-demand Hadoop clusters
Use Amazon’s ElasticMapReduce to deploy a Hadoop cluster on-demand.

Lesson 4
Analyzing a big dataset with Hadoop and MapReduce
Use Hadoop and MapReduce to analyze a 150 GB dataset of Wikipedia page views.

Prerequisites and Requirements
This course is intended for students with some experience with Hadoop and MapReduce, Python, and bash commands. You’ll have to be able to work with HDFS and write MapReduce programs. You can learn about these in our Intro to Hadoop and MapReduce course. The MapReduce programs in the course are written in Python. It is possible to use Java and other languages, but we suggest using Python, on the level of our Intro to Computer Science course. You’ll also be using remote cloud machines, so you’ll need to know these bash commands: ssh, scp, cat, head/tail.
You’ll also need to be able to work in an editor such as vim or nano. You can learn about these in our Linux Command Line Basics course.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Universidade de São Paulo, Brasil

Redes Ecológicas (Coursera)

Statistics & Data Analysis Data Science

Todos os seres vivos estão conectados entre si por interações ecológicas, formando a “colina emaranhada” de Darwin, metáfora inspirada pela “teia da vida” de Humboldt. Desemaranhar essa complexidade é uma tarefa desafiadora, mas factível, desde que você use ferramentas adequadas. A ciência de redes nos ajuda com excelentes ferramentas conceituais e operacionais.

Aug 17th 2026

4 Weeks

Networks Data Analysis Network Analysis

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 1 (Coursera)

Sci: Mathematics

This class presents the fundamental probability and statistical concepts used in elementary data analysis. It will be taught at an introductory level for students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.

Aug 17th 2026

4 Weeks

Math Statistics Probability

Udacity

Introduction to Machine Learning Course (Udacity)

Data Science

This class will teach you the end-to-end process of investigating data through a machine learning lens. Learn online, with Udacity. Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.

Self Paced

Self-Paced

Statistics Machine Learning Clustering

EdX

Georgia Institute of Technology,GTx

Computing for Data Analysis (edX)

CS: Software Engineering Statistics & Data Analysis

A hands-on introduction to basic programming principles and practice relevant to modern data analysis, data mining, and machine learning. The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data. In the course, you’ll see how computing and mathematics come together.

Aug 24th 2026

13-24 Weeks

Programming Python Computing

Udacity

Intro to Statistics (Udacity)

Statistics & Data Analysis

Get ready to analyze, visualize, and interpret data! Thought-provoking examples and chances to combine statistics and programming will keep you engaged and challenged.

Self Paced

Self-Paced

Math Algebra Statistics

Coursera

Duke University

Bayesian Statistics (Coursera)

Statistics & Data Analysis Data Science

This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes’ rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm.

Aug 17th 2026

5-12 Weeks

Statistics Data Analysis R Programming

Coursera

University of Cape Town

Julia Scientific Programming (Coursera)

Statistics & Data Analysis Data Science

This four-module course introduces users to Julia as a first language. Julia is a high-level, high-performance dynamic programming language developed specifically for scientific computing. This language will be particularly useful for applications in physics, chemistry, astronomy, engineering, data science, bioinformatics and many more.

Aug 17th 2026

4 Weeks

IPython Data Analysis Computer Programming

Coursera

Universidad Austral

Fundamentos de Excel para Negocios (Coursera)

Statistics & Data Analysis Data Science

Cuando finalices este curso habrás logrado un gran número de habilidades como introducir información, ordenarla, manipularla, realizar cálculos de diversa índole (matemáticos, trigonométricos, estadísticos, financieros, ingenieriles, probabilísticos), extraer conclusiones, trabajar con fechas y horas, construir gráficos, imprimir reportes y muchas más.

Aug 17th 2026

5-12 Weeks

Business Excel Data Analysis

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 2 (Coursera)

Statistics & Data Analysis Data Science

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Aug 17th 2026

4 Weeks

Math Statistics Probability

Coursera

Johns Hopkins University

Advanced Reproducibility in Cancer Informatics (Coursera)

Statistics & Data Analysis Data Science

This course introduces tools that help enhance reproducibility and replicability in the context of cancer informatics. It uses hands-on exercises to demonstrate in practical terms how to get acquainted with these tools but is by no means meant to be a comprehensive dive into these tools. The course introduces tools and their concepts such as git and GitHub, code review, Docker, and GitHub actions.

Aug 17th 2026

5-12 Weeks

Github Data Analysis R Language

Udacity

Udacity,Insight

Spark (Udacity)

Data Science

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Self Paced

Self-Paced

Python Debugging Machine Learning

Udacity

Udacity,Twitter

Real-Time Analytics with Apache Storm (Udacity)

Statistics & Data Analysis Data Science

The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream, in real-time to drive d3 visualizations using Apache Storm, the "Hadoop of Real Time." Storm is free, open source, and fun to use! Learn from Karthik Ramasamy, about the distributed, fault-tolerant, and flexible technology used to power Twitter’s real-time data flow pipeline. Twitter open sourced Storm in 2011, and it graduated to a top-level Apache project in September, 2014.

Self Paced

Self-Paced

Data Analysis Hadoop Data Science