Intro to Hadoop and MapReduce (Udacity)

Offered by Udacity, Cloudera,
Intro to Hadoop and MapReduce (Udacity)

How to Process Big Data. The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

What You Will Learn

Lesson 1
Big Data

  • What is Big Data?
  • The problems big data creates.
  • How Apache Hadoop addresses these problems.

Lesson 2
HDFS and MapReduce

  • Discover how HDFS distributes data over multiple computers.
  • Learn how MapReduce enables analyzing datasets in parallel across multiple machines.

Lesson 3
MapReduce code

  • Write your own MapReduce code.

Lesson 4
MapReduce Design Patterns

  • Use common patterns for MapReduce programs to analyze Udacity forum data.

What Will you learn:

  • How Hadoop fits into the world (recognize the problems it solves)
  • Understand the concepts of HDFS and MapReduce (find out how it solves the problems)
  • Write MapReduce programs (see how we solve the problems)
  • Practice solving problems on your own

Prerequisites and Requirements
Lesson 1 does not have technical prerequisites and is a good overview of Hadoop and MapReduce for managers.To get the most out of the class, however, you need basic programming skills in Python on a level provided by introductory courses like our Introduction to Computer Science course.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Intro to AJAX (Udacity) Udacity
Udacity

Intro to AJAX (Udacity)

Making Asynchronous Requests with jQuery. In this course you will learn how to make asynchronous requests with JavaScript (using jQuery’s AJAX functionality), and gain a better understanding of what’s actually happening when you do so. You will also learn how to use data APIs so you can take advantage of freely accessible data in your applications, including photo results, news articles and up-to-date data about the world around us.

Self Paced
Self-Paced
Intro to jQuery (Udacity) Udacity
Udacity

Intro to jQuery (Udacity)

Manipulating Websites with Ease. jQuery is the most popular JavaScript library today, in use by over 60% of the top 100,000 most visited websites. This course will teach you how to use jQuery’s core features - DOM element selections, traversal and manipulation. You'll also learn how to read and make sense of jQuery's documentation, making it easy for you to go beyond the methods taught in this class and take advantage of jQuery's full array of features!

Self Paced
Self-Paced
C++ For Programmers (Udacity) Udacity
Udacity

C++ For Programmers (Udacity)

Learn features and constructs for C++. C++ for Programmers is designed for students who are familiar with a programming language and wish to learn C++. This course focuses on 'how' as opposed to 'what'. For example, in the lesson on functions, we do not teach what a function is, but rather how to create a function in C++. The lessons are taught by several different instructors who have used C++ in their professional careers, so students get to experience different perspectives.

Self Paced
Self-Paced
Website Performance Optimization (Udacity) Udacity
Udacity,Google

Website Performance Optimization (Udacity)

The Critical Rendering Path. You will learn how to optimize any website for speed by diving into the details of how mobile and desktop browsers render pages. In this short course, you’ll learn about the Critical Rendering Path, or the set of steps browsers must take to convert HTML, CSS and JavaScript into living, breathing websites. From there, you’ll start exploring and experimenting with tools to measure performance and simple strategies to deliver the first pixels to the screen as early as possible.

Self Paced
Self-Paced
Model Building and Validation (Udacity) Udacity
Udacity

Model Building and Validation (Udacity)

Advanced Techniques for Analyzing Data. This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

Self Paced
Self-Paced
HTML5 Canvas (Udacity) Udacity
Udacity

HTML5 Canvas (Udacity)

From Pixels to Animation! Canvas is an HTML5 element which gives you drawable surface inside your web pages you can control with JavaScript. Powerful enough to use for compositing images and even creating games. In this course, through several sample projects, you’ll learn how to use the canvas; how to make compositions using shapes, images, and text; how to create effects and filters on images and how to create animations.

Self Paced
Self-Paced
Spark (Udacity) Udacity
Udacity,Insight

Spark (Udacity)

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Self Paced
Self-Paced
Data Wrangling with MongoDB (Udacity) Udacity
Udacity,MongoDB University

Data Wrangling with MongoDB (Udacity)

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Self Paced
Self-Paced
Real-Time Analytics with Apache Storm (Udacity) Udacity
Udacity,Twitter

Real-Time Analytics with Apache Storm (Udacity)

The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream, in real-time to drive d3 visualizations using Apache Storm, the "Hadoop of Real Time." Storm is free, open source, and fun to use! Learn from Karthik Ramasamy, about the distributed, fault-tolerant, and flexible technology used to power Twitter’s real-time data flow pipeline. Twitter open sourced Storm in 2011, and it graduated to a top-level Apache project in September, 2014.

Self Paced
Self-Paced