Mar 21st 2016

Processing Big Data with Hadoop in Azure HDInsight (edX)

Created by: Delivered by:
Taught by:

Learn how to use Hadoop technologies in Microsoft Azure HDInsight to process big data in this five week, hands-on course.

More and more organizations are taking on the challenge of analyzing big data. This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis. In this five-week course, you’ll learn how to use technologies like Hive, Pig, Sqoop, and Oozie with Hadoop in HDInsight; and how to orchestrate big data processing from Windows PowerShell.

This course is the first in a series that explores big data and advanced analytics techniques with HDInsight; and focuses on batch processing techniques. Later courses in this series will build on the concepts taught in this course to cover predictive analytics with R and Mahout, and real-time big data processing with Storm and HBase.

Note: To complete the hands-on elements in this course, you will require an Azure subscription and a Windows client computer. You can sign up for a free Azure trial subscription (a valid credit card is required for verification, but you will not be charged for Azure services). Note that the free trial is not available in all regions. It is possible to complete the course and earn a certificate without completing the hands-on practices.

In this course, you’ll learn how to:

- Provision an HDInsight cluster.

- Use PowerShell to manage HDInsight and run data processing jobs.

- Create and query Hive tables.

- Use advanced Hive techniques.

- Process data using Pig.

- Use custom Python user-defined functions from Hive and Pig.

- Transfer data between HDInsight and databases using Sqoop.

- Define and run workflows for data processing using Oozie.


- Familiarity with database concepts and basic SQL query syntax

- Familiarity with programming fundamentals (for example, variable assignment, loops, conditional logic)

- Experience with Microsoft technologies such as Windows and Excel

- Experience with Visual Studio, and Azure is preferable; but not required

- A willingness to learn actively and persevere when troubleshooting technical problems is essential