Apr 3rd 2017

Managing Big Data with R and Hadoop (FutureLearn)

Learn how to manage and analyse big data using the R programming language and Hadoop programming framework. This online course will introduce you to various high performance computing (HPC) facilities for big data analysis. This includes R – a programming language renowned for its simplicity, elegance and community support – and Hadoop – an open source, Java-based programming framework for large data sets.

You will find out how to use them, avoiding common pitfalls and saving you time and money.

What topics will you cover?

- First steps in R and RStudio

- Working with Apache Hadoop 1 – Fundamentals

- Working with Apache Hadoop 2 – RHadoop

- Statistical learning using RHadoop

By the end of the course, you will:

- Understand how the performance of modern supercomputing is achieved

- Understand the basic functionality of the Bash terminal window

- Understand the basic functionality of Apache Hadoop for scalable, distributed computing

- Understand the basic functionality of RHadoop

- Understand the basic problems of supervised and unsupervised learning

- Perform basic clustering, regression and classification with RHadoop.