Managing Big Data with R and Hadoop (FutureLearn)

Managing Big Data with R and Hadoop (FutureLearn)
Free Course
Categories
Effort
Certification
Languages
This course is designed for people interested in data science, computational statistics and machine learning.
Misc

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Managing Big Data with R and Hadoop (FutureLearn)
Learn how to manage and analyse big data using the R programming language and Hadoop programming framework. This course will give you access to a virtual environment with installations of Hadoop, R and Rstudio to get hands-on experience with big data management. Several unique examples from statistical learning and related R code for map-reduce operations will be available for testing and learning.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Those with basic knowledge in statistical learning and R will better understand the methods behind and how to run them in parallel using map-reduce functions and Hadoop data storage. At the end of the course you will get access to RHadoop on a supercomputer at University of Ljubljana.


Syllabus


Week 1: Welcome to BIG DATA

Week 2: Working with Hadoop

Week 3: First steps in R and RHadoop

Week 4: Statistical learning with RHadoop: clustering

Week 5: Statistical learning with RHadoop: regression and classification


By the end of the course, you will:

- Explore basic functionality of Apache Hadoop and of RHadoop

- Experiment how to achieve performance of modern supercomputing

- Experiment regression, clustering and classification with RHadoop

- Investigate basic functionality of Bash terminal window

- Knowledge about statistical learning to instances of data provided by edcators

- How to do big data management with RHadoop on real supercomputer provided by Universiy of Ljubljana


Who is the course for?

This course is designed for people interested in data science, computational statistics and machine learning and have basic experiences with them. It will be also useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand how to manage big data with Hadoop using R programming language.

We expect that the learners will also have basic experiences with linux and bash and working experiences with R and matrix operations. They should be also capable to download and run virtual machine.



MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Free Course
This course is designed for people interested in data science, computational statistics and machine learning.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.