MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
In this course, part of the Big Data MicroMasters program, you will learn how big data is driving organizational change and the key challenges organizations face when trying to analyse massive data sets.
You will learn fundamental techniques, such as data mining and stream processing. You will also learn how to design and implement PageRank algorithms using MapReduce, a programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. You will learn how big data has improved web search and how online advertising systems work.
By the end of this course, you will have a better understanding of the various applications of big data methods in industry and research.
This course is part of the Big Data MicroMasters.
What you'll learn
- Knowledge and application of MapReduce
- Understanding the rate of occurrences of events in big data
- How to design algorithms for stream processing and counting of frequent elements in Big Data
- Understand and design PageRank algorithms
- Understand underlying random walk algorithms
Prerequisites:Candidates interested in pursuing the MicroMasters program in Big Data are advised to complete Programming for Data Science and Computational Thinking and Big Data before undertaking this course.
Course Syllabus
Section 1: The basics of working with big data
Understand the four V’s of Big Data (Volume, Velocity, and Variety)
Build models for data
Understand the occurrence of rare events in random data
Section 2: Web and social networks
Understand characteristics of the web and social networks
Model social networks
Apply algorithms for community detection in networks
Section 3: Clustering big data
Clustering social networks
Apply hierarchical clustering
Apply k-means clustering
Section 4: Google web search
Understand the concept of PageRank
Implement the basic PageRank algorithm for strongly connected graphs
Implement PageRank with taxation for graphs that are not strongly connected
Section 5: Parallel and distributed computing using MapReduce
Understand the architecture for massive distributed and parallel computing
Apply MapReduce using Hadoop
Compute PageRank using MapReduce
Section 6: Computing similar documents in big data
Measure importance of words in a collection of documents
Measure similarity of sets and documents
Apply local sensitivity hashing to compute similar documents
Section 7: Products frequently bought together in stores
Understand the importance of frequent item sets
Design association rules
Implement the A-priori algorithm
Section 8: Movie and music recommendations
Understand the differences of recommendation systems
Design content-based recommendation systems
Design collaborative filtering recommendation systems
Section 9: Google's AdWordsTM System
Understand the AdWords System
Analyse online algorithms in terms of competitive ratio
Use online matching to solve the AdWords problem
Section 10: Mining rapidly arriving data streams
Understand types of queries for data streams
Analyse sampling methods for data streams
Count distinct elements in data streams
Filter data streams
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.