Learn the theory and gain hands-on experience of big data systems, using Spark as the exemplary platform. Big data systems such as Hadoop and Spark emerge as enabling technologies in managing massive amounts of data across hundreds or even thousands of computing nodes. Meanwhile, cloud computing platforms have made these technologies easily accessible to individuals as well as large enterprises.
Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.
This course exposes students to both the theory and hands-on experience of big data systems, using Spark as the exemplary platform.
What you'll learn
- Spark programming using both RDD and DataFrame APIs
- Useful packages including ML, GraphX/GraphFrames, and SparkStreaming
- Spark internals and performance optimizations
- Algorithm design for big data systems
Syllabus
Week 1: Overview, MapReduce, and Hadoop
Week 2-3: Spark Basics and RDD
Week 4: SparkSQL and MLib
Week 5: Spark internals
Week 6: Algorithm design for big data
Week 7: GraphX/GraphFrames
Week 8: Spark Streaming