No doubt working with huge data volumes is hard, but to move a mountain, you have to deal with a lot of small stones. But why strain yourself? Using Mapreduce and Spark you tackle the issue partially, thus leaving some space for high-level tools. Stop struggling to make your big data workflow productive and efficient, make use of the tools we are offering you.
Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.
This course will teach you how to:
- Warehouse your data efficiently using Hive, Spark SQL and Spark DataFframes.
- Work with large graphs, such as social graphs or networks.
- Optimize your Spark applications for maximum performance.
Precisely, you will master your knowledge in:
- Writing and executing Hive & Spark SQL queries;
- Reasoning how the queries are translated into actual execution primitives (be it MapReduce jobs or Spark transformations);
- Organizing your data in Hive to optimize disk space usage and execution times;
- Constructing Spark DataFrames and using them to write ad-hoc analytical jobs easily;
- Processing large graphs with Spark GraphFrames;
- Debugging, profiling and optimizing Spark application performance.
Still in doubt? Check this out. Become a data ninja by taking this course!
Syllabus
WEEK 1: Welcome to the Second Course: Big Data Analysis; Big Data SQL: Hive
WEEK 2: Big Data SQL: Hive (practice week)
WEEK 3: Spark SQL and Spark Dataframe
WEEK 4: Graph Analysis from Big Data Perspective
WEEK 5: PageRank and Recent Advances
WEEK 6: Spark Internals and Optimization