Managing Big Data in Clusters and Cloud Storage (Coursera)

Offered by Cloudera,
Managing Big Data in Clusters and Cloud Storage (Coursera)

In this course, you'll learn how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Apache Hive and Apache Impala. You’ll learn how to choose the right data types, storage systems, and file formats based on which tools you’ll use and what performance you need.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

By the end of the course, you will be able to
• use different tools to browse existing databases and tables in big data systems;
• use different tools to explore files in distributed big data filesystems and cloud storage;
• create and manage big data databases and tables using Apache Hive and Apache Impala; and
• describe and choose among different data types and file formats for big data systems.
To use the hands-on environment for this course, you need to download and install a virtual machine and the software on which to run it. Before continuing, be sure that you have access to a computer that meets the following hardware and software requirements:
• Windows, macOS, or Linux operating system (iPads and Android tablets will not work)
• 64-bit operating system (32-bit operating systems will not work)
• 8 GB RAM or more
• 25GB free disk space or more
• Intel VT-x or AMD-V virtualization support enabled (on Mac computers with Intel processors, this is always enabled;
on Windows and Linux computers, you might need to enable it in the BIOS)
• For Windows XP computers only: You must have an unzip utility such as 7-Zip or WinZip installed (Windows XP’s built-in unzip utility will not work)

What You Will Learn

  • Use different tools to browse existing databases and tables in big data systems
  • Use different tools to explore files in distributed big data filesystems and cloud storage
  • Create and manage big data databases and tables using Apache Hive and Apache Impala
  • Describe and choose among different data types and file formats for big data systems

Course 3 of 3 in the Modern Big Data Analysis with SQL Specialization.

Syllabus

WEEK 1: Orientation to Data in Clusters and Cloud Storage
WEEK 2: Defining Databases, Tables, and Columns
WEEK 3: Data Types and File Types
WEEK 4: Managing Datasets in Clusters and Cloud Storage
WEEK 5: Optimizing Hive and Impala (Honors)

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Graph Analytics for Big Data (Coursera) Coursera
University of California, San Diego

Graph Analytics for Big Data (Coursera)

Want to understand your data network structure and how it changes under different conditions? Curious to know how to identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model, store, retrieve and analyze graph-structured data.

Jun 8th 2026
5-12 Weeks
Programming Mobile Applications for Android Handheld Systems: Part 2 (Coursera) Coursera
University of Maryland, College Park

Programming Mobile Applications for Android Handheld Systems: Part 2 (Coursera)

This course introduces you to the design and implementation of Android applications for mobile devices. You will build upon concepts from the prior course, including handling notifications, using multimedia and graphics and incorporating touch and gestures into your apps.

Jun 8th 2026
5-12 Weeks
Inferential Statistics (Coursera) Coursera
University of Amsterdam

Inferential Statistics (Coursera)

Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population. We will start by considering the basic principles of significance testing: the sampling and test statistic distribution, p-value, significance level, power and type I and type II errors. Then we will consider a large number of statistical tests and techniques that help us make inferences for different types of data and different types of research designs.

Jun 8th 2026
5-12 Weeks
The Importance of Listening (Coursera) Coursera
Northwestern University

The Importance of Listening (Coursera)

In this second MOOC in the Social Marketing Specialization - "The Importance of Listening" - you will go deep into the Big Data of social and gain a more complete picture of what can be learned from interactions on social sites. You will be amazed at just how much information can be extracted from a single post, picture, or video.

Jun 8th 2026
4 Weeks
Business Intelligence Concepts, Tools, and Applications (Coursera) Coursera
University of Colorado System

Business Intelligence Concepts, Tools, and Applications (Coursera)

This is the fourth course in the Data Warehouse for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will gain the knowledge and skills for using data warehouses for business intelligence purposes and for working as a business intelligence developer. You’ll have the opportunity to work with large data sets in a data warehouse environment and will learn the use of MicroStrategy's Online Analytical Processing (OLAP) and Visualization capabilities to create visualizations and dashboards.

Jun 8th 2026
5-12 Weeks
Research Data Management and Sharing (Coursera) Coursera
University of Edinburgh,University of North Carolina

Research Data Management and Sharing (Coursera)

This course will provide learners with an introduction to research data management and sharing. After completing this course, learners will understand the diversity of data and their management needs across the research data lifecycle, be able to identify the components of good data management plans, and be familiar with best practices for working with data including the organization, documentation, and storage and security of data. Learners will also understand the impetus and importance of archiving and sharing data as well as how to assess the trustworthiness of repositories.

Jun 8th 2026
5-12 Weeks
Data Manipulation at Scale: Systems and Algorithms (Coursera) Coursera
University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 8th 2026
4 Weeks
GIS Data Formats, Design and Quality (Coursera) Coursera
University of California, Davis

GIS Data Formats, Design and Quality (Coursera)

In this course, the second in the Geographic Information Systems (GIS) Specialization. What you will learn: design data tables and use separating and joining data in a relational database; write query strings to subset data; create and work with raster data; create web maps.

Jun 8th 2026
4 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 8th 2026
5-12 Weeks
Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 8th 2026
4 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 8th 2026
5-12 Weeks
Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera) Coursera
University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Jun 8th 2026
4 Weeks