Coursera

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Offered by University of Illinois at Urbana-Champaign,

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

We start the first week by introducing some major systems for data analysis including Spark and the major frameworks and distributions of analytics applications including Hortonworks, Cloudera, and MapR. By the middle of week one we introduce the HDFS distributed and robust file system that is used in many applications like Hadoop and finish week one by exploring the powerful MapReduce programming model and how distributed operating systems like YARN and Mesos support a flexible and scalable environment for Big Data analytics.
In week two, our course introduces large scale data storage and the difficulties and problems of consensus in enormous stores that use quantities of processors, memories and disks. We discuss eventual consistency, ACID, and BASE and the consensus algorithms used in data centers including Paxos and Zookeeper. Our course presents Distributed Key-Value Stores and in memory databases like Redis used in data centers for performance. Next we present NOSQL Databases. We visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then again we show how Spark SQL can program SQL queries on huge data. We finish up week two with a presentation on Distributed Publish/Subscribe systems using Kafka, a distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form complex systems. Week three moves to fast data real-time streaming and introduces Storm technology that is used widely in industries such as Yahoo. We continue with Spark Streaming, Lambda and Kappa architectures, and a presentation of the Streaming Ecosystem. Week four focuses on Graph Processing, Machine Learning, and Deep Learning. We introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX. Then we move to machine learning with examples from Mahout and Spark. Kmeans, Naive Bayes, and fpm are given as examples. Spark ML and Mllib continue the theme of programmability and application construction. The last topic we cover in week four introduces Deep Learning technologies including Theano, Tensor Flow, CNTK, MXnet, and Caffe on Spark.

Course 4 of 6 in the Cloud Computing Specialization.

Syllabus

WEEK 1
Course Orientation
You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
Spark, Hortonworks, HDFS, CAP
In Module 1, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm.

WEEK 2
Large Scale Data Storage
In this module, you will learn about large scale data storage technologies and frameworks. We start by exploring the challenges of storing large data in distributed systems. We then discuss in-memory key/value storage systems, NoSQL distributed databases, and distributed publish/subscribe queues.

WEEK 3
Streaming Systems
This module introduces you to real-time streaming systems, also known as Fast Data. We talk about Apache Storm in length, Apache Spark Streaming, and Lambda and Kappa architectures. Finally, we contrast all these technologies as a streaming ecosystem.

WEEK 4
Graph Processing and Machine Learning
In this module, we discuss the applications of Big Data. In particular, we focus on two topics: graph processing, where massive graphs (such as the web graph) are processed for information, and machine learning, where massive amounts of data are used to train models such as clustering algorithms and frequent pattern mining. We also introduce you to deep learning, where large data sets are used to train neural networks with effective results.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 2 (Coursera)

Statistics & Data Analysis Data Science

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Aug 17th 2026

4 Weeks

Math Statistics Probability

Coursera

University of Cape Town

Julia Scientific Programming (Coursera)

Statistics & Data Analysis Data Science

This four-module course introduces users to Julia as a first language. Julia is a high-level, high-performance dynamic programming language developed specifically for scientific computing. This language will be particularly useful for applications in physics, chemistry, astronomy, engineering, data science, bioinformatics and many more.

Aug 17th 2026

4 Weeks

IPython Data Analysis Computer Programming

Coursera

University of Melbourne

Moving to the Cloud (Coursera)

Management & Leadership Business

The cloud is taking business by storm. In fact, due to the extraordinary growth of the cloud, it has been described as a tornado, not a fluffy white floating object! Commercial research analysts consider cloud as one of the most significant trends with a potential to change the whole global IT industry. Governments, including those in the US, Britain, and Australia, have cloud-first policies now in place which mandate cloud over non-cloud services in ICT procurement.

Aug 24th 2026

5-12 Weeks

Management Cloud Communication

Coursera

Rice University

Business Statistics and Analysis Capstone (Coursera)

Statistics & Data Analysis Data Science

The Business Statistics and Analysis Capstone is an opportunity to apply various skills developed across the four courses in the specialization to a real life data. The Capstone, in collaboration with an industry partner uses publicly available ‘Housing Data’ to pose various questions typically a client would pose to a data analyst. Your job is to do the relevant statistical analysis and report your findings in response to the questions in a way that anyone can understand.

Aug 17th 2026

4 Weeks

Data Analysis Microsoft Excel Business Analysis

Coursera

University of Illinois at Urbana-Champaign

Infonomics I: Business Information Economics and Data Monetization (Coursera)

Business

Thriving in the Information Age compels organizations to deploy information as an actual business asset, not as an IT asset or merely as a business byproduct. This demands creativity in conceiving and implementing new ways to generate economic benefits from the wide array of information assets available to an organization. Unfortunately, information too frequently is underappreciated and therefore underutilized.

Aug 17th 2026

4 Weeks

Business Economics Big Data

Coursera

IBM

Machine Learning Rapid Prototyping with IBM Watson Studio (Coursera)

Data Science

An emerging trend in AI is the availability of technologies in which automation is used to select a best-fit model, perform feature engineering and improve model performance via hyperparameter optimization. This automation will provide rapid-prototyping of models and allow the Data Scientist to focus their efforts on applying domain knowledge to fine-tune models. This course will take the learner through the creation of an end-to-end automated pipeline built by Watson Studio’s AutoAI experiment tool, explaining the underlying technology at work as developed by IBM Research.

Aug 17th 2026

4 Weeks

Python Artificial Intelligence Prototyping

Coursera

The Chinese University of Hong Kong

Structural Equation Model and its Applications | 结构方程模型及其应用 (普通话) (Coursera)

Statistics & Data Analysis

在社会学、心理学、教育学、经济学、管理学、市场学等研究领域的数据分析中，结构方程建模是当前最前沿的统计方法中应用最广、研究最多的一个。它包含了方差分析、回归分析、路径分析和因子分析，弥补了传统回归分析和因子分析的不足，可以分析多因多果的联系、潜变量的关系，

Aug 17th 2026

5-12 Weeks

Data Analysis LISREL Regression Analysis

Coursera

University of North Texas

Research Design: Inquiry and Discovery (Coursera)

Personal and Professional Development

The main purpose of this course is to focus on good questions and how to answer them. This is essential to making considered decisions as a leader in any organization or in your life overall. Topics will include the basis of human curiosity, development of questions, connections between questions and approaches to information gathering design, variable measurement, sampling, the differences between experimental and non-experimental designs, data analysis, reporting and the ethics of inquiry projects.

Aug 17th 2026

4 Weeks

Data Analysis Research Research Design

Coursera

University of Colorado Boulder

Deep Learning Applications for Computer Vision (Coursera)

Data Science

This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics.

Aug 17th 2026

5-12 Weeks

Machine Learning Neural Networks Computer Vision

Coursera

Stanford University

Probabilistic Graphical Models 2: Inference (Coursera)

Statistics & Data Analysis Data Science

Probabilistic graphical models (PGMs) are a rich framework for encoding probability distributions over complex domains: joint (multivariate) distributions over large numbers of random variables that interact with each other. These representations sit at the intersection of statistics and computer science, relying on concepts from probability theory, graph algorithms, machine learning, and more.

Aug 17th 2026

5-12 Weeks

Machine Learning PGM Inference

Coursera

University of Minnesota

Interprofessional Healthcare Informatics (Coursera)

Statistics & Data Analysis Data Science

Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. We will be incorporating technology-enabled educational innovations to bring the subject matter to life. Over the 10 modules, we will create a vital online learning community and a working healthcare informatics network.

Aug 17th 2026

5-12 Weeks

Healthcare Informatics Telehealth

Coursera

Emory University

Reproducible Templates for Analysis and Dissemination (Coursera)

Statistics & Data Analysis

This course will assist you with recreating work that a previous coworker completed, revisiting a project you abandoned some time ago, or simply reproducing a document with a consistent format and workflow. Incomplete information about how the work was done, where the files are, and which is the most recent version can give rise to many complications.

Aug 17th 2026

5-12 Weeks

Analysis Dynamic Data Analysis