Big Data: adquisición y almacenamiento de datos (Coursera)

Big Data: adquisición y almacenamiento de datos (Coursera)

¿Estás interesado en tener un conocimiento más detallado sobre las herramientas y aplicaciones Big Data? En este curso aprenderás los principios para comprender la terminología, conceptos básicos y herramientas más importantes para resolver problemas de análisis de datos enfocándonos en los problemas y las aplicaciones. El objetivo es proporcionar una visión de sistema para entender los retos más importantes que nos encontramos cuando trabajamos en entornos con grandes volúmenes de datos.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

En el curso se plantea una introducción a diversas herramientas utilizadas de forma común en la comunidad como Hadoop, Spark o Hive y tendrás que resolver diferentes retos de análisis de datos mediante su uso.
Al terminar el curso habrás adquirido conocimientos sobre el ecosistema de herramientas Big Data incluyendo ejemplos de uso con problemas industriales y científicos. Tendrás una serie de recursos sobre cómo un análisis a realizar se traduce en una serie de operaciones de recolección de datos, monitorización, almacenamiento, análisis y creación de informes sobre los resultados obtenidos. También adquirirás un criterio para elegir cuál es la herramienta más adecuada para resolver un cierto problema de análisis de datos a partir de los requerimientos de uso de las herramientas.
El curso está orientado tanto a estudiantes universitarios de primeros cursos de estudios universitarios relacionados con la informática, la ingeniería o las matemáticas, como a otros estudiantes con conocimientos de programación, interesados en aprender cómo utilizar de análisis de datos con herramientas de código abierto. Para realizar los ejercicios es necesario utilizar una máquina virtual que deberá ser instalada en tu ordenador.
Course 2 of 5 in the Big Data – Introducción al uso práctico de datos masivos Specialization.

Syllabus

WEEK 1
Introducción
La máquina virtual
A lo largo de estos cursos vamos a trabajar con un conjunto de herramientas contenidas en la máquina virtual Cloudera. En este apartado te explicamos cómo descargar e instalar dicha máquina virtual en tu ordenador. La MV-Cloudera requiere disponer de un equipo con las siguientes características: (1) máquina de 64 bits, (2) mínimo 6G de memoria (recomendable 8G), y (3) 20G disponibles en disco.
Ten en cuenta que bajar e instalar la máquina virtual te llevará tiempo dado el tamaño y complejidad de la misma
Introducción al ecosistema Apache Hadoop
En este módulo se van a introducir los conceptos básicos sobre el uso de Apache Hadoop y su utilización para plantear análisis de grandes conjuntos de datos. Se van a presentar las herramientas principales y la arquitectura del sistema.Visualiza los vídeos, contesta el cuestionario tantas veces como quieras, realiza el ejercicio práctico sobre Hadoop y HDFS, y accede a los foros para discutir los temas que te parezcan más interesantes.

WEEK 2
Tecnologías SQL y NoSQL. Consistencia, fiabilidad y escalabilidad
En este módulo se introducen conceptos básicos sobre la naturaleza de los datos a tratar y de qué forma los sistemas NoSQL se diferencian de las bases de datos relacionales. Se presenta el teorema CAP y se muestra su importancia en el contexto de los sistemas distribuidos. Finalmente, se muestran una serie de sistemas junto con su uso en la industria actual. Visualiza los vídeos, contesta el cuestionario tantas veces como quieras, y accede a los foros para discutir los temas que te parezcan más interesantes.

WEEK 3
Adquisición de datos
En este módulo se presentan los desafíos que hay que resolver a la hora de incorporar datos a los sistemas NoSQL y una breve introducción a las herramientas asociadas al ecosistema Hadoop más importantes. Visualiza los vídeos, contesta el cuestionario tantas veces como quieras, realiza el ejercicio práctico sobre Apache Scoop, y accede a los foros para discutir los temas que te parezcan más interesantes.

WEEK 4
Herramientas para el análisis de datos industrial
En este módulo se presenta el análisis industrial de grandes volúmenes de datos y se introducen una serie de herramientas y sistemas de segunda generación dedicados a resolver necesidades específicas de la industria.Visualiza los vídeos, contesta el cuestionario tantas veces como quieras, realiza los ejercicios prácticos sobre Apache Hive y Sparck, y accede a los foros para discutir los temas que te parezcan más interesantes.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 1st 2026
4 Weeks
Process Data from Dirty to Clean (Coursera) Coursera
Google

Process Data from Dirty to Clean (Coursera)

This is the fourth course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. In this course, you’ll continue to build your understanding of data analytics and the concepts and tools that data analysts use in their work. You’ll learn how to check and clean your data using spreadsheets and SQL as well as how to verify and report your data cleaning results. Current Google data analysts will continue to instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources.

Jun 2nd 2026
5-12 Weeks
A Crash Course in Data Science (Coursera) Coursera
Johns Hopkins University

A Crash Course in Data Science (Coursera)

By now you have definitely heard about data science and big data. In this one-week class, we will provide a crash course in what these terms mean and how they play a role in successful organizations. This class is for anyone who wants to learn what all the data science action is about, including those who will eventually need to manage data scientists. The goal is to get you up to speed as quickly as possible on data science without all the fluff. We've designed this course to be as convenient as possible without sacrificing any of the essentials.

Jun 1st 2026
1 Week
Exploratory Data Analysis (Coursera) Coursera
Johns Hopkins University

Exploratory Data Analysis (Coursera)

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 1st 2026
4 Weeks
Cloud Computing Concepts: Part 2 (Coursera) Coursera
University of Illinois at Urbana-Champaign

Cloud Computing Concepts: Part 2 (Coursera)

Cloud computing systems today, whether open-source or used inside companies, are built using a common set of core techniques, algorithms, and design philosophies—all centered around distributed systems. Learn about such fundamental distributed computing "concepts" for cloud computing. Some of these concepts include: Clouds, MapReduce, key-value stores, Classical precursors, Widely-used algorithms, Classical algorithms, Scalability, Trending areas, And more!

Jun 1st 2026
5-12 Weeks
Foundations: Data, Data, Everywhere (Coursera) Coursera
Google

Foundations: Data, Data, Everywhere (Coursera)

This is the first course in the Google Data Analytics Certificate. These courses will equip you with the skills you need to apply to introductory-level data analyst jobs. Organizations of all kinds need data analysts to help them improve their processes, identify opportunities and trends, launch new products, and make thoughtful decisions. In this course, you’ll be introduced to the world of data analytics through hands-on curriculum developed by Google. The material shared covers plenty of key data analytics topics, and it’s designed to give you an overview of what’s to come in the Google Data Analytics Certificate. Current Google data analysts will instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources.

Jun 2nd 2026
5-12 Weeks
Accounting Analytics (Coursera) Coursera
University of Pennsylvania

Accounting Analytics (Coursera)

Accounting Analytics explores how financial statement data and non-financial metrics can be linked to financial performance. In this course, taught by Wharton’s acclaimed accounting professors, you’ll learn how data is used to assess what drives financial performance and to forecast future financial scenarios. While many accounting and financial organizations deliver data, accounting analytics deploys that data to deliver insight, and this course will explore the many areas in which accounting data provides insight into other business areas including consumer behavior predictions, corporate strategy, risk management, optimization, and more.

Jun 1st 2026
4 Weeks
Teaching Impacts of Technology: Workplace of the Future (Coursera) Coursera
University of California, San Diego

Teaching Impacts of Technology: Workplace of the Future (Coursera)

In this course you’ll focus on how the Internet has enabled new careers and changed expectations in traditional work settings, creating a new vision for the workplace of the future. This will be done through a series of paired teaching sections, exploring a specific “Impact of Computing” in your typical day and the “Technologies and Computing Concepts” that enable that impact, all at a K12-appropriate level.

Jun 3rd 2026
4 Weeks
Share Data Through the Art of Visualization (Coursera) Coursera
Google

Share Data Through the Art of Visualization (Coursera)

This is the sixth course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. You’ll learn how to visualize and present your data findings as you complete the data analysis process. This course will show you how data visualizations, such as visual dashboards, can help bring your data to life. You’ll also explore Tableau, a data visualization platform that will help you create effective visualizations for your presentations.

Jun 2nd 2026
4 Weeks
Machine Learning for Data Analysis (Coursera) Coursera
Wesleyan University

Machine Learning for Data Analysis (Coursera)

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering.

Jun 1st 2026
4 Weeks
Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera) Coursera
Icahn School of Medicine at Mount Sinai

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Jun 1st 2026
5-12 Weeks