Big Data: capstone project (Coursera)

Big Data: capstone project (Coursera)

En este último curso de la Especialización Big Data el estudiante tendrá la oportunidad de aplicar algunas de las herramientas y métodos aprendidos en los cursos anteriores en un caso práctico. El objetivo de este Capstone Project es mostrar un ejemplo del trabajo que se realiza diariamente en el departamento de Cosmología del Port d’Informació Científica, en Barcelona.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Se trata de crear un clasificador para imágenes de galaxias, a partir de datos del proyecto GalaxyZoo e imágenes y datos del telescopio Sloan Digital Sky Survey. Los trabajos y ejercicios guiados llevarán al estudiante a la exploración y analisis de estos datos, hasta realizar una herramienta automática de Machine Learning.
El proceso seguido por los estudiantes en este curso se podría aplicar en cualquier otra disciplina, por ejemplo en las ciencias sociales, en un estudio de mercado o en cualquier ámbito que comporte toma de decisiones a partir de un gran volumen de datos.
Course 5 of 5 in the Big Data – Introducción al uso práctico de datos masivos Specialization.

Syllabus

WEEK 1
Introducción
La máquina Virtual
ATENCIÓN: Si ya te instalaste la máquina virtual en el curso anterior de la Especialización no es necesario que vuelvas a hacerlo. En caso contrario, en este apartado te explicamos cómo descargar e instalar dicha máquina virtual en tu ordenador. La MV-Cloudera requiere disponer de un equipo con las siguientes características: (1) máquina de 64 bits, (2) mínimo 6G de memoria (recomendable 8G), y (3) 20G disponibles en disco. Ten en cuenta que bajar e instalar la máquina virtual te llevará tiempo dado el tamaño y complejidad de la misma
Exploración de datos
En esta semana vamos a conocer el proyecto y a hacer una primera exploración de algunos de los datos con los que iremos trabajando. Nos familiarizamos con el contenido de estos ficheros y haremos el trabajo preliminar para poderlo luego aplicar a grandes volumenes de datos.

WEEK 2
Modelo de Datos
En esta semana aprenderemos a cargar los datos en Hive, construir su modelo de datos y entender la tarea de clasificar una galaxia según su forma.

WEEK 3
Clasificación
Esta semana vamos a normalizar un modelo de datos, estudiaremos con profundidad los votos que nos han proporcionado los usuarios y generaremos la información necesaria para construir un clasificador automàtico.

WEEK 4
Machine Learning
Esta semana introduciremos el dataset de imágenes galácticas y prepararemos dos algoritmos de Inteligencia Artificial para la clasificación automática de galaxias a partir de una imagen.

WEEK 5
Trabajo Final
Es el momento de preparar el informe final con el trabajo realizado hasta ahora. Necesitaréis tener a mano los trabajos realizados las semanas anteriores.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 8th 2026
5-12 Weeks
Introduction to Recommender Systems: Non-Personalized and Content-Based (Coursera) Coursera
University of Minnesota

Introduction to Recommender Systems: Non-Personalized and Content-Based (Coursera)

This course, which is designed to serve as the first course in the Recommender Systems specialization, introduces the concept of recommender systems, reviews several examples in detail, and leads you through non-personalized recommendation using summary statistics and product associations, basic stereotype-based or demographic recommendations, and content-based filtering recommendations.

Jun 8th 2026
4 Weeks
Introduction to Artificial Intelligence (AI) (Coursera) Coursera
IBM

Introduction to Artificial Intelligence (AI) (Coursera)

In this course you will learn what Artificial Intelligence (AI) is, explore use cases and applications of AI, understand AI concepts and terms like machine learning, deep learning and neural networks. You will be exposed to various issues and concerns surrounding AI such as ethics and bias, & jobs, and get advice from experts about learning and starting a career in AI. You will also demonstrate AI in action with a mini project.

Jun 8th 2026
4 Weeks
Device-based Models with TensorFlow Lite (Coursera) Coursera
DeepLearning.AI

Device-based Models with TensorFlow Lite (Coursera)

Bringing a machine learning model into the real world involves a lot more than just modeling. This Specialization will teach you how to navigate various deployment scenarios and use data more effectively to train your model. This second course teaches you how to run your machine learning models in mobile applications. You’ll learn how to prepare models for a lower-powered, battery-operated devices, then execute models on both Android and iOS platforms. Finally, you’ll explore how to deploy on embedded systems using TensorFlow on Raspberry Pi and microcontrollers.

Jun 8th 2026
4 Weeks
Sequence Models (Coursera) Coursera
DeepLearning.AI

Sequence Models (Coursera)

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.

Jun 8th 2026
3 Weeks
Browser-based Models with TensorFlow.js (Coursera) Coursera
DeepLearning.AI

Browser-based Models with TensorFlow.js (Coursera)

Bringing a machine learning model into the real world involves a lot more than just modeling. This Specialization will teach you how to navigate various deployment scenarios and use data more effectively to train your model. In this first course, you’ll train and run machine learning models in any browser using TensorFlow.js. You’ll learn techniques for handling data in the browser, and at the end you’ll build a computer vision project that recognizes and classifies objects from a webcam.

Jun 8th 2026
4 Weeks
Recommender Systems: Evaluation and Metrics (Coursera) Coursera
University of Minnesota

Recommender Systems: Evaluation and Metrics (Coursera)

In this course you will learn how to evaluate recommender systems. You will gain familiarity with several families of metrics, including ones to measure prediction accuracy, rank accuracy, decision-support, and other factors such as diversity, product coverage, and serendipity. You will learn how different metrics relate to different user goals and business goals.

Jun 8th 2026
4 Weeks
Machine Learning: Clustering & Retrieval (Coursera) Coursera
University of Washington

Machine Learning: Clustering & Retrieval (Coursera)

Case Studies: Finding Similar Documents. A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?

Jun 8th 2026
5-12 Weeks
Practical Predictive Analytics: Models and Methods (Coursera) Coursera
University of Washington

Practical Predictive Analytics: Models and Methods (Coursera)

Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Jun 8th 2026
4 Weeks