Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund program. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of single genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. These perturbations are applied to various types of human cells including induced pluripotent stem cells from patients, differentiated into various lineages such as neurons or cardiomyocytes. Then, to better understand the molecular networks that are affected by these perturbations, changes in level of many different variables are measured including: mRNAs, proteins, and metabolites, as well as cellular phenotypic changes such as changes in cell morphology. The BD2K-LINCS Data Coordination and Integration Center (DCIC) is commissioned to organize, analyze, visualize and integrate this data with other publicly available relevant resources.

Syllabus

WEEK 1
The Library of Integrated Network-based Cellular Signatures (LINCS) Program Overview
This module provides an overview of the concept behind the LINCS program; and tutorials on how to get started with using the LINCS L1000 dataset.
Metadata and Ontologies
This module includes a broad high level description of the concepts behind metadata and ontologies and how these are applied to LINCS datasets.
Serving Data with APIs
In this module we explain the concept of accessing data through an application programming interface (API).

WEEK 2
Bioinformatics Pipelines
This module describes the important concept of a Bioinformatics pipeline.
The Harmonizome
This module describes a project that integrates many resources that contain knowledge about genes and proteins.

WEEK 3
Data Normalization
This module describes the mathematical concepts behind data normalization.
Data Clustering
This module describes the mathematical concepts behind data clustering, or in other words unsupervised learning - the identification of patterns within data without considering the labels associated with the data.
Midterm Exam
The Midterm Exam consists of 45 multiple choice questions which covers modules 1-7. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.

WEEK 4
Enrichment Analysis
This module introduces the important concept of performing gene set enrichment analyses. Enrichment analysis is the process of querying gene sets from genomics and proteomics studies against annotated gene sets collected from prior biological knowledge.
Machine Learning
This module describes the mathematical concepts of supervised machine learning, the process of making predictions from examples that associate observations/features/attribute with one or more properties that we wish to learn/predict.

WEEK 5
Benchmarking
This module discusses how Bioinformatics pipelines can be compared and evaluated.
Interactive Data Visualization
This module provides programming examples on how to get started with creating interactive web-based data visualization elements/figures.

WEEK 6
Crowdsourcing Projects
This final module describes opportunities to work on LINCS related projects that go beyond the course.

WEEK 7
Final Exam
The Final Exam consists of 60 multiple choice questions which covers all of the modules of the course. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Python Project for Data Science (Coursera) Coursera
IBM

Python Project for Data Science (Coursera)

This mini-course is intended to for you to demonstrate foundational Python skills for working with data. The completion of this course involves working on a hands-on project where you will develop a simple dashboard using Python. This course is part of the IBM Data Science Professional Certificate and the IBM Data Analytics Professional Certificate.

Jun 4th 2026
1 Week
Bioinformatic Methods I (Coursera) Coursera
University of Toronto

Bioinformatic Methods I (Coursera)

Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.

Jun 1st 2026
5-12 Weeks
Internet History, Technology, and Security (Coursera) Coursera
University of Michigan

Internet History, Technology, and Security (Coursera)

The impact of technology and networks on our lives, culture, and society continues to increase. The very fact that you can take this course from anywhere in the world requires a technological infrastructure that was designed, engineered, and built over the past sixty years. To function in an information-centric world, we need to understand the workings of network technology. This course will open up the Internet and show you how it was created, who created it and how it works. Along the way we will meet many of the innovators who developed the Internet and Web technologies that we use today.

Jun 1st 2026
5-12 Weeks
Data-driven Decision Making (Coursera) Coursera
PwC

Data-driven Decision Making (Coursera)

Welcome to Data-driven Decision Making. In this course, you'll get an introduction to Data Analytics and its role in business decisions. You'll learn why data is important and how it has evolved. You'll be introduced to “Big Data” and how it is used. You'll also be introduced to a framework for conducting Data Analysis and what tools and techniques are commonly used. Finally, you'll have a chance to put your knowledge to work in a simulated business setting. This course was created by PricewaterhouseCoopers LLP with an address at 300 Madison Avenue, New York, New York, 10017.

Jun 1st 2026
4 Weeks
Plant Bioinformatics (Coursera) Coursera
University of Toronto

Plant Bioinformatics (Coursera)

The past 15 years have been exciting ones in plant biology. Hundreds of plant genomes have been sequenced, RNA-seq has enabled transcriptome-wide expression profiling, and a proliferation of "-seq"-based methods has permitted protein-protein and protein-DNA interactions to be determined cheaply and in a high-throughput manner. These data sets in turn allow us to generate hypotheses at the click of a mouse.

Jun 1st 2026
5-12 Weeks
Machine Learning for Accounting with Python (Coursera) Coursera
University of Illinois at Urbana-Champaign

Machine Learning for Accounting with Python (Coursera)

This course, Machine Learning for Accounting with Python, introduces machine learning algorithms (models) and their applications in accounting problems. It covers classification, regression, clustering, text analysis, time series analysis. It also discusses model evaluation and model optimization. This course provides an entry point for students to be able to apply proper machine learning models on business related datasets with Python to solve various problems.

Jun 1st 2026
5-12 Weeks
Machine Learning with Python (Coursera) Coursera
IBM

Machine Learning with Python (Coursera)

This course dives into the basics of machine learning using an approachable, and well-known programming language, Python. In this course, we will be reviewing two main components: First, you will be learning about the purpose of Machine Learning and where it applies to the real world. Second, you will get a general overview of Machine Learning topics such as supervised vs unsupervised learning, model evaluation, and Machine Learning algorithms.

Jun 1st 2026
5-12 Weeks
Teaching Impacts of Technology: Data Collection, Use, and Privacy (Coursera) Coursera
University of California, San Diego

Teaching Impacts of Technology: Data Collection, Use, and Privacy (Coursera)

In this course you’ll focus on how constant data collection and big data analysis have impacted us, exploring the interplay between using your data and protecting it, as well as thinking about what it could do for you in the future. This will be done through a series of paired teaching sections, exploring a specific “Impact of Computing” in your typical day and the “Technologies and Computing Concepts” that enable that impact, all at a K12-appropriate level.

Jun 3rd 2026
4 Weeks
Google Cloud Platform Fundamentals: Core Infrastructure (Coursera) Coursera
Google

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera)

This course introduces you to important concepts and terminology for working with Google Cloud Platform (GCP). You learn about, and compare, many of the computing and storage services available in Google Cloud Platform, including Google App Engine, Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, Google Cloud SQL, and BigQuery. You learn about important resource and policy management tools, such as the Google Cloud Resource Manager hierarchy and Google Cloud Identity and Access Management. Hands-on labs give you foundational skills for working with GCP.

Jun 1st 2026
1 Week
Introduction to Graph Theory (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Introduction to Graph Theory (Coursera)

We invite you to a fascinating journey into Graph Theory — an area which connects the elegance of painting and the rigor of mathematics; is simple, but not unsophisticated. Graph Theory gives us, both an easy way to pictorially represent many major mathematical results, and insights into the deep theories behind them. In this course, among other intriguing applications, we will see how GPS systems find shortest routes, how engineers design integrated circuits, how biologists assemble genomes, why a political map can always be colored using a few colors. We will study Ramsey Theory which proves that in a large system, complete disorder is impossible!

Jun 1st 2026
5-12 Weeks
Genome Assembly Programming Challenge (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Genome Assembly Programming Challenge (Coursera)

In Spring 2011, thousands of people in Germany were hospitalized with a deadly disease that started as food poisoning with bloody diarrhea and often led to kidney failure. It was the beginning of the deadliest outbreak in recent history, caused by a mysterious bacterial strain that we will refer to as E. coli X. Soon, German officials linked the outbreak to a restaurant in Lübeck, where nearly 20% of the patrons had developed bloody diarrhea in a single week. At this point, biologists knew that they were facing a previously unknown pathogen and that traditional methods would not suffice – computational biologists would be needed to assemble and analyze the genome of the newly emerged pathogen.

Jun 1st 2026
3 Weeks