Introduction to Reproducibility in Cancer Informatics (Coursera)

Introduction to Reproducibility in Cancer Informatics (Coursera)

The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research and have not had training in reproducibility tools and methods.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

This course is written for individuals who:

  • Have some familiarity with R or Python - have written some scripts.
  • Have not had formal training in computational methods.
  • Have limited or no familiar with GitHub, Docker, or package management tools.

Motivation
Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (BeaulieuJones et al, 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively.
Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. Reproducible analyses are more likely to be understood, applied, and replicated by others. This helps expedite the scientific process by helping researchers avoid false positive dead ends. Open source clarity in reproducible methods also saves researchers' time so they don't have to reinvent the proverbial wheel for methods that everyone in the field is already performing.
Curriculum
This course introduces the concepts of reproducibility and replicability in the context of cancer informatics. It uses hands-on exercises to demonstrate in practical terms how to increase the reproducibility of data analyses. The course also introduces tools relevant to reproducibility including analysis notebooks, package managers, git and GitHub.
The course includes hands-on exercises for how to apply reproducible code concepts to their code. Individuals who take this course are encouraged to complete these activities as they follow along with the course material to help increase the reproducibility of their analyses.
Goal of this course:
Equip learners with reproducibility skills they can apply to their existing analyses scripts and projects. This course opts for an "ease into it" approach. We attempt to give learners doable, incremental steps to increase the reproducibility of their analyses.
What is not the goal
This course is meant to introduce learners to the reproducibility tools, but _it does not necessarily represent the absolute end-all, be-all best practices for the use of these tools_. In other words, this course gives a starting point with these tools, but not an ending point. The advanced version of this course is the next step toward incrementally "better practices".
How to use the course
This course is designed with busy professional learners in mind -- who may have to pick up and put down the course when their schedule allows.
Each exercise has the option for you to continue along with the example files as you've been editing them in each chapter, OR you can download fresh chapter files that have been edited in accordance with the relative part of the course. This way, if you decide to skip a chapter or find that your own files you've been working on no longer make sense, you have a fresh starting point at each exercise.

What You Will Learn

  • Create reproducible data analyses
  • Apply reproducibility skills to existing analyses scripts and projects

Syllabus

WEEK 1
Introduction to this Course
In this first section, we will discuss the goals of this course and define what we mean by reproducibility.
Organizing your project
In this section we discuss motivation and strategies for project organization.

WEEK 2
Using notebooks
In this section we discuss the motivation for using notebooks and integrated development environments to enhance the reproducibility of your project.
Making your project open source with GitHub
In this section we will describe how GitHub can make a project open source and encourage reproducibility.

WEEK 3
Managing package versions
In this section we discuss two strategies for managing package versions in a project.

WEEK 4
Writing durable code
In this section we discuss aspects of code that can make it more durable to enhance the reproducibility of a project.

WEEK 5
Code review
This section discusses the importance of code review for creating reproducible analyses.
Documenting analysis
This section discusses how to document analyses to enhance their reproducibility.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 1st 2026
5-12 Weeks
Data Science Companion (Coursera) Coursera
MathWorks

Data Science Companion (Coursera)

The Data Science Companion provides an introduction to data science. You will gain a quick background in data science and core machine learning concepts, such as regression and classification. You’ll be introduced to the practical knowledge of data processing and visualization using low-code solutions, as well as an overview of the ways to integrate multiple tools effectively to solve data science problems.

Jun 5th 2026
4 Weeks
Data-driven Decision Making (Coursera) Coursera
PwC

Data-driven Decision Making (Coursera)

Welcome to Data-driven Decision Making. In this course, you'll get an introduction to Data Analytics and its role in business decisions. You'll learn why data is important and how it has evolved. You'll be introduced to “Big Data” and how it is used. You'll also be introduced to a framework for conducting Data Analysis and what tools and techniques are commonly used. Finally, you'll have a chance to put your knowledge to work in a simulated business setting. This course was created by PricewaterhouseCoopers LLP with an address at 300 Madison Avenue, New York, New York, 10017.

Jun 1st 2026
4 Weeks
Basic Data Descriptors, Statistical Distributions, and Application to Business Decisions (Coursera) Coursera
Rice University

Basic Data Descriptors, Statistical Distributions, and Application to Business Decisions (Coursera)

The abilities to understand and apply Business Statistics are becoming increasingly important in the industry. A good understanding of Business Statistics is a requirement to make correct and relevant interpretations of data. Lack of knowledge could lead to erroneous decisions which could potentially have negative consequences for a firm. This course is designed to introduce you to Business Statistics. We begin with the notion of descriptive statistics, which is summarizing data using a few numbers.

Jun 1st 2026
4 Weeks
Fundamentals of GIS (Coursera) Coursera
University of California, Davis

Fundamentals of GIS (Coursera)

Explore the world of spatial analysis and cartography with geographic information systems (GIS). What you will learn: define core geospatial concepts; practice with subset data using selections and feature attributes; create map books using advanced mapping techniques; create layer and map packages.

Jun 1st 2026
4 Weeks
Bioinformatic Methods I (Coursera) Coursera
University of Toronto

Bioinformatic Methods I (Coursera)

Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.

Jun 1st 2026
5-12 Weeks
A Crash Course in Data Science (Coursera) Coursera
Johns Hopkins University

A Crash Course in Data Science (Coursera)

By now you have definitely heard about data science and big data. In this one-week class, we will provide a crash course in what these terms mean and how they play a role in successful organizations. This class is for anyone who wants to learn what all the data science action is about, including those who will eventually need to manage data scientists. The goal is to get you up to speed as quickly as possible on data science without all the fluff. We've designed this course to be as convenient as possible without sacrificing any of the essentials.

Jun 1st 2026
1 Week
Data Science in Real Life (Coursera) Coursera
Johns Hopkins University

Data Science in Real Life (Coursera)

Have you ever had the perfect data science experience? The data pull went perfectly. There were no merging errors or missing data. Hypotheses were clearly defined prior to analyses. Randomization was performed for the treatment of interest. The analytic plan was outlined prior to analysis and followed exactly. The conclusions were clear and actionable decisions were obvious. Has that every happened to you? Of course not. Data analysis in real life is messy. How does one manage a team facing real data analyses? In this one-week course, we contrast the ideal with what happens in real life. By contrasting the ideal, you will learn key concepts that will help you manage real life analyses.

Jun 1st 2026
1 Week
Share Data Through the Art of Visualization (Coursera) Coursera
Google

Share Data Through the Art of Visualization (Coursera)

This is the sixth course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. You’ll learn how to visualize and present your data findings as you complete the data analysis process. This course will show you how data visualizations, such as visual dashboards, can help bring your data to life. You’ll also explore Tableau, a data visualization platform that will help you create effective visualizations for your presentations.

Jun 2nd 2026
4 Weeks
Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 1st 2026
4 Weeks
Dealing With Missing Data (Coursera) Coursera
University of Maryland, College Park

Dealing With Missing Data (Coursera)

This course will cover the steps used in weighting sample surveys, including methods for adjusting for nonresponse and using data external to the survey for calibration. Among the techniques discussed are adjustments using estimated response propensities, poststratification, raking, and general regression estimation. Alternative techniques for imputing values for missing items will be discussed. For both weighting and imputation, the capabilities of different statistical software packages will be covered, including R®, Stata®, and SAS®.

Jun 1st 2026
4 Weeks
Analyze Data to Answer Questions (Coursera) Coursera
Google

Analyze Data to Answer Questions (Coursera)

This is the fifth course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. In this course, you’ll explore the “analyze” phase of the data analysis process. You’ll take what you’ve learned to this point and apply it to your analysis to make sense of the data you’ve collected. You’ll learn how to organize and format your data using spreadsheets and SQL to help you look at and think about your data in different ways. You’ll also find out how to perform complex calculations on your data to complete business objectives.

Jun 2nd 2026
4 Weeks