In this class, we will compare DNA from an individual against a reference human genome to find potentially disease-causing mutations. We will also learn how to identify the function of a protein even if it has been bombarded by so many mutations compared to similar proteins with known functions that it has become barely recognizable.
In previous courses in the Specialization, we have discussed how to sequence and compare genomes. This course will cover advanced topics in finding mutations lurking within DNA and proteins.
In the first half of the course, we would like to ask how an individual's genome differs from the "reference genome" of the species. Our goal is to take small fragments of DNA from the individual and "map" them to the reference genome. We will see that the combinatorial pattern matching algorithms solving this problem are elegant and extremely efficient, requiring a surprisingly small amount of runtime and memory.
In the second half of the course, we will learn how to identify the function of a protein even if it has been bombarded by so many mutations compared to similar proteins with known functions that it has become barely recognizable. This is the case, for example, in HIV studies, since the virus often mutates so quickly that researchers can struggle to study it. The approach we will use is based on a powerful machine learning tool called a hidden Markov model.
Finally, you will learn how to apply popular bioinformatics software tools applying hidden Markov models to compare a protein against a related family of proteins.
In this class, we will consider the following two central biological questions (the computational approaches needed to solve them are shown in parentheses):
How Do We Locate Disease-Causing Mutations? (Combinatorial Pattern Matching)
Why Have Biologists Still Not Developed an HIV Vaccine? (Hidden Markov Models)
Graded: How Do We Find Disease-Causing Mutations? (Week 1)
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 1
The Burrows-Wheeler Transform
This week, we will introduce a paradigm called the Burrows-Wheeler transform; after seeing how it can be used in string compression, we will demonstrate that it is also the foundation of modern read-mapping algorithms.
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 2
Speeding Up Burrows-Wheeler Read Mapping
Last week, we saw how the Burrows-Wheeler transform could be applied to multiple pattern matching. This week, we will speed up our algorithm and generalize it to the case that patterns have errors, which models the biological problem of mapping reads with errors to a reference genome.
Graded: How Do We Find Disease-Causing Mutations? (Weeks 2-3)
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 3
Introduction to Hidden Markov Models
This week, we will start examining the case of aligning sequences with many mutations -- such as related genes from different HIV strains -- and see that our problem formulation for sequence alignment is not adequate for highly diverged sequences.
To improve our algorithms, we will introduce a machine-learning paradigm called a hidden Markov model and see how dynamic programming helps us answer questions about these models.
Graded: Stepik Code Challenges for Week 4
Profile HMMs for Sequence Alignment
Last week, we introduced hidden Markov models. This week, we will see how hidden Markov models can be applied to sequence alignment with a profile HMM. We will then consider some advanced topics in this area, which are related to advanced methods that we considered in a previous course for clustering.
Graded: Why Have Biologists Still Not Developed an HIV Vaccine? (Weeks 4-5)
Graded: Stepik Code Challenges for Week 5
Week 6: Bioinformatics Application Challenge
This week brings our Application Challenge, in which we apply the HMM sequence alignment algorithms that we have developed.
Join us on the frontier of bioinformatics and learn how to look for hidden messages in DNA without ever needing to put on a lab coat. In the first half of this course, we'll investigate DNA replication, and ask the question, where in the genome does DNA replication begin? You will learn how to answer this question for many bacteria using straightforward algorithms to look for hidden messages in the genome.
This course begins a series of classes illustrating the power of computing in modern biology. Please join us on the frontier of bioinformatics to look for hidden messages in DNA without ever needing to put on a lab coat.
World and internet is full of textual information. We search for information using textual queries, we read websites, books, e-mails. All those are strings from the point of view of computer science. To make sense of all that information and make search efficient, search engines use many string algorithms. Moreover, the emerging field of personalized medicine uses many search algorithms to find disease-causing mutations in the human genome.
Are you interested in learning how to program (in Python) within a scientific setting? This course will cover algorithms for solving various biological problems along with a handful of programming challenges helping you implement these algorithms in Python. Each of the four weeks in the course will consist of two required components. First, an interactive textbook provides Python programming challenges that arise from real biological problems.
This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.
In this course, we will see how evolutionary trees resolve quandaries from finding the origin of a deadly virus to locating the birthplace of modern humans. We will then use methods from computational proteomics to test whether we can reconstruct Tyrannosaurus rex proteins and prove that birds evolved from dinosaurs.
This course introduces you to the basic biology of modern genomics and the experimental tools that we use to measure it. We'll introduce the Central Dogma of Molecular Biology and cover how next-generation sequencing can be used to measure DNA, RNA, and epigenetic patterns. You'll also get an introduction to the key concepts in computing and data science that you'll need to understand how data from next-generation sequencing experiments are generated and analyzed.
The course covers basic algorithmic techniques and ideas for computational problems arising frequently in practical applications: sorting and searching, divide and conquer, greedy algorithms, dynamic programming. We will learn a lot of theory: how to sort data and how it helps for searching; how to break a large problem into pieces and solve them recursively; when it makes sense to proceed greedily; how dynamic programming is used in genomic studies. You will practice solving computational problems, designing new algorithms, and implementing solutions efficiently (so that they run in less than a second).
Biologists still cannot read the nucleotides of an entire genome as you would read a book from beginning to end. However, they can read short pieces of DNA. In this course, we will see how graph theory can be used to assemble genomes from these short pieces. We will further learn about brute force algorithms and apply them to sequencing mini-proteins called antibiotics. Finally, you will learn how to apply popular bioinformatics software tools to sequence the genome of a deadly Staphylococcus bacterium.
After sequencing genomes, we would like to compare them. We will see that dynamic programming is a powerful algorithmic tool when we compare two genes (i.e., short sequences of DNA) or two proteins. When we "zoom out" to compare entire genomes, we will employ combinatorial algorithms.
A good algorithm usually comes together with a set of good data structures that allow the algorithm to manipulate the data efficiently. In this course, we consider the common data structures that are used in various computational problems. You will learn how these data structures are implemented in different programming languages and will practice implementing them in our programming assignments.
We have all seen forensic scientists in TV shows, but how do they really work? What is the science behind their work? The course aims to explain the scientific principles and techniques behind the work of forensic scientists and will be illustrated with numerous case studies from Singapore and around the world.
MOOCs – Massive Open Online Courses – enable students around the world to take university courses online. This guide, by the instructors of edX’s most successful MOOC in 2013-2014, Principles of Written English (based on both enrollments and rate of completion), advises current and future students how to get the most out of their online study, covering areas such as what types of courses are offered and who offers them, what resources students need, how to register, how to work effectively with other students, how to interact with professors and staff, and how to handle assignments. This second edition offers a new chapter on how to stay motivated. This book is suitable for both native and non-native speakers of English, and is applicable to MOOC classes on any subject (and indeed, for just about any type of online study).