After sequencing genomes, we would like to compare them. We will see that dynamic programming is a powerful algorithmic tool when we compare two genes (i.e., short sequences of DNA) or two proteins. When we "zoom out" to compare entire genomes, we will employ combinatorial algorithms.
Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different.
In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.
In the second half of the course, we will "zoom out" to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution. Looking at the human and mouse genomes, we will ask ourselves: just as earthquakes are much more likely to occur along fault lines, are there locations in our genome that are "fragile" and more susceptible to be broken as part of genome rearrangements? We will see how combinatorial algorithms will help us answer this question.
Finally, you will learn how to apply popular bioinformatics software tools to solve problems in sequence alignment, including BLAST.
This course is primarily aimed at undergraduate-level learners in computer science, biology, or a related field who are interested in learning about how the intersection of these two disciplines represents an important frontier in modern science.
Introduction to Sequence Alignment
If you joined us in the previous course in this Specialization, then you became an expert at assembling genomes and sequencing antibiotics. The next natural question to ask is how to compare DNA and amino acid sequences. This question will motivate this week's discussion of sequence alignment, which is the first of two questions that we will ask in this class (the algorithmic methods used to answer them are shown in parentheses):
How Do We Compare DNA Sequences? (Dynamic Programming)
Are There Fragile Regions in the Human Genome? (Combinatorial Algorithms)
Graded: Week 1 Quiz
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 1
From Finding a Longest Path to Aligning DNA Strings
Last week, we saw how touring around Manhattan and making change in a Roman shop help us find a longest common subsequence of two DNA or protein strings.
This week, we will study how to find a highest scoring alignment of two strings. We will see that regardless of the underlying assumptions that we make regarding how the strings should be aligned, we will be able to phrase our alignment problem as an instance of finding the longest path in a directed acyclic graph.
Graded: Week 2 Quiz
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 2
Advanced Topics in Sequence Alignment
Last week, we saw how a variety of different applications of sequence alignment can all be reduced to finding the longest path in a Manhattan-like graph.
This week, we will conclude the current chapter by considering a few advanced topics in sequence alignment. For example, if we need to align long strings, our current algorithm will consume a huge amount of memory. Can we find a more memory-efficient approach? And what should we do when we move from aligning just two strings at a time to aligning many strings?
Graded: Week 3 Quiz
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 3
Genome Rearrangements and Fragility
You now know how to compare two DNA (or protein) strings. But what if we wanted to compare entire genomes? When we "zoom out" to the genome level, we find that substitutions, insertions, and deletions don't tell the whole story of evolution: we need to model more dramatic evolutionary events known as genome rearrangements, which wrench apart chromosomes and put them back together in a new order. A natural question to ask is whether there are "fragile regions" hidden in your genome where chromosome breakage has occurred more often over millions of years. This week, we will begin addressing this question by asking how we can compute the number of rearrangements on the evolutionary path connecting two species.
Graded: Week 4 Quiz
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 4
Applying Genome Rearrangement Analysis to Find Genome Fragility
Last week, we asked whether there are fragile regions in the human genome. Then, we took a lengthy detour to see how to compute a distance between species genomes, a discussion that we will continue this week.
It is probably unclear how computing the distance between two genomes can help us understand whether fragile regions exist. If so, please stay tuned -- we will see that the connection between these two concepts will yield a surprising conclusion to the class.
Graded: Week 5 Quiz
Graded: Open in order to Sync Your Progress: Stepik Interactive Text for Week 5
Week 6: Bioinformatics Application Challenge
In the sixth and final week of the course, we will apply sequence alignment algorithms to infer the non-ribosomal code.
"Assembling DNA and Proteins" is the suggested prerequisite for taking this course, but it is not a strict prerequisite, especially if you have some programming experience. The programming assignments in this class can be solved using any programming lang
Join us on the frontier of bioinformatics and learn how to look for hidden messages in DNA without ever needing to put on a lab coat. In the first half of this course, we'll investigate DNA replication, and ask the question, where in the genome does DNA replication begin? You will learn how to answer this question for many bacteria using straightforward algorithms to look for hidden messages in the genome.
This course begins a series of classes illustrating the power of computing in modern biology. Please join us on the frontier of bioinformatics to look for hidden messages in DNA without ever needing to put on a lab coat.
World and internet is full of textual information. We search for information using textual queries, we read websites, books, e-mails. All those are strings from the point of view of computer science. To make sense of all that information and make search efficient, search engines use many string algorithms. Moreover, the emerging field of personalized medicine uses many search algorithms to find disease-causing mutations in the human genome.
Are you interested in learning how to program (in Python) within a scientific setting? This course will cover algorithms for solving various biological problems along with a handful of programming challenges helping you implement these algorithms in Python. Each of the four weeks in the course will consist of two required components. First, an interactive textbook provides Python programming challenges that arise from real biological problems.
This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.
In this course, we will see how evolutionary trees resolve quandaries from finding the origin of a deadly virus to locating the birthplace of modern humans. We will then use methods from computational proteomics to test whether we can reconstruct Tyrannosaurus rex proteins and prove that birds evolved from dinosaurs.
This course introduces you to the basic biology of modern genomics and the experimental tools that we use to measure it. We'll introduce the Central Dogma of Molecular Biology and cover how next-generation sequencing can be used to measure DNA, RNA, and epigenetic patterns. You'll also get an introduction to the key concepts in computing and data science that you'll need to understand how data from next-generation sequencing experiments are generated and analyzed.
To acquire an understanding of the fundamental concepts of genomics and biotechnology, and their implications for human biology, evolution, medicine, social policy and individual life path choices in the 21st century.
The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).
The course covers basic algorithmic techniques and ideas for computational problems arising frequently in practical applications: sorting and searching, divide and conquer, greedy algorithms, dynamic programming. We will learn a lot of theory: how to sort data and how it helps for searching; how to break a large problem into pieces and solve them recursively; when it makes sense to proceed greedily; how dynamic programming is used in genomic studies. You will practice solving computational problems, designing new algorithms, and implementing solutions efficiently (so that they run in less than a second).
Biologists still cannot read the nucleotides of an entire genome as you would read a book from beginning to end. However, they can read short pieces of DNA. In this course, we will see how graph theory can be used to assemble genomes from these short pieces. We will further learn about brute force algorithms and apply them to sequencing mini-proteins called antibiotics. Finally, you will learn how to apply popular bioinformatics software tools to sequence the genome of a deadly Staphylococcus bacterium.
A good algorithm usually comes together with a set of good data structures that allow the algorithm to manipulate the data efficiently. In this course, we consider the common data structures that are used in various computational problems. You will learn how these data structures are implemented in different programming languages and will practice implementing them in our programming assignments.
MOOCs – Massive Open Online Courses – enable students around the world to take university courses online. This guide, by the instructors of edX’s most successful MOOC in 2013-2014, Principles of Written English (based on both enrollments and rate of completion), advises current and future students how to get the most out of their online study, covering areas such as what types of courses are offered and who offers them, what resources students need, how to register, how to work effectively with other students, how to interact with professors and staff, and how to handle assignments. This second edition offers a new chapter on how to stay motivated. This book is suitable for both native and non-native speakers of English, and is applicable to MOOC classes on any subject (and indeed, for just about any type of online study).