Genomic Data Science and Clustering (Bioinformatics V)

Overview

How do we infer which genes orchestrate various processes in the cell? How did humans migrate out of Africa and spread around the world? In this class, we will see that these two seemingly different questions can be addressed using similar algorithmic and machine learning techniques arising from the general problem of dividing data points into distinct clusters. In the first half of the course, we will introduce algorithms for clustering a group of objects into a collection of clusters based on their similarity, a classic problem in data science, and see how these algorithms can be applied to gene expression data. In the second half of the course, we will introduce another classic tool in data science called principal components analysis that can be used to preprocess multidimensional data before clustering in an effort to greatly reduce the number dimensions without losing much of the "signal" in the data. Finally, you will learn how to apply popular bioinformatics software tools to solve a real problem in clustering.

Syllabus

Week 1: Introduction to Clustering Algorithms

Welcome to class!
At the beginning of the class, we will see how algorithms for clustering a set of data points will help us determine how yeast became such good wine-makers. At the bottom of this email is the Bioinformatics Cartoon for this chapter, courtesy of Randall Christopher and serving as a chapter header in the Specialization's bestselling print companion. How did the monkey lose a wine-drinking contest to a tiny mammal? Why have Pavel and Phillip become cavemen? And will flipping a coin help them escape their eternal boredom until they can return to the present? Start learning to find out!

Week 2: Advanced Clustering Techniques

Welcome to week 2 of class!

This week, we will see how we can move from a "hard" assignment of points to clusters toward a "soft" assignment that allows the boundaries of the clusters to blend. We will also see how to adapt the Lloyd algorithm that we encountered in the first week in order to produce an algorithm for soft clustering. We will also see another clustering algorithm called "hierarchical clustering" that groups objects into larger and larger clusters.

Week 3: Introductory Algorithms in Population Genetics

Taught by

Pavel Pevzner and Phillip Compeau

Reviews

3.5 rating, based on 2 Class Central reviews

4.2 rating at Coursera based on 91 ratings

Start your review of Genomic Data Science and Clustering (Bioinformatics V)

Anonymous

Highly recommend the course and the specializations to all learners who are serious about learning algorithms. This course goes deeply into developing hard and soft k mean clustering algorithms. Very tough course.
Alex Ivanov

Taught by

Tags

Genome Sequencing (Bioinformatics II)

Finding Hidden Messages in DNA (Bioinformatics I)

Finding Mutations in DNA and Proteins (Bioinformatics VI)

Introduction to Genomic Data Science

Molecular Evolution (Bioinformatics IV)

Comparing Genes, Proteins, and Genomes (Bioinformatics III)

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

1800+ Coursera Courses That Are Still Completely FREE

250 Top FREE Coursera Courses of All Time

Massive List of MOOC-based Microcredentials

Never Stop Learning.