ABOUT THE COURSE:This course is intended to provide a broad overview of fundamental algorithms and data structure to analyse large biological datasets. Several major questions in modern biology such as (i) how to find mutations in a genome sequence, or (ii) how do we trace evolutionary relationships among species, can only be answered using efficient algorithms. This course is particularly relevant for computer science or applied maths students who wish to pursue a career in designing algorithmic solutions for scientific applications. The course includes hands-on programming exercises to appreciate the complexity of real-world data such as the SARS-Cov2 genome database.INTENDED AUDIENCE: Students with interest in developing algorithms and fast software that are applicable to the emerging biology and genomics applicationsPREREQUISITES: Elementary knowledge of discrete mathematics, basic algorithms and data structures is required. Programming proficiency with either C or C++ or Java or Python is required. Knowledge of basic algorithms for sorting, searching, hashing, graph traversal algorithms will be required.INDUSTRY SUPPORT: Companies developing software for molecular biology and omics applications (e.g., Google Health, Strand Life Sciences)
Overview
Syllabus
Week 1:Introduction
Week 3:Pairwise Sequence Alignment
Week 5:Genome reconstruction using graph algorithms
Week 6:Evolutionary tree construction
Week 7:Sequence models and classification
Week 9:Discussion of research papers
Week 10:Discussion of research papers(Cont.,)
Week 11:Discussion of research papers(Cont.,)
Week 12:Discussion of research papers(Cont.,)
- Brief review of the fundamentals of molecular biology and genetics.
- Examples of widely used software, algorithms, databases.
- Suffix trees, suffix arrays, BWT, and their applications
Week 3:Pairwise Sequence Alignment
- Classic dynamic programming ideas for pairwise sequence alignment.
- Statistical measures of alignment significance.
- Mathematical ideas underlying heuristic sequence aligners.
- Applications of sequence alignment for mutation finding and disease diagnosis.
Week 5:Genome reconstruction using graph algorithms
- de Bruijn Graphs, Overlap graphs, Shortest common superstring
Week 6:Evolutionary tree construction
- Multiple sequence alignment – formulations, optimal and approximation algorithms
- Classical and contemporary algorithms for inferring evolutionary trees.
Week 7:Sequence models and classification
- Gene finding. Hidden Markov models
- Large language models for biological sequences
- Gene finding. Hidden Markov models
- Large language models for biological sequences
Week 9:Discussion of research papers
Week 10:Discussion of research papers(Cont.,)
Week 11:Discussion of research papers(Cont.,)
Week 12:Discussion of research papers(Cont.,)
Taught by
Prof. Chirag Jain