Computational Genomics
Indian Institute of Science Education and Research Bhopal and NPTEL via Swayam
-
33
-
- Write review
Overview
ABOUT THE COURSE: With the availability of large amount of biological data including sequences, genomes, transcriptomes, etc, it is necessary to impart skills in students and researchers for the comprehensive analysis of this data. Thus, the emphasis of this course is on building concepts and providing insights into the process of genomic analysis, and understanding the algorithms and basic genomic analysis methods, which are commonly needed for biological data analysis and computational genomics.INTENDED AUDIENCE: BSc., MSc., MPhil, PhDPREREQUISITES: Basic Biology Knowledge such as courses in Molecular Biology, Microbiology, Biochemistry, Genetics, etcINDUSTRY SUPPORT: Any Bioinformatics or Life Science company will find this course relevant such as TCS Life Sciences, Reliance Life Sciences, WIPRO life sciences, Siemens Healthineers, etc.
Syllabus
Week 1:
Day 1: Introduction to Computational genomics, Transcriptomics, Proteomics, Epigenomics, Metagenomics and their applications, The BIG data of biological sciences
Day 2: Organization of genetic information in prokaryotic and eukaryotic cell, genome maps, Eukaryotic genome structure, High-throughput technologies to translate this information into genomic data
Day 3: How genomic data is organized in public databases, Genomics web resources, Nucleic acid and protein sequence databases, gene expression databases, Metabolic and metabolomic databases. Examples: NCBI GenBank and Expasy, EBI, Ensembl, UCSC, KEGG
Week 2:
Day 1: First, second generation sequencing technologies including Sanger and Illumina and their data output
Day 2: Long read sequencing and linked read sequencing (Nanopore, PacBio, TELL-Seq)
Day 3: Sequence formats: FASTA, GenBank, EMBL, XML, Fastq, fast5, etc., genomic database versions and archives, NCBI SRA, bio-project, accessions, data retrieval using wget, FTP, FileZilla, and scripts provided by the database team for genomic analysis
Week 3:
Day 1: Introduction to Linux, basic commands for file handling
Day 2: Running jobs on Linux, processing, installation of genomic packages
Day 3: Introduction to R, commonly used packages, applications in genomic analysis
Week 4:
Day 1: Introduction to genomes and packages for genomic analysis such as EMBOSS; Specifications of workstations needed for genomic analysis, Introduction to High Performance Computing and servers, and their need in genomic analysis
Day 2 : Overview and concepts in genomic and transcriptomic analysis of an organism with examples and case studies
Day 3: Sample collection, DNA extraction and quantification, and species identification of the species to be sequenced. RNA extraction and transcriptome sequencing approaches
Week 5:
Day 1: Methods to estimate the amount of sequencing coverage needed for genomic assembly, use of hybrid sequencing approaches for appropriate coverage and assembly
Day 2: Short and long reads, paired-end reads, quality filtering of sequence data, Genome complexity assessment, Jellyfish and GenomeScope for generating k-mer count histograms and calculating genomic heterozygosity
Day 3: Concept of genome assembly, contigs, scaffolds, complete genome, draft genome, chromosomal level assembly, Genome assembly algorithms such as De-Bruijn graph, Overlap layout consensus (OLC), Hybrid assembly
Week 6:
Day 1: Introduction to common assembly tools ABySS, SOAPdeneno, Flye, Supernova
Day 2: 10X genomic linked-read sequencing, use of proc10xG set of python scripts to pre-process the 10x Genomics raw reads, removal of barcode sequences
Day 3: Nanopore long reads analysis: Guppy for base calling of raw reads, adaptor removal using Porechop, Genome assembly workflow using three different assemblers: wtdbg, SMARTdenovo, and Flye, parameters for assembly
Week 7:
Day 1: de novo assembly using Supernova, parameters, usage of genomic and transcriptomic reads to increase assembly contiguity
Day 2: Merging assemblies to create hybrid assembly, gap closing of assembly and polishing, fixation of small indels, base errors, and local misassemblies, determining the quality of assembly using N50, BUSCO scores, coverage etc.,
Day 3: Chromosomal level assembly using Hi-C, concept of reference genome, finished genome, draft genome, case studies
Week 8:
Day 1: Annotation of repeats in final genome assembly using RepeatMasker, Determining the simple and complex repeat content of a genome
Day 2: de novo transcriptome assembly, Determining the coding gene set using MAKER pipeline
Day 3: Prediction of tRNA, rRNA and miRNA in a genome, Identification of metabolic pathways by KEGG
Week 9:
Day 1: Comprehensive functional annotation of predicted genes or protein sequences by homology-based alignment using Blast or Blat, COGs, Gene ontology based annotation, Interproscan, PROSITE, Pfam, prints, patterns, motifs and fingerprints
Day 2: Evolutionary analysis using homologs, paralogs and orthologs, Multiple signs of adaptation, gene family expansion and contraction
Day 3: Taxonomic classification, marker sequences such as 16S rDNA and ITS, taxonomic hierarchy, Phylogeny reconstruction using multiple sequence alignment, Distance based approaches such as Neighbour joining, Character based approaches such as Maximum parsimony, Maximum likelihood, RAxML
Week 10:
Day 1: Epigenetics, ChIp-seq, transcriptome and microarrays for regulation of expression
Day 2: Single cell genomics, 10X Chromium linked-reads and Illumina sequencing, single cell gene expression
Day 3: Application of multiomics approaches in human health and diseases such as cancer, diabetes, etc.
Week 11:
Day 1: Prokaryotic genome sequencing and assembly approaches, draft and complete genomes, taxonomic identification
Day 2: Gene prediction approaches and common methods, annotation of a bacterial genome, t-RNA, rRNA, operon prediction
Day 3: Phylogenetic, metabolic and comparative analysis
Week 12:
Day 1: Microbiome and Metagenome, Human, organismal and environmental microbiomes
Day 2: Sequencing and assembly of metagenomes, gene prediction, annotation, MAGs
Day 3: Taxonomic analysis using amplicon sequence variants, Statistical analysis
Day 1: Introduction to Computational genomics, Transcriptomics, Proteomics, Epigenomics, Metagenomics and their applications, The BIG data of biological sciences
Day 2: Organization of genetic information in prokaryotic and eukaryotic cell, genome maps, Eukaryotic genome structure, High-throughput technologies to translate this information into genomic data
Day 3: How genomic data is organized in public databases, Genomics web resources, Nucleic acid and protein sequence databases, gene expression databases, Metabolic and metabolomic databases. Examples: NCBI GenBank and Expasy, EBI, Ensembl, UCSC, KEGG
Week 2:
Day 1: First, second generation sequencing technologies including Sanger and Illumina and their data output
Day 2: Long read sequencing and linked read sequencing (Nanopore, PacBio, TELL-Seq)
Day 3: Sequence formats: FASTA, GenBank, EMBL, XML, Fastq, fast5, etc., genomic database versions and archives, NCBI SRA, bio-project, accessions, data retrieval using wget, FTP, FileZilla, and scripts provided by the database team for genomic analysis
Week 3:
Day 1: Introduction to Linux, basic commands for file handling
Day 2: Running jobs on Linux, processing, installation of genomic packages
Day 3: Introduction to R, commonly used packages, applications in genomic analysis
Week 4:
Day 1: Introduction to genomes and packages for genomic analysis such as EMBOSS; Specifications of workstations needed for genomic analysis, Introduction to High Performance Computing and servers, and their need in genomic analysis
Day 2 : Overview and concepts in genomic and transcriptomic analysis of an organism with examples and case studies
Day 3: Sample collection, DNA extraction and quantification, and species identification of the species to be sequenced. RNA extraction and transcriptome sequencing approaches
Week 5:
Day 1: Methods to estimate the amount of sequencing coverage needed for genomic assembly, use of hybrid sequencing approaches for appropriate coverage and assembly
Day 2: Short and long reads, paired-end reads, quality filtering of sequence data, Genome complexity assessment, Jellyfish and GenomeScope for generating k-mer count histograms and calculating genomic heterozygosity
Day 3: Concept of genome assembly, contigs, scaffolds, complete genome, draft genome, chromosomal level assembly, Genome assembly algorithms such as De-Bruijn graph, Overlap layout consensus (OLC), Hybrid assembly
Week 6:
Day 1: Introduction to common assembly tools ABySS, SOAPdeneno, Flye, Supernova
Day 2: 10X genomic linked-read sequencing, use of proc10xG set of python scripts to pre-process the 10x Genomics raw reads, removal of barcode sequences
Day 3: Nanopore long reads analysis: Guppy for base calling of raw reads, adaptor removal using Porechop, Genome assembly workflow using three different assemblers: wtdbg, SMARTdenovo, and Flye, parameters for assembly
Week 7:
Day 1: de novo assembly using Supernova, parameters, usage of genomic and transcriptomic reads to increase assembly contiguity
Day 2: Merging assemblies to create hybrid assembly, gap closing of assembly and polishing, fixation of small indels, base errors, and local misassemblies, determining the quality of assembly using N50, BUSCO scores, coverage etc.,
Day 3: Chromosomal level assembly using Hi-C, concept of reference genome, finished genome, draft genome, case studies
Week 8:
Day 1: Annotation of repeats in final genome assembly using RepeatMasker, Determining the simple and complex repeat content of a genome
Day 2: de novo transcriptome assembly, Determining the coding gene set using MAKER pipeline
Day 3: Prediction of tRNA, rRNA and miRNA in a genome, Identification of metabolic pathways by KEGG
Week 9:
Day 1: Comprehensive functional annotation of predicted genes or protein sequences by homology-based alignment using Blast or Blat, COGs, Gene ontology based annotation, Interproscan, PROSITE, Pfam, prints, patterns, motifs and fingerprints
Day 2: Evolutionary analysis using homologs, paralogs and orthologs, Multiple signs of adaptation, gene family expansion and contraction
Day 3: Taxonomic classification, marker sequences such as 16S rDNA and ITS, taxonomic hierarchy, Phylogeny reconstruction using multiple sequence alignment, Distance based approaches such as Neighbour joining, Character based approaches such as Maximum parsimony, Maximum likelihood, RAxML
Week 10:
Day 1: Epigenetics, ChIp-seq, transcriptome and microarrays for regulation of expression
Day 2: Single cell genomics, 10X Chromium linked-reads and Illumina sequencing, single cell gene expression
Day 3: Application of multiomics approaches in human health and diseases such as cancer, diabetes, etc.
Week 11:
Day 1: Prokaryotic genome sequencing and assembly approaches, draft and complete genomes, taxonomic identification
Day 2: Gene prediction approaches and common methods, annotation of a bacterial genome, t-RNA, rRNA, operon prediction
Day 3: Phylogenetic, metabolic and comparative analysis
Week 12:
Day 1: Microbiome and Metagenome, Human, organismal and environmental microbiomes
Day 2: Sequencing and assembly of metagenomes, gene prediction, annotation, MAGs
Day 3: Taxonomic analysis using amplicon sequence variants, Statistical analysis
Taught by
Prof. Vineet Kumar Sharma