The past 15 years have been exciting ones in plant biology. Hundreds of plant genomes have been sequenced, RNA-seq has enabled transcriptome-wide expression profiling, and a proliferation of "-seq"-based methods has permitted protein-protein and protein-DNA interactions to be determined cheaply and in a high-throughput manner. These data sets in turn allow us to generate hypotheses at the click of a mouse. For instance, knowing where and when a gene is expressed can help us narrow down the phenotypic search space when we don't see a phenotype in a gene mutant under "normal" growth conditions. Coexpression analyses and association networks can provide high-quality candidate genes involved in a biological process of interest. Using Gene Ontology enrichment analysis and pathway visualization tools can help us make sense of our own 'omics experiments and answer the question "what processes/pathways are being perturbed in our mutant of interest?"
Structure: each of the 6 week hands-on modules consists of a ~2 minute intro, a ~20 minute theory mini-lecture, a 1.5 hour hands-on lab, an optional ~20 minute lab discussion if experiencing difficulties with lab, and a ~2 minute summary.
Tools covered [Material updated in June 2024]:
Module 1: GENOMIC DBs / PRECOMPUTED GENE TREES / PROTEIN TOOLS. Araport, TAIR, Gramene, EnsemblPlants Compara, PLAZA; SUBA5 and Cell eFP Browser, 1001 Genomes Browser
Module 2: EXPRESSION TOOLS. eFP Browser / eFP-Seq Browser, Araport, ARDB, TravaDB, NCBI Genome Data Viewer for exploring RNA-seq data for many plant species, MPSS database for small RNAs
Module 3: COEXPRESSION TOOLS. ATTED II, Expression Angler, AraNet, AtCAST2
Module 4: PROMOTER ANALYSIS. Cistome, MEME, ePlant
Module 5: GO ENRICHMENT ANALYSIS AND PATHWAY VIZUALIZATION. AgriGO, AmiGO, Classification SuperViewer, TAIR, g:profiler, AraCyc, MapMan (optional: Plant Reactome)
Module 6: NETWORK EXPLORATION. Arabidopsis Interactions Viewer 2, ePlant, TF2Network, Virtual Plant, GeneMANIA
Overview
Syllabus
- Plant Genomic Databases, and useful sites for info about proteins
- In this module we'll be exploring several plant databases including Ensembl Plants, Gramene, PLAZA, SUBA, TAIR and Araport. The information in these databases allows us to easily identify functional regions within gene products, view subcellular localization, find homologs in other species, and even explore pre-computed gene trees to see if our gene of interest has undergone a gene duplication event in another species, all at the click of a mouse!
- Expression Analysis
- Vast databases of gene expression and nifty visualization tools allow us to explore where and when a gene is expressed. Often this information can be used to help guide a search for a phenotype if we don't see a phenotype in a gene mutant under "normal" growth conditions. We explore several tools for Arabidopsis data (eFP Browser, ARDB, TraVA DB, Araport) along with NCBI's Genome Data Viewer for RNA-seq data for other plant species. We also examine the MPSS database of small RNAs and degradation products to see if our example gene has any potential microRNA targets.
- Coexpression Tools
- Being able to group genes by similar patterns of expression across expression data sets using algorithms like WGCNA is a very useful way of organizing the data. Clusters of genes with similar patterns of expression can then be subject to Gene Ontology term enrichment analysis (see Module 5) or examined to see if they are part of the same pathway. What's even more powerful is being able to identify genes with similar patterns of expression without doing a single expression profiling experiment, by mining gene expression databases! There are several tools that allow you to do this in many plant species simply by entering a query gene identifier. The genes that are returned are often in the same biological process as the query gene, and thus this "guilt-by-association" paradigm is a excellent tool for hypothesis generation.
- Sectional Quiz 1
- Promoter Analysis
- The regulation of gene expression is one of the main ways by which a plant can control the abundance of a gene product (post-translational modifications and protein degradation are some others). When and where a gene is expressed is controlled to a large extent by the presence of short sequence motifs, called cis-elements, present in the promoter of the gene. These in turn are regulated by transcription factors that perhaps get induced in response to environmental stresses or during specific developmental programs. Thus understanding which transcription factors can bind to which promoters can help us understand the role the downstream genes might be playing in a biological system.
- Functional Classification and Pathway Vizualization
- Often the results of 'omics experiments are large lists of genes, such as those that are differentially expressed. We can use a "cherry picking" approach to explore individual genes in those lists but it's nice to be able to have an automated way of analyzing them. Here tools for performing Gene Ontology enrichment analysis are invaluable and can tell you if any particular biological processes or molecular functions are over-represented in your gene list. We'll explore AgriGO, AmiGO, tools at TAIR and the BAR, and g:Profiler, which all allow you to do such analyses. Another useful analysis is to be able to map your gene lists (along with associated e.g. expression values) onto pathway representations, and we'll use AraCyc and MapMan to do this. In this way it is easy to see if certain biosynthetic reactions are upregulated, which can help you interpret your 'omics data!
- Network Exploration (PPIs, PDIs, GRNs)
- Molecules inside the cell rarely operate in isolation. Proteins act together to form complexes, or are part of signal transduction cascades. Transcription factors bind to cis-elements in promoters or elsewhere and can act as activators or repressors of transcription. MicroRNAs can affect transcription in other ways. One of the main themes to have emerged in the past two decades in biology is that of networks. In terms of protein-protein interaction networks, often proteins that are highly connected with others are crucial for biological function – when these “hubs” are perturbed, we see large phenotypic effects. The way that transcription factors interact with downstream promoters, some driving the expression of other transcription factors that in turn regulate genes combinatorially with upstream transcription factors can have an important biological effect in terms of modulating the kind of output achieved. The tools described in this lab can help us to explore molecular interactions in a network context, perhaps with the eventual goal of modeling the behaviour of a given system.
- Sectional Quiz 2 and Final Assignment
Taught by
Nicholas James Provart