Yeasts are unicellular fungi that do not form fruiting bodies. Although the yeast lifestyle has evolved multiple times, most known species belong to the subphylum Saccharomycotina (hereafter yeasts). This diverse group includes the premier eukaryotic model system, Saccharomyces cerevisiae; the common human commensal and opportunistic pathogen, Candida albicans; and over 1,000 other known species (with more continuing to be discovered). Yeasts are found in every biome and continent and are more genetically diverse than either plants or bilaterian animals. Ease of culture, simple life cycles, and small genomes (10– 20 Mbp) have made yeasts exceptional models for molecular genetics, biotechnology, and evolutionary genomics. Since only a tiny fraction of yeast biodiversity and metabolic capabilities has been tapped by industry and science, expanding the taxonomic breadth of deep genomic investigations will further illuminate how genome function evolves to encode their diverse metabolisms and ecologies. As part of National Science Foundation’s Dimensions of Biodiversity program, we have undertaken a large-scale comparative genomic study to uncover the genetic basis of metabolic diversity in the entire Saccharomycotina subphylum. In my talk, I will discuss the team’s evolutionary analyses of 332 genomes spanning the diversity of the subphylum. These include establishing a robust genus-level phylogeny and timetree for the subphylum, quantification of the extent of horizontal gene transfer for the subphylum, and characterization of the evolution of approximately 50 metabolic traits (and, in some cases, their underlying genes and pathways). These analyses allow us, for the first time, to infer the key metabolic characteristics of the Last Yeast Common Ancestor (LYCA) and characterize the tempo and mode of genome evolution across an entire subphylum.
Date, time & location: Thu 25 Sep 2014, 12 noon, Darwin Building Room 114
Host: Christophe Dessimoz
Abstract: The advent of high throughput sequencing (HTS) has boosted the variety of sequencing projects related to molecular biology and medicine. Mapping reads to a reference genome is one of the fundamental steps in most HTS related analysis. We show that highly polymorphic regions hinder an accurate alignment of HTS reads. Thus biasing subsequent analyses (e.g. SNP detection and transcription abundance estimation). We studied effect of such bias, by identifying highly polymorphic genomic regions in an F1 cross from inbred lines of D.m Mel 6 x D. m RAL774. To this end we determined all heterozygous positions in F1 from a genomic alignment. Each such heterozygous SNP was then categorized according to the number of SNPs in its vicinity. Mapping the RNA-seq data of the F1 cross to Mel 6D showed that neither BWA nor Tophat2 could recover most of the heterozygous SNPs alignment, where the detection probability depended on the variability around the SNP. Moreover, the heterozygous SNPs show an overrepresentation of the Mel 6D variant, thus the SNP frequencies deviate substantially from the expected value 0.5 per site. Furthermore, we demonstrate that a highly polymorphic region in a gene influences the estimation of its transcript abundance. To conclude, in this study, we demonstrate some read mappers are affected by highly polymorphic. We also show that mappers like NextGenMap are less affected and thus more suitable for reliable analyses of HTS data.