We are delighted to host Dr. Fritz Sedlazeck, from Prof. Arndt von Haeseler’s lab, for a special seminar:
- Title: The impact of highly polymorphic regions on high throughput sequencing related studies.
- Speaker: Dr. Fritz Sedlazeck
- Date, time & location: Thu 25 Sep 2014, 12 noon, Darwin Building Room 114
- Host: Christophe Dessimoz
- Abstract: The advent of high throughput sequencing (HTS) has boosted the variety of sequencing projects related to molecular biology and medicine. Mapping reads to a reference genome is one of the fundamental steps in most HTS related analysis. We show that highly polymorphic regions hinder an accurate alignment of HTS reads. Thus biasing subsequent analyses (e.g. SNP detection and transcription abundance estimation). We studied effect of such bias, by identifying highly polymorphic genomic regions in an F1 cross from inbred lines of D.m Mel 6 x D. m RAL774. To this end we determined all heterozygous positions in F1 from a genomic alignment. Each such heterozygous SNP was then categorized according to the number of SNPs in its vicinity. Mapping the RNA-seq data of the F1 cross to Mel 6D showed that neither BWA nor Tophat2 could recover most of the heterozygous SNPs alignment, where the detection probability depended on the variability around the SNP. Moreover, the heterozygous SNPs show an overrepresentation of the Mel 6D variant, thus the SNP frequencies deviate substantially from the expected value 0.5 per site. Furthermore, we demonstrate that a highly polymorphic region in a gene influences the estimation of its transcript abundance. To conclude, in this study, we demonstrate some read mappers are affected by highly polymorphic. We also show that mappers like NextGenMap are less affected and thus more suitable for reliable analyses of HTS data.