Transcript Document
Arrays as tools for Natural Variation studies: Mapping, Haplotyping, and gene expression Justin Borevitz University of Chicago naturalvariation.org` Talk Outline • Single Feature Polymorphisms (SFPs) – Potential deletions • Bulk Segregant Mapping – Extreme Array Mapping • Haplotyping – Selection • Transcriptional profiling – for QTL candidate genes What is Array Genotyping? • Affymetrix expression GeneChips contain 202,806 unique 25bp oligo nucleotides. • 11 features per probset for 21546 genes • New array’s have even more • Genomic DNA is randomly labeled with biotin, product ~50bp. • 3 independent biological replicates compared to the reference strain Col GeneChip Potential Deletions Spatial Correction Spatial Artifacts Improved reproducibility Next: Quantile Normalization False Discovery and Sensitivity Cereon may be a sequencing Error TIGR match is a match PM only SAM threshold 5% FDR GeneChip SFPs nonSFPsCereon marker accuracy 3806 89118 100% 90% 80% 70% Sequence 817 121 696Sensitivity Polymorphic 340 117 223 34% 41% 53% 85% Non-polymorphic 477 4 473 False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p -value = 1.845e-40 GeneChip SFPs nonSFPsCereon marker accuracy 10627 82297 100% 90% 80% 70% Sequence 817 223 594Sensitivity Polymorphic 340 195 145 57% 67% 85% 100% Non-polymorphic 477 28 449 False Discovery rate: 13% Test for independence of all factors: Chisq = 265.13, df = 1, p -value = 1.309e-59 SAM threshold 18% FDR 3/4 Cvi markers were also confirmed in PHYB Chip genotyping of a Recombinant Inbred Line 29kb interval Discovery 6 replicates X $500 12,000 SFPs = $0.25 Typing 1 replicate X $500 12,000 SFPs = $0.041 LIGHT1 NIL Potential Deletions >500 potential deletions 45 confirmed by Ler sequence 23 (of 114) transposons Disease Resistance (R) gene clusters Single R gene deletions Genes involved in Secondary metabolism Unknown genes Potential Deletions Suggest Candidate Genes MAF1 natural deletion FLOWERING1 QTL Chr1 (bp) MAF1 Flowering Time QTL caused by a natural deletion in MAF1 Fast Neutron deletions FKF1 80kb deletion CHR1 Het cry2 10kb deletion CHR1 Map bibb 100 bibb mutant plants 100 wt mutant plants bibb mapping Bulk segregant Mapping using Chip hybridization bibb maps to Chromosome2 near ASYMETRIC LEAVES1 AS1 ChipMap BIBB = ASYMETRIC LEAVES1 AS1 (ASYMMETRIC LEAVES1) = MYB closely related to PHANTASTICA located at 64cM bibb as1 Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain bib-1 W49* MYB as-101 Q107* bibb as1-101 20 40 60 80 100 0.5 -0.5 0 20 40 60 cM Chromosome 2 stamenstaymut stamenstaymut 80 0 20 40 60 80 cM Chromosome 3 0.0 stamenstay Ler Sarah Liljegren -0.5 -0.5 0.0 allele frequency 0.5 cM Chromosome 1 0.5 0.0 allele frequency -0.5 0.0 allele frequency 0.0 -0.5 allele frequency 0 allele frequency stamenstaymut 0.5 stamenstaymut 0.5 stamenstaymut 0 20 40 60 cM Chromosome 4 0 20 40 60 80 cM Chromosome 5 100 Mapping confirmed 40 60 80 100 0.6 0.4 20 40 60 60 cM Chromosome 4 20 40 60 80 ein6een -0.2 0.0 double mutant Ramlah Nehring -0.4 40 0 cM Chromosome 3 0.2 0.6 80 Mapping confirmed -0.6 -0.4 -0.2 0.0 allele frequency 0.2 0.4 ein6F2mut 0.6 ein6F2mut 0.4 cM Chromosome 2 20 0.2 -0.4 -0.6 0 cM Chromosome 1 -0.6 0 0.0 allele frequency -0.4 -0.6 20 -0.2 0.4 0.2 0.0 allele frequency -0.2 0.4 0.2 0.0 -0.2 -0.6 -0.4 allele frequency 0 allele frequency ein6F2mut 0.6 ein6F2mut 0.6 ein6F2mut 0 20 40 60 80 cM Chromosome 5 100 eXtreme Array Mapping 12 Histogram of Kas/Col RILs Red light 6 4 2 0 counts 8 10 15 tallest RILs pooled vs 15 shortest RILs pooled 6 8 10 hypocotyl length (mm) 12 14 eXtreme Array Mapping Chromosome 2 12 8 LOD 16 RED2 QTL 4 0 0 20 40 cM 60 80 100 Composite Interval Mapping RED2 QTL 12cM LOD 15 tallest RILs pooled vs 15 shortest RILs pooled Allele frequencies determined by SFP genotyping. Thresholds set by simulations Red light QTL RED2 from 100 Kas/ Col RILs Fine Mapping with Arrays 100 200 300 400 500 600 1.0 0.5 -0.5 -1.0 0 100 200 300 400 500 Chromosome 4 (cM) Chromosome 5 (cM) 600 0.5 -0.5 -1.0 200 300 kb 400 500 600 100 200 300 400 500 600 Single Additive Gene 1000 F2s Select recombinants by PCR 1Mb region 0.0 genotype 0.5 0.0 -0.5 100 0 kb 1.0 kb 1.0 kb -1.0 0 0.0 genotype 0.5 -1.0 -0.5 0.0 genotype 0.5 0.0 -1.0 -0.5 genotype 0 genotype Chromosome 3 (cM) 1.0 Chromosome 2 (cM) 1.0 Chromosome 1 (cM) 0 100 200 300 kb 400 500 600 Barley SFPs gDNA • 9 arrays, random labeled genomic DNA • 3 wild type, 3 parent 1, 3 parent 2 • Hope to verify some RNA SFPs • Pairs plots, correlation matrix • SFP table Just better than permutations delta ori.data perm.data difference 0.10 2866 2114.2 751.8 0.15 1870 578.4 1291.6 0.20 1274 269.3 1004.7 0.25 991 174.7 816.3 0.30 816 126.8 689.2 0.35 660 95.8 564.2 0.40 554 75.8 478.2 FDR 0.74 0.31 0.21 0.18 0.16 0.15 0.14 Increase specific activity with other labeling methods Perform more replicates • Single Feature Polymorphisms – Improve with replicates (easy) – Improved statistical models • Genotyping – Precisely define recombination breakpoints – Fine mapping • Potential Deletions – Candidate genes/ induced mutations • Bulk segregant Mapping – eXtreme Array Mapping, F2s etc Array Haplotyping • What about Diversity/selection across the genome? • A genome wide estimate of population genetics parameters, θw, π, Tajima’D, ρ • LD decay, Haplotype block size • Deep population structure? • Col, Lz, Ler, Bay, Shah, Cvi, Kas, C24, Est, Kin, Mt, Nd, Sorbo, Van, Ws2 C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K MMM N N N S S S v v V WWW l l l l l l l C CC L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - o o o o o o o w ww w w w e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s Pairwise Correlation between and within replicates C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K M M M N N N S S S v v V WWW o o o o o o o wwwwww e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s l l l l l l l CCC L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - - Array Haplotyping Chromosome1 ~500kb Inbred lines Low effective recombination due to partial selfing Extensive LD blocks Col Ler Cvi Kas Bay Shah Lz Nd Distribution of T-stats null (permutation) actual 32,427 Calls 4 e+04 0 e+00 frequency 8 e+04 208,729 (-4,-3.5] (-3,-2.5] (-2,-1.5] (-1,-0.5] (0,0.5] (1,1.5] (2,2.5] (3,3.5] T statistic Not Col 12,250 SFPs NA Col NA duplications Sequence confirmation of SFPs Accession bay c24 cvi est kas kendl ler lz mt nd shah sorbo van ws2 FDR 0.0% 0.2% 0.0% 0.0% 1.9% 3.1% 0.0% 0.0% 0.2% 0.0% 0.0% 0.0% 0.2% 0.0% Sensitivity 43% 39% 38% 59% 44% 33% 49% 53% 61% 47% 24% 45% 29% 49% SNP 51 64 91 39 66 57 43 51 49 49 80 55 92 57 Total 563 580 543 548 577 545 562 573 570 568 548 526 571 514 SFPs for reverse genetics 14 Accessions 30,950 SFPs` http://naturalvariation.org/sfp Chromosome Wide Diversity Self Incompatibility-locus Self Incompatibility-locus Diversity 50kb windows Tajima’s D like 50kb windows RPS4 unknown R genes vs bHLH Theta W RPS4 Rgenes vs bHLH Tajimas’ D RPS4 R genes vs bHLH Summery Haplotyping • Patterns of variation across accessions • Natural reverse genetics – Polymorphism database • Increased polymorphism in centromere • Selection on R/genes Transcription based cloning • Look for gene expression differences between genotypes • Identify candidate genes that map to mutation • Downstream targets that map elsewhere differences may be due to expression or hybridization PAG1 down regulated in Cvi PLALE GREEN1 knock out has long hypocotyl in red light SFPs from RNA • Barley Affy array 22801 probe sets – Most probes sets 11 probes – Background correction “rma2” – Quantile normalization • 36 arrays total – 3 replicates – 6 tissues, leaf, crown, root, radical, gem, col? – 2 genotypes (Golden Promise 7,459 ESTs) – (Morex 52,695 ESTs) Look at some plots raw data Remove probe effect Remove Tissue + Genotype effect Look at some plots raw data Remove probe effect Remove Tissue + Genotype effect SAM False Discovery Rate delta ori.data perm.data difference FDR 0.1 13210 1210.34 11999.66 0.091623013 0.2 7903 183.95 7719.05 0.023275971 0.3 5462 49.18 5412.82 0.009004028 0.4 4036 18.31 4017.69 0.004536670 0.5 3024 8.49 3015.51 0.002807540 0.6 2285 3.85 2281.15 0.001684902 Both + and – SFPs since no reference comparison Need to compare with ESTs Review • Single Feature Polymorphisms (SFPs) can be used to identify recombination breakpoints, potential deletions, for eXtreme Array mapping, and haplotyping • Expression analysis to identify QTL candidate genes and downstream responses that consider polymorphisms Universal Whole Genome Array RNA Gene Discovery Gene model correction Non-coding/ micro-RNA Antisense transcription DNA Chromatin Immunoprecipitation ChIP chip Methylation Transcriptome Atlas Expression levels Tissues specificity Alternative Splicing Polymorphism SFPs Discovery/Genotyping Comparative Genome Hybridization (CGH) Insertion/Deletions ~19 bp tile, both strands eliminate repeat regions “good” binding oligos Transcriptome Atlas Improved Genome Annotation ORFa ORFb start conservation MMMM M M AAAAA SFP SFP SFP SNP Chromosome (bp) deletion MMMM M M SNP ChipViewer: Mapping of transcriptional units of ORFeome From 2000v At1g09750 (MIPS) to the latest AGI At1g09750 2000 v Annotation (MIPS) The latest AGI Annotation NaturalVariation.org Syngenta Hur-Song Chang Tong Zhu Salk Jon Werner Todd Mockler Sarah Liljegren Ramlah Nehring Joanne Chory Detlef Weigel Joseph Ecker UC Davis Julin Maloof UC San Diego Charles Berry University of Guelph, Canada Dave Wolyn Scripps Sam Hazen Elizabeth Winzeler