Transcript sews4 7185
A biophysical approach to predicting intrinsic and extrinsic nucleosome positioning signals Alexandre V. Morozov Department of Physics & Astronomy and the BioMaPS Institute for Quantitative Biology, Rutgers University [email protected] IPAM, Nov. 26 2007 Introduction to chromatin scales Electron micrograph of D.Melanogaster chromatin: arrays of regularly spaced nucleosomes, each ~80 A across. Overview of gene regulation RNA Pol II + TAFs [mRNA] [TF1] [TF2] [TF3] Gene [Nucleosomes] Prediction and design of gene expression levels from DNA sequence: 1. Prediction of transcription factor and nucleosome occupancies in vitro and in vivo from genomic sequence 2. Prediction of levels of mRNA production from transcription factor and nucleosome occupancies Data for modeling eukaryotic gene regulation Available data sources: …accagtttacgt… DNA sequence data for multiple organisms: Genome-wide transcription factor occupancy data (ChIP-chip): Structural data for 100s of protein-DNA complexes: Nucleosome positioning data: MNase digestion + sequencing or microarrays Biophysical picture of gene transcription Wray, G. A. et al. Mol Biol Evol 2003 20:1377-1419 Chromatin Structure & Nucleosomes Structure of the nucleosome core particle (NCP) Left-handed super-helix: (1.84 turns, 147 bp, R = 41.9 A, P = 25.9 A) PDB code: 1kx5 T.J.Richmond: K.Luger et al. Nature 1997 (2.8 Ǻ); T.J.Richmond & C.A.Davey Nature 2003 (1.9 Ǻ) Gene regulation through chromatin structure Transcription factor – DNA interactions are affected by the chromatin Chromatin remodeling by ATP-dependent complexes Histone variants (H2A.Z) Post-translational histone modifications H2A (“histone code”) H2B H3 H4 H3 tail Experimental validation of the histone-DNA interaction model Jon Widom Adding key dinucleotide motifs increases nucleosome affinity Deleting dinucleotide motifs or disrupting their spacing decreases affinity dyad 8 28 48 58 68 78 88 98 108 118 128 138 c t ggagaat c c c ggt gc c gaggc c gc t c aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c c c c c gc gt t t t aac c gc c aaggggat t ac t c c c t agt c t c c aggc ac gt gt c agat at at ac at c c t gt c t ggagat ac c c ggt gc t aaggc c gc t t aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c t ac c gc gt t t t aac c gc c aat aggat t ac t t ac t agt c t c t aggc ac gt gt aagat at at ac at c c t gt gt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c t ac c gc gt t t t aac c gc c aat aggat t ac t t ac t ag g1 g2 g3 g4 g5 at ggat c c t t gc aagc t c t t ggt gc gc t t t t t c ggc t gt t gac gc c c t gt t c ggc agt t t t t gc gc ac c t t gagc c c c c t c t c c ggaat t c ac at ggat c c gc gc aagc t c gc ggt gc gc t t aaac ggc t ggc gac gc c c t ggc c ggc agt t t aagc gc ac c gc gagc c c c c t c t c c ggaat t c ac at ggat c c t c gc aagc gagc t t t gc t aggc c c c gt c t gt c gc c t c ac gggac ggaaggggc c t agc ac agc t c gc c c c c gc t c c ggaat t c ac at ggat c c at gc aagc t c at ggt gc gc aat t t c ggc t gat gac gc c c t gat c ggc agaaat t gc gc ac c at gagc c c c c t c t c c ggaat t c ac at ggat c c at gc aagc t c at ggt gc gc c c gggc ggc t gat gac gc c c t gat c ggc agc c c gggc gc ac c at gagc c c c c t c t c c ggaat t c ac c t ggagaat c c c ggt gc c gaggc c gc t c aat t ggt c gt agc aagc t c t agc ac c gc t t aaac gc ac gt ac gc gc t gt c c c c c gc gt t t t aac c gc c aaggggat t ac t c c c t agt c t c c aggc ac gt gt c agat at at ac at c c t gt at ggat c c t agc aagc t c t aggt gc gc t t aaac ggc t gt agac gc c c t at c c t gt ac ggc agt t t aagc gc ac c t agagc c t c c ggaat t c ac at ggat c c t agc at ac t c t aggt t agc t t aaac t ac t gt agac t t ac t gt ac ggc agt t t aagc t aac c t agagt ac c c t c t c c ggaat t c ac 3.00 Relative affinity (fold to g1) Relative affinity (fold to f1) h1 h2 h3 38 2.00 1.00 0.00 f1 f2 f3 1.00 Relative affinity (fold to h1) f1 f2 f3 18 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 g1 g2 g3 g4 g5 h1 h2 h3 Histone-DNA interaction model and DNA flexibility a cc gc t ta a a c g cg ta c tc g c t g ca a C G gcc a a g acc g g ta g tt a GC t GC AA TT TA AA TT TA g g c c AA TT TA a t dyad ct gt cc cc gc g c cg ta c GC dyad GC AA TT TA AA TT TA ta AA TT TA GC AA TT TA Nucleosome affinity depends on the presence and spacing of key dinucleotide motifs (e.g. TA,CA) Nucleosome affinity can be explained by DNA flexibility GC Base-pair steps are fundamental units for DNA mechanics Data-driven model for DNA elastic energy (DNABEND) Geometry distributions for TA steps in ~100 non-homologous protein-DNA complexes: Quadratic sequence-specific DNA elastic energy: • mean = <θ> • width ~ <(θ - <θ>)2>-1 • Matrix of force constants: F Eel 6i, j 1 Fij (i i )( j j ) W.K. Olson et al., PNAS 1998 bs Elastic rod model DNA looping induced by a Lac repressor tetramer Elastic energy and geometry of DNA constrained to follow an arbitrary curve (DNABEND) Δr Econstr rbp 2 bp Sequence-specific DNA elastic energy Etot Eel wEconstr Minimize Etot to determine energy & geometry: Etot 0 i “Constraint” energy System of linear equations: ½ x 6Nbs x 6Nbs Example of DNA geometry prediction: nucleosome structure Ideal superhelix Prediction for NCP (1kx5) Predictions of nucleosome binding affinities Experimental techniques: nucleosome dialysis A.Thastrom et al., J.Mol.Biol. 1999,2004; P.T.Lowary & J.Widom, J.Mol.Biol. 1998 nucleosome exchange T.E.Shrader & D.M.Crothers PNAS 1989; T.E.Shrader & D.M.Crothers J.Mol.Biol. 1990 Alignment model (Segal E. et al. Nature 2006): Collect nucleosome-bound sequences in yeast Center align sequences Construct nucleosome-DNA model using observed dinucleotide frequencies Alignment Model (in vivo selection) MNase digestion Extract DNA, clone into plasmids Sequence and center-align AGGTTTATAG.. AGGTTAATCG.. AGGTAAATAA.. ……………….. Di-nucleotide log score: L 1 142-152 bp log[ P( Si 1 | Si ) / PB ( Si 1 )] i 1 From nucleosome energies to probabilities and occupancies Nucleosome energy Chromosomal coordinate Use dynamic programming to find the partition function Z and thus probabilities and occupancies of each DNA-binding factor, e.g. nucleosomes exp[ E ( conf )] Nucleosome Probability & Occupancy Chromosomal coordinate conf Nucleosome occupancy is dynamic Nucleosome-free site TGACGTCA Nucleosome-occluded site TGACGTCA Nucleosome is displaced by the bound TF TGACGTCA Nucleosome occupancy of TATA boxes explains gene expression levels Nucleosome occupancy in the vicinity of genes Nucleosome occupancy in the vicinity of TATA boxes: default repression TATA Functional sites by ChIP-chip: in vivo genome-wide measurements of TF occupancy Genome-wide occupancies for 203 transcription factors in yeast by ChIP-chip (Harbison et al., Nature 2004: “Transcriptional regulatory code”) MacIsaac et al., BMC Bioinformatics 2006: “An improved map of phylogenetically conserved regulatory sites” (98 factor specificities + 26 more from the literature) Nucleosome occupancy of transcription factor binding sites: default repression • <Occ(functional sites)> - <Occ(non-functional sites)> • In vitro: nucleosomes compete for DNA sequence only with each other DNABEND: Nucleosomes p < 0.05 Nucleosome occupancy of transcription factor binding sites • <Occ(functional sites)> - <Occ(non-functional sites)> • In vivo: nucleosomes compete for DNA sequence with TFs DNABEND: Nucleosomes + TFs p < 0.05 Functional transcription factor sites are clustered DNABEND: Nucleosomes + TFs, randomized functional sites p < 0.05 functional sites non-functional sites Clustering! Functional transcription factor sites are not occupied by nucleosomes in vivo Yuan et al. microarray experiment DNABEND + Transcription Factors DNABEND Alignment model Nucleosome-induced cooperativity Nucleosome-occluded TF sites: no separate binding Nucleosome-occluded TF sites: cooperative binding TGACGTCA TAAGGCCT TGACGTCA TAAGGCCT Miller and Widom, Mol.Cell.Biol. 2003 Nucleosome occupancy of TF sites in a model system TF sites pCYC1 Nucleosome-induced cooperativity: example Nucleosome position predictions: GAL1-10 locus GAL10 GAL1 Nucleosomes in vitro Nucleosomes in vivo TBP GAL4 Nucleosome position predictions: HIS3-PET56 locus Nucleosomes in vitro Nucleosomes in vivo TBP GCN4 Conclusions Predicted histone-DNA binding affinities and genome-wide nucleosome occupancies using a DNA mechanics model + a thermodynamic model of nucleosomes competing with other factors for genomic sequence Chromatin structure around ORF starts is consistent with microarray-based measurements of nucleosome positions, and can be explained with a simple model of nucleosomes “phasing off” bound TBPs Nucleosome-induced cooperativity (brought about by clustering of functional transcription factor binding sites) is responsible for the increased accessibility of functional sites Future Directions Lots of nucleosome positioning sequences [soon to become] available – can a better model of dinucleotide (base stacking) energies be built? {Anirvan Sengupta, Rutgers} Can such a model be used to inform a better DNA mechanics model? Conversely, can a DNA mechanics model be “compressed”, i.e. encapsulated in a simple set of dinucleotide energies? {Anirvan Sengupta, Rutgers} DNABEND extensions to non-nucleosome systems, i.e. nucleoid proteins, DNA loops etc.? {John Marko, Jon Widom, Northwestern} Prediction of in vivo nucleosome positions in gene expression libraries {Ligr et al., Genetics 2006: random libraries of yeast promoters; Lu Bai et al., unpublished} Acknowledgements PEOPLE: Eric Siggia (Rockefeller University) Jon Widom (Northwestern University) Harmen Bussemaker (Columbia University) FUNDING: Leukemia & Lymphoma Society Fellowship BioMaPS Institute, Rutgers University Nucleosome occupancy of chromosomal regions Induced periodicity of stable nucleosomes stable stable Nucleosome position predictions: summary