SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
1
WITH A FOCUS ON ROSETTA
This presentation was prepared by: Xavier Ambroggio,
ambroggiox@niaid.nih.gov
PROTEIN STRUCTURE PREDICTION
OFFICE OF CYBER INFRASTRUCTURE AND COMPUTATIONAL BIOLOGY
NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES
Fall 2011 Computational Structural Biology Seminar Series
2
9 – 11 AM, T/Th in 12A/B51 http://training.cit.nih.gov
Week Day Date Course Instructor CIT Course #
Week 1
Tues Aug. 23 Fundamentals, Data Sources, and Visualization of Macromolecular Structure Darrell Hurt SS260-11001
Thurs Aug. 25 Generating Protein Structures from Homology Darrell Hurt SS270-11001
Week 2
Tues Aug. 30 Predicting Protein Structures from Amino Acid Sequences Xavier Ambroggio SS660-11001
Thurs Sept. 1 Predicting Macromolecular Complexes from Uncomplexed Structures Xavier Ambroggio SS670-11001
Week 3
Tues Sept. 6 Design and Analysis of Macromolecular Interfaces Xavier Ambroggio SS770-11001
Thurs Sept. 8 Analysis and Advanced Visualization of Macromolecular Structure Darrell Hurt SS330-11001
Week 4
Tues Sept. 13 Computational Drug Design Mike Dolan SS340-11001
Thurs Sept. 15 Introduction to Molecular Dynamics Mike Dolan TBA
Week 5 Thurs. Sept. 22 Advanced Molecular Dynamics Mike Dolan TBA
Bioinformatics and Computational Biosciences Branch
3
Scientific
Collaboration
Scientific
Training
Custom Scientific
Software &
Infrastructure
•  Structural Biology
•  Phylogenetics
•  Statistics
•  Sequence Analysis
•  Microarray Analysis
•  NGS Analysis
•  Bioinformatics
•  Biological Networks
•  Function Prediction
•  …
4
Ab Initio Structure Prediction:
Given an amino acid sequence, find the tertiary structure
“Protein folding problem”
CASP: Critical Assessment of protein Structure Prediction
http://predictioncenter.org
•  Double-blind experiment (…competition)
•  World-wide scientific community
•  Unbiased assessment of techniques in structure
prediction
•  Biennial (every even year)
•  “Pulse” of the prediction community
•  What can be predicted?
•  Which servers/algorithms perform best?
6
CASP Overview
Blutsbrüder Design
CASP Top Free-Modeling Servers
7
Why Rosetta focus?
•  Standalone
•  Versatile
  RNA
  design
  dock
  …
•  Open Source
•  Substantial Literature
•  Shared methodology
Use any and all available servers!!!
Das & Baker Annu. Rev. Biochem 2008
prediction
design
Rosetta: multipurpose macromolecular modeling suite
CIT Course #
SS660-11001
CIT Course #
SS670-11001
CIT Course #
SS770-11001
ab initio predict the structure from sequence
relax refine the structure using Rosetta energy functions
idealize replace bond geometries with ideal values
loop modeling build and refine local structurally variable regions in context of a structural template
design optimize sequence given a structure with a fixed backbone
docking structure prediction for a protein-protein complex given subunits
ligand ligand docking
ddG prediction protein-protein interface and protein stability ddG stability calculations for mutations
scoring score input conformations with Rosetta energy functions
RNA predict RNA structures from sequences and design sequences from fixed structures
clustering grouping input structures by RMSD to each other for structure prediction analysis
backrub generate alternate backbone conformations based on sets of rotations
membrane ab initio predict the structures of helical membrane proteins
enzyme design redesign a protein around a ligand
domain assembly fixed domains connected by variable regions
antibody automated antibody homology modeling
XML parsing Parse XML scripts into protocols
Brief Description of Select Rosetta Functions
What types of protein domains can Rosetta fold?
Small, globular, soluble protein domains…
Small, simple membrane protein domains… …but not complex domains or
multi-domain proteins.
T4-lysozyme C-terminal domain
V-type Na+ ATP
synthase subunit
rhodopsin
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
A B C
What are the success rates?
High resolution predictions are achievable
•  targets ≤100 residues
•  success rate ~30%
•  success rate with accurate secondary
structure ~50%
•  a hallmark of accuracy: convergence
11
Slide content courtesy Rhiju Das, Baker Lab
What types of protein domains can no one fold?
CASP9: domains with no good FM predictions
Slide	
  content	
  adapted	
  from	
  talk	
  given	
  by	
  Lisa	
  Kinch	
  of	
  the	
  Grishin	
  lab	
  at	
  CASP9	
  mee>ng:	
  h@p://predic>oncenter.org/casp9/	
  
•  Non-­‐globular	
  
•  Trimeric	
  
•  Fe	
  stabilized	
  
•  High	
  contact	
  order	
  
Many	
  residues	
  close	
  	
  
in	
  3D,	
  far	
  in	
  1D	
  	
  
•  +	
  elongated	
  sheet?	
  
T0591d1,	
  3MWT	
   T0550d2,	
  3NQK	
  
T0629d2,	
  2XGF	
  
1.  Select	
  fragments	
  consistent	
  with	
  local	
  
sequence	
  preferences	
  
2.  Assemble	
  fragments	
  into	
  models	
  with	
  
na>ve-­‐like	
  global	
  proper>es	
  
3.  Iden>fy	
  the	
  best	
  model	
  from	
  the	
  
popula>on	
  of	
  decoys	
  
Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”
Figures adapted from Charlie Strauss;
Protein structure prediction using ROSETTA, Rohl et al (2004) Methods in Enzymology, 383:66
Basic	
  Ab	
  Ini'o	
  Rose<a	
  protocol
Assembly	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Decoy	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Fragment	
  
Decoy	
  
Fragment-Based Structure Prediction
Rosetta, Quark, …
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
  
Template(s)	
   Model	
  Alignment	
  Homology modeling:
First atomic-resolution model
Target 0281 CASP6
•  Topology sampled by ab initio trajectory
of homolog sequence (rmsd=2.2Å)
•  Full atom refinement reduces rmsd to
1.5Å
•  Side chain packing accurately
recovered
Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”
Figures adapted from Bradley P, Malmström L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KM, Baker D. Free modeling with Rosetta in CASP6. Proteins.
Folding Theory: Sequence-Structure Relationships
16
•  Secondary structure formation is the earliest part of the folding process
•  Local sequence codes for local structures… i.e. fragments
  helical sequences in a folded protein tend to be helical in isolation
•  Secondary structure prediction algorithms have ~70-80% accuracy
  Partial failure due to tertiary interactions stabilizing secondary structure elements
Rosetta fragments
•  3 and 9 residue fragments matched to
query sequence
•  database created from crystal structures
  < 2.5Å resolution
  < 50% sequence identity
•  low resolution modeling
  centroid representation of side chains
•  ranked by:
  alignment
  Secondary structure predictions
•  PSI-PRED
•  SAM-T02
•  Jufo
•  PhD
17
KVFGRCELAAAMKRHGLDNYRGYSLGNWVC...
KVF
KVFGRCELA
VFG
VFGRCELAA
FGR
FGRCELAAA
GRC
GRCELAAAM
---------------------------------
EEEE TT S EEEEEEE TT HH...
query
sec str
Slide content courtesy David Hoover, CIT, NIH
Sliding fragment windows
# Rank G K L M Q E R A
13 1000 G K L
25 821 G R L
46 1000 K L M
21 635 R L M
43 923 K V M
26 523 R V M
15 970 M Q E
26 934 E R A
Separate 3-mer and 9-mer libraries generated
Slide content courtesy David Hoover, CIT, NIH
Example 3-mer fragment library
Making Fragment Libraries with Robetta
http://robetta.bakerlab.org/
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
Making Fragment Libraries on Biowulf
Slide content by David Hoover from: http://biowulf.nih.gov/apps/Rosetta23.html#RosettaFragments
22
•  Levinthal paradox:
  Given either alpha, beta, or loop conformation, for protein of nres, 3nres possible conformations.
  If nres = 100, sampling a conformation every 10-13 seconds = 1027 years to fold
  Universe is 1010 years old.
  Folding is non-random and cooperative.
•  Many different combinations of secondary structure elements have similar stabilities
  Tertiary (side-chain level) interactions drive folding towards the native topology
  Phase transition results in a substantial energy gap between native and non-native structures
Folding Theory: The Folding Landscape
•  Cyrus Levinthal, J. Chim. Phys. 65, 44; 1968
•  Hue Sun Chan and Ken A. Dill, Protein Folding in the Landscape Perspective: Chevron Plots and Non-
Arrhenius Kinetics, Proteins: Structure, Function, and Genetics, Volume 30, No. 1, January 1998, pp 2-33.
Implications and requirements for folding algorithm:
•  Fast conformational sampling algorithm
•  Accurate scoring function
•  Full-atom modeling
early centroid models centroid models final full-atom models
Assembly Coarse funnel to native-like decoys Fine-grained funnel to near-native decoys
Major Classes of Energy Functions in Rosetta
24
Low resolution: reduced atom representation (centroid)
  simplified energy function
  used for aggressive search of state space
High resolution: full-atom representation
  detailed energy function
  local search of state space
  refinement and minimization
General
  weighted sum of linear terms: Energy = w1*term1 + w2*term2 + …
  pairwise decomposable (speed)
  weighted for task, e.g. ligand docking
Low resolution (centroid) folding
25
  Fragment insertion
  conformation modification occurs in torsion space
  initial insertions result in large changes in dihedrals
  9 mers inserted first followed by 3 mers later in process
  later insertions purposefully result in small changes in dihedrals random insertion
*
*
Sss + SHS - sheet and helix-sheet geometries
•  Scβ density/compactness of structure
•  Svdw no clashes
•  SRgyr radius	
  of	
  gyra>on	
  (Rgyr),	
  globular structure
Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”
Driving assembly towards native-like decoys
Low-resolution homolog folding improves prediction
•  Collect homologs
•  Create low-resolution models
  cluster
•  Thread query sequence onto models
•  Proceed to fullatom refinement
…	
   …	
   …	
  
Slide content adapted from Ora Schueler-Furman’s
“Workshop in Structural Computational Biology”
Low resolution (centroid) folding example
28
Clustering:
Graphical representation
29
30
High resolution (full-atom) refinement
Chen Y et al. Nucl. Acids Res. 2004;32:5147-5162
evaluating/optimizing specific atom-atom interactions
e.g. hydrogen bonding:
Comparison of low resolution, relax, and abrelax folding example
31
32
Examples from the Rosetta@home archive of top predictions
Note: massively parallel computation
rosetta prediction
crystal structure
Detailed ab initio Rosetta Workflow
33
INPUT
•  amino acid sequence
•  secondary structure prediction(s)
•  fragment library
•  constraints from experimental data
•  NMR
•  biochemical/biophysical studies
•  ...
LOW RESOLUTION FOLDING
•  fragment insertions
•  scoring
•  filters
CLUSTERING
•  groups of decoys with low RMSD to each other
•  lowest energy decoy of clusters selected for
further refinement or prediction
HIGH RESOLUTION REFINEMENT
•  backbone minimization
•  rotamer optimization
ADDITIONAL MODELING
•  identifying variable regions
•  rebuilding
>103-106
trajectories
automated
manual
34
Computational Considerations
Protocol Utility Caveats
Centroid •  fast
•  widely sample conformational space
•  possibility of no near-native models after low
resolution folding
•  no discrimination by energy
Full-atom
refinement
•  near-native decoys separated by energy •  more computationally demanding
•  must have near-native in starting decoy pool
Combined •  streamlined
•  for powerful and massively parallel
computing
•  most computationally demanding
•  improvement only with sufficient sampling
35
Native (CheY)
A ~1000-fold increase in computational power
Slide content courtesy Rhiju Das, Baker Lab
36
Architect of Rosetta@home: David Kim	

A ~1000-fold increase in computational power
Native (CheY)
Lowest energy
Rosetta
structure
“brute force” approach
Computational power vs. accuracy
in ab initio structure prediction
37
Cα RMSD of lowest energy model to the native structure vs. sample size
Sample Size
RMSDtonative
Category 1:
Successful high-resolution predictions
Category 2:
Successful high-resolution predictions
with additional sampling
Category 3:
Unsuccessful predictions (with any amount of sampling)
Kim DE, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein structure prediction. J Mol Biol. 2009 Oct 16;393(1):249-60.
38
“De novo” phasing: large-scale tests
Tests on 30 data sets
(covering 16 proteins)
Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.
TF Z-score Have I solved it?
< 5 no
5 - 6 unlikely
6 - 7 possibly
7 - 8 probably
> 8 definitely
39
“De novo” phasing: large-scale tests
Tests on 30 data sets
(covering 16 proteins)
1hz5-sf.cif
Å
Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.
Rosetta-refined native
(positive controls)
Rosetta-refined de novo models
40
“De novo” phasing: large-scale tests
Tests on 30 data sets
(covering 16 proteins)
1hz5-sf.cif
Success in 14/30 data sets
Å
Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.
Rosetta-refined native
(positive controls)
Rosetta-refined de novo models
41
“De novo” phasing: large-scale tests
Tests on 30 data sets
(covering 16 proteins)
Rosetta-refined native
(positive controls)
Rosetta-refined de novo models
Rosetta-refined de novo models, fragments with
correct native 2° structure
1hz5-sf.cif
Å
Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.
Preparation for folding simulations
•  proper secondary structure assignment
•  constraints
•  limit search space
•  increase sampling efficiency
•  decrease CPU time
42
Constraints
•  There are constraint types and function types
  Constraint types: AtomPair, Angle, Dihedral, etc.
  Function types: Bounded, Spline, Harmonic, Gaussian, etc.
•  Each constraint is scored individually and the total constraint score is the sum of all
individual scores
•  Each constraint can have its own constraint type and function type.
  In some cases, like when using Spline function, each constraint can have its own
weight
•  How you define the constraint and how it’s scored depends on the constraint type;
this is same with function type.
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
Constraint file example: EPR data
<cst type> <atom1> <res1> <atom2> <res2> <cst_func> <RosettaEPR> <Dcb> <weight> <bin>!
AtomPair CB 32 CB 36 SPLINE EPR_DISTANCE 16.0 1.0 0.5!
AtomPair CB 59 CB 74 SPLINE EPR_DISTANCE 19.0 1.0 0.5!
AtomPair CB 62 CB 71 SPLINE EPR_DISTANCE 19.0 1.0 0.5!
AtomPair CB 62 CB 74 SPLINE EPR_DISTANCE 25.0 1.0 0.5!
AtomPair CB 63 CB 74 SPLINE EPR_DISTANCE 14.0 1.0 0.5!
AtomPair CB 66 CB 74 SPLINE EPR_DISTANCE 23.0 1.0 0.5!
AtomPair CB 83 CB 90 SPLINE EPR_DISTANCE 13.0 1.0 0.5!
Constraint info Constraint Function info
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
Membrane protein ab initio
•  RosettaMembrane divides the protein into:
  hydrophobic
  hydrophilic
  soluble layers
•  Specific scoring function for each layer
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
Figure from Yarov-Yarovoy, Schonbrun, and Baker 2006.
Input	
  Files	
  
Spanfile	
  -­‐	
  *.span	
  
	
  -­‐-­‐transmembrane	
  topology	
  predic>on	
  file	
  generated	
  using	
  octopus2span.pl	
  script	
  
	
  -­‐-­‐Input	
  OCTOPUS	
  topology	
  file	
  is	
  generated	
  at	
  h@p://octopus.cbr.su.se	
  using	
  protein	
  
sequence	
  as	
  input.	
  
Lipopholicity	
  predicDon	
  file	
  -­‐	
  *.lips4	
  
	
  -­‐-­‐Generate	
  using	
  run_lips.pl	
  script	
  
	
  -­‐-­‐Need	
  input	
  FASTA	
  file,	
  spanfile,	
  blaspgp	
  and	
  nr	
  (NCBI)	
  database	
  
to	
  run	
  
Fragment	
  generaDon	
  
	
  -­‐-­‐Advised	
  to	
  use	
  SAM	
  but	
  not	
  JUFO	
  or	
  PSIPRED,	
  which	
  predict	
  TMH	
  regions	
  poorly	
  
Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
Folding and studying folding with molecular dynamics
Specialized hardware, ANTON capable of continuous ms length trajectories
Standard simulations:
1 - 3 µs simulations ~ months of HPC
Approximate Rates of Folding:
1 µs helix
10 µs sheet
100 µs fast folding protein
1+ ms typical protein
D E Shaw et al. Science 2010;330:341-346
simulation of villin at 300 K
2-8 µs folder
simulation of FiP35 at 337 K
20-80 µs folder
Blue: x-ray structures
Red: last frame of MD simulation
Folding proteins at x-ray resolution
Published by AAAS
tip of hairpin 1 (12-18, blue)
hairpin 1 (8-22, green)
hairpin 2 (19-30, orange)
full protein (2-33, red)
D E Shaw et al. Science 2010;330:341-346
Reversible folding simulation of FiP35.
Thank You
For questions or comments please contact:
ScienceApps@niaid.nih.gov
301.496.4455
50

Más contenido relacionado

La actualidad más candente (20)

Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Protein database
Protein databaseProtein database
Protein database
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
MD Simulation
MD SimulationMD Simulation
MD Simulation
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Pymol
PymolPymol
Pymol
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure Prediction
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Prosite
PrositeProsite
Prosite
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 

Similar a Protein structure prediction with a focus on Rosetta

Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentSaramita De Chakravarti
 
Final report - Adam Zienkiewicz
Final report - Adam ZienkiewiczFinal report - Adam Zienkiewicz
Final report - Adam ZienkiewiczAdam Zienkiewicz
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputingNatalio Krasnogor
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignmentShwetA Kumari
 
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)0309akash
 
Bio process
Bio processBio process
Bio processsun777
 
Bio process
Bio processBio process
Bio processsun777
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modelingpkchoudhury
 
Modelling Functional Motions of Biological Systems by Customised Natural Moves.
Modelling Functional Motions of Biological Systems by Customised Natural Moves.Modelling Functional Motions of Biological Systems by Customised Natural Moves.
Modelling Functional Motions of Biological Systems by Customised Natural Moves.Samuel Demharter
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal
 
Bioinformatics lecture xxiii
Bioinformatics lecture xxiiiBioinformatics lecture xxiii
Bioinformatics lecture xxiiiMuhammad Younis
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
L1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxL1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxkigaruantony
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsRyohei Suzuki
 
Systems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectiveSystems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectivePhilip Bourne
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug designThanh Truong
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure DeterminationAmjad Ibrahim
 

Similar a Protein structure prediction with a focus on Rosetta (20)

Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Final report - Adam Zienkiewicz
Final report - Adam ZienkiewiczFinal report - Adam Zienkiewicz
Final report - Adam Zienkiewicz
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputing
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignment
 
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
 
protein Modeling Abi.pptx
protein Modeling Abi.pptxprotein Modeling Abi.pptx
protein Modeling Abi.pptx
 
Bio process
Bio processBio process
Bio process
 
Bio process
Bio processBio process
Bio process
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Modelling Functional Motions of Biological Systems by Customised Natural Moves.
Modelling Functional Motions of Biological Systems by Customised Natural Moves.Modelling Functional Motions of Biological Systems by Customised Natural Moves.
Modelling Functional Motions of Biological Systems by Customised Natural Moves.
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013
 
Bioinformatics lecture xxiii
Bioinformatics lecture xxiiiBioinformatics lecture xxiii
Bioinformatics lecture xxiii
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
L1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxL1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptx
 
Paper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problemsPaper memo: persistent homology on biological problems
Paper memo: persistent homology on biological problems
 
Systems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectiveSystems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural Perspective
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 

Más de Bioinformatics and Computational Biosciences Branch

Más de Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 
Appendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductorAppendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductor
 
Crash course in R and BioConductor
Crash course in R and BioConductorCrash course in R and BioConductor
Crash course in R and BioConductor
 
GraphPad Prism: Customizing your graphs
GraphPad Prism: Customizing your graphsGraphPad Prism: Customizing your graphs
GraphPad Prism: Customizing your graphs
 

Último

DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialMarkus Roggen
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 

Último (20)

DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s Potential
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 

Protein structure prediction with a focus on Rosetta

  • 1. 1 WITH A FOCUS ON ROSETTA This presentation was prepared by: Xavier Ambroggio, ambroggiox@niaid.nih.gov PROTEIN STRUCTURE PREDICTION OFFICE OF CYBER INFRASTRUCTURE AND COMPUTATIONAL BIOLOGY NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES
  • 2. Fall 2011 Computational Structural Biology Seminar Series 2 9 – 11 AM, T/Th in 12A/B51 http://training.cit.nih.gov Week Day Date Course Instructor CIT Course # Week 1 Tues Aug. 23 Fundamentals, Data Sources, and Visualization of Macromolecular Structure Darrell Hurt SS260-11001 Thurs Aug. 25 Generating Protein Structures from Homology Darrell Hurt SS270-11001 Week 2 Tues Aug. 30 Predicting Protein Structures from Amino Acid Sequences Xavier Ambroggio SS660-11001 Thurs Sept. 1 Predicting Macromolecular Complexes from Uncomplexed Structures Xavier Ambroggio SS670-11001 Week 3 Tues Sept. 6 Design and Analysis of Macromolecular Interfaces Xavier Ambroggio SS770-11001 Thurs Sept. 8 Analysis and Advanced Visualization of Macromolecular Structure Darrell Hurt SS330-11001 Week 4 Tues Sept. 13 Computational Drug Design Mike Dolan SS340-11001 Thurs Sept. 15 Introduction to Molecular Dynamics Mike Dolan TBA Week 5 Thurs. Sept. 22 Advanced Molecular Dynamics Mike Dolan TBA
  • 3. Bioinformatics and Computational Biosciences Branch 3 Scientific Collaboration Scientific Training Custom Scientific Software & Infrastructure •  Structural Biology •  Phylogenetics •  Statistics •  Sequence Analysis •  Microarray Analysis •  NGS Analysis •  Bioinformatics •  Biological Networks •  Function Prediction •  …
  • 4. 4 Ab Initio Structure Prediction: Given an amino acid sequence, find the tertiary structure “Protein folding problem”
  • 5. CASP: Critical Assessment of protein Structure Prediction http://predictioncenter.org •  Double-blind experiment (…competition) •  World-wide scientific community •  Unbiased assessment of techniques in structure prediction •  Biennial (every even year) •  “Pulse” of the prediction community •  What can be predicted? •  Which servers/algorithms perform best?
  • 7. CASP Top Free-Modeling Servers 7 Why Rosetta focus? •  Standalone •  Versatile   RNA   design   dock   … •  Open Source •  Substantial Literature •  Shared methodology Use any and all available servers!!!
  • 8. Das & Baker Annu. Rev. Biochem 2008 prediction design Rosetta: multipurpose macromolecular modeling suite CIT Course # SS660-11001 CIT Course # SS670-11001 CIT Course # SS770-11001
  • 9. ab initio predict the structure from sequence relax refine the structure using Rosetta energy functions idealize replace bond geometries with ideal values loop modeling build and refine local structurally variable regions in context of a structural template design optimize sequence given a structure with a fixed backbone docking structure prediction for a protein-protein complex given subunits ligand ligand docking ddG prediction protein-protein interface and protein stability ddG stability calculations for mutations scoring score input conformations with Rosetta energy functions RNA predict RNA structures from sequences and design sequences from fixed structures clustering grouping input structures by RMSD to each other for structure prediction analysis backrub generate alternate backbone conformations based on sets of rotations membrane ab initio predict the structures of helical membrane proteins enzyme design redesign a protein around a ligand domain assembly fixed domains connected by variable regions antibody automated antibody homology modeling XML parsing Parse XML scripts into protocols Brief Description of Select Rosetta Functions
  • 10. What types of protein domains can Rosetta fold? Small, globular, soluble protein domains… Small, simple membrane protein domains… …but not complex domains or multi-domain proteins. T4-lysozyme C-terminal domain V-type Na+ ATP synthase subunit rhodopsin Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop A B C
  • 11. What are the success rates? High resolution predictions are achievable •  targets ≤100 residues •  success rate ~30% •  success rate with accurate secondary structure ~50% •  a hallmark of accuracy: convergence 11 Slide content courtesy Rhiju Das, Baker Lab
  • 12. What types of protein domains can no one fold? CASP9: domains with no good FM predictions Slide  content  adapted  from  talk  given  by  Lisa  Kinch  of  the  Grishin  lab  at  CASP9  mee>ng:  h@p://predic>oncenter.org/casp9/   •  Non-­‐globular   •  Trimeric   •  Fe  stabilized   •  High  contact  order   Many  residues  close     in  3D,  far  in  1D     •  +  elongated  sheet?   T0591d1,  3MWT   T0550d2,  3NQK   T0629d2,  2XGF  
  • 13. 1.  Select  fragments  consistent  with  local   sequence  preferences   2.  Assemble  fragments  into  models  with   na>ve-­‐like  global  proper>es   3.  Iden>fy  the  best  model  from  the   popula>on  of  decoys   Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology” Figures adapted from Charlie Strauss; Protein structure prediction using ROSETTA, Rohl et al (2004) Methods in Enzymology, 383:66 Basic  Ab  Ini'o  Rose<a  protocol
  • 14. Assembly   Decoy   Decoy   Decoy   Decoy   Decoy   Decoy   Decoy   Decoy   Decoy   Fragment   Fragment   Fragment   Fragment   Fragment   Fragment   Fragment   Fragment   Fragment   Fragment   Decoy   Fragment-Based Structure Prediction Rosetta, Quark, … Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Template(s)   Model  Alignment  Homology modeling:
  • 15. First atomic-resolution model Target 0281 CASP6 •  Topology sampled by ab initio trajectory of homolog sequence (rmsd=2.2Å) •  Full atom refinement reduces rmsd to 1.5Å •  Side chain packing accurately recovered Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology” Figures adapted from Bradley P, Malmström L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KM, Baker D. Free modeling with Rosetta in CASP6. Proteins.
  • 16. Folding Theory: Sequence-Structure Relationships 16 •  Secondary structure formation is the earliest part of the folding process •  Local sequence codes for local structures… i.e. fragments   helical sequences in a folded protein tend to be helical in isolation •  Secondary structure prediction algorithms have ~70-80% accuracy   Partial failure due to tertiary interactions stabilizing secondary structure elements
  • 17. Rosetta fragments •  3 and 9 residue fragments matched to query sequence •  database created from crystal structures   < 2.5Å resolution   < 50% sequence identity •  low resolution modeling   centroid representation of side chains •  ranked by:   alignment   Secondary structure predictions •  PSI-PRED •  SAM-T02 •  Jufo •  PhD 17
  • 18. KVFGRCELAAAMKRHGLDNYRGYSLGNWVC... KVF KVFGRCELA VFG VFGRCELAA FGR FGRCELAAA GRC GRCELAAAM --------------------------------- EEEE TT S EEEEEEE TT HH... query sec str Slide content courtesy David Hoover, CIT, NIH Sliding fragment windows
  • 19. # Rank G K L M Q E R A 13 1000 G K L 25 821 G R L 46 1000 K L M 21 635 R L M 43 923 K V M 26 523 R V M 15 970 M Q E 26 934 E R A Separate 3-mer and 9-mer libraries generated Slide content courtesy David Hoover, CIT, NIH Example 3-mer fragment library
  • 20. Making Fragment Libraries with Robetta http://robetta.bakerlab.org/ Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
  • 21. Making Fragment Libraries on Biowulf Slide content by David Hoover from: http://biowulf.nih.gov/apps/Rosetta23.html#RosettaFragments
  • 22. 22 •  Levinthal paradox:   Given either alpha, beta, or loop conformation, for protein of nres, 3nres possible conformations.   If nres = 100, sampling a conformation every 10-13 seconds = 1027 years to fold   Universe is 1010 years old.   Folding is non-random and cooperative. •  Many different combinations of secondary structure elements have similar stabilities   Tertiary (side-chain level) interactions drive folding towards the native topology   Phase transition results in a substantial energy gap between native and non-native structures Folding Theory: The Folding Landscape •  Cyrus Levinthal, J. Chim. Phys. 65, 44; 1968 •  Hue Sun Chan and Ken A. Dill, Protein Folding in the Landscape Perspective: Chevron Plots and Non- Arrhenius Kinetics, Proteins: Structure, Function, and Genetics, Volume 30, No. 1, January 1998, pp 2-33. Implications and requirements for folding algorithm: •  Fast conformational sampling algorithm •  Accurate scoring function •  Full-atom modeling
  • 23. early centroid models centroid models final full-atom models Assembly Coarse funnel to native-like decoys Fine-grained funnel to near-native decoys
  • 24. Major Classes of Energy Functions in Rosetta 24 Low resolution: reduced atom representation (centroid)   simplified energy function   used for aggressive search of state space High resolution: full-atom representation   detailed energy function   local search of state space   refinement and minimization General   weighted sum of linear terms: Energy = w1*term1 + w2*term2 + …   pairwise decomposable (speed)   weighted for task, e.g. ligand docking
  • 25. Low resolution (centroid) folding 25   Fragment insertion   conformation modification occurs in torsion space   initial insertions result in large changes in dihedrals   9 mers inserted first followed by 3 mers later in process   later insertions purposefully result in small changes in dihedrals random insertion * *
  • 26. Sss + SHS - sheet and helix-sheet geometries •  Scβ density/compactness of structure •  Svdw no clashes •  SRgyr radius  of  gyra>on  (Rgyr),  globular structure Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology” Driving assembly towards native-like decoys
  • 27. Low-resolution homolog folding improves prediction •  Collect homologs •  Create low-resolution models   cluster •  Thread query sequence onto models •  Proceed to fullatom refinement …   …   …   Slide content adapted from Ora Schueler-Furman’s “Workshop in Structural Computational Biology”
  • 28. Low resolution (centroid) folding example 28
  • 30. 30 High resolution (full-atom) refinement Chen Y et al. Nucl. Acids Res. 2004;32:5147-5162 evaluating/optimizing specific atom-atom interactions e.g. hydrogen bonding:
  • 31. Comparison of low resolution, relax, and abrelax folding example 31
  • 32. 32 Examples from the Rosetta@home archive of top predictions Note: massively parallel computation rosetta prediction crystal structure
  • 33. Detailed ab initio Rosetta Workflow 33 INPUT •  amino acid sequence •  secondary structure prediction(s) •  fragment library •  constraints from experimental data •  NMR •  biochemical/biophysical studies •  ... LOW RESOLUTION FOLDING •  fragment insertions •  scoring •  filters CLUSTERING •  groups of decoys with low RMSD to each other •  lowest energy decoy of clusters selected for further refinement or prediction HIGH RESOLUTION REFINEMENT •  backbone minimization •  rotamer optimization ADDITIONAL MODELING •  identifying variable regions •  rebuilding >103-106 trajectories automated manual
  • 34. 34 Computational Considerations Protocol Utility Caveats Centroid •  fast •  widely sample conformational space •  possibility of no near-native models after low resolution folding •  no discrimination by energy Full-atom refinement •  near-native decoys separated by energy •  more computationally demanding •  must have near-native in starting decoy pool Combined •  streamlined •  for powerful and massively parallel computing •  most computationally demanding •  improvement only with sufficient sampling
  • 35. 35 Native (CheY) A ~1000-fold increase in computational power Slide content courtesy Rhiju Das, Baker Lab
  • 36. 36 Architect of Rosetta@home: David Kim A ~1000-fold increase in computational power Native (CheY) Lowest energy Rosetta structure “brute force” approach
  • 37. Computational power vs. accuracy in ab initio structure prediction 37 Cα RMSD of lowest energy model to the native structure vs. sample size Sample Size RMSDtonative Category 1: Successful high-resolution predictions Category 2: Successful high-resolution predictions with additional sampling Category 3: Unsuccessful predictions (with any amount of sampling) Kim DE, Blum B, Bradley P, Baker D. Sampling bottlenecks in de novo protein structure prediction. J Mol Biol. 2009 Oct 16;393(1):249-60.
  • 38. 38 “De novo” phasing: large-scale tests Tests on 30 data sets (covering 16 proteins) Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007. TF Z-score Have I solved it? < 5 no 5 - 6 unlikely 6 - 7 possibly 7 - 8 probably > 8 definitely
  • 39. 39 “De novo” phasing: large-scale tests Tests on 30 data sets (covering 16 proteins) 1hz5-sf.cif Å Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007. Rosetta-refined native (positive controls) Rosetta-refined de novo models
  • 40. 40 “De novo” phasing: large-scale tests Tests on 30 data sets (covering 16 proteins) 1hz5-sf.cif Success in 14/30 data sets Å Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007. Rosetta-refined native (positive controls) Rosetta-refined de novo models
  • 41. 41 “De novo” phasing: large-scale tests Tests on 30 data sets (covering 16 proteins) Rosetta-refined native (positive controls) Rosetta-refined de novo models Rosetta-refined de novo models, fragments with correct native 2° structure 1hz5-sf.cif Å Slide content courtesy Rhiju Das, Baker Lab; Bin et al., Nature 2007.
  • 42. Preparation for folding simulations •  proper secondary structure assignment •  constraints •  limit search space •  increase sampling efficiency •  decrease CPU time 42
  • 43. Constraints •  There are constraint types and function types   Constraint types: AtomPair, Angle, Dihedral, etc.   Function types: Bounded, Spline, Harmonic, Gaussian, etc. •  Each constraint is scored individually and the total constraint score is the sum of all individual scores •  Each constraint can have its own constraint type and function type.   In some cases, like when using Spline function, each constraint can have its own weight •  How you define the constraint and how it’s scored depends on the constraint type; this is same with function type. Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
  • 44. Constraint file example: EPR data <cst type> <atom1> <res1> <atom2> <res2> <cst_func> <RosettaEPR> <Dcb> <weight> <bin>! AtomPair CB 32 CB 36 SPLINE EPR_DISTANCE 16.0 1.0 0.5! AtomPair CB 59 CB 74 SPLINE EPR_DISTANCE 19.0 1.0 0.5! AtomPair CB 62 CB 71 SPLINE EPR_DISTANCE 19.0 1.0 0.5! AtomPair CB 62 CB 74 SPLINE EPR_DISTANCE 25.0 1.0 0.5! AtomPair CB 63 CB 74 SPLINE EPR_DISTANCE 14.0 1.0 0.5! AtomPair CB 66 CB 74 SPLINE EPR_DISTANCE 23.0 1.0 0.5! AtomPair CB 83 CB 90 SPLINE EPR_DISTANCE 13.0 1.0 0.5! Constraint info Constraint Function info Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
  • 45. Membrane protein ab initio •  RosettaMembrane divides the protein into:   hydrophobic   hydrophilic   soluble layers •  Specific scoring function for each layer Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop Figure from Yarov-Yarovoy, Schonbrun, and Baker 2006.
  • 46. Input  Files   Spanfile  -­‐  *.span    -­‐-­‐transmembrane  topology  predic>on  file  generated  using  octopus2span.pl  script    -­‐-­‐Input  OCTOPUS  topology  file  is  generated  at  h@p://octopus.cbr.su.se  using  protein   sequence  as  input.   Lipopholicity  predicDon  file  -­‐  *.lips4    -­‐-­‐Generate  using  run_lips.pl  script    -­‐-­‐Need  input  FASTA  file,  spanfile,  blaspgp  and  nr  (NCBI)  database   to  run   Fragment  generaDon    -­‐-­‐Advised  to  use  SAM  but  not  JUFO  or  PSIPRED,  which  predict  TMH  regions  poorly   Slide content adapted from Stephanie Hirst at the 2011 Vanderbilt Rosetta Workshop
  • 47. Folding and studying folding with molecular dynamics Specialized hardware, ANTON capable of continuous ms length trajectories Standard simulations: 1 - 3 µs simulations ~ months of HPC Approximate Rates of Folding: 1 µs helix 10 µs sheet 100 µs fast folding protein 1+ ms typical protein
  • 48. D E Shaw et al. Science 2010;330:341-346 simulation of villin at 300 K 2-8 µs folder simulation of FiP35 at 337 K 20-80 µs folder Blue: x-ray structures Red: last frame of MD simulation Folding proteins at x-ray resolution
  • 49. Published by AAAS tip of hairpin 1 (12-18, blue) hairpin 1 (8-22, green) hairpin 2 (19-30, orange) full protein (2-33, red) D E Shaw et al. Science 2010;330:341-346 Reversible folding simulation of FiP35.
  • 50. Thank You For questions or comments please contact: ScienceApps@niaid.nih.gov 301.496.4455 50