SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
© 2009 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb,
iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genome Informatics
Alliance 2013
Defining Genomic Big Data
and its Impact on Scientific
Progress
2
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
From Whence We Came…
ATGCCGTTT…
CCGGTTAAT…
GAATTGCAG…
6:A2567C
12:C123T
20:T4678A
30-40TB
˜5TB
600GB
˜20GB
3
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data
Large amounts of data generated in genomics; multiple
samples, size of data, etc
Integration of digital data to enrich context of samples;
DNA, RNA, methylation, time courses, spatial
distributions with samples, …
Fusion of digital data and categorical data; combination
rules (categories), extraction from unstructured inputs,
…
Tools and techniques appropriate for resultant data
sets; visualization, model building, exploration, …
Advances require data mining rather than the one-at-a-
time hypothesis testing approaches of today
4
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data and Personal Genome Information
PERSONAL SEQUENCE
(owned by individual/doctor)
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
RISK VARIANTS
(approved for clinical use)
Human Genome
Clinical studies Populations
SequencingFunctional annotation
3: 12,300 3: 12,400 ( kb )
PPARg
GENOMIC ANNOTATION
(in public domain)
Variant: C3 : 12,450,610 : T0.7/C0.3 :
PPARG : Pro/Leu :
Medical
consequence:
Associated with severe insulin
resistance, diabetes mellitus,
hypertension
Pharmacological
consequence:
Resistant to thiazolidinediones
CLINICAL DECISION
Consultation
Consent
Clinical assessment
Selected risk
information
5
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Sequencing a 17-member three-generation
pedigree.
– Ultra deep sequencing improves sensitivity
– Leveraging inheritance information improves
accuracy
– Data and results made publicly available
Identifying ultra accurate genomic variants is
enabling rapid improvements in technology
and software
This data will allow us to assess accuracy for
many FDA submissions
We are collaborating with NIST & CDC to
develop a public resource for quantifying
sequencing accuracy
Platinum Genomes as a Truth Reference
Creating a catalogue of highly-accurate SNPs, indels & SVs
6
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Reduction from 40 Q-scores to 8 Q-scores becoming accepted
Sequencing output is still increasing exponentially therefore further
compression is likely to be required
Platinum genome work suggest ~95% of genome is consistently called (this
95% is known as the platinum regions)
Regions which are reliably called may not need 8 Q-scores resolution
– we can reduce “well
sequenced” regions to 2 Q-
scores
Start with 8 Q-score bam file:
– Reduce the platinum regions
to 2 Q-scores (keep non-
platinum at 8 Q-scores)
– Reduce the platinum regions
to 1 Q-score
– Whole genome
2 Q-score
– Reduce platinum region to 2
Q-scores but also keep
original Q-scores of
mismatches (MM) and
anomalous reads
– ~40Gb (20Gb CRAM)
Data Reduction Via Vertical Compression (NA12882)
Build Total SNPs
(>Q20)
SNPs diff
genotype
(>Q20)
Not called in
Q-score
compressed
build
(>Q20)
Not called in 8
Q-score build
(>Q20)
8 Q-score 3,735,575
(3,627,165)
- - -
8 Q-score
technical
replicate
3,734,849
(3,626,485)
45,584
(22,400)
80,131 (29,211) 79,405 (28,845)
Platinum
Genome 2 Q-
score
3,732,568
(3,620,612)
3,255 (161) 3417 (63) 410 (127)
Platinum
Genome 1 Q-
score
3,764,928
(3,626,468)
4002 (584) 2605 (75) 31,958 (2964)
Whole Genome
2 Q-score
3,712,636
(3,598,400)
25,175 (1912) 24,237 (166) 1298 (112)
Platinum 2 q-
score keep MM
and anom.
reads
3,735,684
(3,627,226)
197 (123) 142 (35) 251 (102)
7
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Faster Data – DNA to Result in <2 Days
12 core server
64Gb RAM
Sequence Analyze AnnotateSample
27 hr 8 hr
HiSeq2500 Isaac analysis overnight
40 hr
Fast turnaround is required for clinical applications
4.5 hr
PCR Free library
8
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
WGS reveals somatic mutations in TERT
gene promoter of melanoma patients
Form a novel transcription factor binding
motif
Recurrence in melanoma is as high as
any known coding mutation
Importance of Non-coding Mutations – Bigger Data!
-200 -100
TERT gene
0 +100 +200
Gene (mutation) Incidence in
melanoma
TERT (promoter) 52%
BRAF (V600E) 53%
CDKN2A 50%
NRAS (Q61R) 28%
TERT (coding) 1%
Horn et al. & Huang et al., Science 2013
9
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Complexity of Data
10
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Surveillance of Leukaemia (CLL) – More Data Complexity!
0 6463 65 6662
Event
Timeline
Sequencing
Birth DeathTreatmentDiagnosis TreatmentTreatment
0
50
100
150
200
250
a b c d e
NORMAL
CLASS 4
CLASS 3
CLASS 2
CLASS 1
Time points
Abundance
Changing
subclonal
populations
0
1
2
3
4
5
c
NO
CL
CL
CL
CL
“Remission” has
disease
Schuh et al., Oxford
11
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
A Deeper Complexity of Genomic Data
12
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Utility Requires Complex Composite Information
iPad
Plug and Play
Cloud
Allele Frequency
in populations
www.1000genomes.org
Medical/Risk data
(with expert review)
Hgmd, pharmgkb
Genetic Variants
dbSNP
Functional Effects
ensembl.org,
genome.ucsc.edu,
encode.org
Disease association
genome.gov
ANNOTATED
GENOME
( gVCF)
<1Gbyte
Ancestry
Tissue type
Risk
Carrier status
Diagnosis
Drug
response
Annotate DisseminateInterpret
13
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Apps
Public Genomic Databases
Users
EMR
Support & Engineering
Instruments
Genomic Big Data Ecosystems
14
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data Status
Researcher
Treatment choice
Clinician
Patient
Knowledge
Information
15
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Challenges for this Meeting to Address
What data frameworks and models
are required?
How will genomes (DNA, RNA,
methylation states, etc) be
aggregated and compared?
How will collaboration and data
sharing evolve?
Where will the technology go and
how must the community respond
to lever the benefits
Brainstorming of ideas
Sessions from groups that have
experiences from many fields
Next steps!!
Actively participate and enjoy the entire
experience!

Más contenido relacionado

La actualidad más candente

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglySciBite Limited
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyCAREL Industries S.p.A
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqGolden Helix
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment Covance
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalGolden Helix
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGThermo Fisher Scientific
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_MartinezBill Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisGolden Helix
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowGolden Helix
 

La actualidad más candente (9)

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and Efficiency
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeq
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinical
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAG
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysis
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
 

Destacado

台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路交點
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichCityAge
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estarAna
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & AdvisorsPractice Paradox
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL ChemicalTan Ray
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news mediumRoshan Mastana
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaElizabeth Huisa Veria
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisDavid Simpson
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous WritersESSAYSHARK.com
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Mashable
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groupsNeeraj Saini
 

Destacado (18)

Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011
 
台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
 
Horario 8º semestre
Horario  8º semestreHorario  8º semestre
Horario 8º semestre
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estar
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & Advisors
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL Chemical
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news medium
 
Adultos Mayores.
Adultos Mayores.Adultos Mayores.
Adultos Mayores.
 
Five Easy Casserole Recipes
Five Easy Casserole RecipesFive Easy Casserole Recipes
Five Easy Casserole Recipes
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en lima
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE Luminis
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous Writers
 
Enlace quimico daniel
Enlace quimico danielEnlace quimico daniel
Enlace quimico daniel
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groups
 
Home DIYs That Smell Good
Home DIYs That Smell GoodHome DIYs That Smell Good
Home DIYs That Smell Good
 
SOP CV
SOP CVSOP CV
SOP CV
 

Similar a Scott Kahn Genomic Big Data.gia.052913

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Docker, Inc.
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Groupnist-spin
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Ilya Klabukov
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...InsideScientific
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyGenotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product BulletinAmanda Eberle
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic dataMiro Cupak
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...FOODCROPS
 

Similar a Scott Kahn Genomic Big Data.gia.052913 (20)

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product Bulletin
 
05 costa
05 costa05 costa
05 costa
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Scott Kahn Genomic Big Data.gia.052913

  • 1. © 2009 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genome Informatics Alliance 2013 Defining Genomic Big Data and its Impact on Scientific Progress
  • 2. 2 COMPANY CONFIDENTIAL – INTERNAL USE ONLY From Whence We Came… ATGCCGTTT… CCGGTTAAT… GAATTGCAG… 6:A2567C 12:C123T 20:T4678A 30-40TB ˜5TB 600GB ˜20GB
  • 3. 3 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Large amounts of data generated in genomics; multiple samples, size of data, etc Integration of digital data to enrich context of samples; DNA, RNA, methylation, time courses, spatial distributions with samples, … Fusion of digital data and categorical data; combination rules (categories), extraction from unstructured inputs, … Tools and techniques appropriate for resultant data sets; visualization, model building, exploration, … Advances require data mining rather than the one-at-a- time hypothesis testing approaches of today
  • 4. 4 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data and Personal Genome Information PERSONAL SEQUENCE (owned by individual/doctor) Issued: 01 MAR 07 Recommended next check: 28 FEB 10 PGI id: 5910322 – 61215923014 RISK VARIANTS (approved for clinical use) Human Genome Clinical studies Populations SequencingFunctional annotation 3: 12,300 3: 12,400 ( kb ) PPARg GENOMIC ANNOTATION (in public domain) Variant: C3 : 12,450,610 : T0.7/C0.3 : PPARG : Pro/Leu : Medical consequence: Associated with severe insulin resistance, diabetes mellitus, hypertension Pharmacological consequence: Resistant to thiazolidinediones CLINICAL DECISION Consultation Consent Clinical assessment Selected risk information
  • 5. 5 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Sequencing a 17-member three-generation pedigree. – Ultra deep sequencing improves sensitivity – Leveraging inheritance information improves accuracy – Data and results made publicly available Identifying ultra accurate genomic variants is enabling rapid improvements in technology and software This data will allow us to assess accuracy for many FDA submissions We are collaborating with NIST & CDC to develop a public resource for quantifying sequencing accuracy Platinum Genomes as a Truth Reference Creating a catalogue of highly-accurate SNPs, indels & SVs
  • 6. 6 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Reduction from 40 Q-scores to 8 Q-scores becoming accepted Sequencing output is still increasing exponentially therefore further compression is likely to be required Platinum genome work suggest ~95% of genome is consistently called (this 95% is known as the platinum regions) Regions which are reliably called may not need 8 Q-scores resolution – we can reduce “well sequenced” regions to 2 Q- scores Start with 8 Q-score bam file: – Reduce the platinum regions to 2 Q-scores (keep non- platinum at 8 Q-scores) – Reduce the platinum regions to 1 Q-score – Whole genome 2 Q-score – Reduce platinum region to 2 Q-scores but also keep original Q-scores of mismatches (MM) and anomalous reads – ~40Gb (20Gb CRAM) Data Reduction Via Vertical Compression (NA12882) Build Total SNPs (>Q20) SNPs diff genotype (>Q20) Not called in Q-score compressed build (>Q20) Not called in 8 Q-score build (>Q20) 8 Q-score 3,735,575 (3,627,165) - - - 8 Q-score technical replicate 3,734,849 (3,626,485) 45,584 (22,400) 80,131 (29,211) 79,405 (28,845) Platinum Genome 2 Q- score 3,732,568 (3,620,612) 3,255 (161) 3417 (63) 410 (127) Platinum Genome 1 Q- score 3,764,928 (3,626,468) 4002 (584) 2605 (75) 31,958 (2964) Whole Genome 2 Q-score 3,712,636 (3,598,400) 25,175 (1912) 24,237 (166) 1298 (112) Platinum 2 q- score keep MM and anom. reads 3,735,684 (3,627,226) 197 (123) 142 (35) 251 (102)
  • 7. 7 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Faster Data – DNA to Result in <2 Days 12 core server 64Gb RAM Sequence Analyze AnnotateSample 27 hr 8 hr HiSeq2500 Isaac analysis overnight 40 hr Fast turnaround is required for clinical applications 4.5 hr PCR Free library
  • 8. 8 COMPANY CONFIDENTIAL – INTERNAL USE ONLY WGS reveals somatic mutations in TERT gene promoter of melanoma patients Form a novel transcription factor binding motif Recurrence in melanoma is as high as any known coding mutation Importance of Non-coding Mutations – Bigger Data! -200 -100 TERT gene 0 +100 +200 Gene (mutation) Incidence in melanoma TERT (promoter) 52% BRAF (V600E) 53% CDKN2A 50% NRAS (Q61R) 28% TERT (coding) 1% Horn et al. & Huang et al., Science 2013
  • 9. 9 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Complexity of Data
  • 10. 10 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Surveillance of Leukaemia (CLL) – More Data Complexity! 0 6463 65 6662 Event Timeline Sequencing Birth DeathTreatmentDiagnosis TreatmentTreatment 0 50 100 150 200 250 a b c d e NORMAL CLASS 4 CLASS 3 CLASS 2 CLASS 1 Time points Abundance Changing subclonal populations 0 1 2 3 4 5 c NO CL CL CL CL “Remission” has disease Schuh et al., Oxford
  • 11. 11 COMPANY CONFIDENTIAL – INTERNAL USE ONLY A Deeper Complexity of Genomic Data
  • 12. 12 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Utility Requires Complex Composite Information iPad Plug and Play Cloud Allele Frequency in populations www.1000genomes.org Medical/Risk data (with expert review) Hgmd, pharmgkb Genetic Variants dbSNP Functional Effects ensembl.org, genome.ucsc.edu, encode.org Disease association genome.gov ANNOTATED GENOME ( gVCF) <1Gbyte Ancestry Tissue type Risk Carrier status Diagnosis Drug response Annotate DisseminateInterpret
  • 13. 13 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Apps Public Genomic Databases Users EMR Support & Engineering Instruments Genomic Big Data Ecosystems
  • 14. 14 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Status Researcher Treatment choice Clinician Patient Knowledge Information
  • 15. 15 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Challenges for this Meeting to Address What data frameworks and models are required? How will genomes (DNA, RNA, methylation states, etc) be aggregated and compared? How will collaboration and data sharing evolve? Where will the technology go and how must the community respond to lever the benefits Brainstorming of ideas Sessions from groups that have experiences from many fields Next steps!! Actively participate and enjoy the entire experience!