SlideShare una empresa de Scribd logo
1 de 79
© 2014 MapR Technologies 1
© 2014 MapR Technologies 2
Biomedical Research Goal: Improve Fitness
Therapeutics => Diagnostics => Prognostics
• Therapeutics => traditional medicine
• Diagnostics => personalized medicine
– NextGen public health
– Requires hi-res mechanical knowledge
– Reverse engineer how genetic variation leads to (un)desired traits
• Prognostics => GATTACA (dys/eu)topia
– Managed populations / NextGen eugenics
© 2014 MapR Technologies 3
Biomedical & Advertising Tech Overarching Themes*
*Obligatory movie references… shout-out to my hometown LA
Eugenics & Determinism Free will vs. Determinism Media Tech & Privacy
© 2014 MapR Technologies 4Star Wars III: Revenge of the Sith
© 2014 MapR Technologies 5Star Wars V: The Empire Strikes Back
© 2014 MapR Technologies 6
Health ~ Fitness
Genes => Traits => Behaviors => Fitness
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8© 2014 MapR Technologies
Human Genetics & Big Data
Human Genetics & Ethics
Today we talk about technology
© 2014 MapR Technologies 9
Me, Us
• Allen Day, Principal Data Scientist, MapR
5yr Hadoop Dev, R project contributor
PhD, Human Genetics, UCLA Medicine
• MapR
Distributes open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• See Also
– “allenday” most places (twitter, github, etc.)
– @mapR
© 2014 MapR Technologies 10
Genetic Basis of Facial Features
self-reported values of {sex, ancestry}
+ observer scores [race, sex]}
+ 3D facial scan
+ genome scan
______________________________
Allelic model of 20 genes that
determine facial characteristics
Claes, et al. 2014. Modeling 3D Facial Shape from DNA
© 2014 MapR Technologies 11
Genetic Basis of Facial Features
Claes, et al. 2014. Modeling 3D Facial Shape from DNA
© 2014 MapR Technologies 12
So Get Ready…
www.theness.com
© 2014 MapR Technologies 13
DTRA102-007 – Forensic DNA
Analysis Kit for Genetic Intelligence
• Sex
• Blood type
• Ancestry
• Hair morphology
• Dimples
• Freckles
• Shoe size
• Flat-footedness
• Vision correction
• Ear lobe attachment
• Ear lobe crease
• 5th digit clinodactyly
• Eye color, hair color, skin
color
• Height, handedness
• Etc
https://sbirsource.com/grantiq#/topics/85383
© 2014 MapR Technologies 14
DTRA102-007: Sex and Ancestry
© 2014 MapR Technologies 15© 2014 MapR Technologies
Trends & Events
© 2014 MapR Technologies 16
Trends and Events: Even Moore’s Law
Stein. 2010. The case for cloud computing in genome informatics
“Even Moore’s” begins in 2004
with Solexa (acquired by ILMN 2007)
Storage:MB/$
DNA:bp/$
ILMN HiSeq XTen
(Jan 2014)
$1000 Genome
© 2014 MapR Technologies 17
NIH Research Funding Trends.
http://www.faseb.org/Policy-and-Government-Affairs/Data-Compilations/NIH-Research-Funding-Trends.aspx
Trends and Events: US Federal Funding You are here
© 2014 MapR Technologies 18
More Data
Less Federal $
© 2014 MapR Technologies 19
Trends and Events: The $1000 Genome
• Physicians want to use patient genomes to improve care
• Scientists say personalized medicine breakthroughs require
100Ks to MMs of genomes
• Healthcare mandates efficacy and efficiency (early majority)
These forces converge at $1000 for a clinically usable genome
© 2014 MapR Technologies 20
Trends and Events: ILMN HiSeq XTen Specs
• Sold in sets of 10 units ONLY (XTen =10 sequencers)
~ $10 million/XTen, shipments began in Jan 2014
• XTen produces 600 GBases/day @ 30x oversampling
= 1.8 TBases per 3-day cycle
= 54 TBytes per 3-day cycle
= $1000 per genome
= 18,000 genomes/year/XTen
~ 4,000,000 births/year (US, 2012)
 Neonatal sequencing is a reality (with 200 of today’s systems)
© 2014 MapR Technologies 21
Summary: Major Impact on Social Fabric
Soon to be gone:
• Muscular dystrophy
• Cystic fibrosis
• Albinism
• PKU (phenylketonuria)
• Paternity Tests =>
http://pandawhale.com/post/13851/my-report-card-came-in-my-paternity-test-came-in
http://www.nature.com/scitable/topicpage/rare-genetic-disorders-learning-about-genetic-disease-979
• Hemophilia
• Huntington’s Disease (keep?)
Fact: US paternity fraud
rate is 1 in 25
© 2014 MapR Technologies 22
Summary: A Perfect Storm
• LESS public funding (NIH)
• MORE DNA sequencing efficiency (HiSeq XTen)
• Predicted DNA sequencing demand VALIDATED (medicine)
• MORE VC investment ($1000/genome force confluence)
• DNA sequencing capacity consolidating into genome “factories” (e.g.
Broad, ILMN) => REQUIRES new infrastructure
© 2014 MapR Technologies 23
The Evolving Genomics Workload
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
<= 1º analytics
“current high ROI use cases”
<= 2º analytics
“next-gen high ROI use cases”
© 2014 MapR Technologies 24
The Evolving Genomics Workload
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
<= 1º analytics
“current high ROI use cases”
<= 2º analytics
“next-gen high ROI use cases”
© 2014 MapR Technologies 25© 2014 MapR Technologies
Clinical Application of Human Genetics
© 2014 MapR Technologies 26
Clinical Sequencing Business Process Workflow
PhysicianPatient
Clinic
blood/saliva
Clinical Lab
Analytics
extract
© 2014 MapR Technologies 27
One Bad MTHFR
MTHFR C677T
Methylfolate helps make neurotransmitters in
your brain. When methylfolate levels are low,
so are your neurotransmitters. Low production
of neurotransmitters may cause conditions of
addictive behavior, depression, anxiety,
ADHD, mania, irritability, insomnia, learning
disorders and others.
Everyone should get tested. Why? Because 1
in 2 people are affected and if one knows they
have a MTHFR polymorphism, they know they
have to be very proactive in taking care of
themselves.
http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The-
Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid-
Health.htm
© 2014 MapR Technologies 28
One Bad MTHFR
MTHFR C677T
Methylfolate helps make neurotransmitters in
your brain. When methylfolate levels are low,
so are your neurotransmitters. Low production
of neurotransmitters may cause conditions of
addictive behavior, depression, anxiety,
ADHD, mania, irritability, insomnia, learning
disorders and others.
Everyone should get tested. Why? Because 1
in 2 people are affected and if one knows they
have a MTHFR polymorphism, they know they
have to be very proactive in taking care of
themselves.
http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The-
Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid-
Health.htm
© 2014 MapR Technologies 29
Clinical Sequencing Business Process Workflow
PhysicianPatient
Clinic
blood/saliva
Clinical Lab
Analytics
extract
© 2014 MapR Technologies 30
Clinical Genomics, Information Systems Perspective
Compressed Structured
Base4 Data
Uncompressed Unstructured
Base2 Data
extract
Base4=>Base2
Converter
[[ DE-STRUCTURES ]]
“BI” Reporting and
Visualization tools
PhysicianPatient
AnalystStakeholder
© 2014 MapR Technologies 31
Clinical Genomics, Information Systems Perspective
PhysicianPatient
AnalystStakeholder
ETL
Reporting and Viz
Data Store
Analytics
© 2014 MapR Technologies 32
Clinical Genomics, Information Systems Perspective
PhysicianPatient
AnalystStakeholder
ETL
Reporting and Viz
Data Store
Analytics
1º analytics
2º analytics
Not much in this presentation,
see also:
http://slidesha.re/1sC2BOX
© 2014 MapR Technologies 33© 2014 MapR Technologies
1º Analytics: Why MapReduce?
© 2014 MapR Technologies 34
Clinical Genomics, Information Systems Perspective
PhysicianPatient
AnalystStakeholder
ETL
Reporting and Viz
Data Store
Analytics
1º analytics
2º analytics
see also:
http://slidesha.re/1sC2BOX
© 2014 MapR Technologies 35
The Essence of the Problem:
What is the (Probable) Color of Each Column?
© 2014 MapR Technologies 36© 2014 MapR Technologies
Next-Gen Human Genetics – Population Scale
© 2014 MapR Technologies 37
The Evolving Genomics Workload
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
<= 2º analytics
“next-gen high ROI use cases”
© 2014 MapR Technologies 38
MapR Data Platform Advantage, Clinical Genomics
Epidemiological,
Actuarial Analyses
Denormalization for
Search, Viz, Research
ETL
Clinical
Reporting
WEB TIERClinical
Reporting
Systems
CLINICAL
TREATMENT
OF PATIENTS
RESEARCHERS
National Pop.
Database
INDEX SHARDSPrognostic
Capability
© 2014 MapR Technologies 39
Co-expression (10K samples) and Linkage
Gene Annotation / Set CompletionBMP6
BMP2
MMP3
LIF
NOS2A
MMP13
CSPG4
ACAN
ACAN
ACAN
COL11A2
COL11A2
COL9A1
MATN1
LECT1
MATN4
HAPLN1
HAPLN1
ITGA10
EDIL3
NGF
MAST4
MATN3
EPYC
COL11A1
COL11A1
COL10A1
COL10A1
THBS3
C1QTNF3
WISP1
PDPN
PDLIM4
CHST3
MIA
SOX5
CYTL1
TNMD
AKR1C1
MMP12
ETNK1
RELA
FOSL1
EIF2C2
NUPL1
RLF
RELB
SOD2
RNF24
RNF24
XYLT1
HAS2
BDKRB1
HSPC159
SLC28A3
FZD10
SLC28A3
HSPC159
BDKRB1
HAS2
XYLT1
RNF24
RNF24
SOD2
RELB
RLF
NUPL1
EIF2C2
FOSL1
RELA
ETNK1
MMP12
AKR1C1
TNMD
CYTL1
SOX5
MIA
CHST3
PDLIM4
PDPN
FZD10
WISP1
C1QTNF3
THBS3
COL10A1
COL10A1
COL11A1
COL11A1
EPYC
MATN3
MAST4
NGF
EDIL3
ITGA10
HAPLN1
HAPLN1
MATN4
ACAN
ACAN
ACAN
LECT1
MATN1
COL9A1
COL11A2
COL11A2
CSPG4
MMP13
NOS2A
LIF
MMP3
BMP2
BMP6
Disease gene characterization through large-scale co-expression analysis.
http://www.ncbi.nlm.nih.gov/pubmed/20046828
+ =>
© 2014 MapR Technologies 40
If they were unlabeled, would you know which is which?
Friend. 2010. The Need for Precompetitive
Integrative Bionetwork Disease Model Building
NPR. 2011. The Search For Analysts To Make Sense Of
'Big Data’
http://www.npr.org/2011/11/30/142893065
© 2014 MapR Technologies 41
If they were unlabeled, would you know which is which?
Friend. 2010. The Need for Precompetitive
Integrative Bionetwork Disease Model Building
• Identify network structures
• Label them
• Observe
stimulus=>response
space mapping
• Purposefully target
• $$$$ Twitter’s Business
Model
© 2014 MapR Technologies 42© 2014 MapR Technologies
These are Linear Algebra / Machine Learning Problems
© 2014 MapR Technologies 43© 2014 MapR Technologies
A Quick Digression: Recommender Systems
© 2014 MapR Technologies 44
HOW RECOMMENDATIONS WORK
Behavior of a crowd
helps us understand
what individuals will do
© 2014 MapR Technologies 45
History Matrix (A)
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
© 2014 MapR Technologies 46
Co-occurrence Matrix (ATA)
1 2
1 1
1
1
2 1
© 2014 MapR Technologies 47
<Normalize>
(filter to identify only unusual co-occurences)
© 2014 MapR Technologies 48
HOW CROSS-RECOMMENDATIONS
WORK
Behavior of a crowd
helps us understand
what individuals will do
© 2014 MapR Technologies 49
Example Multi-modal Inputs
• Overlap in restaurant visits is useful
• Big spender cues
• Cuisine as an indicator
• Review text as an indicator
© 2014 MapR Technologies 50
People do more than one kind of thing
• Different kinds of behaviors give different quality, quantity and
kind of information
– Restaurant visits
– Movie reviews
• We don’t have to do co-occurrence
• We can do cross-occurrence
• Result is cross-recommendation
© 2014 MapR Technologies 51
For example
• Users enter queries (A)
– (actor = user, item=query)
• Users view videos (B)
– (actor = user, item=video)
• ATA gives query recommendation
– “did you mean to ask for”
• BTB gives video recommendation
– “you might like these videos”
© 2014 MapR Technologies 52
The punch-line
• BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
© 2014 MapR Technologies 53
Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
© 2014 MapR Technologies 54
Real-life example
© 2014 MapR Technologies 55
Previous Click Histories
user1
user2
user3
user4
user5
1 2 3 4 5 6 7 8
© 2014 MapR Technologies 56
Detect similar content: 2 & 8
user1
user2
user3
user4
user5
1 2 3 4 5 6 7 8
© 2014 MapR Technologies 57
Call to Action – Request Clicks
user1
user2
user3
user4
user5
Show me more:
sports
comedy
technology
1 2 3 4 5 6 7 8
“Under
Construction”
© 2014 MapR Technologies 58
Build Navigational Ontology (estimate content labels):
4=sports ; 2 & 7=comedy
user1
user2
user3
user4
user5
Show me more:
sports
comedy
technology
1 2 3 4 5 6 7 8
4
2 & 7
Under
construction
© 2014 MapR Technologies 59
Matrices A (U*Q) and B (U*V)
Query Term = Clicked Term
Users
Query Terms
Users
Clicked Videos
© 2014 MapR Technologies 60
Relate Q to V
Users
Query Terms
© 2014 MapR Technologies 61
Relate Q to V
Users
Query Terms
© 2014 MapR Technologies 62
Users
Query Terms
© 2014 MapR Technologies 63
Relate Q to V: it’s a Cross-Recommender
QueryTerms
Videos
© 2014 MapR Technologies 64© 2014 MapR Technologies
Population-level Inference
© 2014 MapR Technologies 65
Typical Dimensions in Genetics/Medicine
• Genotype
• Gene Expression
• Samples
• Phenotypes (traits/behavior)
© 2014 MapR Technologies 66
Typical Dimensions in Behavioral Data
• Genotype
• Gene Expression
• Samples Individuals
• Phenotype
– Traits
– Behaviors
© 2014 MapR Technologies 67
Incidence/Co-occurrence in Behavioral Data
• Individual * Individual
– Genealogy
• Trait * Behavior => [Netflix]
– User/Content Topic Modeling
• Genotype * Behavior => [Psychometrics]
– Genetics of personality, intelligence, aptitude
• Behavior * Outcome => [Korn-Ferry]
– Job effectiveness
• Phenotype (trait/behavior) * Outcome => [eHarmony]
– Reproductive fitness
© 2014 MapR Technologies 68
Traits and Behaviors:
Content Topic Modeling / UX Personalization
© 2014 MapR Technologies 69
Behaviors and Outcomes:
Economic Fitness (Korn/Ferry)
Korn/Ferry ProSpective
http://linkedin.kornferry.com
Allen
=>
© 2014 MapR Technologies 70
Genes
Job
Performance
© 2014 MapR Technologies 71
(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com
© 2014 MapR Technologies 72
Genes
Reproductive
Outcomes
© 2014 MapR Technologies 73
Genes => Traits => Behaviors => Fitness
Job Performance
Psychometrics
Movie Preferences
Medicine
Forensics
© 2014 MapR Technologies 74
Genes => Traits => Behaviors => Fitness
Job Performance
Psychometrics
Movie Preferences
Medicine
Forensics
Fitness
Reproductive Outcomes
© 2014 MapR Technologies 75
© 2014 MapR Technologies 76
ENCODE
http://www.nature.com/news/encode-the-human-encyclopaedia-1.11312
© 2014 MapR Technologies 77
Robot Scientist
Sparkes, et al. 2010. Towards Robot Scientists for autonomous scientific discovery
© 2014 MapR Technologies 78
Robot (Data?) Scientist
Sparkes, et al. 2010. Towards Robot Scientists for autonomous scientific discovery
© 2014 MapR Technologies 79© 2014 MapR Technologies
Thanks

Más contenido relacionado

La actualidad más candente

Management of experimental biosamples
Management of experimental biosamplesManagement of experimental biosamples
Management of experimental biosamplesARDC
 
Spiral Genetics at American Society for Human Genome
Spiral Genetics at American Society for Human GenomeSpiral Genetics at American Society for Human Genome
Spiral Genetics at American Society for Human GenomeChristine Marie Mason
 
How HoloLens Transforms Healthcare
How HoloLens Transforms HealthcareHow HoloLens Transforms Healthcare
How HoloLens Transforms HealthcareAlexandros Sigaras
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
 
NeuroRads.org
NeuroRads.orgNeuroRads.org
NeuroRads.orgneurorads
 
Porter, W. M. 112.004
Porter, W. M. 112.004Porter, W. M. 112.004
Porter, W. M. 112.004Warren Porter
 
U of michigan buys pfizer site | january 5, 2009 issue vol. 87 issue 1 | ch...
U of michigan buys pfizer site | january 5, 2009 issue   vol. 87 issue 1 | ch...U of michigan buys pfizer site | january 5, 2009 issue   vol. 87 issue 1 | ch...
U of michigan buys pfizer site | january 5, 2009 issue vol. 87 issue 1 | ch...Joseph Serwach
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
 
neutralresumemergui17
neutralresumemergui17neutralresumemergui17
neutralresumemergui17Sylvia Mergui
 
Krijn Paajimans-Enfermedades transmitidas por vectores
Krijn Paajimans-Enfermedades transmitidas por vectoresKrijn Paajimans-Enfermedades transmitidas por vectores
Krijn Paajimans-Enfermedades transmitidas por vectoresFundación Ramón Areces
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Larry Smarr
 
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...OECD Environment
 
Igsn webinar-26Jul-Slides
Igsn webinar-26Jul-SlidesIgsn webinar-26Jul-Slides
Igsn webinar-26Jul-SlidesARDC
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatoriesgloriakt
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?Varsha Khodiyar
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)AllSeq
 

La actualidad más candente (20)

David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
Management of experimental biosamples
Management of experimental biosamplesManagement of experimental biosamples
Management of experimental biosamples
 
Spiral Genetics at American Society for Human Genome
Spiral Genetics at American Society for Human GenomeSpiral Genetics at American Society for Human Genome
Spiral Genetics at American Society for Human Genome
 
How HoloLens Transforms Healthcare
How HoloLens Transforms HealthcareHow HoloLens Transforms Healthcare
How HoloLens Transforms Healthcare
 
Dr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycleDr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycle
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
NeuroRads.org
NeuroRads.orgNeuroRads.org
NeuroRads.org
 
Porter, W. M. 112.004
Porter, W. M. 112.004Porter, W. M. 112.004
Porter, W. M. 112.004
 
U of michigan buys pfizer site | january 5, 2009 issue vol. 87 issue 1 | ch...
U of michigan buys pfizer site | january 5, 2009 issue   vol. 87 issue 1 | ch...U of michigan buys pfizer site | january 5, 2009 issue   vol. 87 issue 1 | ch...
U of michigan buys pfizer site | january 5, 2009 issue vol. 87 issue 1 | ch...
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
Resume
ResumeResume
Resume
 
Big Data
Big DataBig Data
Big Data
 
neutralresumemergui17
neutralresumemergui17neutralresumemergui17
neutralresumemergui17
 
Krijn Paajimans-Enfermedades transmitidas por vectores
Krijn Paajimans-Enfermedades transmitidas por vectoresKrijn Paajimans-Enfermedades transmitidas por vectores
Krijn Paajimans-Enfermedades transmitidas por vectores
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...
OECD Webinar | From Data to Knowledge and Beyond Adverse Outcome Pathways as ...
 
Igsn webinar-26Jul-Slides
Igsn webinar-26Jul-SlidesIgsn webinar-26Jul-Slides
Igsn webinar-26Jul-Slides
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatories
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)
 

Destacado

Big Data meets Big Data
Big Data meets Big DataBig Data meets Big Data
Big Data meets Big DataDirk Ortloff
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law? Jeffrey Funk
 
Connectivist Learning Theory
Connectivist Learning TheoryConnectivist Learning Theory
Connectivist Learning TheoryCath Farrant
 
Adeno,Parvo,Polyoma Virus
Adeno,Parvo,Polyoma VirusAdeno,Parvo,Polyoma Virus
Adeno,Parvo,Polyoma Virusraj kumar
 
Endoplasmic reticulum by amita
Endoplasmic reticulum by amitaEndoplasmic reticulum by amita
Endoplasmic reticulum by amitaAmita Mevada
 
instrumentation of mass spectrometry
instrumentation of mass spectrometryinstrumentation of mass spectrometry
instrumentation of mass spectrometryManali Parab
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreemanshreeman cs
 
Central dogma of molecular biology 20 11-2015
Central dogma of molecular biology 20 11-2015Central dogma of molecular biology 20 11-2015
Central dogma of molecular biology 20 11-2015Yahya Noori, Ph.D
 
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...Life, Knowledge and Natural Selection ― How life (scientifically) designs its...
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...William Hall
 
Q3 2015 RESULTS by Sanofi
Q3 2015 RESULTS by Sanofi Q3 2015 RESULTS by Sanofi
Q3 2015 RESULTS by Sanofi Sanofi
 
The Personality of a Scientist
The Personality of a Scientist The Personality of a Scientist
The Personality of a Scientist Kelly Services
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
Decision Making In Management
Decision Making In ManagementDecision Making In Management
Decision Making In ManagementVinesh Pathak
 

Destacado (20)

Covering Ebola
Covering EbolaCovering Ebola
Covering Ebola
 
Big Data meets Big Data
Big Data meets Big DataBig Data meets Big Data
Big Data meets Big Data
 
Virology
VirologyVirology
Virology
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law?
 
Connectivist Learning Theory
Connectivist Learning TheoryConnectivist Learning Theory
Connectivist Learning Theory
 
Virus
VirusVirus
Virus
 
Adeno,Parvo,Polyoma Virus
Adeno,Parvo,Polyoma VirusAdeno,Parvo,Polyoma Virus
Adeno,Parvo,Polyoma Virus
 
Mass spectrometry
Mass spectrometryMass spectrometry
Mass spectrometry
 
Endoplasmic reticulum by amita
Endoplasmic reticulum by amitaEndoplasmic reticulum by amita
Endoplasmic reticulum by amita
 
instrumentation of mass spectrometry
instrumentation of mass spectrometryinstrumentation of mass spectrometry
instrumentation of mass spectrometry
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Endoplasmic reticulum
Endoplasmic reticulumEndoplasmic reticulum
Endoplasmic reticulum
 
Central dogma of molecular biology 20 11-2015
Central dogma of molecular biology 20 11-2015Central dogma of molecular biology 20 11-2015
Central dogma of molecular biology 20 11-2015
 
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...Life, Knowledge and Natural Selection ― How life (scientifically) designs its...
Life, Knowledge and Natural Selection ― How life (scientifically) designs its...
 
Q3 2015 RESULTS by Sanofi
Q3 2015 RESULTS by Sanofi Q3 2015 RESULTS by Sanofi
Q3 2015 RESULTS by Sanofi
 
The Personality of a Scientist
The Personality of a Scientist The Personality of a Scientist
The Personality of a Scientist
 
Mass spectrometry
Mass spectrometryMass spectrometry
Mass spectrometry
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Decision Making In Management
Decision Making In ManagementDecision Making In Management
Decision Making In Management
 
Proteomics
ProteomicsProteomics
Proteomics
 

Similar a Human Genetics & Big Data [sans Ethics]

Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsAllen Day, PhD
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersAllen Day, PhD
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...Allen Day, PhD
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeWarren Kibbe
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseAllen Day, PhD
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIAllen Day, PhD
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Databricks
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ARDC
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSaama
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Big Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraBig Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraData Con LA
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Cloudera, Inc.
 

Similar a Human Genetics & Big Data [sans Ethics] (20)

Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
NAACCR June 2020
NAACCR June 2020NAACCR June 2020
NAACCR June 2020
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Big Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit MehraBig Data in Pediatric Critical Care by Mohit Mehra
Big Data in Pediatric Critical Care by Mohit Mehra
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 

Más de Allen Day, PhD

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...Allen Day, PhD
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...Allen Day, PhD
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser UniversityAllen Day, PhD
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - WageningenAllen Day, PhD
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMAllen Day, PhD
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Allen Day, PhD
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big DataAllen Day, PhD
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data AnalyticsAllen Day, PhD
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 

Más de Allen Day, PhD (18)

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 

Último

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 

Último (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 

Human Genetics & Big Data [sans Ethics]

  • 1. © 2014 MapR Technologies 1
  • 2. © 2014 MapR Technologies 2 Biomedical Research Goal: Improve Fitness Therapeutics => Diagnostics => Prognostics • Therapeutics => traditional medicine • Diagnostics => personalized medicine – NextGen public health – Requires hi-res mechanical knowledge – Reverse engineer how genetic variation leads to (un)desired traits • Prognostics => GATTACA (dys/eu)topia – Managed populations / NextGen eugenics
  • 3. © 2014 MapR Technologies 3 Biomedical & Advertising Tech Overarching Themes* *Obligatory movie references… shout-out to my hometown LA Eugenics & Determinism Free will vs. Determinism Media Tech & Privacy
  • 4. © 2014 MapR Technologies 4Star Wars III: Revenge of the Sith
  • 5. © 2014 MapR Technologies 5Star Wars V: The Empire Strikes Back
  • 6. © 2014 MapR Technologies 6 Health ~ Fitness Genes => Traits => Behaviors => Fitness
  • 7. © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8© 2014 MapR Technologies Human Genetics & Big Data Human Genetics & Ethics Today we talk about technology
  • 9. © 2014 MapR Technologies 9 Me, Us • Allen Day, Principal Data Scientist, MapR 5yr Hadoop Dev, R project contributor PhD, Human Genetics, UCLA Medicine • MapR Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • See Also – “allenday” most places (twitter, github, etc.) – @mapR
  • 10. © 2014 MapR Technologies 10 Genetic Basis of Facial Features self-reported values of {sex, ancestry} + observer scores [race, sex]} + 3D facial scan + genome scan ______________________________ Allelic model of 20 genes that determine facial characteristics Claes, et al. 2014. Modeling 3D Facial Shape from DNA
  • 11. © 2014 MapR Technologies 11 Genetic Basis of Facial Features Claes, et al. 2014. Modeling 3D Facial Shape from DNA
  • 12. © 2014 MapR Technologies 12 So Get Ready… www.theness.com
  • 13. © 2014 MapR Technologies 13 DTRA102-007 – Forensic DNA Analysis Kit for Genetic Intelligence • Sex • Blood type • Ancestry • Hair morphology • Dimples • Freckles • Shoe size • Flat-footedness • Vision correction • Ear lobe attachment • Ear lobe crease • 5th digit clinodactyly • Eye color, hair color, skin color • Height, handedness • Etc https://sbirsource.com/grantiq#/topics/85383
  • 14. © 2014 MapR Technologies 14 DTRA102-007: Sex and Ancestry
  • 15. © 2014 MapR Technologies 15© 2014 MapR Technologies Trends & Events
  • 16. © 2014 MapR Technologies 16 Trends and Events: Even Moore’s Law Stein. 2010. The case for cloud computing in genome informatics “Even Moore’s” begins in 2004 with Solexa (acquired by ILMN 2007) Storage:MB/$ DNA:bp/$ ILMN HiSeq XTen (Jan 2014) $1000 Genome
  • 17. © 2014 MapR Technologies 17 NIH Research Funding Trends. http://www.faseb.org/Policy-and-Government-Affairs/Data-Compilations/NIH-Research-Funding-Trends.aspx Trends and Events: US Federal Funding You are here
  • 18. © 2014 MapR Technologies 18 More Data Less Federal $
  • 19. © 2014 MapR Technologies 19 Trends and Events: The $1000 Genome • Physicians want to use patient genomes to improve care • Scientists say personalized medicine breakthroughs require 100Ks to MMs of genomes • Healthcare mandates efficacy and efficiency (early majority) These forces converge at $1000 for a clinically usable genome
  • 20. © 2014 MapR Technologies 20 Trends and Events: ILMN HiSeq XTen Specs • Sold in sets of 10 units ONLY (XTen =10 sequencers) ~ $10 million/XTen, shipments began in Jan 2014 • XTen produces 600 GBases/day @ 30x oversampling = 1.8 TBases per 3-day cycle = 54 TBytes per 3-day cycle = $1000 per genome = 18,000 genomes/year/XTen ~ 4,000,000 births/year (US, 2012)  Neonatal sequencing is a reality (with 200 of today’s systems)
  • 21. © 2014 MapR Technologies 21 Summary: Major Impact on Social Fabric Soon to be gone: • Muscular dystrophy • Cystic fibrosis • Albinism • PKU (phenylketonuria) • Paternity Tests => http://pandawhale.com/post/13851/my-report-card-came-in-my-paternity-test-came-in http://www.nature.com/scitable/topicpage/rare-genetic-disorders-learning-about-genetic-disease-979 • Hemophilia • Huntington’s Disease (keep?) Fact: US paternity fraud rate is 1 in 25
  • 22. © 2014 MapR Technologies 22 Summary: A Perfect Storm • LESS public funding (NIH) • MORE DNA sequencing efficiency (HiSeq XTen) • Predicted DNA sequencing demand VALIDATED (medicine) • MORE VC investment ($1000/genome force confluence) • DNA sequencing capacity consolidating into genome “factories” (e.g. Broad, ILMN) => REQUIRES new infrastructure
  • 23. © 2014 MapR Technologies 23 The Evolving Genomics Workload Sboner, et al, 2011. The real cost of sequencing: higher than you think! <= 1º analytics “current high ROI use cases” <= 2º analytics “next-gen high ROI use cases”
  • 24. © 2014 MapR Technologies 24 The Evolving Genomics Workload Sboner, et al, 2011. The real cost of sequencing: higher than you think! <= 1º analytics “current high ROI use cases” <= 2º analytics “next-gen high ROI use cases”
  • 25. © 2014 MapR Technologies 25© 2014 MapR Technologies Clinical Application of Human Genetics
  • 26. © 2014 MapR Technologies 26 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
  • 27. © 2014 MapR Technologies 27 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
  • 28. © 2014 MapR Technologies 28 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
  • 29. © 2014 MapR Technologies 29 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
  • 30. © 2014 MapR Technologies 30 Clinical Genomics, Information Systems Perspective Compressed Structured Base4 Data Uncompressed Unstructured Base2 Data extract Base4=>Base2 Converter [[ DE-STRUCTURES ]] “BI” Reporting and Visualization tools PhysicianPatient AnalystStakeholder
  • 31. © 2014 MapR Technologies 31 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics
  • 32. © 2014 MapR Technologies 32 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 1º analytics 2º analytics Not much in this presentation, see also: http://slidesha.re/1sC2BOX
  • 33. © 2014 MapR Technologies 33© 2014 MapR Technologies 1º Analytics: Why MapReduce?
  • 34. © 2014 MapR Technologies 34 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 1º analytics 2º analytics see also: http://slidesha.re/1sC2BOX
  • 35. © 2014 MapR Technologies 35 The Essence of the Problem: What is the (Probable) Color of Each Column?
  • 36. © 2014 MapR Technologies 36© 2014 MapR Technologies Next-Gen Human Genetics – Population Scale
  • 37. © 2014 MapR Technologies 37 The Evolving Genomics Workload Sboner, et al, 2011. The real cost of sequencing: higher than you think! <= 2º analytics “next-gen high ROI use cases”
  • 38. © 2014 MapR Technologies 38 MapR Data Platform Advantage, Clinical Genomics Epidemiological, Actuarial Analyses Denormalization for Search, Viz, Research ETL Clinical Reporting WEB TIERClinical Reporting Systems CLINICAL TREATMENT OF PATIENTS RESEARCHERS National Pop. Database INDEX SHARDSPrognostic Capability
  • 39. © 2014 MapR Technologies 39 Co-expression (10K samples) and Linkage Gene Annotation / Set CompletionBMP6 BMP2 MMP3 LIF NOS2A MMP13 CSPG4 ACAN ACAN ACAN COL11A2 COL11A2 COL9A1 MATN1 LECT1 MATN4 HAPLN1 HAPLN1 ITGA10 EDIL3 NGF MAST4 MATN3 EPYC COL11A1 COL11A1 COL10A1 COL10A1 THBS3 C1QTNF3 WISP1 PDPN PDLIM4 CHST3 MIA SOX5 CYTL1 TNMD AKR1C1 MMP12 ETNK1 RELA FOSL1 EIF2C2 NUPL1 RLF RELB SOD2 RNF24 RNF24 XYLT1 HAS2 BDKRB1 HSPC159 SLC28A3 FZD10 SLC28A3 HSPC159 BDKRB1 HAS2 XYLT1 RNF24 RNF24 SOD2 RELB RLF NUPL1 EIF2C2 FOSL1 RELA ETNK1 MMP12 AKR1C1 TNMD CYTL1 SOX5 MIA CHST3 PDLIM4 PDPN FZD10 WISP1 C1QTNF3 THBS3 COL10A1 COL10A1 COL11A1 COL11A1 EPYC MATN3 MAST4 NGF EDIL3 ITGA10 HAPLN1 HAPLN1 MATN4 ACAN ACAN ACAN LECT1 MATN1 COL9A1 COL11A2 COL11A2 CSPG4 MMP13 NOS2A LIF MMP3 BMP2 BMP6 Disease gene characterization through large-scale co-expression analysis. http://www.ncbi.nlm.nih.gov/pubmed/20046828 + =>
  • 40. © 2014 MapR Technologies 40 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building NPR. 2011. The Search For Analysts To Make Sense Of 'Big Data’ http://www.npr.org/2011/11/30/142893065
  • 41. © 2014 MapR Technologies 41 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building • Identify network structures • Label them • Observe stimulus=>response space mapping • Purposefully target • $$$$ Twitter’s Business Model
  • 42. © 2014 MapR Technologies 42© 2014 MapR Technologies These are Linear Algebra / Machine Learning Problems
  • 43. © 2014 MapR Technologies 43© 2014 MapR Technologies A Quick Digression: Recommender Systems
  • 44. © 2014 MapR Technologies 44 HOW RECOMMENDATIONS WORK Behavior of a crowd helps us understand what individuals will do
  • 45. © 2014 MapR Technologies 45 History Matrix (A) Alice Bob Charles ✔ ✔ ✔ ✔ ✔ ✔ ✔
  • 46. © 2014 MapR Technologies 46 Co-occurrence Matrix (ATA) 1 2 1 1 1 1 2 1
  • 47. © 2014 MapR Technologies 47 <Normalize> (filter to identify only unusual co-occurences)
  • 48. © 2014 MapR Technologies 48 HOW CROSS-RECOMMENDATIONS WORK Behavior of a crowd helps us understand what individuals will do
  • 49. © 2014 MapR Technologies 49 Example Multi-modal Inputs • Overlap in restaurant visits is useful • Big spender cues • Cuisine as an indicator • Review text as an indicator
  • 50. © 2014 MapR Technologies 50 People do more than one kind of thing • Different kinds of behaviors give different quality, quantity and kind of information – Restaurant visits – Movie reviews • We don’t have to do co-occurrence • We can do cross-occurrence • Result is cross-recommendation
  • 51. © 2014 MapR Technologies 51 For example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • ATA gives query recommendation – “did you mean to ask for” • BTB gives video recommendation – “you might like these videos”
  • 52. © 2014 MapR Technologies 52 The punch-line • BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 53. © 2014 MapR Technologies 53 Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres del paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 54. © 2014 MapR Technologies 54 Real-life example
  • 55. © 2014 MapR Technologies 55 Previous Click Histories user1 user2 user3 user4 user5 1 2 3 4 5 6 7 8
  • 56. © 2014 MapR Technologies 56 Detect similar content: 2 & 8 user1 user2 user3 user4 user5 1 2 3 4 5 6 7 8
  • 57. © 2014 MapR Technologies 57 Call to Action – Request Clicks user1 user2 user3 user4 user5 Show me more: sports comedy technology 1 2 3 4 5 6 7 8 “Under Construction”
  • 58. © 2014 MapR Technologies 58 Build Navigational Ontology (estimate content labels): 4=sports ; 2 & 7=comedy user1 user2 user3 user4 user5 Show me more: sports comedy technology 1 2 3 4 5 6 7 8 4 2 & 7 Under construction
  • 59. © 2014 MapR Technologies 59 Matrices A (U*Q) and B (U*V) Query Term = Clicked Term Users Query Terms Users Clicked Videos
  • 60. © 2014 MapR Technologies 60 Relate Q to V Users Query Terms
  • 61. © 2014 MapR Technologies 61 Relate Q to V Users Query Terms
  • 62. © 2014 MapR Technologies 62 Users Query Terms
  • 63. © 2014 MapR Technologies 63 Relate Q to V: it’s a Cross-Recommender QueryTerms Videos
  • 64. © 2014 MapR Technologies 64© 2014 MapR Technologies Population-level Inference
  • 65. © 2014 MapR Technologies 65 Typical Dimensions in Genetics/Medicine • Genotype • Gene Expression • Samples • Phenotypes (traits/behavior)
  • 66. © 2014 MapR Technologies 66 Typical Dimensions in Behavioral Data • Genotype • Gene Expression • Samples Individuals • Phenotype – Traits – Behaviors
  • 67. © 2014 MapR Technologies 67 Incidence/Co-occurrence in Behavioral Data • Individual * Individual – Genealogy • Trait * Behavior => [Netflix] – User/Content Topic Modeling • Genotype * Behavior => [Psychometrics] – Genetics of personality, intelligence, aptitude • Behavior * Outcome => [Korn-Ferry] – Job effectiveness • Phenotype (trait/behavior) * Outcome => [eHarmony] – Reproductive fitness
  • 68. © 2014 MapR Technologies 68 Traits and Behaviors: Content Topic Modeling / UX Personalization
  • 69. © 2014 MapR Technologies 69 Behaviors and Outcomes: Economic Fitness (Korn/Ferry) Korn/Ferry ProSpective http://linkedin.kornferry.com Allen =>
  • 70. © 2014 MapR Technologies 70 Genes Job Performance
  • 71. © 2014 MapR Technologies 71 (Traits/Behaviors) and Outcomes Reproductive Fitness (eHarmony) eHarmony @ Hadoop World: Data Science of Love http://eharmony.com
  • 72. © 2014 MapR Technologies 72 Genes Reproductive Outcomes
  • 73. © 2014 MapR Technologies 73 Genes => Traits => Behaviors => Fitness Job Performance Psychometrics Movie Preferences Medicine Forensics
  • 74. © 2014 MapR Technologies 74 Genes => Traits => Behaviors => Fitness Job Performance Psychometrics Movie Preferences Medicine Forensics Fitness Reproductive Outcomes
  • 75. © 2014 MapR Technologies 75
  • 76. © 2014 MapR Technologies 76 ENCODE http://www.nature.com/news/encode-the-human-encyclopaedia-1.11312
  • 77. © 2014 MapR Technologies 77 Robot Scientist Sparkes, et al. 2010. Towards Robot Scientists for autonomous scientific discovery
  • 78. © 2014 MapR Technologies 78 Robot (Data?) Scientist Sparkes, et al. 2010. Towards Robot Scientists for autonomous scientific discovery
  • 79. © 2014 MapR Technologies 79© 2014 MapR Technologies Thanks

Notas del editor

  1. The genomic position (x-axis) of probesets within a 6 megabase region centered at the location of TTN, a gene known to be associated with LMGD2, is plotted versus the Pearson correlation coefficient An external file that holds a picture, illustration, etc.Object name is pone.0008491.e023.jpg (y-axis) to a list of probesets targeting other genes known to be associated with LGMD2 (excluding TTN) across 11636 HG-U133_Plus_2 microarrays. Solid circles: probesets targeting TTN, An external file that holds a picture, illustration, etc.Object name is pone.0008491.e024.jpg: probesets that are for genes of unknown function and, open circles: probesets for known genes in interval.
  2. Allen: this is the transitional slide from talking about more than one input to one step further: cross recommendation. I doubt you want to use it as it, but I’ve included it FYI
  3. Allen: additional transitional slide
  4. Allen: What do you plan to say about this? General example without anything proprietary?
  5. Allen: What do you plan to say about this? General example without anything proprietary?
  6. Allen: What do you plan to say about this? General example without anything proprietary?