SlideShare una empresa de Scribd logo
1 de 70
Crowdsourcing Biology: The Gene
Wiki, BioGPS and GeneGames.org
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
May 14, 2014
CBIIT
Slides: slideshare.net/andrewsu
Citizen Science!
Few genes are well annotated…
2
Data: NCBI, February 2013
41%
65%
CTNNB1
VEGFA
SIRT1
FGFR2
TGFB1
TP53
MEF2C
BMP4
LEF1
WNT5A
TNF
20,473
protein-
coding
genes
Genes, sorted by decreasing counts
GOAnnotation
Counts
… because the literature is sparsely curated?
3
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
… because the literature is sparsely curated?
4
0
10
20
30
40
1983 1988 1993 1998 2003 2008 2013
Average capacity of human scientist
5
311,696 articles (1.5% of PubMed)
have been cited by GO annotations
6
0
Sooner or later, the
research community will
need to be involved in the
annotation effort to scale
up to the rate of data
generation.
The Long Tail is a prolific source of content
7
Short
Head
Long Tail
Content
produced
Contributors (sorted)
News :
Video:
Product reviews:
Food reviews:
Talent judging:
Newspapers
TV/Hollywood
Consumer reports
Food critics
Olympics
Blogs
YouTube
Amazon reviews
Yelp
American Idol
Wikipedia is reasonably accurate
8
Wikipedia has breadth and depth
9
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Articles
Words
(millions)
Wikipedia Britannica
Online
10
We can harness the
Long Tail of scientists
to directly participate in
the gene annotation
process.
From crowdsourcing to structured data
11
The Gene Wiki
Citizen Science
Filtering, extracting, and summarizing PubMed
Documents
Concepts Review article
Filtering, extracting, and summarizing PubMed
Documents
Concepts
Wiki success depends on a positive feedback
14
Gene wiki page utility
Number of
users
Number of
contributors
1001
2002
10,000 gene “stubs” within Wikipedia
15
Protein structure
Symbols and
identifiers
Tissue expression
pattern
Gene Ontology
annotations
Links to structured
databases
Gene
summary
Protein
interactions
Linked
references
Huss, PLoS Biol, 2008
Utility
Users
Contributors
Gene Wiki has a critical mass of readers
16
Total: 4.0 million views / month
Huss, PLoS Biol, 2008; Good, NAR, 2011
Utility
Users
Contributors
Gene Wiki has a critical mass of editors
17
Increase of ~10,000 words / month from >1,000 edits
Currently 1.42 million words
Approximately equal to 230 full-length articles
Good, NAR, 2011
Utility
Users
Contributors
Editorcount
Editors
Edits
Editcount
A review article for every gene is powerful
18
References to the literature
Hyperlinks to related concepts
Reelin: 98 editors, 703 edits since July 2002
Heparin: 358 editors, 654 edits since June 2003
AMPK: 109 editors, 203 edits since March 2004
RNAi: 394 editors, 994 edits since October 2002
Making the Gene Wiki more computable
19
Structured annotationsFree text
Filling the gaps in gene annotation
20
Wikilink
GO exact
match
Gene Wiki
mapping
NCBI Entrez Gene: 334
Candidate
assertion
GO:0006897
6319 novel GO annotations
2147 novel DO annotations
Gene Wiki content improves enrichment analysis
23
p-value (PubMed only)
p-value
(PubMed + GW)
Muscle
contraction
More
significant
PubMed + GW
More
significant
PubMed only
Good BM et al., BMC Genomics, 2011
Making the Gene Wiki more computable
24
Structured annotationsFree text
Analyses
Expansion through outreach and incentives
26
SP-A1
SP-A2
KIF11
LIG3 MIR155
EPHX2
Cardiovascular Gene Wiki Portal
27
• CAMK2D -- CaM kinase II subunit delta
• CSRP3 -- Cysteine and glycine-rich protein 3
• GJA1 -- Gap junction alpha-1 protein / Connexin-43
• MAPK14 -- Mitogen-activated protein kinase 14 / p38-α
• MYL7 -- Myosin regulatory light chain 2, atrial isoform
• MYL2 -- Myosin regulatory light chain 2, ventricular/cardiac
isoform
• PECAM1 -- Platelet endothelial cell adhesion molecule/CD31
• RYR2 -- Ryanodine receptor 2
• ATP2A2 -- Sarcoplasmic/endoplasmic reticulum calcium
ATPase 2 / SERCA2
• TNNI3 -- Troponin I, cardiac muscle
• TNNT2 -- Troponin T, cardiac muscle
Peipei Ping
UCLA
The
Long Tail of scientists
is a valuable source of
information on gene
function
28
From crowdsourcing to structured data
29
The Gene Wiki
Citizen Science
Gene databases are numerous and overlapping
30
… and hundreds
more …
Why is there so much redundancy?
31
Users
Requests
Resources
Time
Community
development
BioGPS emphasizes community extensibility
Why do developers define the gene report view?
32
BioGPS emphasizes user customizability
http://biogps.org
Community extensibility and user customizability
33
Utility
UsersContributors
Utility: A simple and universal plugin interface
34
Utility
UsersContributors
Utility: A simple and universal plugin interface
35
Utility
UsersContributors
Utility: A simple and universal plugin interface
36
Utility
UsersContributors
Utility: A simple and universal plugin interface
37
Utility
UsersContributors
Utility: A simple and universal plugin interface
38
Utility: A simple and universal plugin interface
39
Utility
UsersContributors
Total of > 540 gene-centric online
databases registered as BioGPS plugins
Users: BioGPS has critical mass
40
• > 6400 registered users
• 14,000 unique visitors per month
• 155,000 page views per month
1. Harvard
2. NIH
3. UCSD
4. Scripps
5. MIT
6. Cambridge
7. U Penn
8. Stanford
9. Wash U
10. UNC
Top 10 organizations
Daily pageviewsUtility
UsersContributors
Contributors: Explicit and implicit knowledge
41
540 plugins registered
(>300 publicly shared)
by over 120 users
spanning 280+ domains
Utility
UsersContributors
Gene Annotation Query as a Service
42
http://mygene.info
• High performance
• 3M hits/month
• Highly scalable
• 13k species
• 16M genes
• Weekly data updates
• JSON output
• REST interface
• Python/R/JS libraries
The
Long Tail of
bioinformaticians
can collaboratively
build a gene portal.
43
From crowdsourcing to structured data
44
The Gene Wiki
Citizen Science
The biomedical literature is growing fast
45
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
Information Extraction
46
1. Find mentions of high level concepts in
text
2. Map mentions to specific terms in
ontologies
3. Identify relationships between concepts
Disease mentions in PubMed abstracts
47
NCBI Disease corpus
• 793 PubMed abstracts
• (100 development, 593 training, 100 test)
• 12 expert annotators (2 annotate each abstract)
6,900 “disease” mentions
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in
PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural
Language Processing. Association for Computational Linguistics.
Four types of disease mentions
48
Specific Disease:
• “Diastrophic dysplasia”
Disease Class:
• “Cancers”
Composite Mention:
• “prostatic , skin , and lung cancer”
Modifier:
• ..the “familial breast cancer” gene , BRCA2..
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in
PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural
Language Processing. Association for Computational Linguistics.
Question: Can a group of non-scientists
collectively perform concept recognition in
biomedical texts?
49
The Turk
50
http://en.wikipedia.org/wiki/The_Turk
The Turk
51
http://en.wikipedia.org/wiki/The_Turk
Amazon Mechanical Turk (AMT)
52
Requester
Amazon
For each task, specify:
• a qualification test
• how many workers per task
• how much we will pay per task
Manages:
• parallel execution of jobs
• worker access to tasks
via qualification tests
• payments
• task advertising
Workers
1. Create tasks
2. Execute
3. Aggregate
Instructions to workers
53
• Highlight all diseases and disease abbreviations
• “...are associated with Huntington disease ( HD )... HD patients
received...”
• “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked
immunodeficiency…”
• Highlight the longest span of text specific to a disease
• “... contains the insulin-dependent diabetes mellitus locus …”
• Highlight disease conjunctions as single, long spans.
• “... a significant fraction of familial breast and ovarian cancer , but
undergoes…”
• Highlight symptoms - physical results of having a
disease
– “XFE progeroid syndrome can cause dwarfism, cachexia, and
microcephaly. Patients often display learning disabilities, hearing loss,
and visual impairment.
Qualification test
54
Test #1: “Myotonic dystrophy ( DM ) is associated with a ( CTG ) in
trinucleotide repeat expansion in the 3-untranslated region of a protein
kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ”
Test #2: “Germline mutations in BRCA1 are responsible for most cases of
inherited breast and ovarian cancer . However , the function of the BRCA1
protein has remained elusive . As a regulated secretory protein , BRCA1
appears to function by a mechanism not previously described for tumour
suppressor gene products.”
Test #3: “We report about Dr . Kniest , who first described the condition in
1952 , and his patient , who , at the age of 50 years is severely
handicapped with short stature , restricted joint mobility , and blindness but
is mentally alert and leads an active life . This is in accordance with
molecular findings in other patients with Kniest dysplasia and…”
26 yes / no questions
Qualification test results
55
Threshold
for passing
33/194 passed
17%
Workers
qualified
workers
Simple annotation interface
56
Click to see
instructions
Highlight
disease
mentions
Experimental design
• Task: Identify the disease mentions in
the 593 abstracts from the NCBI disease
corpus
– $0.06 per Human Intelligence Task (HIT)
– HIT = annotate one abstract from PubMed
– 5 workers annotate each abstract
57
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
Aggregation function based on simple voting
58
5
1 or more votes (K=1)
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
K=2
K=3 K=4
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
Comparison to gold standard
59
F = 0.81, k = 2, N = 5
• 593 documents
• 7 days
• 17 workers
• $192.90
Comparisons to text-mining algorithms
64
Comparisons to human annotators
65
Average level of
agreement
between expert
annotators
(stage 1)
F = 0.76
Comparisons to human annotators
66
F = 0.76
F = 0.87
Average level of
agreement
between expert
annotators
(stage 2)
67
In aggregate, our worker
ensemble is faster, cheaper
and as accurate as a single
expert annotator for disease
concept recognition.
Information Extraction
68
1. Find mentions of high level concepts in
text
2. Map mentions to specific terms in
ontologies
3. Identify relationships between concepts
Annotating the relationships
69
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
therapeutic target
subject
predicate
object
GENE
DISEASE
Citizen Science at Mark2Cure.org
70
The
Long Tail of
citizen scientists
can collaboratively
annotate biomedical
text.
71
72
Doug Howe, ZFIN
John Hogenesch, U Penn
Jon Huss, GNF
Luca de Alfaro, UCSC
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Lynn Schriml, U Maryland
Paul Pavlidis, U British Columbia
Peipei Ping, UCLA
Many Wikipedia editors
WP:MCB Project
Collaborators
Katie Fisch
Karthik Gangavarapu
Louis Gioia
Ben Good
Salvatore Loguercio
Adam Mark
Max Nanis
Ginger Tseung
Chunlei Wu
Group members
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su
Adriel Carolino
Erik Clarke
Jon Huss
Marc Leglise
Maximilian Ludvigsson
Ian MacLeod
Camilo Orozco
Key group alumni
Citizen Science logo based on
http://thenounproject.com/term/team
work/39543/
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820, DA036134)
Related AMT work
73
• [1] Zhai et al 2013, used similar protocol to tag medication names in
clinical trials descriptions. F = 0.88 compared to gold standard
• [2] Burger et al, using microtask workers to identify relationships
between genes and mutations.
• [3] Aroyo & Welty, used workers to identify relations between
concepts in medical text.
[1] Zhai H. et al (2013) ”Web 2.0-Based Crowdsourcing for High-Quality Gold Standard
Development in Clinical Natural Language Processing” J Med Internet Res
[2] Burger, John, et al. (2014) "Hybrid curation of gene-mutation relations combining
automated extraction and crowdsourcing.” Mitre technical report
[3] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation
extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM
Research, 2013.

Más contenido relacionado

La actualidad más candente

NGS-Based Clinical Analysis
NGS-Based Clinical AnalysisNGS-Based Clinical Analysis
NGS-Based Clinical AnalysisDelaina Hawkins
 
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism BodyFrom N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism BodyLarry Smarr
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekingeProf. Wim Van Criekinge
 
Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases Waliullah Wali
 
20141218 Methylation Sequencing Analysis
20141218  Methylation Sequencing Analysis20141218  Methylation Sequencing Analysis
20141218 Methylation Sequencing AnalysisYi-Feng Chang
 
The key considerations of crispr genome editing
The key considerations of crispr genome editingThe key considerations of crispr genome editing
The key considerations of crispr genome editingChris Thorne
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Larry Smarr
 
Robert Pesich_PAVA_Stanford Resume v. 8_22_16
Robert Pesich_PAVA_Stanford Resume v. 8_22_16Robert Pesich_PAVA_Stanford Resume v. 8_22_16
Robert Pesich_PAVA_Stanford Resume v. 8_22_16Robert Pesich
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
Hao Liu Resume 2017-02
Hao Liu Resume 2017-02Hao Liu Resume 2017-02
Hao Liu Resume 2017-02Hao Liu
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderAlexander Pico
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014eilosei
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Projectkhamere
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Larry Smarr
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 

La actualidad más candente (20)

NGS-Based Clinical Analysis
NGS-Based Clinical AnalysisNGS-Based Clinical Analysis
NGS-Based Clinical Analysis
 
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism BodyFrom N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases Cancer genome databases & Ecological databases
Cancer genome databases & Ecological databases
 
20141218 Methylation Sequencing Analysis
20141218  Methylation Sequencing Analysis20141218  Methylation Sequencing Analysis
20141218 Methylation Sequencing Analysis
 
The key considerations of crispr genome editing
The key considerations of crispr genome editingThe key considerations of crispr genome editing
The key considerations of crispr genome editing
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
 
Robert Pesich_PAVA_Stanford Resume v. 8_22_16
Robert Pesich_PAVA_Stanford Resume v. 8_22_16Robert Pesich_PAVA_Stanford Resume v. 8_22_16
Robert Pesich_PAVA_Stanford Resume v. 8_22_16
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
Hao Liu Resume 2017-02
Hao Liu Resume 2017-02Hao Liu Resume 2017-02
Hao Liu Resume 2017-02
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
Mi rvar
Mi rvarMi rvar
Mi rvar
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014
 
2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Mohit_CV
Mohit_CVMohit_CV
Mohit_CV
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 

Destacado

ScholarMate - A social research management tool
ScholarMate - A social research management toolScholarMate - A social research management tool
ScholarMate - A social research management toolJing Wang
 
social media cafe / organize your author identities
 social media cafe / organize your author identities social media cafe / organize your author identities
social media cafe / organize your author identitiesHugo Besemer
 
ScholarMate科研之友: 科研社交媒体推广平台
ScholarMate科研之友: 科研社交媒体推广平台ScholarMate科研之友: 科研社交媒体推广平台
ScholarMate科研之友: 科研社交媒体推广平台Jing Wang
 
Social media and altmetrics for scientists
Social media and altmetrics for scientistsSocial media and altmetrics for scientists
Social media and altmetrics for scientistsWouter Gerritsma
 
How to increase your h index and paper citation
How to increase your h index and paper citation How to increase your h index and paper citation
How to increase your h index and paper citation zwentang
 
Academic social networking sites
Academic social networking sitesAcademic social networking sites
Academic social networking sitesKaty Jordan
 
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...Michael McGaulley
 
Ebola virus November 2014- A final update?
Ebola virus November 2014- A final update?Ebola virus November 2014- A final update?
Ebola virus November 2014- A final update?VAIBHAV RAJHANS
 
general organization and characterstics of virus
general organization and characterstics of virusgeneral organization and characterstics of virus
general organization and characterstics of virusMohd Asif Kanth
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Interesting facts about Mars
Interesting facts about MarsInteresting facts about Mars
Interesting facts about MarsStefan Andrei
 
The role and importance of social media in science
The role and importance of social media in science The role and importance of social media in science
The role and importance of social media in science Jari Laru
 
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)Chung Yen Chang
 
How to set up your Google Scholar profile (Google Scholar Citations)
How to set up your Google Scholar profile (Google Scholar Citations)How to set up your Google Scholar profile (Google Scholar Citations)
How to set up your Google Scholar profile (Google Scholar Citations)SarahG_SS
 
Communicating Science (or anything else) Online
Communicating Science (or anything else) OnlineCommunicating Science (or anything else) Online
Communicating Science (or anything else) OnlineGary Schroeder
 
UFOs: The Reality of the Impossible
UFOs: The Reality of the ImpossibleUFOs: The Reality of the Impossible
UFOs: The Reality of the ImpossibleMichael Hughes
 

Destacado (20)

SaaS: Science as a Service
SaaS: Science as a Service SaaS: Science as a Service
SaaS: Science as a Service
 
ScholarMate - A social research management tool
ScholarMate - A social research management toolScholarMate - A social research management tool
ScholarMate - A social research management tool
 
social media cafe / organize your author identities
 social media cafe / organize your author identities social media cafe / organize your author identities
social media cafe / organize your author identities
 
ScholarMate科研之友: 科研社交媒体推广平台
ScholarMate科研之友: 科研社交媒体推广平台ScholarMate科研之友: 科研社交媒体推广平台
ScholarMate科研之友: 科研社交媒体推广平台
 
GODAN action wp1
GODAN action wp1GODAN action wp1
GODAN action wp1
 
Social media and altmetrics for scientists
Social media and altmetrics for scientistsSocial media and altmetrics for scientists
Social media and altmetrics for scientists
 
How to increase your h index and paper citation
How to increase your h index and paper citation How to increase your h index and paper citation
How to increase your h index and paper citation
 
Academic social networking sites
Academic social networking sitesAcademic social networking sites
Academic social networking sites
 
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...
A Remedy for Death--Playing God with Body, Soul & Bio-tech. A science technot...
 
Ebola virus November 2014- A final update?
Ebola virus November 2014- A final update?Ebola virus November 2014- A final update?
Ebola virus November 2014- A final update?
 
general organization and characterstics of virus
general organization and characterstics of virusgeneral organization and characterstics of virus
general organization and characterstics of virus
 
Vírus
VírusVírus
Vírus
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Interesting facts about Mars
Interesting facts about MarsInteresting facts about Mars
Interesting facts about Mars
 
The role and importance of social media in science
The role and importance of social media in science The role and importance of social media in science
The role and importance of social media in science
 
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
 
How to set up your Google Scholar profile (Google Scholar Citations)
How to set up your Google Scholar profile (Google Scholar Citations)How to set up your Google Scholar profile (Google Scholar Citations)
How to set up your Google Scholar profile (Google Scholar Citations)
 
Communicating Science (or anything else) Online
Communicating Science (or anything else) OnlineCommunicating Science (or anything else) Online
Communicating Science (or anything else) Online
 
UFOs: The Reality of the Impossible
UFOs: The Reality of the ImpossibleUFOs: The Reality of the Impossible
UFOs: The Reality of the Impossible
 
Ebola virus
Ebola virusEbola virus
Ebola virus
 

Similar a Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science

Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Quantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic ObservatoryQuantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic ObservatoryLarry Smarr
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptxHussainTaqi1
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemSubhendu Dey
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...QIAGEN
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteQIAGEN
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.Elena Sügis
 
From Genomics to Medicine: Advancing Healthcare at Scale
From Genomics to Medicine: Advancing Healthcare at ScaleFrom Genomics to Medicine: Advancing Healthcare at Scale
From Genomics to Medicine: Advancing Healthcare at ScaleDatabricks
 
UKSG 2023 - Will artificial intelligence change how readers use the research ...
UKSG 2023 - Will artificial intelligence change how readers use the research ...UKSG 2023 - Will artificial intelligence change how readers use the research ...
UKSG 2023 - Will artificial intelligence change how readers use the research ...UKSG: connecting the knowledge community
 

Similar a Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science (20)

Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Quantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic ObservatoryQuantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic Observatory
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx01. Introduction to Bioinformatics.pptx
01. Introduction to Bioinformatics.pptx
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
EnrichR database
EnrichR databaseEnrichR database
EnrichR database
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problem
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...
Microbiome Isolation and DNA Enrichment Protocol: Pathogen Detection Webinar ...
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro Suite
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
From Genomics to Medicine: Advancing Healthcare at Scale
From Genomics to Medicine: Advancing Healthcare at ScaleFrom Genomics to Medicine: Advancing Healthcare at Scale
From Genomics to Medicine: Advancing Healthcare at Scale
 
UKSG 2023 - Will artificial intelligence change how readers use the research ...
UKSG 2023 - Will artificial intelligence change how readers use the research ...UKSG 2023 - Will artificial intelligence change how readers use the research ...
UKSG 2023 - Will artificial intelligence change how readers use the research ...
 

Más de Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Andrew Su
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)Andrew Su
 

Más de Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 

Último

The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 

Último (20)

The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 

Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science

  • 1. Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org May 14, 2014 CBIIT Slides: slideshare.net/andrewsu Citizen Science!
  • 2. Few genes are well annotated… 2 Data: NCBI, February 2013 41% 65% CTNNB1 VEGFA SIRT1 FGFR2 TGFB1 TP53 MEF2C BMP4 LEF1 WNT5A TNF 20,473 protein- coding genes Genes, sorted by decreasing counts GOAnnotation Counts
  • 3. … because the literature is sparsely curated? 3 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1983 1988 1993 1998 2003 2008 2013 Number of new PubMed-indexed articles
  • 4. … because the literature is sparsely curated? 4 0 10 20 30 40 1983 1988 1993 1998 2003 2008 2013 Average capacity of human scientist
  • 5. 5 311,696 articles (1.5% of PubMed) have been cited by GO annotations
  • 6. 6 0 Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.
  • 7. The Long Tail is a prolific source of content 7 Short Head Long Tail Content produced Contributors (sorted) News : Video: Product reviews: Food reviews: Talent judging: Newspapers TV/Hollywood Consumer reports Food critics Olympics Blogs YouTube Amazon reviews Yelp American Idol
  • 9. Wikipedia has breadth and depth 9 http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008 Articles Words (millions) Wikipedia Britannica Online
  • 10. 10 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
  • 11. From crowdsourcing to structured data 11 The Gene Wiki Citizen Science
  • 12. Filtering, extracting, and summarizing PubMed Documents Concepts Review article
  • 13. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 14. Wiki success depends on a positive feedback 14 Gene wiki page utility Number of users Number of contributors 1001 2002
  • 15. 10,000 gene “stubs” within Wikipedia 15 Protein structure Symbols and identifiers Tissue expression pattern Gene Ontology annotations Links to structured databases Gene summary Protein interactions Linked references Huss, PLoS Biol, 2008 Utility Users Contributors
  • 16. Gene Wiki has a critical mass of readers 16 Total: 4.0 million views / month Huss, PLoS Biol, 2008; Good, NAR, 2011 Utility Users Contributors
  • 17. Gene Wiki has a critical mass of editors 17 Increase of ~10,000 words / month from >1,000 edits Currently 1.42 million words Approximately equal to 230 full-length articles Good, NAR, 2011 Utility Users Contributors Editorcount Editors Edits Editcount
  • 18. A review article for every gene is powerful 18 References to the literature Hyperlinks to related concepts Reelin: 98 editors, 703 edits since July 2002 Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002
  • 19. Making the Gene Wiki more computable 19 Structured annotationsFree text
  • 20. Filling the gaps in gene annotation 20 Wikilink GO exact match Gene Wiki mapping NCBI Entrez Gene: 334 Candidate assertion GO:0006897 6319 novel GO annotations 2147 novel DO annotations
  • 21. Gene Wiki content improves enrichment analysis 23 p-value (PubMed only) p-value (PubMed + GW) Muscle contraction More significant PubMed + GW More significant PubMed only Good BM et al., BMC Genomics, 2011
  • 22. Making the Gene Wiki more computable 24 Structured annotationsFree text Analyses
  • 23. Expansion through outreach and incentives 26 SP-A1 SP-A2 KIF11 LIG3 MIR155 EPHX2
  • 24. Cardiovascular Gene Wiki Portal 27 • CAMK2D -- CaM kinase II subunit delta • CSRP3 -- Cysteine and glycine-rich protein 3 • GJA1 -- Gap junction alpha-1 protein / Connexin-43 • MAPK14 -- Mitogen-activated protein kinase 14 / p38-α • MYL7 -- Myosin regulatory light chain 2, atrial isoform • MYL2 -- Myosin regulatory light chain 2, ventricular/cardiac isoform • PECAM1 -- Platelet endothelial cell adhesion molecule/CD31 • RYR2 -- Ryanodine receptor 2 • ATP2A2 -- Sarcoplasmic/endoplasmic reticulum calcium ATPase 2 / SERCA2 • TNNI3 -- Troponin I, cardiac muscle • TNNT2 -- Troponin T, cardiac muscle Peipei Ping UCLA
  • 25. The Long Tail of scientists is a valuable source of information on gene function 28
  • 26. From crowdsourcing to structured data 29 The Gene Wiki Citizen Science
  • 27. Gene databases are numerous and overlapping 30 … and hundreds more …
  • 28. Why is there so much redundancy? 31 Users Requests Resources Time Community development BioGPS emphasizes community extensibility
  • 29. Why do developers define the gene report view? 32 BioGPS emphasizes user customizability
  • 31. Utility UsersContributors Utility: A simple and universal plugin interface 34
  • 32. Utility UsersContributors Utility: A simple and universal plugin interface 35
  • 33. Utility UsersContributors Utility: A simple and universal plugin interface 36
  • 34. Utility UsersContributors Utility: A simple and universal plugin interface 37
  • 35. Utility UsersContributors Utility: A simple and universal plugin interface 38
  • 36. Utility: A simple and universal plugin interface 39 Utility UsersContributors Total of > 540 gene-centric online databases registered as BioGPS plugins
  • 37. Users: BioGPS has critical mass 40 • > 6400 registered users • 14,000 unique visitors per month • 155,000 page views per month 1. Harvard 2. NIH 3. UCSD 4. Scripps 5. MIT 6. Cambridge 7. U Penn 8. Stanford 9. Wash U 10. UNC Top 10 organizations Daily pageviewsUtility UsersContributors
  • 38. Contributors: Explicit and implicit knowledge 41 540 plugins registered (>300 publicly shared) by over 120 users spanning 280+ domains Utility UsersContributors
  • 39. Gene Annotation Query as a Service 42 http://mygene.info • High performance • 3M hits/month • Highly scalable • 13k species • 16M genes • Weekly data updates • JSON output • REST interface • Python/R/JS libraries
  • 40. The Long Tail of bioinformaticians can collaboratively build a gene portal. 43
  • 41. From crowdsourcing to structured data 44 The Gene Wiki Citizen Science
  • 42. The biomedical literature is growing fast 45 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1983 1988 1993 1998 2003 2008 2013 Number of new PubMed-indexed articles
  • 43. Information Extraction 46 1. Find mentions of high level concepts in text 2. Map mentions to specific terms in ontologies 3. Identify relationships between concepts
  • 44. Disease mentions in PubMed abstracts 47 NCBI Disease corpus • 793 PubMed abstracts • (100 development, 593 training, 100 test) • 12 expert annotators (2 annotate each abstract) 6,900 “disease” mentions Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
  • 45. Four types of disease mentions 48 Specific Disease: • “Diastrophic dysplasia” Disease Class: • “Cancers” Composite Mention: • “prostatic , skin , and lung cancer” Modifier: • ..the “familial breast cancer” gene , BRCA2.. Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
  • 46. Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts? 49
  • 49. Amazon Mechanical Turk (AMT) 52 Requester Amazon For each task, specify: • a qualification test • how many workers per task • how much we will pay per task Manages: • parallel execution of jobs • worker access to tasks via qualification tests • payments • task advertising Workers 1. Create tasks 2. Execute 3. Aggregate
  • 50. Instructions to workers 53 • Highlight all diseases and disease abbreviations • “...are associated with Huntington disease ( HD )... HD patients received...” • “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked immunodeficiency…” • Highlight the longest span of text specific to a disease • “... contains the insulin-dependent diabetes mellitus locus …” • Highlight disease conjunctions as single, long spans. • “... a significant fraction of familial breast and ovarian cancer , but undergoes…” • Highlight symptoms - physical results of having a disease – “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment.
  • 51. Qualification test 54 Test #1: “Myotonic dystrophy ( DM ) is associated with a ( CTG ) in trinucleotide repeat expansion in the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ” Test #2: “Germline mutations in BRCA1 are responsible for most cases of inherited breast and ovarian cancer . However , the function of the BRCA1 protein has remained elusive . As a regulated secretory protein , BRCA1 appears to function by a mechanism not previously described for tumour suppressor gene products.” Test #3: “We report about Dr . Kniest , who first described the condition in 1952 , and his patient , who , at the age of 50 years is severely handicapped with short stature , restricted joint mobility , and blindness but is mentally alert and leads an active life . This is in accordance with molecular findings in other patients with Kniest dysplasia and…” 26 yes / no questions
  • 52. Qualification test results 55 Threshold for passing 33/194 passed 17% Workers qualified workers
  • 53. Simple annotation interface 56 Click to see instructions Highlight disease mentions
  • 54. Experimental design • Task: Identify the disease mentions in the 593 abstracts from the NCBI disease corpus – $0.06 per Human Intelligence Task (HIT) – HIT = annotate one abstract from PubMed – 5 workers annotate each abstract 57
  • 55. This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. Aggregation function based on simple voting 58 5 1 or more votes (K=1) This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=2 K=3 K=4 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
  • 56. Comparison to gold standard 59 F = 0.81, k = 2, N = 5 • 593 documents • 7 days • 17 workers • $192.90
  • 57.
  • 58.
  • 59.
  • 60.
  • 61. Comparisons to text-mining algorithms 64
  • 62. Comparisons to human annotators 65 Average level of agreement between expert annotators (stage 1) F = 0.76
  • 63. Comparisons to human annotators 66 F = 0.76 F = 0.87 Average level of agreement between expert annotators (stage 2)
  • 64. 67 In aggregate, our worker ensemble is faster, cheaper and as accurate as a single expert annotator for disease concept recognition.
  • 65. Information Extraction 68 1. Find mentions of high level concepts in text 2. Map mentions to specific terms in ontologies 3. Identify relationships between concepts
  • 66. Annotating the relationships 69 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. therapeutic target subject predicate object GENE DISEASE
  • 67. Citizen Science at Mark2Cure.org 70
  • 68. The Long Tail of citizen scientists can collaboratively annotate biomedical text. 71
  • 69. 72 Doug Howe, ZFIN John Hogenesch, U Penn Jon Huss, GNF Luca de Alfaro, UCSC Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Lynn Schriml, U Maryland Paul Pavlidis, U British Columbia Peipei Ping, UCLA Many Wikipedia editors WP:MCB Project Collaborators Katie Fisch Karthik Gangavarapu Louis Gioia Ben Good Salvatore Loguercio Adam Mark Max Nanis Ginger Tseung Chunlei Wu Group members Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Adriel Carolino Erik Clarke Jon Huss Marc Leglise Maximilian Ludvigsson Ian MacLeod Camilo Orozco Key group alumni Citizen Science logo based on http://thenounproject.com/term/team work/39543/ Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820, DA036134)
  • 70. Related AMT work 73 • [1] Zhai et al 2013, used similar protocol to tag medication names in clinical trials descriptions. F = 0.88 compared to gold standard • [2] Burger et al, using microtask workers to identify relationships between genes and mutations. • [3] Aroyo & Welty, used workers to identify relations between concepts in medical text. [1] Zhai H. et al (2013) ”Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing” J Med Internet Res [2] Burger, John, et al. (2014) "Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing.” Mitre technical report [3] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.

Notas del editor

  1. We are very early in our efforts to comprehensively annotate human gene functionWhy important? Genome-scale surveys aren’t biased toward well studied genes, huge opportunity for biomedical discoveryNo IEA
  2. If you believe that greater than 1.5% of articles have relevance to gene function, then it says there is a bottleneck in in our curation effortsNumbers updated 7/15/2011
  3. For each resourceBriefly describe the unstructured resourceDescribe the structuring approach
  4. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  5. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  6. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  7. Tried on 773 GO categories, significant in 356 cases (46%)
  8. We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
  9. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  10. Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  11. For each resourceBriefly describe the unstructured resourceDescribe the structuring approach
  12. Developer resources do not scale with usagePractical effects:Core developers’ time is always the rate-limiting step Addition of new features and data always feels slowEventually, new databases are created to fill the gap80% duplication for 20% innovation
  13. MODs and portals
  14. Genetics resources
  15. Literature resources
  16. Protein resources
  17. Pathway and expression databases
  18. Pathway and expression databases
  19. Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  20. For each resourceBriefly describe the unstructured resourceDescribe the structuring approach
  21. … but the amount of knowledge that is amenable to query and computation is tiny. We would like to have more efficient methods for information extraction.
  22. Harmonic mean of the precision and recall593 training corpus
  23. On 100 development data set
  24. On 100 development data set
  25. On 100 development data set
  26. On 100 development data set
  27. Phase 1: pairs of annotators work independently on computationally pre-annotated documents. Phase 2: annotators get to see each other’s annotations and then make changes Phase 3: all remaining inconsistencies resolved collaboratively
  28. Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.