SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Is that a scientific report or just some cool
pictures from the lab? Reproducibility and
computational chemistry
Gregory Landrum Ph.D.
NIBR IT
Novartis Institutes for BioMedical Research
Basel
2013 CADD Gordon Conference, Mount Snow VT
23 July, 2013
Publishing…
Publishing…
Scientific publications have at least two goals: (i) to announce a result and (ii)
to convince readers that the result is correct. Mathematics papers are
expected to contain a proof complete enough to allow knowledgeable
readers to fill in any details. Papers in experimental science should describe
the results and provide a clear enough protocol to allow successful repetition
and extension.
Mesirov, J. P. Accessible Reproducible Research. Science 327,
415–416 (2010).
Outline
§  Reproducibility?
§  Requirements for reproducibility of published research
§  Practical aspects
Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement?
Reproducibility, source code and data in the computational chemistry literature. Future
Medicinal Chemistry 4, 1885–1887 (2012).
Reproducibility
http://en.wikipedia.org/wiki/Reproducibility
Reproducibility is the ability of an entire experiment or
study to be reproduced, either by the researcher or by
someone else working independently. It is one of the main
principles of the scientific method.
Reproducibility
An author’s central obligation is to present an accurate and complete account
of the research performed, absolutely avoiding deception, including the data
collected or used, as well as an objective discussion of the significance of the
research. Data are defined as information collected or used in generating
research conclusions. The research report and the data collected should
contain sufficient detail and reference to public sources of information to
permit a trained professional to reproduce the experimental observations.
ACS “Ethical Guidelines to Publication of Chemical Research”
Reproducibility
Experimental reproducibility is the coin of the scientific realm. The extent to
which measurements or observations agree when performed by different
individuals defines this important tenet of the scientific method. The formal
essence of experimental reproducibility was born of the philosophy of logical
positivism or logical empiricism, which purports to gain knowledge of the world
through the use of formal logic linked to observation. A key principle of logical
positivism is verificationism, which holds that every truth is verifiable by
experience. In this rational context, truth is defined by reproducible experience,
and unbiased scientific observation and determinism are its underpinnings.
…
The assumption that objectively true scientific observations must be reproducible
is implicit, yet direct tests of reproducibility are rarely found in the published
literature. This lack of published evidence of reproducibility stems from the
limited appeal of studies reproducing earlier work to most funding bodies and to
most editors. Furthermore, many readers of scientific journals— especially of
higher-impact journals—assume that if a study is of sufficient quality to pass the
scrutiny of rigorous reviewers, it must be true; this assumption is based on the
inferred equivalence of reproducibility and truth described above.
Loscalzo, J. Irreproducible Experimental Results: Causes, (Mis)
interpretations, and Consequences. Circulation 125, 1211–1214 (2012).
If it’s not reproducible science?
“Let me show you some cool pictures from my lab…”
Requirements for Reproducibility
thanks to Martin Stahl for the picture
A great start
(1) Wherever possible, source code should be provided for new computational methods. The
source code can be a reference implementation of a method or algorithm and does not need to include a
graphical interface. If it is not possible to release the source code for a new method, authors should
provide a sufficient justification. Reviewers and editors will then consider this explanation. Any paper that
does not comply with the reproducibility guidelines will include this explanation when published. In cases
where it is not possible to release code due to intellectual property or other limitations, an executable
version of the new method should be readily accessible. Commercial products should provide time limited
licenses to facilitate evaluation and comparison of published methods.
(2) Any chemical structures and data mentioned in the paper should be made available in a
commonly used (SDF or SMILES) format. Distribution of data in pdf format is not sufficient.
(3) Any publications that employ commercial or open-source software should include scripts
or parameter files as well as data files that will enable others to easily reproduce the work.
(4) A clear easy to follow description of any new method should be a key criterion during the
review process. Wherever possible, a paper should contain a simple worked example that
demonstrates the application of the method. Parameter values and intermediate results for example
compounds should be included as part of the supporting material.
(5) Reviewers should put particular emphasis on the reproducibility of the method described
in a manuscript. Each reviewer should evaluate the description of the method, as well as the presence
of associated code, data, or executables, to ensure that the results can be independently reproduced.
Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J.
Chem. Inf. Model. (2013). doi:10.1021/ci400197w
Requirements for Reproducibility
§ Data used
§ Code/algorithm description
§ Results
Requirements for Reproducibility:
Data
As a condition of publication, authors must agree to make available all data
necessary to understand and assess the conclusions of the manuscript to
any reader of Science. Data must be included in the body of the paper or in
the supplementary materials, where they can be viewed free of charge by all
visitors to the site. Certain types of data must be deposited in an approved
online database, including DNA and protein sequences, microarray data,
crystal structures, and climate records.
http://www.sciencemag.org/site/feature/contribinfo/faq/
index.xhtml#data_faq
Requirements for Reproducibility:
Data
§  This is a no brainer, right?
§  Unless it’s completely unprocessed (or the processing is part of the
detailed method description/code), it’s better to include the actual data
§  “Ligands from PDB structures X, Y, and Z” probably not good enough
§  For sources like ChEMBL, a version number and SQL to grab the data
are probably adequate
Requirements for Reproducibility:
Data
Goodman, L., Lawrence, R. & Ashley, K. Data-set visibility: Cite links to
data in reference lists. Nature 492:356–6 (2012).
A huge amount of work goes into creating data sets. It is crucial that these data,
big or small, should be more prominently linked to their associated research
articles as standard practice.
To achieve this, data can be cited directly in a publication's reference section using
a permanent identifier such as a digital object identifier (DOI; see, for example,
go.nature.com/vnyidi and go.nature.com/zdfbcl). So far, however, only very few
journals do this.
Publishers, funders, researchers and institutions all need to recognize that data
sets constitute a valuable scholarly resource. Authors should be credited for these
career-making contributions. Enhanced data-set visibility would also benefit
referees and readers by raising standards of data analysis, promoting more
detailed review, encouraging data curation and boosting reproducibility and data
reuse.
Requirements for Reproducibility:
Data
§  What about chemical structures?
•  a table with drawings of molecules?
•  names instead of structures?
§  Why not include the structures in a machine-readable format?
This expanded use of electronic resources offers an excellent opportunity to make chemical
information more accessible and user-friendly to readers of scientific papers.
To take advantage of these opportunities, we have developed several online features that expand
the usefulness of chemical compound information for Nature Chemical Biology readers … In all
original research papers, compounds that are relevant to the background or results of the paper
are assigned a bolded, Arabic numeral that serves as a unique identifier for the compound. Each
numerical abbreviation in the HTML and PDF versions of the article is linked to a Compound Data
page, which shows the structure and the IUPAC or common name of the chemical compound.
From there, readers can download a ChemDraw file of the compound…To provide readers with
rapid access to all of the chemical compounds discussed in an article, we feature a Compound
Data Index page, which is accessible from the Compound Data page, the table of contents entry
for the paper, and the navigation tools on the right side of the Nature Chemical Biology website.
http://www.nature.com/nchembio/journal/v3/n6/full/nchembio0607-297.htm
Requirements for Reproducibility:
Chemical Data
From Nature Chemical Biology
Requirements for Reproducibility:
Chemical Data
From Nature Chemistry
Huigens, R. W., et al. A ring-distortion strategy to construct stereochemically complex and
structurally diverse compounds from natural products. Nature Chemistry 5:195-202 (2013).
doi:10.1038/nchem.1549
It’s not always easy
Data Sets. For this study we arbitrarily chose 18 Merck data sets
shown in Table 1. These include a mix of on-target data sets and
ADME data sets. Some data sets are so large (>100,000) that we
randomly selected a smaller subset of compounds (50,000) to
expedite the study. It is useful to use proprietary data sets for two
reasons:
1.  We wanted data sets which are realistically large and have a
realistic level of noise but are not as noisy as high- throughput
data sets.
2.  Time-splitting requires dates of testing, and these are almost
impossible to find in public domain data sets.
Chen, B., Sheridan, R. P., Hornak, V. & Voigt, J. H. Comparison of Random
Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions. J.
Chem. Inf. Model. 52, 792–803 (2012).
Requirements for Reproducibility
§ Data used
§ Code/algorithm description
§ Results
Requirements for Reproducibility:
Code
Stahl, M. & Bajorath, J. Computational Medicinal Chemistry. J. Med.
Chem. 54, 1-2 (2011).
Computational methods must be described in sufficient
detail for the reader to reproduce the results.
Requirements for Reproducibility:
Code
Ince, D. C., Hatton, L. & Graham-Cumming, J. The case for open
computer programs. Nature 482, 485–488 (2012).
We argue that, with some exceptions, anything less
than the release of source programs is intolerable for
results that depend on computation. The vagaries of
hardware, software and natural language will always
ensure that exact reproducibility remains uncertain, but
withholding code increases the chances that efforts to
reproduce results will fail.
Requirements for Reproducibility:
Code
Data and materials availability All data necessary to understand, assess,
and extend the conclusions of the manuscript must be available to any
reader of Science. All computer codes involved in the creation or
analysis of data must also be available to any reader of Science.
After publication, all reasonable requests for data and materials must be
fulfilled. Any restrictions on the availability of data, codes, or materials,
including fees and original data obtained from other sources (Materials
Transfer Agreements), must be disclosed to the editors upon submission.
http://www.sciencemag.org/site/feature/contribinfo/prep/
gen_info.xhtml#dataavail
Requirements for Reproducibility:
Code
An inherent principle of publication is that others should be able to
replicate and build upon the authors' published claims. Therefore, a
condition of publication in a Nature journal is that authors are required to
make materials, data and associated protocols promptly available to
readers without undue qualifications. Any restrictions on the availability of
materials or information must be disclosed to the editors at the time of
submission. Any restrictions must also be disclosed in the submitted
manuscript, including details of how readers can obtain materials and
information. If materials are to be distributed by a for-profit company, this
must be stated in the paper.
http://www.nature.com/authors/policies/availability.html
In the meantime, researchers must, when they are arranging the
commercialization of their work, bear in mind the implications that these
deals may have on their freedom to publish to the standards that the
community is entitled to expect.
http://www.nature.com/nature/journal/v442/
n7098/full/442001a.html
Requirements for Reproducibility:
Code
§  “Black box” code sharing: installing the software on a publicly
accessible server, or providing executables for people to test
§  Does this help with reproducibility?
§  Doesn’t demonstrate that the implementation corresponds to the
algorithm description
§  Not cut and dried.
The Recomputation Manifesto
From Ian Gent, University of St. Andrews
1.  Computational experiments should be recomputable for all time
2.  Recomputation of recomputable experiments should be very easy
3.  It should be easier to make experiments recomputable than not to
4.  Tools and repositories can help recomputation become standard
5.  The only way to ensure recomputability is to provide virtual
machines
6.  Runtime performance is a secondary issue
http://www.software.ac.uk/blog/2013-07-09-recomputation-manifesto
http://arxiv.org/pdf/1304.3674v1.pdf
Requirements for Reproducibility
§ Data used
§ Code/algorithm description
§ Results
Requirements for Reproducibility:
Results
§  Including the actual results is even more of a no brainer, right?
Requirements for Reproducibility:
Results
§  Including the actual results is even more of a no brainer, right?
Homology Models of Human All-Trans Retinoic Acid Metabolizing Enzymes
CYP26B1 and CYP26B1 Spliced Variant
Homology models of CYP26B1 (cytochrome P450RAI2) and CYP26B1 spliced variant were
derived using the crystal structure of cyanobacterial CYP120A1 as template for the model building.
The quality of the homology models generated were carefully evaluated, and the natural substrate
all-trans-retinoic acid (atRA), several tetralone-derived retinoic acid metabolizing blocking agents
(RAMBAs), and a well-known potent inhibitor of CYP26B1 (R115866) were docked into the
homology model of full-length cytochrome P450 26B1. The results show that in the model of the
full-length CYP26B1, the protein is capable of distinguishing between the natural substrate (atRA),
R115866, and the tetralone derivatives. The spliced variant of CYP26B1 model displays a reduced
affinity for atRA compared to the full-length enzyme, in accordance with recently described
experimental information.
This paper, presenting two new homology models, does not
include either model.
Unfortunately I didn’t have to search long to find this example
Requirements for Reproducibility:
Results
§ This is the primary output of the research
§ Helps dampen some of the arguments about statistics
§ Need the unprocessed data
§ All of it
How are we doing?
§  Survey of recent publications:
•  Everything in JCIM vol 52 #10
•  Everything in JCAMD vol 26 #10
•  Journal of Cheminformatics from July 2012-Nov 4 2012
§  Big differences between journals
§  Plenty of room for improvement
§  Analysis is presence/absence of full results
Journal	
   Type	
  of	
  paper	
   Count	
   Full	
  Data	
   Par3al	
  Data	
   Missing	
  Data	
   Code?	
  
JCIM	
   Method	
   13	
   6	
   3	
   4	
   1	
  
JCIM	
   Non-­‐method	
   16	
   10	
   3	
   3	
   0	
  
JCAMD	
   Method	
   3	
   3	
   0	
   0	
   0	
  
JCAMD	
   Non-­‐method	
   4	
   0	
   3	
   1	
   0	
  
JChemInf	
   Method	
   12	
   7	
   3	
   3	
   8	
  
JChemInf	
   Non-­‐method	
   3	
   0	
   0	
   0	
   0	
  
Practical considerations
§  Where to put the data and code?
•  Supplementary material
•  Code-sharing sites (sourceforge.net, google code, github)
•  Data sharing: Zenodo/Labarchives.com
•  A hybrid: Figshare
§  Considerations:
•  It needs to still be there 5+ years from now
•  Having a solid connection to the original paper is good
•  Others have to actually be able to do something with it
Practical considerations
§  Where to put the data and code?
•  Supplementary material
•  Code-sharing sites (sourceforge.net, google code, github)
•  Data sharing: Zenodo/Labarchives.com
•  A hybrid: Figshare
§  Considerations:
•  It needs to still be there 5+ years from now
•  Having a solid connection to the original paper is good
•  Others have to actually be able to do something with it
Some stuff to look at
§  vagrant (virtual box configuration and provisioning):
http://www.vagrantup.com/
§  openshift (cloud-based application deployment):
https://www.openshift.com/
§  wakari (ipython in the cloud): https://wakari.io/
Tools for reproducible research
Knime
§  Open-source workflow tool
§  Strong data manipulation and mining capabilities
§  Data and results can be stored with the workflow.
Tools for reproducible research
IPython notebook
§  Python session running in a browser
•  Tab completion
•  Access to docstrings
§  Text formatting options available for including discussion or capturing
mathematics (access to LaTeX for formatting math)
§  Captures all data transformations and displays output
§  Tight integration with matplotlib
Tools for reproducible research
IPython notebook
Tools for reproducible research
IPython notebook
Here’s a cool picture from my lab.
… and here’s how you can make it too.
A great start
(1) Wherever possible, source code should be provided for new computational methods. The
source code can be a reference implementation of a method or algorithm and does not need to include a
graphical interface. If it is not possible to release the source code for a new method, authors should
provide a sufficient justification. Reviewers and editors will then consider this explanation. Any paper that
does not comply with the reproducibility guidelines will include this explanation when published. In cases
where it is not possible to release code due to intellectual property or other limitations, an executable
version of the new method should be readily accessible. Commercial products should provide time limited
licenses to facilitate evaluation and comparison of published methods.
(2) Any chemical structures and data mentioned in the paper should be made available in a
commonly used (SDF or SMILES) format. Distribution of data in pdf format is not sufficient.
(3) Any publications that employ commercial or open-source software should include scripts
or parameter files as well as data files that will enable others to easily reproduce the work.
(4) A clear easy to follow description of any new method should be a key criterion during the
review process. Wherever possible, a paper should contain a simple worked example that
demonstrates the application of the method. Parameter values and intermediate results for example
compounds should be included as part of the supporting material.
(5) Reviewers should put particular emphasis on the reproducibility of the method described
in a manuscript. Each reviewer should evaluate the description of the method, as well as the presence
of associated code, data, or executables, to ensure that the results can be independently reproduced.
Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J.
Chem. Inf. Model. (2013). doi:10.1021/ci400197w
Requirements for Reproducibility
§ Data used
§ Code/algorithm description
§ Results
Pat’s not completely off the hook
Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J.
Chem. Inf. Model. (2013). doi:10.1021/ci400197w
Pat’s not completely off the hook
Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J.
Chem. Inf. Model. (2013). doi:10.1021/ci400197w
No data
No code
No algorithm description
Results only as a figure
Acknowledgements
§  NIBR:
•  Nik Stiefl (GDC/CADD)
•  Nikolas Fechner (NIBR IT/IS Sigma)
•  Sereina Riniker (NIBR IT/IS Sigma)
§  Matthias Rarey
§  Pat Walters
Perhaps the biggest barrier to reproducible research
is the lack of a deeply ingrained culture that simply
requires reproducibility for all scientific claims.
Peng, R. D. Reproducible Research in Computational Science.
Science 334, 1226–1227 (2011).

Más contenido relacionado

La actualidad más candente

Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIPaul Agapow
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?Varsha Khodiyar
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceNextBio
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014Microsoft Azure for Research
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databasestusharjadhav2611
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 

La actualidad más candente (20)

Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AI
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
 

Destacado

HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197Jiří Černák
 
How Unbreakable Is Batman's Armor?
How Unbreakable Is Batman's Armor?How Unbreakable Is Batman's Armor?
How Unbreakable Is Batman's Armor?Mashable
 
Notice flexible retractable retraflex
Notice flexible retractable retraflexNotice flexible retractable retraflex
Notice flexible retractable retraflexHomexity
 
Peter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer WorkflowsPeter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer WorkflowsSharePoint Saturday NY
 
Presentación de socializacion aamtic
Presentación de socializacion aamticPresentación de socializacion aamtic
Presentación de socializacion aamticPolo Apolo
 
Daily Newsletter: 12th January, 2011
Daily Newsletter: 12th January, 2011Daily Newsletter: 12th January, 2011
Daily Newsletter: 12th January, 2011Fullerton Securities
 
ELECTRONICS - ESSENTIALS OF THE HOUSE
ELECTRONICS - ESSENTIALS OF THE HOUSEELECTRONICS - ESSENTIALS OF THE HOUSE
ELECTRONICS - ESSENTIALS OF THE HOUSEVouchers R US
 
Universidad nacional chimborazo
Universidad nacional chimborazoUniversidad nacional chimborazo
Universidad nacional chimborazoFernando Cáceres
 
Winds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in businessWinds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in businessThe Economist Media Businesses
 
Joel Oleson: SharePoint 2010 Upgrade Drill Down
Joel Oleson: SharePoint 2010 Upgrade Drill DownJoel Oleson: SharePoint 2010 Upgrade Drill Down
Joel Oleson: SharePoint 2010 Upgrade Drill DownSharePoint Saturday NY
 
Mindfulness - What. How
Mindfulness - What. HowMindfulness - What. How
Mindfulness - What. HowOH TEIK BIN
 
Basic guitar anatomy
Basic guitar anatomyBasic guitar anatomy
Basic guitar anatomytyrannis14
 
SharePoint Authentication and Authorization presented by Liam Cleary
SharePoint Authentication and Authorization presented by Liam ClearySharePoint Authentication and Authorization presented by Liam Cleary
SharePoint Authentication and Authorization presented by Liam ClearyEuropean SharePoint Conference
 
Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION Banesa Ruiz
 
SharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday RedmondSharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday RedmondKanwal Khipple
 
Kellogg’s – Give a Child a Breakfast
Kellogg’s – Give a Child a BreakfastKellogg’s – Give a Child a Breakfast
Kellogg’s – Give a Child a BreakfastNewsworks
 

Destacado (20)

Vilifiti
VilifitiVilifiti
Vilifiti
 
HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197
 
How Unbreakable Is Batman's Armor?
How Unbreakable Is Batman's Armor?How Unbreakable Is Batman's Armor?
How Unbreakable Is Batman's Armor?
 
Notice flexible retractable retraflex
Notice flexible retractable retraflexNotice flexible retractable retraflex
Notice flexible retractable retraflex
 
Peter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer WorkflowsPeter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer Workflows
 
Derecho informatico
Derecho informaticoDerecho informatico
Derecho informatico
 
Presentación de socializacion aamtic
Presentación de socializacion aamticPresentación de socializacion aamtic
Presentación de socializacion aamtic
 
Daily Newsletter: 12th January, 2011
Daily Newsletter: 12th January, 2011Daily Newsletter: 12th January, 2011
Daily Newsletter: 12th January, 2011
 
ELECTRONICS - ESSENTIALS OF THE HOUSE
ELECTRONICS - ESSENTIALS OF THE HOUSEELECTRONICS - ESSENTIALS OF THE HOUSE
ELECTRONICS - ESSENTIALS OF THE HOUSE
 
Universidad nacional chimborazo
Universidad nacional chimborazoUniversidad nacional chimborazo
Universidad nacional chimborazo
 
Winds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in businessWinds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in business
 
Joel Oleson: SharePoint 2010 Upgrade Drill Down
Joel Oleson: SharePoint 2010 Upgrade Drill DownJoel Oleson: SharePoint 2010 Upgrade Drill Down
Joel Oleson: SharePoint 2010 Upgrade Drill Down
 
Expresion oral
Expresion oralExpresion oral
Expresion oral
 
Mindfulness - What. How
Mindfulness - What. HowMindfulness - What. How
Mindfulness - What. How
 
Word debate
Word debateWord debate
Word debate
 
Basic guitar anatomy
Basic guitar anatomyBasic guitar anatomy
Basic guitar anatomy
 
SharePoint Authentication and Authorization presented by Liam Cleary
SharePoint Authentication and Authorization presented by Liam ClearySharePoint Authentication and Authorization presented by Liam Cleary
SharePoint Authentication and Authorization presented by Liam Cleary
 
Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION
 
SharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday RedmondSharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday Redmond
 
Kellogg’s – Give a Child a Breakfast
Kellogg’s – Give a Child a BreakfastKellogg’s – Give a Child a Breakfast
Kellogg’s – Give a Child a Breakfast
 

Similar a Is that a scientific report or just some cool pictures from the lab? Reproducibility and computational chemistry

Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookIsabella Feierberg
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
Systematic review article and Meta-analysis: Main steps for Successful writin...
Systematic review article and Meta-analysis: Main steps for Successful writin...Systematic review article and Meta-analysis: Main steps for Successful writin...
Systematic review article and Meta-analysis: Main steps for Successful writin...Pubrica
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...Frederik van den Broek
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsTimothy McPhillips
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)Dag Endresen
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GrahamSmith646206
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Ben Gardner
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 

Similar a Is that a scientific report or just some cool pictures from the lab? Reproducibility and computational chemistry (20)

Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
DataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlookDataFAIRy bioassays pilot -- lessons learned and future outlook
DataFAIRy bioassays pilot -- lessons learned and future outlook
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
Systematic review article and Meta-analysis: Main steps for Successful writin...
Systematic review article and Meta-analysis: Main steps for Successful writin...Systematic review article and Meta-analysis: Main steps for Successful writin...
Systematic review article and Meta-analysis: Main steps for Successful writin...
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research Objects
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)2021-01-27--biodiversity-informatics-gbif-(52slides)
2021-01-27--biodiversity-informatics-gbif-(52slides)
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 

Más de Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 

Más de Greg Landrum (16)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 

Último

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Último (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Is that a scientific report or just some cool pictures from the lab? Reproducibility and computational chemistry

  • 1. Is that a scientific report or just some cool pictures from the lab? Reproducibility and computational chemistry Gregory Landrum Ph.D. NIBR IT Novartis Institutes for BioMedical Research Basel 2013 CADD Gordon Conference, Mount Snow VT 23 July, 2013
  • 3. Publishing… Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct. Mathematics papers are expected to contain a proof complete enough to allow knowledgeable readers to fill in any details. Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension. Mesirov, J. P. Accessible Reproducible Research. Science 327, 415–416 (2010).
  • 4. Outline §  Reproducibility? §  Requirements for reproducibility of published research §  Practical aspects Landrum, G. A. & Stiefl, N. Is that a scientific publication or an advertisement? Reproducibility, source code and data in the computational chemistry literature. Future Medicinal Chemistry 4, 1885–1887 (2012).
  • 5. Reproducibility http://en.wikipedia.org/wiki/Reproducibility Reproducibility is the ability of an entire experiment or study to be reproduced, either by the researcher or by someone else working independently. It is one of the main principles of the scientific method.
  • 6. Reproducibility An author’s central obligation is to present an accurate and complete account of the research performed, absolutely avoiding deception, including the data collected or used, as well as an objective discussion of the significance of the research. Data are defined as information collected or used in generating research conclusions. The research report and the data collected should contain sufficient detail and reference to public sources of information to permit a trained professional to reproduce the experimental observations. ACS “Ethical Guidelines to Publication of Chemical Research”
  • 7. Reproducibility Experimental reproducibility is the coin of the scientific realm. The extent to which measurements or observations agree when performed by different individuals defines this important tenet of the scientific method. The formal essence of experimental reproducibility was born of the philosophy of logical positivism or logical empiricism, which purports to gain knowledge of the world through the use of formal logic linked to observation. A key principle of logical positivism is verificationism, which holds that every truth is verifiable by experience. In this rational context, truth is defined by reproducible experience, and unbiased scientific observation and determinism are its underpinnings. … The assumption that objectively true scientific observations must be reproducible is implicit, yet direct tests of reproducibility are rarely found in the published literature. This lack of published evidence of reproducibility stems from the limited appeal of studies reproducing earlier work to most funding bodies and to most editors. Furthermore, many readers of scientific journals— especially of higher-impact journals—assume that if a study is of sufficient quality to pass the scrutiny of rigorous reviewers, it must be true; this assumption is based on the inferred equivalence of reproducibility and truth described above. Loscalzo, J. Irreproducible Experimental Results: Causes, (Mis) interpretations, and Consequences. Circulation 125, 1211–1214 (2012).
  • 8. If it’s not reproducible science?
  • 9. “Let me show you some cool pictures from my lab…”
  • 10. Requirements for Reproducibility thanks to Martin Stahl for the picture
  • 11. A great start (1) Wherever possible, source code should be provided for new computational methods. The source code can be a reference implementation of a method or algorithm and does not need to include a graphical interface. If it is not possible to release the source code for a new method, authors should provide a sufficient justification. Reviewers and editors will then consider this explanation. Any paper that does not comply with the reproducibility guidelines will include this explanation when published. In cases where it is not possible to release code due to intellectual property or other limitations, an executable version of the new method should be readily accessible. Commercial products should provide time limited licenses to facilitate evaluation and comparison of published methods. (2) Any chemical structures and data mentioned in the paper should be made available in a commonly used (SDF or SMILES) format. Distribution of data in pdf format is not sufficient. (3) Any publications that employ commercial or open-source software should include scripts or parameter files as well as data files that will enable others to easily reproduce the work. (4) A clear easy to follow description of any new method should be a key criterion during the review process. Wherever possible, a paper should contain a simple worked example that demonstrates the application of the method. Parameter values and intermediate results for example compounds should be included as part of the supporting material. (5) Reviewers should put particular emphasis on the reproducibility of the method described in a manuscript. Each reviewer should evaluate the description of the method, as well as the presence of associated code, data, or executables, to ensure that the results can be independently reproduced. Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J. Chem. Inf. Model. (2013). doi:10.1021/ci400197w
  • 12. Requirements for Reproducibility § Data used § Code/algorithm description § Results
  • 13. Requirements for Reproducibility: Data As a condition of publication, authors must agree to make available all data necessary to understand and assess the conclusions of the manuscript to any reader of Science. Data must be included in the body of the paper or in the supplementary materials, where they can be viewed free of charge by all visitors to the site. Certain types of data must be deposited in an approved online database, including DNA and protein sequences, microarray data, crystal structures, and climate records. http://www.sciencemag.org/site/feature/contribinfo/faq/ index.xhtml#data_faq
  • 14. Requirements for Reproducibility: Data §  This is a no brainer, right? §  Unless it’s completely unprocessed (or the processing is part of the detailed method description/code), it’s better to include the actual data §  “Ligands from PDB structures X, Y, and Z” probably not good enough §  For sources like ChEMBL, a version number and SQL to grab the data are probably adequate
  • 15. Requirements for Reproducibility: Data Goodman, L., Lawrence, R. & Ashley, K. Data-set visibility: Cite links to data in reference lists. Nature 492:356–6 (2012). A huge amount of work goes into creating data sets. It is crucial that these data, big or small, should be more prominently linked to their associated research articles as standard practice. To achieve this, data can be cited directly in a publication's reference section using a permanent identifier such as a digital object identifier (DOI; see, for example, go.nature.com/vnyidi and go.nature.com/zdfbcl). So far, however, only very few journals do this. Publishers, funders, researchers and institutions all need to recognize that data sets constitute a valuable scholarly resource. Authors should be credited for these career-making contributions. Enhanced data-set visibility would also benefit referees and readers by raising standards of data analysis, promoting more detailed review, encouraging data curation and boosting reproducibility and data reuse.
  • 16. Requirements for Reproducibility: Data §  What about chemical structures? •  a table with drawings of molecules? •  names instead of structures? §  Why not include the structures in a machine-readable format? This expanded use of electronic resources offers an excellent opportunity to make chemical information more accessible and user-friendly to readers of scientific papers. To take advantage of these opportunities, we have developed several online features that expand the usefulness of chemical compound information for Nature Chemical Biology readers … In all original research papers, compounds that are relevant to the background or results of the paper are assigned a bolded, Arabic numeral that serves as a unique identifier for the compound. Each numerical abbreviation in the HTML and PDF versions of the article is linked to a Compound Data page, which shows the structure and the IUPAC or common name of the chemical compound. From there, readers can download a ChemDraw file of the compound…To provide readers with rapid access to all of the chemical compounds discussed in an article, we feature a Compound Data Index page, which is accessible from the Compound Data page, the table of contents entry for the paper, and the navigation tools on the right side of the Nature Chemical Biology website. http://www.nature.com/nchembio/journal/v3/n6/full/nchembio0607-297.htm
  • 17. Requirements for Reproducibility: Chemical Data From Nature Chemical Biology
  • 18. Requirements for Reproducibility: Chemical Data From Nature Chemistry Huigens, R. W., et al. A ring-distortion strategy to construct stereochemically complex and structurally diverse compounds from natural products. Nature Chemistry 5:195-202 (2013). doi:10.1038/nchem.1549
  • 19. It’s not always easy Data Sets. For this study we arbitrarily chose 18 Merck data sets shown in Table 1. These include a mix of on-target data sets and ADME data sets. Some data sets are so large (>100,000) that we randomly selected a smaller subset of compounds (50,000) to expedite the study. It is useful to use proprietary data sets for two reasons: 1.  We wanted data sets which are realistically large and have a realistic level of noise but are not as noisy as high- throughput data sets. 2.  Time-splitting requires dates of testing, and these are almost impossible to find in public domain data sets. Chen, B., Sheridan, R. P., Hornak, V. & Voigt, J. H. Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions. J. Chem. Inf. Model. 52, 792–803 (2012).
  • 20. Requirements for Reproducibility § Data used § Code/algorithm description § Results
  • 21. Requirements for Reproducibility: Code Stahl, M. & Bajorath, J. Computational Medicinal Chemistry. J. Med. Chem. 54, 1-2 (2011). Computational methods must be described in sufficient detail for the reader to reproduce the results.
  • 22. Requirements for Reproducibility: Code Ince, D. C., Hatton, L. & Graham-Cumming, J. The case for open computer programs. Nature 482, 485–488 (2012). We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.
  • 23. Requirements for Reproducibility: Code Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available to any reader of Science. After publication, all reasonable requests for data and materials must be fulfilled. Any restrictions on the availability of data, codes, or materials, including fees and original data obtained from other sources (Materials Transfer Agreements), must be disclosed to the editors upon submission. http://www.sciencemag.org/site/feature/contribinfo/prep/ gen_info.xhtml#dataavail
  • 24. Requirements for Reproducibility: Code An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript, including details of how readers can obtain materials and information. If materials are to be distributed by a for-profit company, this must be stated in the paper. http://www.nature.com/authors/policies/availability.html In the meantime, researchers must, when they are arranging the commercialization of their work, bear in mind the implications that these deals may have on their freedom to publish to the standards that the community is entitled to expect. http://www.nature.com/nature/journal/v442/ n7098/full/442001a.html
  • 25. Requirements for Reproducibility: Code §  “Black box” code sharing: installing the software on a publicly accessible server, or providing executables for people to test §  Does this help with reproducibility? §  Doesn’t demonstrate that the implementation corresponds to the algorithm description §  Not cut and dried.
  • 26. The Recomputation Manifesto From Ian Gent, University of St. Andrews 1.  Computational experiments should be recomputable for all time 2.  Recomputation of recomputable experiments should be very easy 3.  It should be easier to make experiments recomputable than not to 4.  Tools and repositories can help recomputation become standard 5.  The only way to ensure recomputability is to provide virtual machines 6.  Runtime performance is a secondary issue http://www.software.ac.uk/blog/2013-07-09-recomputation-manifesto http://arxiv.org/pdf/1304.3674v1.pdf
  • 27. Requirements for Reproducibility § Data used § Code/algorithm description § Results
  • 28. Requirements for Reproducibility: Results §  Including the actual results is even more of a no brainer, right?
  • 29. Requirements for Reproducibility: Results §  Including the actual results is even more of a no brainer, right? Homology Models of Human All-Trans Retinoic Acid Metabolizing Enzymes CYP26B1 and CYP26B1 Spliced Variant Homology models of CYP26B1 (cytochrome P450RAI2) and CYP26B1 spliced variant were derived using the crystal structure of cyanobacterial CYP120A1 as template for the model building. The quality of the homology models generated were carefully evaluated, and the natural substrate all-trans-retinoic acid (atRA), several tetralone-derived retinoic acid metabolizing blocking agents (RAMBAs), and a well-known potent inhibitor of CYP26B1 (R115866) were docked into the homology model of full-length cytochrome P450 26B1. The results show that in the model of the full-length CYP26B1, the protein is capable of distinguishing between the natural substrate (atRA), R115866, and the tetralone derivatives. The spliced variant of CYP26B1 model displays a reduced affinity for atRA compared to the full-length enzyme, in accordance with recently described experimental information. This paper, presenting two new homology models, does not include either model. Unfortunately I didn’t have to search long to find this example
  • 30. Requirements for Reproducibility: Results § This is the primary output of the research § Helps dampen some of the arguments about statistics § Need the unprocessed data § All of it
  • 31. How are we doing? §  Survey of recent publications: •  Everything in JCIM vol 52 #10 •  Everything in JCAMD vol 26 #10 •  Journal of Cheminformatics from July 2012-Nov 4 2012 §  Big differences between journals §  Plenty of room for improvement §  Analysis is presence/absence of full results Journal   Type  of  paper   Count   Full  Data   Par3al  Data   Missing  Data   Code?   JCIM   Method   13   6   3   4   1   JCIM   Non-­‐method   16   10   3   3   0   JCAMD   Method   3   3   0   0   0   JCAMD   Non-­‐method   4   0   3   1   0   JChemInf   Method   12   7   3   3   8   JChemInf   Non-­‐method   3   0   0   0   0  
  • 32. Practical considerations §  Where to put the data and code? •  Supplementary material •  Code-sharing sites (sourceforge.net, google code, github) •  Data sharing: Zenodo/Labarchives.com •  A hybrid: Figshare §  Considerations: •  It needs to still be there 5+ years from now •  Having a solid connection to the original paper is good •  Others have to actually be able to do something with it
  • 33. Practical considerations §  Where to put the data and code? •  Supplementary material •  Code-sharing sites (sourceforge.net, google code, github) •  Data sharing: Zenodo/Labarchives.com •  A hybrid: Figshare §  Considerations: •  It needs to still be there 5+ years from now •  Having a solid connection to the original paper is good •  Others have to actually be able to do something with it
  • 34. Some stuff to look at §  vagrant (virtual box configuration and provisioning): http://www.vagrantup.com/ §  openshift (cloud-based application deployment): https://www.openshift.com/ §  wakari (ipython in the cloud): https://wakari.io/
  • 35. Tools for reproducible research Knime §  Open-source workflow tool §  Strong data manipulation and mining capabilities §  Data and results can be stored with the workflow.
  • 36. Tools for reproducible research IPython notebook §  Python session running in a browser •  Tab completion •  Access to docstrings §  Text formatting options available for including discussion or capturing mathematics (access to LaTeX for formatting math) §  Captures all data transformations and displays output §  Tight integration with matplotlib
  • 37. Tools for reproducible research IPython notebook
  • 38. Tools for reproducible research IPython notebook
  • 39. Here’s a cool picture from my lab. … and here’s how you can make it too.
  • 40. A great start (1) Wherever possible, source code should be provided for new computational methods. The source code can be a reference implementation of a method or algorithm and does not need to include a graphical interface. If it is not possible to release the source code for a new method, authors should provide a sufficient justification. Reviewers and editors will then consider this explanation. Any paper that does not comply with the reproducibility guidelines will include this explanation when published. In cases where it is not possible to release code due to intellectual property or other limitations, an executable version of the new method should be readily accessible. Commercial products should provide time limited licenses to facilitate evaluation and comparison of published methods. (2) Any chemical structures and data mentioned in the paper should be made available in a commonly used (SDF or SMILES) format. Distribution of data in pdf format is not sufficient. (3) Any publications that employ commercial or open-source software should include scripts or parameter files as well as data files that will enable others to easily reproduce the work. (4) A clear easy to follow description of any new method should be a key criterion during the review process. Wherever possible, a paper should contain a simple worked example that demonstrates the application of the method. Parameter values and intermediate results for example compounds should be included as part of the supporting material. (5) Reviewers should put particular emphasis on the reproducibility of the method described in a manuscript. Each reviewer should evaluate the description of the method, as well as the presence of associated code, data, or executables, to ensure that the results can be independently reproduced. Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J. Chem. Inf. Model. (2013). doi:10.1021/ci400197w
  • 41. Requirements for Reproducibility § Data used § Code/algorithm description § Results
  • 42. Pat’s not completely off the hook Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J. Chem. Inf. Model. (2013). doi:10.1021/ci400197w
  • 43. Pat’s not completely off the hook Walters, W. P. Modeling, Informatics, and the Quest for Reproducibility. J. Chem. Inf. Model. (2013). doi:10.1021/ci400197w No data No code No algorithm description Results only as a figure
  • 44. Acknowledgements §  NIBR: •  Nik Stiefl (GDC/CADD) •  Nikolas Fechner (NIBR IT/IS Sigma) •  Sereina Riniker (NIBR IT/IS Sigma) §  Matthias Rarey §  Pat Walters
  • 45. Perhaps the biggest barrier to reproducible research is the lack of a deeply ingrained culture that simply requires reproducibility for all scientific claims. Peng, R. D. Reproducible Research in Computational Science. Science 334, 1226–1227 (2011).