SlideShare una empresa de Scribd logo
1 de 49
Chemical Information
  Retrieval 2012

            First Class
 CHEM367/767 Drexel University

       Jean-Claude Bradley
     Associate Professor of Chemistry
            Drexel University

          September 28, 2012
Finding reliable
chemical information
 can be really hard
After this class,
 you should feel that
   you can never
    blindly trust
chemical data sources
         again
But…
You will learn how to do
  the best you can with
 imperfect information
The Chemical Information Validation Sheet

       567 curated and referenced measurements from
       Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points
          (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of
             cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer




                        (Andrew Lang)
Outliers
MDPI            EPI (donated all
dataset        data to public also)
Outliers for ethanol: Alfa Aesar and Oxford
                   MSDS
Inconsistencies and SMILES problems within
               MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point Datasets
Currently 20,000 compounds with Open MPs
What is the melting point of 4-benzyltoluene?



  American Petroleum Institute5 C
  PHYSPROP                                -30 C
  PHYSPROP                                125
  C
  peer reviewed journal (2008) 97.5 C
  government database               -30 C
  government database               4.58 C
The quest to resolve the melting point
of 4-benzyltoluene: liquid at room temp
       and can be frozen <-30C
Open Lab Notebook page measuring the
   melting point of 4-benzyltoluene
Motivation: Faster Science, Better
             Science
Ruling out all melting points above -15C?
Oops – 4-benzyltoluene freezes after 16 days at -15C!
Measuring the melting point by slowly heating
            from -15 C gives 5 C
There are NO FACTS,
  only measurements embedded
        within assumptions

Open Notebook Science maintains
the integrity of data provenance by
   making assumptions explicit
“Simple” aldol condensation synthesis



                                Top Hit
                                (no reports
                                of synthesis)



                               In top ten
                               (a few reports
                               of synthesis)


                            (Andrew Lang)
Information from the literature on the target synthesis
Information from the literature on the target synthesis
An example of a “failed experiment” in an
 Open Notebook with useful information
A successful synthesis by avoiding water, dramatically
      increasing NaOH and long reaction time
Open Random Forest modeling of Open Melting Point
           data using CDK descriptors
                 (Andrew Lang)
   R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Web services for summary data




                      (Andrew Lang)
Using a Google Spreadsheet as a “dashboard interface”
          for reaction planning and analysis
Calling Google App Scripts
Calling Google App Scripts




                   (Andrew Lang and Rich
                         Apodaca)
Google Apps Scripts for conveniently
   exploring melting point data
Comparison of model with triple validated measurements
           Straight chain carboxylic acids from 1 to 10 carbons




              Straight chain alcohols from 1 to 10 carbons
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for
               validation – only single source available)
Open Melting Points in Supplementary Data Pages
          of Wikipedia (Martin Walker)
Google Apps Scripts web services
Integration of Multiple Web Services to
  Recommend Solvents for Reactions




                             (Andrew Lang)
What are good solvents to recrystallize benzoic acid?




                                      (Andrew Lang)
Click on the solvent to see temp curve




                             (Andrew Lang)
Deliver melting point data via App




                           (Andrew Lang)
Web services from data collected in this
      class will be added here
In this class you will learn


How to search Science1.0 resources

     •Peer-Reviewed journals
     •Commercial databases
     •Patents
     •Conference Proceedings
In this class you will learn


  How to participate in Science2.0

•wikis (Wikipedia, class wiki)
•blogs
•interactive databases (ChemSpider)
•social software (Twitter, FriendFeed)
In this class you will learn
   How to leverage Science3.0
 •machine readable web-services




(via collaboration with Andrew Lang)
Now lets take a look
  at the class wiki

Más contenido relacionado

La actualidad más candente

A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...Jean-Claude Bradley
 
Bradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekBradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekJean-Claude Bradley
 
White House Open Notebook Science Poster
White House Open Notebook Science PosterWhite House Open Notebook Science Poster
White House Open Notebook Science PosterJean-Claude Bradley
 
Test america
Test americaTest america
Test americafueltec
 
ICIC 2017: New Product Introduction info apps
ICIC 2017: New Product Introduction info appsICIC 2017: New Product Introduction info apps
ICIC 2017: New Product Introduction info appsDr. Haxel Consult
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
IGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkIGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkJean-Claude Bradley
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...Jean-Claude Bradley
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkEamonn Maguire
 

La actualidad más candente (14)

A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
A brief description of the Chemical Rediscovery Survey and Open Chemistry in ...
 
Bradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekBradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA week
 
White House Open Notebook Science Poster
White House Open Notebook Science PosterWhite House Open Notebook Science Poster
White House Open Notebook Science Poster
 
Test america
Test americaTest america
Test america
 
ICIC 2017: New Product Introduction info apps
ICIC 2017: New Product Introduction info appsICIC 2017: New Product Introduction info apps
ICIC 2017: New Product Introduction info apps
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
IGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkIGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science Talk
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Sc...
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 

Similar a Chemical Information Retrieval Class 1

Crowdsourcing Solubility using Open Notebook Science
Crowdsourcing Solubility using Open Notebook ScienceCrowdsourcing Solubility using Open Notebook Science
Crowdsourcing Solubility using Open Notebook ScienceJean-Claude Bradley
 
Bradley Open Notebook Science ACSfall2012
Bradley Open Notebook Science ACSfall2012Bradley Open Notebook Science ACSfall2012
Bradley Open Notebook Science ACSfall2012Jean-Claude Bradley
 
Open Notebook Science HUBzero 2011
Open Notebook Science HUBzero 2011Open Notebook Science HUBzero 2011
Open Notebook Science HUBzero 2011Jean-Claude Bradley
 
NITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkNITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkJean-Claude Bradley
 
Bradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point CollectionsBradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point CollectionsJean-Claude Bradley
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkJean-Claude Bradley
 
Open Notebooks Science
Open Notebooks ScienceOpen Notebooks Science
Open Notebooks ScienceAndrew Lang
 
BrightTALK Open Notebook Science
BrightTALK Open Notebook ScienceBrightTALK Open Notebook Science
BrightTALK Open Notebook ScienceJean-Claude Bradley
 
IJCAI09 Open Notebook Science talk
IJCAI09 Open Notebook Science talkIJCAI09 Open Notebook Science talk
IJCAI09 Open Notebook Science talkJean-Claude Bradley
 
modeling melting points
modeling melting pointsmodeling melting points
modeling melting pointsAndrew Lang
 
NIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkNIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkJean-Claude Bradley
 

Similar a Chemical Information Retrieval Class 1 (20)

NBCC Open Notebook Science Talk
NBCC Open Notebook Science TalkNBCC Open Notebook Science Talk
NBCC Open Notebook Science Talk
 
Crowdsourcing Solubility using Open Notebook Science
Crowdsourcing Solubility using Open Notebook ScienceCrowdsourcing Solubility using Open Notebook Science
Crowdsourcing Solubility using Open Notebook Science
 
UPennONS
UPennONSUPennONS
UPennONS
 
OpenSciNY Open Notebook Science
OpenSciNY Open Notebook ScienceOpenSciNY Open Notebook Science
OpenSciNY Open Notebook Science
 
Bradley Open Notebook Science ACSfall2012
Bradley Open Notebook Science ACSfall2012Bradley Open Notebook Science ACSfall2012
Bradley Open Notebook Science ACSfall2012
 
Open Notebook Science HUBzero 2011
Open Notebook Science HUBzero 2011Open Notebook Science HUBzero 2011
Open Notebook Science HUBzero 2011
 
NITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkNITLE Open Notebook Science Talk
NITLE Open Notebook Science Talk
 
Bradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point CollectionsBradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point Collections
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science Talk
 
ACRL Trust in Science Talk
ACRL Trust in Science TalkACRL Trust in Science Talk
ACRL Trust in Science Talk
 
Open Notebooks Science
Open Notebooks ScienceOpen Notebooks Science
Open Notebooks Science
 
BrightTALK Open Notebook Science
BrightTALK Open Notebook ScienceBrightTALK Open Notebook Science
BrightTALK Open Notebook Science
 
NASA Open Notebook Science Talk
NASA Open Notebook Science TalkNASA Open Notebook Science Talk
NASA Open Notebook Science Talk
 
ACRL Open Notebook Science talk
ACRL Open Notebook Science talkACRL Open Notebook Science talk
ACRL Open Notebook Science talk
 
Science Online 2011 ONS session
Science Online 2011 ONS sessionScience Online 2011 ONS session
Science Online 2011 ONS session
 
Bradley Drexel MiniSymp 2010
Bradley Drexel MiniSymp 2010Bradley Drexel MiniSymp 2010
Bradley Drexel MiniSymp 2010
 
IJCAI09 Open Notebook Science talk
IJCAI09 Open Notebook Science talkIJCAI09 Open Notebook Science talk
IJCAI09 Open Notebook Science talk
 
modeling melting points
modeling melting pointsmodeling melting points
modeling melting points
 
NIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkNIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science Talk
 

Más de Jean-Claude Bradley

Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley
Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley
Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley Jean-Claude Bradley
 
Souder Trust in Science SLA 2011
Souder Trust in Science SLA 2011Souder Trust in Science SLA 2011
Souder Trust in Science SLA 2011Jean-Claude Bradley
 
La Science par Cahier de Laboratoire Ouvert
La Science par Cahier de Laboratoire OuvertLa Science par Cahier de Laboratoire Ouvert
La Science par Cahier de Laboratoire OuvertJean-Claude Bradley
 
The use of non-aqueous solubility to control reaction outcomes
The use of non-aqueous solubility to control reaction outcomesThe use of non-aqueous solubility to control reaction outcomes
The use of non-aqueous solubility to control reaction outcomesJean-Claude Bradley
 
Mirza PhD defense on the Ugi reaction for anti-malarial screening
Mirza PhD defense on the Ugi reaction for anti-malarial screeningMirza PhD defense on the Ugi reaction for anti-malarial screening
Mirza PhD defense on the Ugi reaction for anti-malarial screeningJean-Claude Bradley
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Jean-Claude Bradley
 
Smartphones wikis and games for education
Smartphones wikis and games for educationSmartphones wikis and games for education
Smartphones wikis and games for educationJean-Claude Bradley
 
Nanoinformatics 2010 SMIRP-ONS Talk
Nanoinformatics 2010 SMIRP-ONS TalkNanoinformatics 2010 SMIRP-ONS Talk
Nanoinformatics 2010 SMIRP-ONS TalkJean-Claude Bradley
 

Más de Jean-Claude Bradley (12)

Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley
Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley
Nuit de la Liberté - Science Ouverte avec Jean-Claude Bradley
 
ChemInfo Retrieval Exam 2010
ChemInfo Retrieval Exam 2010ChemInfo Retrieval Exam 2010
ChemInfo Retrieval Exam 2010
 
Hagen NTIS SLA 2011
Hagen NTIS SLA 2011Hagen NTIS SLA 2011
Hagen NTIS SLA 2011
 
Souder Trust in Science SLA 2011
Souder Trust in Science SLA 2011Souder Trust in Science SLA 2011
Souder Trust in Science SLA 2011
 
La Science par Cahier de Laboratoire Ouvert
La Science par Cahier de Laboratoire OuvertLa Science par Cahier de Laboratoire Ouvert
La Science par Cahier de Laboratoire Ouvert
 
The use of non-aqueous solubility to control reaction outcomes
The use of non-aqueous solubility to control reaction outcomesThe use of non-aqueous solubility to control reaction outcomes
The use of non-aqueous solubility to control reaction outcomes
 
Cyclic alkynes
Cyclic alkynesCyclic alkynes
Cyclic alkynes
 
Philadelphia U Sciences 2011
Philadelphia U Sciences 2011Philadelphia U Sciences 2011
Philadelphia U Sciences 2011
 
Mirza PhD defense on the Ugi reaction for anti-malarial screening
Mirza PhD defense on the Ugi reaction for anti-malarial screeningMirza PhD defense on the Ugi reaction for anti-malarial screening
Mirza PhD defense on the Ugi reaction for anti-malarial screening
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
Smartphones wikis and games for education
Smartphones wikis and games for educationSmartphones wikis and games for education
Smartphones wikis and games for education
 
Nanoinformatics 2010 SMIRP-ONS Talk
Nanoinformatics 2010 SMIRP-ONS TalkNanoinformatics 2010 SMIRP-ONS Talk
Nanoinformatics 2010 SMIRP-ONS Talk
 

Chemical Information Retrieval Class 1