The document discusses an app that recommends solvents for recrystallizing compounds by using open data sources and algorithms. It summarizes how the app works by looking up solvent properties like boiling point and solubility to predict recrystallization yield. The discussion emphasizes the importance of open data, models and software in chemistry. It provides examples of using open data sources and models to predict properties, validate data, and enable new applications. The conclusions advocate for more openness in chemistry to make science more efficient.
Deploying an App from Open Data Feeds and Algorithms to Recommend Recrystallization Solvents
1. The deployment of an app from Open Data feeds
and algorithms: Recommending recrystallization
solvents
ACS-CINF Symposium
Jean-Claude Bradley
Associate Professor of Chemistry
Drexel University
December 13, 2012
2. The importance of recrystallization
• Generally preferred if there is a known
solvent that gives a good yield
• Scales much more easily and cheaply than
chromatography
• However, for new compounds much trial and
error may be needed
7. How does it work?
1. Look up the solvent boiling point
2. Look up the room temperature solubility or predict it via
Abraham descriptors predicted from a model using the
CDK
3. Look up the solute melting point or predict it via a
model using the CDK
4. Use the melting point and the solubility at room
temperature to predict the solubility at boiling
5. Calculate the predicted recrystallization yield
8. Openness in Chemistry
The Recrystallization App produces and uses
Open Data:
• Open Solubility Collection and Models
• Open Melting Point Collection and Models
• Modeling depends mainly on CDK (Open
Source Software with Open Descriptors)
• Open Notebook Science
WHY?
9. Open Data Collections are essential for this
strategy
Open transparent
Data transformation Open
Data
Open
Data
Transparent chain of provenance
11. What is the melting point of 4-benzyltoluene?
American Petroleum Institute 5C
PHYSPROP -30 C
PHYSPROP 125 C
peer reviewed journal (2008) 97.5 C
government database -30 C
government database 4.58 C
18. There are NO FACTS,
only measurements embedded
within assumptions
Open Notebook Science maintains
the integrity of data provenance by
making assumptions explicit
19. Open Random Forest modeling of Open Melting Point
data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
28. Comparison of model with triple validated measurements
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
29. Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for
validation – only single source available)
30. Open Melting Points in Supplementary Data Pages
of Wikipedia (Martin Walker)
35. Searching for aldol condensations of acetone
in the Reaction Attempts database (about
90% of reactions in Open Notebooks are “not
successful”)
(Andrew Lang)
36. An example of a “failed experiment” in an
Open Notebook with useful information
50. Open Chemical Property Matrix (OCPM)
Boiling point Vapor
pressure
Flash point
Abraham Melting point
descriptors
logP
Aqueous Octanol
solubility solubility
55. Conclusions
More openness in chemistry can make science more efficient
Provide interfaces that make sense to the end users:
Open Data, Open Models and Open Source Software to modelers
Apps (smartphones, Google App Scripts, etc.) for chemists at the bench
Acknowledgements
Andrew Lang (code, modeling)
Bill Acree (modeling, solubility data contribution)
Antony Williams (ChemSpider services, mp data curation)
Matthew McBride and Rida Atif (recrystallization and synthesis)
Kayla Gogarty (OCPM)