3. Chemistry on the Internet
Where do you source chemistry information?
What can you trust online?
How can you recognize potential issues?
Cross-referencing and curating data
6. Molfiles
Molfiles are the primary exchange format between
structure drawing packages
Can be different between different drawing packages
Most commonly carry X,Y coordinates for layout
Can support polymers, organometallics, etc.
Can carry 3D coordinates
7. SMILES (http://en.wikipedia.org/wiki/SMILES)
SMILES is a common format
Can support polymers,
organometallics, etc.
Does NOT carry X,Y or Z
coordinates for layout so
requires layout algorithms –
can be problematic!
Generally different between
drawing packages
12. InChI
SINGLE code base managed by IUPAC –
integrated into drawing packages. No variability
as with SMILES
InChI Strings can be reversed to structures –
same problem as with SMILES – no layout
Well adopted by the community (databases,
publishers, blogs, Wikipedia) – good for searching
the internet
26. InChI
No support for polymers, organometallics
Many option settings can lead to variability and
make integration across databases difficult –
FixedH option especially problematic
“Slight” chance of collisions of InChIKeys
VERY USEFUL FOR INTEGRATING THE WEB
31. Where is chemistry online?
Encyclopedic articles (Wikipedia)
Chemical vendor databases
Metabolic pathway databases
Property databases
Patents with chemical structures
Drug Discovery data
Scientific publications
Compound aggregators
Blogs/Wikis and Open Notebook Science
33. How do we build it?
We deal in Molfiles or SDF files – with coordinates
Valence checking, charge imbalance
We have our own “business logic” to standardize
InChI to “aggregate tautomers” to one record
We link out to external sites using their IDs
34. Searches: The INTERNET
All ChemSpider and Internet searches are “simply algorithms”
but synonym searching is based on an assertion
36. Validating structures
Check for “full stereo” and use stereo descriptors
especially for checking!
Check for quality of associated data sources
Check against reference literature when available
– but it can be wrong
Question EVERYTHING!
38. Contributing to The Quality of Data
What is the Structure of Vitamin K?
A lipid cofactor that is required for normal blood
clotting. Several forms of vitamin K have been
identified: VITAMIN K1 (phytomenadione)
derived from plants, VITAMIN K2
(menaquinone) from bacteria & synthetic
naphthoquinone provitamins, VITAMIN K3
(menadione).