In recent years, the growth of scientific data and the increasing need for data sharing and collaboration in the field of environmental chemistry has led to the creation of various software and databases that facilitate research and development into the safety and toxicity of chemicals. The US-EPA Center for Computational Toxicology and Exposure has been developing software and databases that serve the chemistry community for many years. This presentation will focus on several web-based software applications which have been developed at the US-EPA and made available to the community. While the primary software application from the Center is the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard), almost a dozen proof-of-concept applications have been built serving various capabilities. The publicly accessible Cheminformatics Modules (https://www.epa.gov/chemical-research/cheminformatics) provides access to six individual modules to allow for hazard comparison for sets of chemicals, structure-substructure-similarity searching, structure alerts and batch QSAR prediction of both physicochemical and toxicity endpoints. A number of other applications in development include a chemical transformations database and a database of analytical methods and open mass spectral data. Each of these depends on the underlying DSSTox chemicals database, a rich source of chemistry data for over 1.2 million chemical substances. To further extend the accessibility and usability of this vast repository, we have developed RESTful Public APIs hosted on the secure cloud.gov environment, enabling seamless integration of this rich data into computational biology pipelines and web visualizations. We will provide an overview of all tools in development, the integrated nature of the applications based on the underlying chemistry data, and the API which is now publicly available. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Chemistry data delivery from the US-EPA to support environmental chemistry
1. The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Chemistry data delivery from the US-EPA
to support environmental chemistry
Antony Williams
(…but I represent many contributors!)
Government Cheminformatics Conference, October 2023
2. This talk is to open discussions…
• There are many tools developed by our cheminformatics team
and across other centers in EPA. I will represent ours only…
• We have production level public-facing tools, proof-of-concept
public-facing tools, and many tools in development…
• From proof-of-concept to public-facing can take a while
• This talk (and others from our team) is to make you aware of
our efforts and encourage discussions
1
3. Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure has
delivered many tools
– CompTox Chemicals Dashboard
– Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• AnalyticalQC data
• Chemical Transformations Database
• Structure standardizer
• Chemical Safety Profiling
• All chemicals are stored/curated in DSSTox
2
5. Accessing DSSTox chemistry: CompTox Chemicals Dashboard
• A publicly accessible website delivering:
– 1.2M chemicals with related property data
– Related substances: transformation products, mono/polymer
– Experimental/predicted physicochemical property data
– Experimental Human and Ecological hazard data
– Integration to “biological assay data” (ToxCast/Tox21)
– Information regarding chemicals in consumer products
– Links to other agency websites and public data resources
– “Batch searching” for tens to thousands of chemicals
4
7. 1 of ~1.2M Chemical Pages
Experimental/Predicted Properties
6
8. Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches
before moving into production software environments
• PoCs are to figure out how to address specific questions
• Assemble data, develop data model(s), test user interface
approaches, work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application
updates can be more
• Underlying APIs are being used in our research
7
9. PoCs have been rebuilt for production
• Examples of PoCs integrated into production apps
– WebTEST predictions on the Dashboard
– Structure/substructure/similarity search
8
20. WebTEST Batch Prediction
• Batch prediction of all WebTEST predictions
• Display of experimental and predicted data and reports
19
21. QSAR-Ready/MS-Ready Standardizer
• “QSAR and MS-Ready” standardization underpins models and linking
• MS-Ready is ESSENTIAL to our support of Non-Targeted Analysis
• QSAR-Ready rules need tweaking
20
https://jcheminf.biomedcentral.com/articles/10.1186/s1332
1-018-0299-2
26. Excel report for models for each data set
• Cover sheet with model metadata
• Training and test set statistics
25
• Training and test set statistics
• Prediction results for each method
35. ChET Visual Reaction Maps
• Compare and overlap maps
• Load all maps containing a
particular chemical
• Prune and filter maps
34
36. Chemical Space Mapping (CheMSTER)
Chemical Mapping of Space Translated into Enhanced
Representations
35
• Initially built to support
NTA research
• Functionality to overlap
and compare datasets
• Selection of chemicals
based on variables
(predicted properties)
• Plug-in growing model set
to add variables for
comparison
37. Perfect Example of FAIR Data and APIs
• We owe a lot to FAIR data and availability of information
• We curate a lot of our chemistry data using public resources
such as PubChem, ChEBI, Common Chemistry and others
• The availability of Public APIs takes things to another level!
• We have been using the PubChem API to harvest data so
we can build new applications, like the Safety Module
36
39. The CompTox API is now public
https://api-ccte.epa.gov/docs/index.html
38
40. Conclusions
• Underpinning chemistry data is from the DSSTox database
• CompTox Chemicals Dashboard is public access to DSSTox
and other related databases
• Proof-of-Concept (PoC) tools are built to prove approaches
• Effort is both cost and time efficient
• Everything is increasingly API driven and APIs are now public
39
41. Acknowledgments
• Our DSSTox curation team
• AMOS – Greg Janesch and Tyler Carr
• AnalyticalQC Viewer – Christian Ramsland
• Cheminformatics Modules – Nate Charest, Charlie Lowe,
Todd Martin
• ChET – Adam Edelman-Munoz, Caroline Stevens and team
• ChemSTER – Nate Charest and Adam Edelman-Munoz
• Our SCDCD colleagues and DevOps team
40