During the past decade cheminformatics tools at the USEPA have resulted in a data and chemistry software infrastructure based on the underlying DSSTox database and a series of publicly available web-based applications. The primary application delivering these data to the community is the CompTox Chemicals Dashboard, ( https://comptox.epa.gov/dashboard/) which provides access to integrated data sources for chemicals, hazard data, bioactivity screening data and experimental and predicted property predictions. An additional number of proof-of-concept applications have been developed which test novel ways to integrate and deliver data to address novel use cases for specific contexts. One example of a specific use case is the delivery of integrated hazard and safety data for chemical substances included in the Clean Water Act (CWA) regulation being released in 2024 for facilities storing CWA hazardous substances which includes approximately 300 substances. Many of the hazardous substances are stored in aboveground storage tanks and are potentially under threat as climate change can present a threat due to flooding potential on coastal and inland waterways, as a result of a rise in sea level and potential risk of subsidence. This presentation will give an overview of the data and capabilities associated with the hazard and safety cheminformatics modules which will be integrated into a tool to assist emergency planners and responders, water utilities and hazardous substance facility owner/operators better prepare and respond to potential discharges. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Cheminformatics tools supporting dissemination of data associated with US EPA Clean Water Act hazardous substances
1. The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Cheminformatics tools supporting
dissemination of data associated with US EPA
Clean Water Act hazardous substances
Antony Williams1, Fran Kremer2, Jason Lambert1, Jace Cuje3 and Valery Tkachenko4
1. Center for Computational Toxicology and Exposure, US-EPA
2. Center for Environmental Solutions and Emergency Response, US-EPA
3. Office of Science Advisor, Policy and Engagement. US-EPA
4. ScienceDataExperts Inc.
March 2024: Spring Fall Meeting, New Orleans, LA
2. Data, Model and Tool Development
• I work for the Center for Computational Toxicology and Exposure
in the Computational Chemistry and Cheminformatics Branch
• There are many tools developed by our cheminformatics team and
across other centers in EPA. I will represent ours only…
• We have production level public-facing tools, proof-of-concept
public-facing tools, and many tools in development…
• We focus on FAIR data releasing it to the community and making
it available on Public APIs
1
3. Free-Access Cheminformatics Tools
• The Center for Computational Toxicology and Exposure has
delivered many tools including
– CompTox Chemicals Dashboard
– Proof-of-Concept cheminformatics modules
• Chemicals Hazard Profiling
• Chemical Transformations Database
• Analytical Methods and Spectra
• Chemical Safety Profiling
2
7. Curating Chemistry into the DSSTox Database
6
• Chemistry underpins all of our tools
• Data assembly and curation is critical
• DSSTox assembled over 25 years
9. The Charge for the Dashboard
• Develop a “first-stop-shop” for environmental chemical data to
support EPA and partner decision making:
– Centralized location for relevant chemical data
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment
11. Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data can be downloaded as Excel,
TSV and CSV files
16. Chemical Lists
• Chemical lists are focused on regulations, specific research
efforts and categories
• 425 lists and growing
– TSCA Inventory
– Clean Water Act Hazardous Substances
– Consumer Products database
– Chemicals of Emerging Concern
– PFAS lists
– Extractables and Leachables
– …lists are versioned and updated and new lists added
15
21. Harvesting Data en masse
• Harvesting data for CWAHS related chemicals
–Physicochemical properties
–Fate and transport
–Toxicity values
–Exposure data
–Chemical identifiers
–Links to regulatory assessments
27. We supply predicted data for many endpoints
• Property prediction – e.g., water solubility, vapor pressure
• Fate and Transport – e.g., bioaccumulation, bioconcentration
• Bioactivity – e.g., endocrine disruption
• Models are constantly updated with fresh data, are transparent
in their data, and are open source
26
28. QSAR Modeled Data are available
• We build models then apply then to our curated datasets
for release, PLUS deliver the models for realtime use
27
29. Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
28
30. Where do we use predictions like this?
• Models are used in many places in our computational
toxicology research
• They are used in the analytical labs to help guide non-
targeted analysis
• By stakeholders for Hazard
profiling of chemicals
• Predictions for breakdown
products in the environment
29
31. Lots of “proof-of-concept” tools in development
• PoCs are research software builds to prove approaches
before moving into production software environments
• PoCs are to figure out how to address specific questions
• Assemble data, develop data model(s), test user interface
approaches, work with test user base to garner feedback
• Since PoCs are internal access data refreshes and application
updates can be more
• Underlying APIs are being used in our research
30
43. Perfect Example of FAIR Data and APIs
• We owe a lot to FAIR data and availability of information
• We curate a lot of our chemistry data using public resources
such as PubChem, ChEBI, Common Chemistry and others
• The availability of Public APIs takes things to another level!
• We have been using the PubChem API to harvest data so
we can build new applications, like the Safety Module
42
46. You want to know more…
• Lots of resources available
– Presentations: https://tinyurl.com/w5hqs55
– Communities of Practice Videos: https://rb.gy/qsbno1
– Manual: https://rb.gy/4fgydc
– Latest News: https://comptox.epa.gov/dashboard/news_info
45
47. This talk is an overview
• This talk is a high-level overview only. We
can provide trainings into the individual
modules and data as required
• LOTS of training materials are available
https://www.epa.gov/chemical-research/new-approach-methods-nams-training
48. Conclusions
• Underpinning chemistry data is from the DSSTox database
• CompTox Chemicals Dashboard is public access to DSSTox
and other related databases
• Proof-of-Concept (PoC) tools are built to prove approaches
• Everything is increasingly API driven and APIs are now public
47
49. Contact Information
• Contact info: williams.antony@epa.gov
• Slides available at: https://www.slideshare.net/AntonyWilliams/
• Obtain articles from Google Scholar Profile
48