Tox24 Challenge

Goal: Tox24 Challenge [1] is designed to assess the progress in computational methods for predicting in vitro activity of compounds. The development of these methods has steadily gained momentum in the computational chemistry field since the groundbreaking Tox21 Challenge[2]. Since this time, significant advances have emerged in the field including new data[3], harnessing novel modelling strategies[4] and diverse applications[5], which encompasses an important topic addressed by the Chemical Research in Toxicology (ChemResTox) community of contributors and readers. As was the case in the previous challenge, the goal of Tox24 is to "crowdsource" methods and approaches contributed by independent researchers in order to reveal how well they can predict compounds' interference in biochemical pathways using only chemical structure data.
Data: The chemicals and compounds being tested for activity against Transthyretin (TTR)[6] within the “Toxicology in the 21st Century” (Tox21) initiative by the EPA[3] will be used as the training and test sets for the Tox24 challenge. The initial dataset has been split into a training set (1012 compounds), a leaderboard set (200 compounds) and a blind set (300 compounds). The structural information (SMILES) for data was originally retrieved from the U.S. EPA’s CompTox Chemicals Dashboard[7] v. 2.4 (https://comptox.epa.gov/dashboard) for the ToxCast libraries (ph1_v2, ph2, and e1k). However, several compounds in the original set did not have SMILES in ToxCast, prompting us to retrieve missing structural information from PubChem based on CAS RN and names. Since the latter step could be error-prone, the challenge participants can search for more accurate structural representations of compounds to increase the accuracy of models, if required.
Teams: Each participant can join only one team. One participant of each team can register for an account (or use an existing account) with his/her full first and last name on the OCHEM website. This account will be used to submit predictions for test sets. This participant should provide a list of team members with their full names, emails and affiliations before September 1st. If no such list is provided, it will be assumed that the team consists of one participant. Alternatively, this information (along with predictions enclosed as a csv file) can be sent by e-mail to challenge@icann2024.org.
Winning solution: The winning model will be identified as the model providing the lowest RMSE between the predicted and calculated single point concentrations for the blind set compounds. The final prediction submitted by each team will be used to score the blind test results. In case of equal results (RMSE will be rounded to three digits, e.g. 21.7%) the first eligible final result to be submitted will be considered the winning contribution. Alternatively, predictions for the blind set can be sent by e-mail to challenge@icann2024.org together with a list of team members and their affiliations. The results submitted by e-mail will be only evaluated once submission using the web site is closed (August 31). In case multiple models give equal performances, the date of e-mail/submission will be used to determine the winning model.
Validation of the winning model: To be eligible, the winning solution should be released as open source and/or be allowed to be independently reproduced by the Challenge Organisers following instructions of the team members. If this will require additional licences, they should be provided for free solely for model testing. Statistical variations with p > 0.05 due to, e.g., neural network weights initialisation, as determined by bootstrap evaluation,[8] will not disqualify the model even if reproduced accuracy differs slightly. The models that failed to be reproduced will be disqualified. The use of any related data to increase model accuracy (e.g., within multi-task modelling, pretraining, etc.) except for data measured for exactly the same assay, is allowed. These data should be made publicly available during model validation.
Prize: The winning team will be announced during the ICANN2024 and will be awarded a prize of 1,000€ to be sponsored by the AIDD project. A winning team member will be invited to give an in person lecture during the "AI in drug discovery" workshop, if the participant will be attending the ICANN2024, or by Zoom. The winning team, as well as other teams whose models gave an RMSE not significantly different from that of the winning solution, will be invited to publish their studies or protocols in ChemResTox. Thus, by participating in this challenge, each team agrees to contribute such an article to ChemResTox if they develop a winning model.
Organisers: The challenge is co-organised by Marie Sklodowska-Curie Innovative Training Network European Industrial Doctorate grant agreement No. 956832 “Advanced machine learning for Innovative Drug Discovery” (AIDD), Horizon Europe Marie Skłodowska-Curie Actions Doctoral Network grant agreement No. 101120466 “Explainable AI for Molecules” (AiChemist) as well as by Chemical Research in Toxicology and ICANN2024.
References

(1) Tetko, I.V. Tox24 Challenge Chem. Res. Toxicol. 2024, 37, 825−826. https://pubs.acs.org/doi/10.1021/acs.chemrestox.4c00192

(2) Huang, R.; Xia, M. Editorial: Tox21 Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways As Mediated by Exposure to Environmental Toxicants and Drugs. Front. Environ. Sci. 2017, 5. https://doi.org/10.3389/fenvs.2015.00085

(3) Richard, A. M.; Huang, R.; Waidyanatha, S.; Shinn, P.; Collins, B. J.; Thillainadarajah, I.; Grulke, C. M.; Williams, A. J.; Lougee, R. R.; Judson, R. S.; Houck, K. A.; Shobair, M.; Yang, C.; Rathman, J. F.; Yasgar, A.; Fitzpatrick, S. C.; Simeonov, A.; Thomas, R. S.; Crofton, K. M.; Paules, R. S.; Bucher, J. R.; Austin, C. P.; Kavlock, R. J.; Tice, R. R. The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology. Chem. Res. Toxicol. 2020. https://doi.org/10.1021/acs.chemrestox.0c00264

(4) Kleinstreuer, N. C.; Tetko, I. V.; Tong, W. Introduction to Special Issue: Computational Toxicology. Chem. Res. Toxicol. 2021, 34 (2), 171–175. https://doi.org/10.1021/acs.chemrestox.1c00032

(5) Klambauer, G.; Clevert, D.-A.; Shah, I.; Benfenati, E.; Tetko, I. V. Introduction to the Special Issue: AI Meets Toxicology. Chem. Res. Toxicol. 2023, 36 (8), 1163–1167. https://doi.org/10.1021/acs.chemrestox.3c00217

(6) Eytcheson, S. A.; Zosel, A. D.; Olker, J. H.; Hornung, M. W.; Degitz, S. J. Screening the ToxCast Chemical Libraries for Binding to Transthyretin Chem. Res. Tox. 2024, 37, 10, 1670-1681.

(7) Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. M. The CompTox Chemistry Dashboard: A Community Data Resource for Environmental Chemistry. J. Cheminformatics 2017, 9 (1), 61. https://doi.org/10.1186/s13321-017-0247-6

(8) Vorberg, S.; Tetko, I. V. Modeling the Biodegradability of Chemical Compounds Using the Online CHEmical Modeling Environment (OCHEM). Mol. Inform. 2014, 33 (1), 73–85. https://doi.org/10.1002/minf.201300030


Go back