Tox24 Challenge Data

Data correction 21.06.24 We reviewed and corrected several structures following remarks of challenge participants. The corrections were done mainly for mixtures and salts. Structure check was extended to prioritise structures as provided by ACS CAS service. In cases when CAS suggestions were ambiguos, we used structures as retrived from PubChem. The structural information and descripors (see below) were also updated.
Leaderboard set was released 15.08.24 You may use this set to increase accuracy of your model.
The Challenge data were collected from the EPA dataset, which was split into training, leaderboard and blind sets.
The datasets (SMILES and activity for the training set) can be retrived from the original EPA dataset or downloaded from these links:
tox24_challenge_train.csv (1012 compounds)
tox24_challenge_leaderboard.csv (released 15.08.24, 200 compounds with data)
tox24_challenge_test.csv (released 01.09.24, blind set, 300 compounds with data)
They are also available as OCHEM baskets:
Training set (1012 compounds)
Leaderboard set (200 compounds)
Blind test set (300 compounds)
You can use these sets to develop and test models directly within OCHEM or/and calculate and export descriptors for external model development.
A preliminary analysis (OCHEM results for the training set) showed that best five cross-validation Root Mean Squared Error (RMSE) values were 24-25% and 21-23% for the training and the leaderboard sets, respectively. Data for the leaderboard set were released on August 15th. The higher accuracy obtained for the leaderboard set could be the result of (a) more accurate structural information of molecules in this set and/or (b) the use of the data split procedure. The performance for the leaderboard set developed by Random Forest using AlogPS + OEstate descriptors (baseline model) is shown as an entry for itetko_acs. This user did not participate to the challenge, nor did other organisers of the Challenge.
AlogPS + OEstate descriptors used by the baseline model are available: tox24_alogps_oestate.csv

Go back