Toxicity prediction from CellPainting data

Aug 3, 2023 | Knowledge

In the field of chemical research, computational methods have become indispensable for predicting toxicity and exploring vast chemical spaces. There are two powerful tools that we can use to predict the toxicity of chemicals: ProTox-II and ChemProp. ProTox-II is a web server that predicts the toxicity of chemicals, while ChemProp is a machine learning model for molecular property prediction. We can utilize these tools in conjunction with CellPainting data to enhance the understanding of chemical compounds and accelerate drug discovery and the development of novel compounds.


The ProTox-II webserver employs various computational techniques to predict toxicity endpoints for chemicals. The predictive models are based on the relationship between the chemical structure of a compound and its known biological activity, which were acquired from both in vitro and in vivo assays. It uses statistical methods to develop predictive models that can be used to identify potentially toxic compounds. Moreover, it incorporates molecular similarity, pharmacophores, fragment propensities, and machine learning models to predict acute toxicity, hepatotoxicity, cytotoxicity, carcinogenicity, mutagenicity, immunotoxicity, adverse outcomes pathways and toxicity targets. By inputting a Pubchem-name, canonical Smiles or a two-dimensional chemical structure, ProTox-II generates a toxicity profile along with confidence scores, offering valuable insights for toxicity assessments in the field of drug discovery.

Figure 1. ProTox-II offers simple predictions of several different levels of toxicity.

Figure 2. Toxicity prediction for cytochalasin d.

Figure 3. The figure shows the input compound’s molecular weight (MW) and predicted median lethal dose (LD50) compared to the dataset’s mean values.

Figure 4. The toxicity model report summarizes the predicted activity calculated by several machine learning models.

Figure 5. The toxicity radar chart visually represents the confidence of positive toxicity results in relation to the class average.

Figure 6. The diagrams of three compounds that closely resemble the input molecule, accompanied by their chemical structures and properties.


ChemProp is an advanced machine learning framework designed for predicting molecular properties. Leveraging deep learning algorithms, ChemProp analyzes molecular structures to predict a wide range of properties such as drug-likeness, solubility, toxicity, and biological activity. Its deep learning architecture, including a directed message passing neural network (D-MPNN), enables ChemProp to capture intricate relationships between molecular features and properties. We can utilize ChemProp to train a machine learning model using the information acquired from CellPainting experiments. This involves feeding the toxicity data and data acquired from CellPainting images into the model and allowing it to learn the patterns that are associated with toxicity. After the model has been trained, we can use it to predict the toxicity of new, untested compounds. 

Moreover, ChemProp demonstrates its versatility by extending beyond toxicity prediction to antibiotic discovery. A recent study published in Nature Chemical Biology demonstrates the immense potential of ChemProp, in accelerating the process of discovering new antibiotics. By screening around 7,500 molecules, researchers identified abaucin, a narrow-spectrum antibacterial compound that specifically targets A. baumannii. This breakthrough was made possible by training a neural network on bacterial growth inhibition data and utilizing in silico predictions to identify structurally new molecules with activity against the pathogen. Further investigations revealed that abaucin displayed remarkable efficacy in controlling A. baumannii infections in a mouse wound model. This groundbreaking work showcases the utility of chemprop and highlights its role in uncovering promising leads to combat challenging Gram-negative pathogens.

Overall, when it comes to drug discovery, ChemProp can aid in identifying and prioritizing potential drug candidates, accelerating the development process and reducing costs.


Another one of the machine learning models which we can use for toxicity prediction is DeepTox. It is based on deep learning algorithms and has shown high performance in computational toxicity prediction. DeepTox’s ability to learn abstract representations of the input data proves effective in capturing complex chemical features associated with toxicity. One of the key features of DeepTox is its ability to perform multi-task learning. This means that the neural network can learn to predict multiple toxic effects in one model, which can lead to more accurate predictions. Additionally, DeepTox uses an ensemble approach to combine the best models and improve the accuracy of predictions.

The utilization of methods, such as ProTox-II, ChemProp and DeepTox in conjunction with CellPainting data represents a significant advancement in toxicity prediction and drug discovery. It can empower us to make informed decisions early in the development process, saving time, costs, and resources. Moreover, this integrated approach opens doors to exploring vast chemical spaces and identifying compounds with desired properties.