Preprocessing extracted features from cell painting images is an essential step in image-based profiling because it helps to ensure that the data is of high quality and is ready for further analysis. This can help to eliminate sources of error or bias in the data, such as differences in the amount of sample material or in the sensitivity of the measurement method. It can also help to reduce the complexity of the data and to focus the analysis on the most relevant features, which can make the results more interpretable and easier to understand.
For instance, if your feature-extraction method yields non-finite symbols (NaN or INF), which represent incomputable values, it can pose a challenge to applying statistical methods or machine-learning algorithms. If your results contain a small number of cells that have missing values, you can consider excluding them from further analysis, but keep in mind that those cells could also be indicating a valid and relevant phenotype. If a large proportion of your cells have missing values for a particular feature, you can consider this feature being insufficiently informative and thus removing it from your analysis. Again, be thoughtful of unexpected cell phenotypes. There is also an option of using zeros or the mean value as a substitute for the missing values. Although, removing cells or features is more common than substituting the missing values, making decisions on missing values is generally based on your experimental evaluations and empirical observations.
In high-throughput assays such as cell painting the multiwell plates can be subjects to edge effects and gradient artifacts. It is possible to adjust the data obtained to account for differences in the layout of the samples on the plate. This can be important because the layout of your samples on the plate can affect the measurement of the features, due to factors such as variations in the amount of sample material or in the sensitivity of the measurement method. To correct these spatial biases you can use local smoothing techniques as running averages or 2D polynomial regression. Moreover, you can use two-way median polish, which involves iterative median smoothing of rows and columns to remove positional effects, then dividing each well value by the plate median absolute deviation to generate a B score. You can use the B-score to assess the quality of your data by comparing the score across different samples or conditions. A high B-score indicates that there is a large amount of non-specific signal in the data, which can be an indication of poor quality or low specificity. A low B-score, on the other hand, indicates that the data is of high quality and that the signals are specific to the features of interest.
Feature transformation and normalization
Morphological profiling can include features that display varying shapes of statistical distributions. We recommend adjusting the values of the features in order to improve the quality and reliability of your data with feature transformation and normalization.
Feature transformation is the process of applying mathematical transformations to the values of the features. This can be done for a variety of reasons, such as to stabilize variance, to meet the assumptions of statistical tests, or to make the data more interpretable. Some common transformations include taking the logarithm, the square root, or the inverse of the values.
Normalization is the process of adjusting the values of the features to a common scale. This can be done in a variety of ways, such as by dividing each value by the total intensity of all features measured in the sample, or by subtracting the mean value and dividing by the standard deviation. Normalization can help to eliminate biases that might arise from differences in the amount of sample material or in the sensitivity of the measurement method.