The verification of image quality in high throughput experiments is virtually impossible to do by hand, that is why you can use automated methods to objectively label and remove images and cells that are affected by artifacts. These strategies aim to reduce the possibility of inaccurate values affecting your data.
Field-of-view quality control
When we have to deal with out-of-focus images, the major difference with the in-focus-ones is that the latter are sharper, which means they have sharp gradients and sharp features. To distinguish the out-of-focus images from the ones that are in focus, we can apply a Gaussian-Laplace filter and then look at the distribution of pixel values of the result. The distribution of the in-focus image will have more high values, so the variance of the distribution will be high and the maximum of the distribution will be higher than the out-of-focus one (see figure with graph and two example images). High variance and a high maximum suggest clearly distinguished edges in the un-blurred image. Some measurements to detect a blurry image also include the ratio of the mean and s.d. of each picture’s pixel intensities, the normalized measure of the intensity variance, and the image correlation across subregions of the image.
Cell-level quality control
The goal of cell-level quality control in high throughput image-based cell profiling is to identify and exclude the outlier cells that may have been damaged or otherwise compromised during sample preparation or imaging, as these cells may not provide accurate or representative data. By removing these cells from the analysis, you can improve the overall accuracy and reliability of your data and ensure that your results are not biased by the inclusion of poor-quality cells.
Outlier detection is a way of identifying cells that are unusual or different from the majority of cells in a sample. There are two main approaches to outlier detection: methods that do not rely on pre-defined statistical models or assumptions about the data, and methods that involve training a statistical model on normal samples.
The first approach uses techniques such as distance-based methods, density-based methods, or clustering algorithms to identify outlier cells. These methods do not rely on any predetermined assumptions about the data, but rather identify outlier cells based on their distance from the majority of cells in the sample, their relative density or concentration within the sample, or their similarity to other cells in the sample. We can analyze this using univariate statistical tools, including the 3- or 5-s.d. rules, Winsorizing, and the adjusted box-plot rule. Moreover, there are some useful multivariate methods that include principal component analysis (PCA) and Mahalanobis-based outlier detection.
The second approach involves training a statistical model on normal samples and using that model to identify cells that do not fit the model, which may be considered outliers. This approach involves building a model of what is considered normal, and identifying cells that do not fit that model as outliers.
Outlier cells can be identified and eliminated from analysis if they are the result of errors or if their presence would significantly affect the results of the analysis. It is important to keep in mind that the full population should be used for cell-outlier detection. It shouldn’t be programmed separately for each well, duplicate, or plate. Caution should be exercised to prevent eliminating data points that reflect cells and samples with interesting and unique phenotypes.