A primary reason for using two-color microarrays is that the use of two samples labeled with different dyes on the same slide, that bind to probes on the same spot, is supposed to adjust for many factors that introduce noise and errors into the analysis. Most users assume that any differences between the dyes can be adjusted out by standard methods of normalization, so that measures such as log ratios on the same slide are reliable measures of comparative expression. However, even after the normalization, there are still probe specific dye and slide variation among the data. We define a method to quantify the amount of the dye-by-probe and slide-by-probe interaction. This serves as a diagnostic, both visual and numeric, of the existence of probe-specific dye bias. We show how this improved the performance of two-color array analysis for arrays for genomic analysis of biological samples ranging from rice to human tissue.
We develop a procedure for quantifying the extent of probe-specific dye and slide bias in two-color microarrays. The primary output is a graphical diagnostic of the extent of the bias which called ECDF (Empirical Cumulative Distribution Function), though numerical results are also obtained.
We show that the dye and slide biases were high for human and rice genomic arrays in two gene expression facilities, even after the standard intensity-based normalization, and describe how this diagnostic allowed the problems causing the probe-specific bias to be addressed, and resulted in important improvements in performance. The R package LMGene which contains the method described in this paper has been available to download from Bioconductor.