Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Jaroslav P Novak, Seon-Young Kim, Jun Xu, Olga Modlich, David J Volsky, David Honys, Joan L. Slonczewski, Douglas A. Bell, Fred R Blattner, Eduardo Blumwald, Marjan Boerma, Manuel Cosio, Zoran Gatalica, Marian Hajduch, Juan Hildago, Roderick R McInnes, Merrill C. Miller III, Michael Rolph, Milena Penkowa, Jordan Sottosanto & 4 others Rene St-Arnaud, Michael J Szego, David Twell, Charles Wang

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results: Here we examine the expression data obtained from 682 Affymetrix GeneChips with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kαcoefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kαdistributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance
Original languageEnglish
Pages (from-to)1-24
Number of pages24
JournalBiology Direct
Volume1
Issue number27
DOIs
Publication statusPublished - 2006
Externally publishedYes

Fingerprint

DNA Microarray
t-distribution
Microarrays
Oligonucleotide Array Sequence Analysis
Gene expression
Microarray
Consecutive
DNA
Normal Distribution
Standard deviation
Interval Probability
Gene Expression
Genes
Microarray Data
Physical Phenomena
Technology
Coefficient
gene expression
Comparison of Experiments
Gaussian distribution

Cite this

Novak, J. P., Kim, S-Y., Xu, J., Modlich, O., Volsky, D. J., Honys, D., ... Wang, C. (2006). Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution. Biology Direct, 1(27), 1-24. https://doi.org/10.1186/1745-6150-1-27
Novak, Jaroslav P ; Kim, Seon-Young ; Xu, Jun ; Modlich, Olga ; Volsky, David J ; Honys, David ; Slonczewski, Joan L. ; Bell, Douglas A. ; Blattner, Fred R ; Blumwald, Eduardo ; Boerma, Marjan ; Cosio, Manuel ; Gatalica, Zoran ; Hajduch, Marian ; Hildago, Juan ; McInnes, Roderick R ; Miller III, Merrill C. ; Rolph, Michael ; Penkowa, Milena ; Sottosanto, Jordan ; St-Arnaud, Rene ; Szego, Michael J ; Twell, David ; Wang, Charles. / Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution. In: Biology Direct. 2006 ; Vol. 1, No. 27. pp. 1-24.
@article{323dbd3c732348349332e32b6b7f3dc9,
title = "Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution",
abstract = "Background: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results: Here we examine the expression data obtained from 682 Affymetrix GeneChips with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15{\%} of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kαcoefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kαdistributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give {"}true{"} representation of physical processes, involved in measurement of RNA abundance",
author = "Novak, {Jaroslav P} and Seon-Young Kim and Jun Xu and Olga Modlich and Volsky, {David J} and David Honys and Slonczewski, {Joan L.} and Bell, {Douglas A.} and Blattner, {Fred R} and Eduardo Blumwald and Marjan Boerma and Manuel Cosio and Zoran Gatalica and Marian Hajduch and Juan Hildago and McInnes, {Roderick R} and {Miller III}, {Merrill C.} and Michael Rolph and Milena Penkowa and Jordan Sottosanto and Rene St-Arnaud and Szego, {Michael J} and David Twell and Charles Wang",
year = "2006",
doi = "10.1186/1745-6150-1-27",
language = "English",
volume = "1",
pages = "1--24",
journal = "Biology Direct",
issn = "1745-6150",
publisher = "BioMed Central",
number = "27",

}

Novak, JP, Kim, S-Y, Xu, J, Modlich, O, Volsky, DJ, Honys, D, Slonczewski, JL, Bell, DA, Blattner, FR, Blumwald, E, Boerma, M, Cosio, M, Gatalica, Z, Hajduch, M, Hildago, J, McInnes, RR, Miller III, MC, Rolph, M, Penkowa, M, Sottosanto, J, St-Arnaud, R, Szego, MJ, Twell, D & Wang, C 2006, 'Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution', Biology Direct, vol. 1, no. 27, pp. 1-24. https://doi.org/10.1186/1745-6150-1-27

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution. / Novak, Jaroslav P; Kim, Seon-Young; Xu, Jun; Modlich, Olga; Volsky, David J; Honys, David; Slonczewski, Joan L.; Bell, Douglas A.; Blattner, Fred R; Blumwald, Eduardo; Boerma, Marjan; Cosio, Manuel; Gatalica, Zoran; Hajduch, Marian; Hildago, Juan; McInnes, Roderick R; Miller III, Merrill C.; Rolph, Michael; Penkowa, Milena; Sottosanto, Jordan; St-Arnaud, Rene; Szego, Michael J; Twell, David; Wang, Charles.

In: Biology Direct, Vol. 1, No. 27, 2006, p. 1-24.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

AU - Novak, Jaroslav P

AU - Kim, Seon-Young

AU - Xu, Jun

AU - Modlich, Olga

AU - Volsky, David J

AU - Honys, David

AU - Slonczewski, Joan L.

AU - Bell, Douglas A.

AU - Blattner, Fred R

AU - Blumwald, Eduardo

AU - Boerma, Marjan

AU - Cosio, Manuel

AU - Gatalica, Zoran

AU - Hajduch, Marian

AU - Hildago, Juan

AU - McInnes, Roderick R

AU - Miller III, Merrill C.

AU - Rolph, Michael

AU - Penkowa, Milena

AU - Sottosanto, Jordan

AU - St-Arnaud, Rene

AU - Szego, Michael J

AU - Twell, David

AU - Wang, Charles

PY - 2006

Y1 - 2006

N2 - Background: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results: Here we examine the expression data obtained from 682 Affymetrix GeneChips with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kαcoefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kαdistributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance

AB - Background: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results: Here we examine the expression data obtained from 682 Affymetrix GeneChips with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kαcoefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kαdistributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance

U2 - 10.1186/1745-6150-1-27

DO - 10.1186/1745-6150-1-27

M3 - Article

VL - 1

SP - 1

EP - 24

JO - Biology Direct

JF - Biology Direct

SN - 1745-6150

IS - 27

ER -