PURPOSE: This study estimates the inter-rater and test-retest reliability of Chalmers' quality score scale in the context of bone mass loss and fracture rate in postmenopausal women. METHODS: An exhaustive literature search was performed on Medline to locate clinical trials studying the effect of medication use on bone mass loss and fracture rate in postmenopausal women. Twenty articles were randomly selected and four raters independently assessed the quality of each article with Chalmers' scale. Among the 20 articles, 10 were blinded on authors' names, journal, year of publication and source of funding. Raters were also asked to assess all 20 articles one more time, two months after the first evaluation. Intraclass (ICC) and test-retest correlation coefficients were calculated. RESULTS: The overall inter-rater ICC was 0.66 [0.55, 0.79](95%). The overall test-retest reliability of Chalmers' scale was 0.81 [0.67, 0.98](95%). When ratings were stratified according to articles' blinding status, blinded assessments generated a smaller inter-rater ICC than non-blinded assessments: 0.30 [0.17, 0.53](95%) vs. 0.80 [0.71, 0.90](95%). In addition, analyzing sub-scales separately generated different estimates of reliability. CONCLUSIONS: This study shows that the reliability of the quality scale developed by Chalmers substantially varies between sub-scales, and is highly dependent on articles' blinding status. The possibility of bias in rating non-blinded articles can not be ruled out. The reliability of the scale can also be dependent on the outcome studied. (C) 2000 Elsevier Science Inc.