OBJECTIVE: To examine the proposition that identical summary statistics (mean and/or SD) in different randomized controlled trials (RCT) or clinical cohorts can be explained by common or homogeneous source populations.
STUDY DESIGN: We estimated the probability of identical summary data in studies with high proportions of identical summary statistics, in simulations, and in control datasets.
RESULTS: The probability of both an identical mean and an identical SD for a variable in separate RCT is low (<~3%), unless the variable is rounded to 1 significant figure. In two RCT with identical summary statistics for 16 of 39 shared variables, simulations indicated the probability of the observed matches was <1 in 100,000. In 34 clinical cohorts with publication integrity concerns, the proportion of summary statistics from variables reported in ≥10 studies that were identical in ≥2 cohorts were high (42% for means, 52% for SD, and 29% for both), and improbable based on simulations and comparisons to control datasets.
CONCLUSIONS: The likelihood of multiple identical summary statistics within an individual RCT or across a body of RCT or cohort studies by the same research group is low, especially when both the mean, and the SD are identical, unless the variables are rounded to 1 significant figure.
|Number of pages||9|
|Journal||Journal of Clinical Epidemiology|
|Early online date||15 May 2021|
|Publication status||Published - 1 Aug 2021|
- Statistical method
- research integrity
- identical data
- summary statistic
- fabricated data
- data integrity