Identical summary statistics were uncommon in randomized trials and cohort studies

Mark J Bolland; Greg D Gamble; Alison Avenell; Andrew Grey

doi:10.1016/j.jclinepi.2021.05.002

Identical summary statistics were uncommon in randomized trials and cohort studies

Mark J Bolland^* (Corresponding Author), Greg D Gamble, Alison Avenell, Andrew Grey

^*Corresponding author for this work

Auckland University of Technology

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

5 Downloads (Pure)

Abstract

OBJECTIVE: To examine the proposition that identical summary statistics (mean and/or SD) in different randomized controlled trials (RCT) or clinical cohorts can be explained by common or homogeneous source populations.

STUDY DESIGN: We estimated the probability of identical summary data in studies with high proportions of identical summary statistics, in simulations, and in control datasets.

RESULTS: The probability of both an identical mean and an identical SD for a variable in separate RCT is low (<~3%), unless the variable is rounded to 1 significant figure. In two RCT with identical summary statistics for 16 of 39 shared variables, simulations indicated the probability of the observed matches was <1 in 100,000. In 34 clinical cohorts with publication integrity concerns, the proportion of summary statistics from variables reported in ≥10 studies that were identical in ≥2 cohorts were high (42% for means, 52% for SD, and 29% for both), and improbable based on simulations and comparisons to control datasets.

CONCLUSIONS: The likelihood of multiple identical summary statistics within an individual RCT or across a body of RCT or cohort studies by the same research group is low, especially when both the mean, and the SD are identical, unless the variables are rounded to 1 significant figure.

Original language	English
Pages (from-to)	180-188
Number of pages	9
Journal	Journal of Clinical Epidemiology
Volume	136
Early online date	15 May 2021
DOIs	https://doi.org/10.1016/j.jclinepi.2021.05.002
Publication status	Published - 1 Aug 2021

Bibliographical note

Acknowledgments
We thank Dorit Naot and Susannah O'Sullivan for providing the animal raw data used in the simulations.

Funding
This research received no specific funding. MB is a recipient of an HRC Clinical Practitioners Fellowship. The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health and Social Care Directorates. The authors are independent of the HRC. The HRC had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Keywords

Statistical method
research integrity
identical data
summary statistic
fabricated data
data integrity

Access to Document

10.1016/j.jclinepi.2021.05.002Licence: Unspecified

Bolland_etal_JCE_Identical_Summary_Statitics_AAM
© 2021. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Accepted author manuscript, 197 KBLicence: CC BY-NC-ND

Cite this

@article{1c1222ef2f5d4101aba6dd6a18b50988,

title = "Identical summary statistics were uncommon in randomized trials and cohort studies",

abstract = "OBJECTIVE: To examine the proposition that identical summary statistics (mean and/or SD) in different randomized controlled trials (RCT) or clinical cohorts can be explained by common or homogeneous source populations.STUDY DESIGN: We estimated the probability of identical summary data in studies with high proportions of identical summary statistics, in simulations, and in control datasets.RESULTS: The probability of both an identical mean and an identical SD for a variable in separate RCT is low (<~3%), unless the variable is rounded to 1 significant figure. In two RCT with identical summary statistics for 16 of 39 shared variables, simulations indicated the probability of the observed matches was <1 in 100,000. In 34 clinical cohorts with publication integrity concerns, the proportion of summary statistics from variables reported in ≥10 studies that were identical in ≥2 cohorts were high (42% for means, 52% for SD, and 29% for both), and improbable based on simulations and comparisons to control datasets.CONCLUSIONS: The likelihood of multiple identical summary statistics within an individual RCT or across a body of RCT or cohort studies by the same research group is low, especially when both the mean, and the SD are identical, unless the variables are rounded to 1 significant figure.",

keywords = "Statistical method, research integrity, identical data, summary statistic, fabricated data, data integrity",

author = "Bolland, {Mark J} and Gamble, {Greg D} and Alison Avenell and Andrew Grey",

note = "Acknowledgments We thank Dorit Naot and Susannah O'Sullivan for providing the animal raw data used in the simulations. Funding This research received no specific funding. MB is a recipient of an HRC Clinical Practitioners Fellowship. The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health and Social Care Directorates. The authors are independent of the HRC. The HRC had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.",

year = "2021",

month = aug,

day = "1",

doi = "10.1016/j.jclinepi.2021.05.002",

language = "English",

volume = "136",

pages = "180--188",

journal = "Journal of Clinical Epidemiology",

issn = "0895-4356",

publisher = "Elsevier USA",

}

TY - JOUR

T1 - Identical summary statistics were uncommon in randomized trials and cohort studies

AU - Bolland, Mark J

AU - Gamble, Greg D

AU - Avenell, Alison

AU - Grey, Andrew

N1 - Acknowledgments We thank Dorit Naot and Susannah O'Sullivan for providing the animal raw data used in the simulations. Funding This research received no specific funding. MB is a recipient of an HRC Clinical Practitioners Fellowship. The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health and Social Care Directorates. The authors are independent of the HRC. The HRC had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

PY - 2021/8/1

Y1 - 2021/8/1

N2 - OBJECTIVE: To examine the proposition that identical summary statistics (mean and/or SD) in different randomized controlled trials (RCT) or clinical cohorts can be explained by common or homogeneous source populations.STUDY DESIGN: We estimated the probability of identical summary data in studies with high proportions of identical summary statistics, in simulations, and in control datasets.RESULTS: The probability of both an identical mean and an identical SD for a variable in separate RCT is low (<~3%), unless the variable is rounded to 1 significant figure. In two RCT with identical summary statistics for 16 of 39 shared variables, simulations indicated the probability of the observed matches was <1 in 100,000. In 34 clinical cohorts with publication integrity concerns, the proportion of summary statistics from variables reported in ≥10 studies that were identical in ≥2 cohorts were high (42% for means, 52% for SD, and 29% for both), and improbable based on simulations and comparisons to control datasets.CONCLUSIONS: The likelihood of multiple identical summary statistics within an individual RCT or across a body of RCT or cohort studies by the same research group is low, especially when both the mean, and the SD are identical, unless the variables are rounded to 1 significant figure.

AB - OBJECTIVE: To examine the proposition that identical summary statistics (mean and/or SD) in different randomized controlled trials (RCT) or clinical cohorts can be explained by common or homogeneous source populations.STUDY DESIGN: We estimated the probability of identical summary data in studies with high proportions of identical summary statistics, in simulations, and in control datasets.RESULTS: The probability of both an identical mean and an identical SD for a variable in separate RCT is low (<~3%), unless the variable is rounded to 1 significant figure. In two RCT with identical summary statistics for 16 of 39 shared variables, simulations indicated the probability of the observed matches was <1 in 100,000. In 34 clinical cohorts with publication integrity concerns, the proportion of summary statistics from variables reported in ≥10 studies that were identical in ≥2 cohorts were high (42% for means, 52% for SD, and 29% for both), and improbable based on simulations and comparisons to control datasets.CONCLUSIONS: The likelihood of multiple identical summary statistics within an individual RCT or across a body of RCT or cohort studies by the same research group is low, especially when both the mean, and the SD are identical, unless the variables are rounded to 1 significant figure.

KW - Statistical method

KW - research integrity

KW - identical data

KW - summary statistic

KW - fabricated data

KW - data integrity

U2 - 10.1016/j.jclinepi.2021.05.002

DO - 10.1016/j.jclinepi.2021.05.002

M3 - Article

C2 - 34000386

SN - 0895-4356

VL - 136

SP - 180

EP - 188

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

ER -

Identical summary statistics were uncommon in randomized trials and cohort studies

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this