Baseline P  value distributions in randomized trials were uniform for continuous but not categorical variables

Mark J. Bolland (Corresponding Author), Greg D. Gamble, Alison Avenell, Andrew Grey, Thomas Lumley

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


OBJECTIVE: Comparing observed and expected distributions of baseline variables in randomized controlled trials (RCTs) has been used to investigate possible research misconduct, although the validity of this approach has been questioned. We explored this technique and introduced a novel metric to compare P values from baseline variables between treatment arms.

STUDY DESIGN AND SETTING: We compared observed with expected distributions of baseline P values using a one-way chi-square test and by comparing the area under the curve (AUC) of the cumulative distribution function in 13 RCTs conducted by our group, two groups of RCTs known to contain fabricated data, and simulations.

RESULTS: In our 13 RCTs, the distribution of P values from baseline continuous variables was consistent with the expected theoretical uniform distribution (P = 0.19, difference from expected AUC -0.03, 95% confidence interval [-0.04, 0.04]). For categorical variables, the P value distribution was not uniform. The distributions of P values from RCTs with fabricated data were highly unusual and not consistent with the uniform distribution for continuous variables, nor with the expected distribution for categorical variables, nor with the distribution of P values in genuine RCTs.

CONCLUSIONS: Assessing baseline P values in groups of RCTs can identify highly unusual distributions that might raise or reinforce concerns about randomization and data integrity.

Original languageEnglish
Pages (from-to)67-76
Number of pages10
JournalJournal of Clinical Epidemiology
Early online date21 May 2019
Publication statusPublished - Aug 2019


  • Statistical methods
  • Research integrity
  • Fabricated data
  • Data integrity
  • P values
  • Randomization
  • BONE


Dive into the research topics of 'Baseline P  value distributions in randomized trials were uniform for continuous but not categorical variables'. Together they form a unique fingerprint.

Cite this