P-Hacking Lexical Richness Through Definitions of "Type" and "Token"

Stud Health Technol Inform. 2019 Aug 21:264:1433-1434. doi: 10.3233/SHTI190470.

Abstract

"P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the definitions of "type" and "token" can produce significantly different results. Since these terms are rarely defined in the biomedical literature, the result is an inability to meaningfully interpret the body of literature that makes use of this measure.

Keywords: Language; vocabulary.

MeSH terms

  • Computer Security*
  • Vocabulary