Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also …
The effect size (ES) is the magnitude of a study outcome or research finding, such as the strength of the relationship obtained between an independent variable and a dependent variable. Two types of ES indicators are sampled here: the difference-type …
This course aims to introduce students to current controversies and new developments in recommended scientific practices. The course is meant to help students think critically about how to conduct better empirical research and how to draw …
In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ …
Scientists, being human, make mistakes. We transcribe things incorrectly, we make errors in our code, and we intend to do things and then forget. The consequences of errors in research may be as minor as wasted time and annoyance, but may be as …
While Open Science has arguably initiated positive changes at some stages of the research process (e.g., increasing transparency through preregistration), problematic behaviors during data collection are still almost impossible to detect and pose a …
Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high-stakes testing and accountability frameworks, content-related validity evidence is typically gathered via alignment studies, with panels of …
This book is not a how-to-do-it manual, so much as a why-to-do-it. Our main goal is to instill in the reader awareness of the numerous sources of bias that can lead to mistaken conclusions when evaluating interventions. Real-life examples are …
Importance: The use and misuse of P values has generated extensive debates. Objective: To evaluate in large scale the P values reported in the abstracts and full text of biomedical research articles over the past 25 years and determine how frequently …