Cluster 2: Conceptual and Statistical Knowledge — FORRT Open Science Taxonomy | FORRT - Framework for Open and Reproducible Research Training

2 Conceptual and Statistical Knowledge

7 sub-clusters · 82 references

Attainment of a grounding in fundamental statistics, measurement, and its implications encompassing conceptual knowledge, application, interpretation and communication of statistical analyses. There are 8 sub-clusters which aim to further parse the learning and teaching process:

▸ Effect sizes, statistical power, simulations, & confidence intervals. 19 / 19

Statistics are more than p-values and we need to use other benchmarks to determine the statistical and practical relevance of an effect. Emphasizes effect size, confidence intervals, power, and simulations to design adequately powered studies and communicate practical significance.

evidence Editorial

Sample size estimation revisited

This editorial examines the prevalence and reproducibility of sample size estimations within the Journal of Sports Sciences, revealing that only a small minority of studies provide sufficient detail for calculation reproduction. It highlights a critical gap between established editorial guidelines and the actual reporting practices of researchers in the field.

practice/tools Editorial

Power, precision, and sample size estimation in sport and exercise science research

This resource offers a technical guide for sport and exercise scientists on determining sample sizes using both frequentist power and precision-based approaches. It acts as a practical primer for researchers to justify their study designs and align with rigorous statistical standards during the submission process.

evidence Paper

Quantitative Political Science Research Is Greatly Underpowered

This large-scale meta-research study provides empirical evidence that quantitative political science research is severely underpowered, with a median power of only 10% across thousands of tests. The findings demonstrate that only a small fraction of tests in the discipline meet the standard 80% power threshold required to detect consensus effects.

advocacy Paper

Behavioural science is unlikely to change the world without a heterogeneity revolution

This article argues that the impact of behavioral science on real-world problems is hindered by a neglect of treatment effect heterogeneity. It advocates for a shift in research priorities toward understanding how and why effects vary across different contexts and populations, proposing a framework to improve the generalizability of findings.

practice/tools Paper

Power Analysis and Effect Size in Mixed Effects Models: A Tutorial

This tutorial addresses the difficulty of conducting power analysis for experimental designs that include both participant and stimulus samples, common in cognitive psychology. It provides researchers with practical methods and literature reviews to accurately estimate power and effect sizes when using mixed-effects models.

practice/tools Preprint

Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning with Multiple Stimuli

This resource introduces simulation-based approaches and Accuracy in Parameter Estimation (AIPE) as alternatives for sample size planning in research studies with multiple stimuli. It provides tools to determine necessary sample sizes for precise parameter estimation when traditional power formulas are insufficient or inapplicable to complex designs.

evidence Paper

Power failure: why small sample size undermines the reliability of neuroscience

This study presents a meta-research analysis quantifying the prevalence of low statistical power across the neuroscience literature and its role in undermining reproducibility. It demonstrates how underpowered studies lead to inflated effect sizes and waste resources, calling for systemic changes in how neuroscience research is conducted and reported.

Caldwell, A. R., Lakens, D., Parlett‑Pelleriti, C. M., Prochilo, G., & Aust, F. (2022). Power analysis with Superpower. https://aaroncaldwell.us/SuperpowerBook/

practice/tools Paper

Understanding Mixed-Effects Models Through Data Simulation

This tutorial provides a practical guide to using data simulation to better understand and interpret linear mixed-effects models that include random effects for both subjects and stimuli. By walking through R code and parameter interpretation, it helps researchers build intuition for complex models and correctly apply them to their own experimental data.

overview Paper

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

This resource clarifies common misconceptions regarding frequentist statistical indicators by identifying and correcting twenty prevalent misinterpretations of p-values, confidence intervals, and power. It serves as a rigorous pedagogical guide to help researchers avoid incorrect shortcut definitions that lead to invalid scientific conclusions.

practice/tools Paper

Conducting Simulation Studies in the R Programming Environment

This resource provides a practical tutorial for using the R programming environment to conduct simulation studies, making these techniques accessible to researchers without advanced programming backgrounds. It includes annotated code to help users estimate statistical power and assess the appropriateness of various analytical methods for their specific research questions.

overview Paper

Heterogeneity in effect size estimates

This resource proposes a framework that decomposes heterogeneity in effect sizes into three distinct sources: population, design, and analytical variation. It provides a theoretical foundation for understanding how these different forms of uncertainty limit the generalizability of research findings and affect the cumulative probability that a tested hypothesis is true.

practice/tools Paper

Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs

This primer provides step-by-step instructions for calculating and reporting various effect size measures for t-tests and ANOVAs, specifically distinguishing between metrics like Cohen’s d and partial eta-squared. It emphasizes how transparent effect size reporting directly enables cumulative science by facilitating accurate a-priori power analyses and the inclusion of findings in meta-analyses.

overview Paper

Sample Size Justification

This article details six different approaches for justifying sample sizes in quantitative research, moving beyond simple power analysis to include strategies based on accuracy planning, resource constraints, and point estimates. It provides researchers with a structured decision-making framework and standardized vocabulary to transparently communicate the rationale for their data collection plans.

evidence Paper

With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests

Employing a survey design with truth-telling incentives, this paper provides empirical data on the widespread prevalence of questionable research practices among psychologists. It reveals that researchers are significantly more likely to admit to behaviors they perceive as defensible, providing insight into the normalization of problematic methodologies within the discipline.

practice/tools Paper

Reporting effect sizes in original psychological research: A discussion and tutorial.

This tutorial offers practical guidance for psychological researchers on the reporting and interpretation of effect sizes and their associated confidence intervals. It specifically emphasizes the importance of unstandardized effect sizes and provides recommendations for selecting measures that best address specific research questions.

practice/tools Paper

Safeguard Power as a Protection Against Imprecise Power Estimates

This article introduces 'safeguard power analysis,' a practical method for sample size planning that accounts for the inherent uncertainty in effect size estimates. By using the lower bound of a confidence interval around an effect size, the tool helps researchers avoid the common problem of designing underpowered studies based on potentially inflated initial results.

critique Paper

Evaluating Research in Personality and Social Psychology: Considerations of Statistical Power and Concerns About False Findings

This article critiques the application of False Finding Rate (FFR) calculations as a primary criterion for evaluating the quality of research in personality and social psychology. It argues that the assumptions underlying these power-based evaluations often fail to reflect the practical realities of the discipline, potentially leading to the unfair dismissal of valid research.

evidence Paper

On the importance of modeling the invisible world of underlying effect sizes

This resource uses formal modeling and simulations to demonstrate that headline replication rates cannot be meaningfully interpreted without considering the underlying distribution of true effect sizes and statistical power. It provides a meta-research framework for understanding how observed replication failures can emerge from the mathematical properties of original study designs rather than solely from questionable research practices.

▸ Exploratory and confirmatory analyses 8 / 8

Confirmatory analyses test a priori hypotheses against a pre-specified analysis plan (ideally preregistered/Registered Report); any deviations are documented. Exploratory analyses probe patterns, generate hypotheses, and build models after seeing the data.

advocacy Book

The Seven Deadly Sins of Psychology

This work identifies and analyzes systemic flaws in psychological science, such as publication bias and lack of transparency, that contribute to the replication crisis. It makes a strong case for institutional reform and the adoption of open science practices, such as Registered Reports, to improve the reliability of the field.

Feest, U., & Devezer, B. (2025). Toward a more accurate notion of exploratory research (and why it matters). PhilSci Archive. https://philsci-archive.pitt.edu/24482/

critique Paper

A critique of using the labels confirmatory and exploratory in modern psychological research

This paper critiques the binary categorization of research as either exploratory or confirmatory, arguing that these labels are too simplistic for modern psychological research involving complex statistical models. It highlights how these terms can mask the nuanced relationship between theory and data analysis, potentially obstructing methodological progress.

Lin, W., & Green, D. P. (2016). Standard Operating Procedures: A Safety Net for Pre-Analysis Plans. Political Science and Politics, 49(3), 495–500. https://doi.org/10.1017/S1049096516000810

critique Paper

Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests

This paper challenges the prevailing hierarchy that favors confirmatory testing over exploratory testing, arguing that the latter can often produce more compelling scientific insights. It provides a theoretical counterpoint to the idea that preregistration is the primary determinant of research quality or certainty.

critique Paper

Arrested Theory Development: The Misguided Distinction Between Exploratory and Confirmatory Research

This resource argues that psychology’s replicability crisis stems from "flexible theories" rather than a failure to distinguish between exploration and confirmation. It critiques current trends that prioritize methodological fixes like preregistration over the fundamental need for developing rigorous, "hard to vary" theories.

advocacy Paper

The Creativity-Verification Cycle in Psychological Science: New Methods to Combat Old Idols

This article advocates for the adoption of preregistration in psychological science as a necessary safeguard against pervasive cognitive biases like hindsight and confirmation bias. It argues for a clear structural separation between the 'creativity' of exploratory data analysis and the 'verification' of confirmatory hypothesis testing.

advocacy Paper

An Agenda for Purely Confirmatory Research

This resource advocates for the adoption of purely confirmatory research designs to prevent researchers from fine-tuning analyses to fit observed data. It highlights how the lack of pre-commitment to specific statistical tests undermines the validity of research claims in psychology.

▸ Limitations and benefits of NHST, Bayesian & Likelihood approaches. 11 / 11

Next to frequentist statistics, there are other quantitative approaches, each with different assumptions and goals. This subcluster summarizes benefits and limitations of each one.

advocacy Paper

The New Statistics

This publication advocates for the adoption of "the new statistics," urging researchers to move away from null-hypothesis significance testing in favor of estimation and effect size reporting. It presents a clear case for research integrity reforms, including the prespecification of studies and the active encouragement of replication to improve literature reliability.

practice/tools Paper

How to become a Bayesian in eight easy steps: An annotated reading list

This resource provides a curated and annotated reading list designed to guide researchers through the transition from frequentist to Bayesian statistical thinking. It offers a structured pathway for self-study by identifying foundational texts and explaining their significance in mastering Bayesian inference.

overview Paper

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

overview Paper

Bayes Factors

This paper provides a comprehensive review of the Bayes factor as a tool for quantifying scientific evidence in favor of a hypothesis. It discusses the practical application of Bayesian hypothesis testing across various research contexts and provides guidelines for interpreting the strength of evidence.

practice/tools Paper

Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

This resource demonstrates how Bayesian hypothesis testing can be applied specifically within neuroscience to distinguish between inconclusive results and genuine evidence for the absence of an effect. It provides a practical alternative to frequentist methods, which are inherently unable to provide statistical evidence in support of a null hypothesis.

overview Paper

Scientific method: Statistical errors

This resource provides an accessible overview of the common misinterpretations of p-values and how the rigid reliance on statistical significance thresholds fuels the reproducibility crisis. It explains the mathematical vulnerability of 'near-significant' results and suggests moving toward more nuanced statistical reporting that avoids binary thinking.

advocacy Paper

The Creativity-Verification Cycle in Psychological Science: New Methods to Combat Old Idols

critique Paper

Statistical Nonsignificance in Empirical Economics

This article critiques the standard practice in empirical economics of prioritizing statistically significant rejections over non-significant findings. It demonstrates that in the context of large economic datasets, the failure to reject a point null is often more scientifically informative than a rejection, challenging the traditional hierarchy of evidence.

policies Paper

Interrogating the “cargo cult science” metaphor

The Bonn PRINTEGER Statement provides a set of guidelines for research organizations to strengthen integrity by focusing on institutional responsibilities and the daily work environment. It contributes actionable advice on how management and governance can be adapted to proactively address the ethical challenges researchers face on the work-floor.

practice/tools Paper

Valid replications require valid methods: Recommendations for best methodological practices with lab experiments.

This resource provides actionable methodological recommendations for conducting lab experiments to ensure they serve as a solid foundation for valid replications. It highlights specific practices in experimental design and implementation that are essential for producing reliable and reproducible findings.

evidence Paper

With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests

▸ Philosophy of science 20 / 20

Approaches to assess the reliability of scientific theories, reasoning, and methods attempting to understand its ability to make predictions about the natural and social world. Introduces how differing philosophies (positivist, post-positivist, constructivist, etc.) influence what scientists consider valid evidence and how open science challenges some traditional norms.

critique Paper

Open Science From a Qualitative, Feminist Perspective: Epistemological Dogmas and a Call for Critical Examination

This article evaluates the alignment between open science frameworks and the priorities of qualitative and feminist research within the field of psychology. It specifically questions whether existing open science dogmas inadvertently marginalize transgressive research methods and calls for a critical examination of how these frameworks impact radical inquiry.

practice/tools Paper

Towards Open Science for the Qualitative Researcher: From a Positivist to an Open Interpretation

This resource provides a practical reflection on data handling and pseudonymization in qualitative research, detailed through a case study of custom software development. It bridges technical implementation with epistemological inquiry to demonstrate how open research data guidelines can be successfully adapted to qualitative workflows.

Feest, U., & Devezer, B. (2025). Toward a more accurate notion of exploratory research (and why it matters). PhilSci Archive. https://philsci-archive.pitt.edu/24482/

advocacy Preprint

Subjectivity is a Feature, not a Flaw: A Call to Unsilence the Human Element in Science

This resource advocates for the recognition of researcher subjectivity as an inherent and valuable component of science rather than a contaminant to be purged. It challenges the traditional myth of the detached scientist and encourages the explicit use of reflexivity to enhance scientific integrity.

advocacy Paper

How Computational Modeling Can Force Theory Building in Psychological Science

This article promotes the adoption of computational modeling as a vital tool for advancing theory building within psychological science. It demonstrates how formalizing theories into models forces researchers to clarify vague intuitions and specify assumptions that often remain unexamined in purely verbal theories.

overview Paper

What Makes a Good Theory, and How Do We Make a Theory Good?

This resource proposes a formal ontology of criteria, known as a metatheoretical calculus, to evaluate the quality and robustness of scientific theories. It specifically outlines categories such as metaphysical commitment and discursive survival to help researchers move beyond vague assessments and toward rigorous theoretical adjudication.

critique Paper

Approaching psychology’s current crises by exploring the vagueness of psychological concepts: Recommendations for advancing the discipline.

This resource argues that the replication, theory, and universality crises in psychology are fundamentally linked to the vagueness of psychological concepts. It suggests that advancing the discipline requires a focus on theoretical and philosophical refinement rather than just methodological or statistical changes.

advocacy Paper

Moving beyond 20 questions: We (still) need stronger psychological theory.

This resource argues that psychology continues to struggle with fragmented findings and emphasizes the persistent need for robust, unifying theories to replace the "20 questions" style of empirical research. It advocates for a shift in focus from isolated experimental effects toward the development of comprehensive theoretical frameworks.

critique Paper

Open Science and Epistemic Diversity: Friends or Foes?

This work explores how the current implementation of open science may marginalize diverse research traditions by privileging specific inquiry styles over others. It identifies four reference points—such as local specificity and data provenance—to help open science frameworks better accommodate epistemic diversity.

Leonelli, S. (2023). Philosophy of open science. Cambridge University Press. http://philsci-archive.pitt.edu/id/eprint/21986

Mackenzie, N., & Knipe, S. (2006). Research dilemmas: Paradigms, methods and methodology. Issues in Educational Research, 16(2), 193–205. http://www.iier.org.au/iier16/mackenzie.html

critique Paper

Metascience Is Not Enough – A Plea for Psychological Humanities in the Wake of the Replication Crisis

This article critiques the reliance on metascience as the primary solution to the replication crisis, arguing that it overlooks deep-seated epistemic problems within psychology. It advocates for integrating perspectives from the psychological humanities to address the conceptual and historical complexities that quantitative metascientific approaches may fail to capture.

critique Paper

The quantitative paradigm and the nature of the human mind. The replication crisis as an epistemological crisis of quantitative psychology in view of the ontic nature of the psyche

This paper frames the replication crisis in psychology as a fundamental epistemological mismatch between the complex nature of the human psyche and the quantitative methods used to measure it. It moves beyond statistical explanations to argue that the crisis stems from underlying philosophical and ontological assumptions that remain largely unaddressed in the field.

critique Paper

Theory-Testing in Psychology and Physics: A Methodological Paradox

This seminal paper identifies a methodological paradox where increased experimental precision in psychology, unlike in physics, actually makes theory corroboration more difficult when relying on null hypothesis significance testing. It critiques the logical foundations of how psychological theories are tested, arguing that 'statistical significance' is often an inadequate substitute for genuine theoretical progress.

critique Paper

Is replication <i>possible</i> in qualitative research? A response to Makel et al. (2022)

Serving as a direct rebuttal to advocacy pieces, this response highlights three core areas where the logic of replication conflicts with the goals of qualitative research. It provides a critical perspective on how open research practices developed for quantitative work may not be appropriate for educational or qualitative methodologies.

advocacy Paper

Building better theories

This resource argues that the replication crisis is fundamentally a crisis of theory, advocating for a shift in focus toward more rigorous theory construction and specification. It highlights how strengthening the theoretical foundations of psychological research is essential for creating more robust, falsifiable, and reproducible scientific findings.

critique Paper

Rethinking Transparency and Rigor from a Qualitative Open Science Perspective

This paper critiques the quantitative-centric definition of transparency in open science, arguing that current frameworks do not align with the epistemic goals of qualitative research. It proposes a broader perspective that emphasizes researcher reflexivity and contextual data interpretation as essential components of rigor.

critique Paper

Psychological models and their distractors

This paper critiques the current use of formal models in psychology, arguing that they often serve as 'distractors' that mask a lack of theoretical depth rather than resolving it. It challenges researchers to ensure that their mathematical models are genuinely grounded in coherent psychological theory rather than being used as mere technical window dressing.

Yarkoni, T. (2020). The generalizability crisis. Behavioral and Brain Sciences, 45. https://doi.org/10.1017/S0140525X20001685

Lakatos, I. (1978). The Methodology of Scientific Research Programmes. https://doi.org/10.1017/CBO9780511621123

▸ Questionable measurement practices (QMPs), validity & reliability issues. 8 / 8

The quality of our measures impacts the validity of our results, and offers another avenue for us to address potential questionable practices. Examines how measurement choices shape the credibility of findings. Addresses Questionable Measurement Practices (QMPs) like ad-hoc scale trimming, unvalidated instruments, poor reliability reporting, ignored measurement invariance, and their impact on construct validity, reliability, and generalizability.

practice/tools Paper

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

This resource defines and categorizes Questionable Measurement Practices (QMPs), illustrating how hidden decisions in the measurement process can threaten the validity of scientific conclusions. It provides a practical framework for researchers to increase measurement transparency and offers guidance on how to avoid these common pitfalls during study design and reporting.

evidence Paper

Construct Validation in Social and Personality Research

The authors present empirical meta-research by auditing a representative sample of social and personality psychology papers to evaluate the state of construct validation. The study reveals a significant gap between the common use of latent variable measurement and the lack of rigorous, ongoing evidence provided by researchers to justify those measures.

critique Paper

Approaching psychology’s current crises by exploring the vagueness of psychological concepts: Recommendations for advancing the discipline.

evidence Paper

Hidden Invalidity Among 15 Commonly Used Measures in Social and Personality Psychology

This study presents empirical evidence of 'hidden invalidity' by showing that widely used psychological scales often fail structural validity tests despite having acceptable internal consistency. By analyzing a uniquely large dataset, it demonstrates that standard metrics like Cronbach's alpha often mask significant psychometric flaws in social and personality psychology measures.

Parsons, S. (2022). Exploring reliability heterogeneity with multiverse analyses: Data processing decisions unpredictably influence measurement reliability. Meta-Psychology, 6. https://doi.org/10.15626/MP.2020.2577

advocacy Paper

Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements

This paper advocates for the establishment of a standard reporting practice for measurement reliability within cognitive-behavioral research to improve the robustness of psychological science. It argues that transparently reporting reliability is a necessary prerequisite for properly evaluating statistical inferences and ensuring that research findings are not merely artifacts of measurement error.

critique Paper

Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias.

This resource critiques the reliance on unreliable behavioral measures in psychopathology research, using attentional bias as a primary example of how poor metrics hinder scientific progress. It highlights the specific threat that measurement error poses to major clinical initiatives, such as the RDoC, which depend on high-reliability measures for individual difference research and mediation analysis.

practice/tools Paper

Crowdsourcing multiverse analyses to explore the impact of different data-processing and analysis decisions: A tutorial.

This resource provides a practical tutorial on implementing multiverse analyses to test the robustness of research findings against various data-processing and analytical choices. It demonstrates how exploring multiple plausible analysis paths can reveal the sensitivity of results to arbitrary decisions, thereby improving the transparency and generalizability of empirical research.

▸ Research design, sampling methods, & its implications for inferences. 4 / 4

How design choices and sampling strategies shape bias, precision, and generalizability. Includes threats to validity (internal/external), power and sample-size planning, selection bias, clustering/design effects, weighting, and transparent reporting/preregistration. Design and sampling decisions determine the credibility and scope of statistical inference. This sub-cluster emphasizes adequate power and sample-size planning (e.g., safeguard power), transparent pre-analysis planning to constrain researcher degrees of freedom, and rigorous, valid methods as prerequisites for meaningful replication—reducing bias, increasing precision, and improving generalizability across lab and field work.

evidence Paper

A Powerful Nudge? Presenting Calculable Consequences of Underpowered Research Shifts Incentives Toward Adequately Powered Designs

This study uses a stylized thought experiment to empirically evaluate how researchers weigh statistical power versus individual productivity in hiring decisions. It demonstrates that explicitly presenting the scientific consequences of underpowered research can shift professional incentives toward favoring more robustly powered experimental designs.

practice/tools Paper

Valid replications require valid methods: Recommendations for best methodological practices with lab experiments.

practice/tools Paper

Safeguard Power as a Protection Against Imprecise Power Estimates

practice/tools Paper

Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking

This resource provides an extensive checklist of 34 specific researcher degrees of freedom that can lead to p-hacking across various stages of the research process. It serves as a practical tool for psychologists to preemptively identify and minimize opportunistic choices during study planning, data collection, analysis, and reporting.

▸ The logic of null hypothesis testing, p-values, Type I and II errors (and when and why they might happen). 12 / 12

Frequentist statistics are typically the default in quantitative research. They come with certain assumptions and implications, as well as often being misinterpreted. Frequentist statistics are typically the default in much quantitative research, but they are often misinterpreted. This sub-cluster clarifies the logic of NHST, the meaning of p-values, and when and why Type I and Type II errors arise, extending to Type S and Type M errors. It links these to design and power choices and outlines practical steps for better inference and reporting.

Banerjee, A., Chitnis, U., Jadhav, S., Bhawalkar, J., & Chaudhury, S. (2009). Hypothesis testing, type I and type II errors. Industrial Psychiatry Journal, 18(2), 127. https://doi.org/10.4103%2F0972-6748.62274

critique Paper

Understanding the Replication Crisis as a Base Rate Fallacy

This paper presents a theoretical critique of the standard narrative that the replication crisis is primarily caused by poor scientific conduct or questionable research practices. It uses the logic of the base rate fallacy to argue that high failure rates in replications are a predictable mathematical outcome in fields that investigate a large proportion of unlikely hypotheses.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

critique Paper

Type I Error Rates are Not Usually Inflated

This article challenges the conventional wisdom that questionable research practices like p-hacking necessarily inflate Type I error rates. It introduces nuanced distinctions between different types of statistical errors to argue that many criticized practices do not impact the error rates relevant to the researchers' specific hypotheses.

practice/tools Paper

Beyond Power Calculations

This paper proposes a move beyond traditional power analysis by introducing "design calculations" to estimate Type S (sign) and Type M (magnitude) errors. These metrics help researchers understand the risk of obtaining results that are either in the wrong direction or grossly exaggerated in magnitude, particularly in small-sample studies.

policies Paper

Interrogating the “cargo cult science” metaphor

critique Book Chapter

The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask

This publication critiques the institutionalized "null ritual," which it describes as an incoherent amalgamation of incompatible Fisherian and Neyman-Pearson statistical frameworks. It explains how this ritualized practice suppresses critical thinking and fosters the illusion that statistical significance is a substitute for scientific evidence and theoretical reasoning.

critique Paper

Mindless statistics

This article critiques the "null ritual" prevalent in the social sciences, where statistical procedures are applied mindlessly as a requirement for social group identification rather than scientific inquiry. It highlights how rigid adherence to significance levels leads to collective confusion among researchers and undermines the quality of statistical reasoning in scientific publications.

Lakens, D. Improving your statistical inferences. Online course. https://www.coursera.org/learn/statistical-inferences

evidence Paper

With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests

critique Paper

Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses

This resource highlights a specific logical inconsistency where researchers apply familywise error rate corrections to individual hypothesis tests rather than joint union hypotheses. It argues that this practice leads to inappropriate inferential conclusions and clarifies the intended purpose of alpha-level adjustments.

policies Paper

The ASA Statement on <i>p</i> -Values: Context, Process, and Purpose

This official statement from the American Statistical Association provides six principles to guide the use and interpretation of p-values in scientific research. It serves as a formal policy document intended to improve the transparency and reproducibility of statistical analysis across various disciplines.