Cluster 2: Conceptual and Statistical Knowledge — FORRT Open Science Taxonomy | FORRT - Framework for Open and Reproducible Research Training

2 Conceptual and Statistical Knowledge

7 sub-clusters · 82 references

Attainment of a grounding in fundamental statistics, measurement, and its implications encompassing conceptual knowledge, application, interpretation and communication of statistical analyses. There are 8 sub-clusters which aim to further parse the learning and teaching process:

Effect sizes, statistical power, simulations, & confidence intervals.

Statistics are more than p-values and we need to use other benchmarks to determine the statistical and practical relevance of an effect. Emphasizes effect size, confidence intervals, power, and simulations to design adequately powered studies and communicate practical significance.

Abt, G., Boreham, C., Davison, G., Jackson, R., Jobson, S., Wallace, E., & Williams, M. (2025). Sample size estimation revisited. Journal of Sports Sciences, 43(21), 2511–2516. https://doi.org/10.1080/02640414.2025.2499403
Abt, G., Boreham, C., Davison, G., Jackson, R., Nevill, A., Wallace, E., & Williams, M. (2020). Power, precision, and sample size estimation in sport and exercise science research. Journal of Sports Sciences, 38(17), 1933–1935. https://doi.org/10.1080/02640414.2020.1776002
Arel-Bundock, V., Briggs, R. C., Doucouliagos, H., Aviña, M. M., & Stanley, T. D. (2026). Quantitative Political Science Research Is Greatly Underpowered. The Journal of Politics, 88(1), 36–46. https://doi.org/10.1086/734279
Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature Human Behaviour, 5(8), 980–989. https://doi.org/10.1038/s41562-021-01143-3
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10
Buchanan, E. M., Elsherif, M. M., Geller, J., Aberson, C., Gurkan, N., Ambrosini, E., Heyman, T., Montefinese, M., vanpaemel, wolf, Barzykowski, K., Batres, C., Fellnhofer, K., Huang, G., McFall, J. P., Ribeiro, G., Röer, J. P., Ulloa Fulgeri, J. L., Roettger, T. B., Valentine, K. D., … Lewis, S. C. (2023). Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning with Multiple Stimuli. https://doi.org/10.31219/osf.io/e3afx
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
Caldwell, A. R., Lakens, D., Parlett‑Pelleriti, C. M., Prochilo, G., & Aust, F. (2022). Power analysis with Superpower. https://aaroncaldwell.us/SuperpowerBook/
DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1). https://doi.org/10.1177/2515245920965119
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
Hallgren, K. A. (2013). Conducting Simulation Studies in the R Programming Environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60. https://doi.org/10.20982/tqmp.09.2.p043
Holzmeister, F., Johannesson, M., Böhm, R., Dreber, A., Huber, J., & Kirchler, M. (2024). Heterogeneity in effect size estimates. Proceedings of the National Academy of Sciences, 121(32). https://doi.org/10.1073/pnas.2403490121
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863
Lakens, D. (2022). Sample Size Justification. Collabra: Psychology, 8(1). https://doi.org/10.1525/collabra.33267
Lengersdorff, L. L., & Lamm, C. (2025). With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests. Advances in Methods and Practices in Psychological Science, 8(1). https://doi.org/10.1177/25152459241296397
Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23(2), 208–225. https://doi.org/10.1037/met0000126
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard Power as a Protection Against Imprecise Power Estimates. Perspectives on Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
Wegener, D. T., Fabrigar, L. R., Pek, J., & Hoisington-Shaw, K. (2021). Evaluating Research in Personality and Social Psychology: Considerations of Statistical Power and Concerns About False Findings. Personality and Social Psychology Bulletin, 48(7), 1105–1117. https://doi.org/10.1177/01461672211030811
Wilson, B. M., & Wixted, J. T. (2023). On the importance of modeling the invisible world of underlying effect sizes. Social Psychological Bulletin, 18. https://doi.org/10.32872/spb.9981

Exploratory and confirmatory analyses

Confirmatory analyses test a priori hypotheses against a pre-specified analysis plan (ideally preregistered/Registered Report); any deviations are documented. Exploratory analyses probe patterns, generate hypotheses, and build models after seeing the data.

Chambers, C. (2017). The Seven Deadly Sins of Psychology. https://doi.org/10.1515/9781400884940
Feest, U., & Devezer, B. (2025). Toward a more accurate notion of exploratory research (and why it matters). PhilSci Archive. https://philsci-archive.pitt.edu/24482/
Jacobucci, R. (2022). A critique of using the labels confirmatory and exploratory in modern psychological research. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.1020770
Lin, W., & Green, D. P. (2016). Standard Operating Procedures: A Safety Net for Pre-Analysis Plans. Political Science and Politics, 49(3), 495–500. https://doi.org/10.1017/S1049096516000810
Rubin, M., & Donkin, C. (2022). Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests. Philosophical Psychology, 37(8), 2019–2047. https://doi.org/10.1080/09515089.2022.2113771
Szollosi, A., & Donkin, C. (2021). Arrested Theory Development: The Misguided Distinction Between Exploratory and Confirmatory Research. Perspectives on Psychological Science, 16(4), 717–724. https://doi.org/10.1177/1745691620966796
Wagenmakers, E.-J., Dutilh, G., & Sarafoglou, A. (2018). The Creativity-Verification Cycle in Psychological Science: New Methods to Combat Old Idols. Perspectives on Psychological Science, 13(4), 418–427. https://doi.org/10.1177/1745691618771357
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An Agenda for Purely Confirmatory Research. Perspectives on Psychological Science, 7(6), 632–638. https://doi.org/10.1177/1745691612463078

Limitations and benefits of NHST, Bayesian & Likelihood approaches.

Next to frequentist statistics, there are other quantitative approaches, each with different assumptions and goals. This subcluster summarizes benefits and limitations of each one.

Cumming, G. (2013). The New Statistics. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
Etz, A., Gronau, Q. F., Dablander, F., Edelsbrunner, P. A., & Baribault, B. (2017). How to become a Bayesian in eight easy steps: An annotated reading list. Psychonomic Bulletin & Review, 25(1), 219–234. https://doi.org/10.3758/s13423-017-1317-5
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.2307/2291091
Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience, 23(7), 788–799. https://doi.org/10.1038/s41593-020-0660-4
Nuzzo, R. (2014). Scientific method: Statistical errors. Nature, 506(7487), 150–152. https://doi.org/10.1038/506150a
Wagenmakers, E.-J., Dutilh, G., & Sarafoglou, A. (2018). The Creativity-Verification Cycle in Psychological Science: New Methods to Combat Old Idols. Perspectives on Psychological Science, 13(4), 418–427. https://doi.org/10.1177/1745691618771357
Abadie, A. (2020). Statistical Nonsignificance in Empirical Economics. American Economic Review: Insights, 2(2), 193–208. https://doi.org/10.1257/aeri.20190252
Gelman, A., & Higgs, M. (2025). Interrogating the “cargo cult science” metaphor. Theory and Society, 54(2), 197–207. https://doi.org/10.1007/s11186-025-09614-6
Harmon-Jones, E., Harmon-Jones, C., Amodio, D. M., Gable, P. A., & Schmeichel, B. J. (2025). Valid replications require valid methods: Recommendations for best methodological practices with lab experiments. Motivation Science, 11(3), 235–245. https://doi.org/10.1037/mot0000398
Lengersdorff, L. L., & Lamm, C. (2025). With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests. Advances in Methods and Practices in Psychological Science, 8(1). https://doi.org/10.1177/25152459241296397

Philosophy of science

Approaches to assess the reliability of scientific theories, reasoning, and methods attempting to understand its ability to make predictions about the natural and social world. Introduces how differing philosophies (positivist, post-positivist, constructivist, etc.) influence what scientists consider valid evidence and how open science challenges some traditional norms.

Bennett, E. A. (2021). Open Science From a Qualitative, Feminist Perspective: Epistemological Dogmas and a Call for Critical Examination. Psychology of Women Quarterly, 45(4), 448–456. https://doi.org/10.1177/03616843211036460
Class, B., de Bruyne, M., Wuillemin, C., Donzé, D., & Claivaz, J.-B. (2021). Towards Open Science for the Qualitative Researcher: From a Positivist to an Open Interpretation. International Journal of Qualitative Methods, 20. https://doi.org/10.1177/16094069211034641
Feest, U., & Devezer, B. (2025). Toward a more accurate notion of exploratory research (and why it matters). PhilSci Archive. https://philsci-archive.pitt.edu/24482/
Field, S. M., & Pownall, M. (2025). Subjectivity is a Feature, not a Flaw: A Call to Unsilence the Human Element in Science. https://doi.org/10.31219/osf.io/ga5fb_v1
Guest, O., & Martin, A. E. (2021). How Computational Modeling Can Force Theory Building in Psychological Science. Perspectives on Psychological Science, 16(4), 789–802. https://doi.org/10.1177/1745691620970585
Guest, O. (2024). What Makes a Good Theory, and How Do We Make a Theory Good? Computational Brain & Behavior, 7(4), 508–522. https://doi.org/10.1007/s42113-023-00193-2
Hutmacher, F., & Franz, D. J. (2025). Approaching psychology’s current crises by exploring the vagueness of psychological concepts: Recommendations for advancing the discipline. American Psychologist, 80(2), 220–231. https://doi.org/10.1037/amp0001300
Jamieson, R. K., & Pexman, P. M. (2020). Moving beyond 20 questions: We (still) need stronger psychological theory. Canadian Psychology / Psychologie Canadienne, 61(4), 273–280. https://doi.org/10.1037/cap0000223
Leonelli, S. (2022). Open Science and Epistemic Diversity: Friends or Foes? Philosophy of Science, 89(5), 991–1001. https://doi.org/10.1017/psa.2022.45
Leonelli, S. (2023). Philosophy of open science. Cambridge University Press. http://philsci-archive.pitt.edu/id/eprint/21986
Mackenzie, N., & Knipe, S. (2006). Research dilemmas: Paradigms, methods and methodology. Issues in Educational Research, 16(2), 193–205. http://www.iier.org.au/iier16/mackenzie.html
Malich, L., & Rehmann-Sutter, C. (2022). Metascience Is Not Enough – A Plea for Psychological Humanities in the Wake of the Replication Crisis. Review of General Psychology, 26(2), 261–273. https://doi.org/10.1177/10892680221083876
Mayrhofer, R., Büchner, I. C., & Hevesi, J. (2024). The quantitative paradigm and the nature of the human mind. The replication crisis as an epistemological crisis of quantitative psychology in view of the ontic nature of the psyche. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1390233
Meehl, P. E. (1967). Theory-Testing in Psychology and Physics: A Methodological Paradox. Philosophy of Science, 34(2), 103–115. https://doi.org/10.1086/288135
Pownall, M. (2024). Is replication possible in qualitative research? A response to Makel et al. (2022). Educational Research and Evaluation, 29(1–2), 104–110. https://doi.org/10.1080/13803611.2024.2314526
Press, C., Yon, D., & Heyes, C. (2022). Building better theories. Current Biology, 32(1), R13–R17. https://doi.org/10.1016/j.cub.2021.11.027
Steltenpohl, C. N., Lustick, H., Meyer, M. S., Lee, L. E., Stegenga, S. M., Standiford Reyes, L., & Renbarger, R. L. (2023). Rethinking Transparency and Rigor from a Qualitative Open Science Perspective. Journal of Trial and Error, 4(1), 47–59. KB. https://doi.org/10.36850/mr7
van Rooij, I. (2022). Psychological models and their distractors. Nature Reviews Psychology, 1(3), 127–128. https://doi.org/10.1038/s44159-022-00031-5
Yarkoni, T. (2020). The generalizability crisis. Behavioral and Brain Sciences, 45. https://doi.org/10.1017/S0140525X20001685
Lakatos, I. (1978). The Methodology of Scientific Research Programmes. https://doi.org/10.1017/CBO9780511621123

Questionable measurement practices (QMPs), validity & reliability issues.

The quality of our measures impacts the validity of our results, and offers another avenue for us to address potential questionable practices. Examines how measurement choices shape the credibility of findings. Addresses Questionable Measurement Practices (QMPs) like ad-hoc scale trimming, unvalidated instruments, poor reliability reporting, ignored measurement invariance, and their impact on construct validity, reliability, and generalizability.

Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct Validation in Social and Personality Research. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
Hutmacher, F., & Franz, D. J. (2025). Approaching psychology’s current crises by exploring the vagueness of psychological concepts: Recommendations for advancing the discipline. American Psychologist, 80(2), 220–231. https://doi.org/10.1037/amp0001300
Hussey, I., & Hughes, S. (2020). Hidden Invalidity Among 15 Commonly Used Measures in Social and Personality Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 166–184. https://doi.org/10.1177/2515245919882903
Parsons, S. (2022). Exploring reliability heterogeneity with multiverse analyses: Data processing decisions unpredictably influence measurement reliability. Meta-Psychology, 6. https://doi.org/10.15626/MP.2020.2577
Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements. Advances in Methods and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695
Rodebaugh, T. L., Scullin, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., Zvielli, A., & Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125(6), 840–851. https://doi.org/10.1037/abn0000184
Heyman, T., Pronizius, E., Lewis, S. C., Acar, O. A., Adamkovič, M., Ambrosini, E., Antfolk, J., Barzykowski, K., Baskin, E., Batres, C., Boucher, L., Boudesseul, J., Brandstätter, E., Collins, W. M., Filipović Ðurđević, D., Egan, C., Era, V., Ferreira, P., Fini, C., … Buchanan, E. M. (2025). Crowdsourcing multiverse analyses to explore the impact of different data-processing and analysis decisions: A tutorial. Psychological Methods. https://doi.org/10.1037/met0000770

Research design, sampling methods, & its implications for inferences.

How design choices and sampling strategies shape bias, precision, and generalizability. Includes threats to validity (internal/external), power and sample-size planning, selection bias, clustering/design effects, weighting, and transparent reporting/preregistration. Design and sampling decisions determine the credibility and scope of statistical inference. This sub-cluster emphasizes adequate power and sample-size planning (e.g., safeguard power), transparent pre-analysis planning to constrain researcher degrees of freedom, and rigorous, valid methods as prerequisites for meaningful replication—reducing bias, increasing precision, and improving generalizability across lab and field work.

Gervais, W. M., Jewell, J. A., Najle, M. B., & Ng, B. K. L. (2015). A Powerful Nudge? Presenting Calculable Consequences of Underpowered Research Shifts Incentives Toward Adequately Powered Designs. Social Psychological and Personality Science, 6(7), 847–854. https://doi.org/10.1177/1948550615584199
Harmon-Jones, E., Harmon-Jones, C., Amodio, D. M., Gable, P. A., & Schmeichel, B. J. (2025). Valid replications require valid methods: Recommendations for best methodological practices with lab experiments. Motivation Science, 11(3), 235–245. https://doi.org/10.1037/mot0000398
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard Power as a Protection Against Imprecise Power Estimates. Perspectives on Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832

The logic of null hypothesis testing, p-values, Type I and II errors (and when and why they might happen).

Frequentist statistics are typically the default in quantitative research. They come with certain assumptions and implications, as well as often being misinterpreted. Frequentist statistics are typically the default in much quantitative research, but they are often misinterpreted. This sub-cluster clarifies the logic of NHST, the meaning of p-values, and when and why Type I and Type II errors arise, extending to Type S and Type M errors. It links these to design and power choices and outlines practical steps for better inference and reporting.

Banerjee, A., Chitnis, U., Jadhav, S., Bhawalkar, J., & Chaudhury, S. (2009). Hypothesis testing, type I and type II errors. Industrial Psychiatry Journal, 18(2), 127. https://doi.org/10.4103%2F0972-6748.62274
Bird, A. (2021). Understanding the Replication Crisis as a Base Rate Fallacy. The British Journal for the Philosophy of Science, 72(4), 965–993. https://doi.org/10.1093/bjps/axy051
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Rubin, M. (2024). Type I Error Rates are Not Usually Inflated. Journal of Trial and Error, 4(2). KB. https://doi.org/10.36850/4d35-44bd
Gelman, A., & Carlin, J. (2014). Beyond Power Calculations. Perspectives on Psychological Science, 9(6), 641–651. https://doi.org/10.1177/1745691614551642
Gelman, A., & Higgs, M. (2025). Interrogating the “cargo cult science” metaphor. Theory and Society, 54(2), 197–207. https://doi.org/10.1007/s11186-025-09614-6
Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask. The SAGE Handbook of Quantitative Methodology for the Social Sciences, 392–409. https://doi.org/10.4135/9781412986311.n21
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. https://doi.org/10.1016/j.socec.2004.09.033
Lakens, D. Improving your statistical inferences. Online course. https://www.coursera.org/learn/statistical-inferences
Lengersdorff, L. L., & Lamm, C. (2025). With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests. Advances in Methods and Practices in Psychological Science, 8(1). https://doi.org/10.1177/25152459241296397
Rubin, M. (2024). Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses. Methods in Psychology, 10, 100140. https://doi.org/10.1016/j.metip.2024.100140
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p -Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108