2 Conceptual and Statistical Knowledge

7 sub-clusters · 82 references

Attainment of a grounding in fundamental statistics, measurement, and its implications encompassing conceptual knowledge, application, interpretation and communication of statistical analyses. There are 8 sub-clusters which aim to further parse the learning and teaching process:

Effect sizes, statistical power, simulations, & confidence intervals.

Statistics are more than p-values and we need to use other benchmarks to determine the statistical and practical relevance of an effect. Emphasizes effect size, confidence intervals, power, and simulations to design adequately powered studies and communicate practical significance.

  • Abt, G., Boreham, C., Davison, G., Jackson, R., Jobson, S., Wallace, E., & Williams, M. (2025). Sample size estimation revisited. Journal of Sports Sciences, 43(21), 2511–2516. https://doi.org/10.1080/02640414.2025.2499403
  • Abt, G., Boreham, C., Davison, G., Jackson, R., Nevill, A., Wallace, E., & Williams, M. (2020). Power, precision, and sample size estimation in sport and exercise science research. Journal of Sports Sciences, 38(17), 1933–1935. https://doi.org/10.1080/02640414.2020.1776002
  • Arel-Bundock, V., Briggs, R. C., Doucouliagos, H., Aviña, M. M., & Stanley, T. D. (2026). Quantitative Political Science Research Is Greatly Underpowered. The Journal of Politics, 88(1), 36–46. https://doi.org/10.1086/734279
  • Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature Human Behaviour, 5(8), 980–989. https://doi.org/10.1038/s41562-021-01143-3
  • Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10
  • Buchanan, E. M., Elsherif, M. M., Geller, J., Aberson, C., Gurkan, N., Ambrosini, E., Heyman, T., Montefinese, M., vanpaemel, wolf, Barzykowski, K., Batres, C., Fellnhofer, K., Huang, G., McFall, J. P., Ribeiro, G., Röer, J. P., Ulloa Fulgeri, J. L., Roettger, T. B., Valentine, K. D., … Lewis, S. C. (2023). Accuracy in Parameter Estimation and Simulation Approaches for Sample Size Planning with Multiple Stimuli. https://doi.org/10.31219/osf.io/e3afx
  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
  • Caldwell, A. R., Lakens, D., Parlett‑Pelleriti, C. M., Prochilo, G., & Aust, F. (2022). Power analysis with Superpower. https://aaroncaldwell.us/SuperpowerBook/
  • DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1). https://doi.org/10.1177/2515245920965119
  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
  • Hallgren, K. A. (2013). Conducting Simulation Studies in the R Programming Environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60. https://doi.org/10.20982/tqmp.09.2.p043
  • Holzmeister, F., Johannesson, M., Böhm, R., Dreber, A., Huber, J., & Kirchler, M. (2024). Heterogeneity in effect size estimates. Proceedings of the National Academy of Sciences, 121(32). https://doi.org/10.1073/pnas.2403490121
  • Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863
  • Lakens, D. (2022). Sample Size Justification. Collabra: Psychology, 8(1). https://doi.org/10.1525/collabra.33267
  • Lengersdorff, L. L., & Lamm, C. (2025). With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests. Advances in Methods and Practices in Psychological Science, 8(1). https://doi.org/10.1177/25152459241296397
  • Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23(2), 208–225. https://doi.org/10.1037/met0000126
  • Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard Power as a Protection Against Imprecise Power Estimates. Perspectives on Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
  • Wegener, D. T., Fabrigar, L. R., Pek, J., & Hoisington-Shaw, K. (2021). Evaluating Research in Personality and Social Psychology: Considerations of Statistical Power and Concerns About False Findings. Personality and Social Psychology Bulletin, 48(7), 1105–1117. https://doi.org/10.1177/01461672211030811
  • Wilson, B. M., & Wixted, J. T. (2023). On the importance of modeling the invisible world of underlying effect sizes. Social Psychological Bulletin, 18. https://doi.org/10.32872/spb.9981

Exploratory and confirmatory analyses

Confirmatory analyses test a priori hypotheses against a pre-specified analysis plan (ideally preregistered/Registered Report); any deviations are documented. Exploratory analyses probe patterns, generate hypotheses, and build models after seeing the data.

Limitations and benefits of NHST, Bayesian & Likelihood approaches.

Next to frequentist statistics, there are other quantitative approaches, each with different assumptions and goals. This subcluster summarizes benefits and limitations of each one.

  • Cumming, G. (2013). The New Statistics. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
  • Etz, A., Gronau, Q. F., Dablander, F., Edelsbrunner, P. A., & Baribault, B. (2017). How to become a Bayesian in eight easy steps: An annotated reading list. Psychonomic Bulletin & Review, 25(1), 219–234. https://doi.org/10.3758/s13423-017-1317-5
  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
  • Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.2307/2291091
  • Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience, 23(7), 788–799. https://doi.org/10.1038/s41593-020-0660-4
  • Nuzzo, R. (2014). Scientific method: Statistical errors. Nature, 506(7487), 150–152. https://doi.org/10.1038/506150a
  • Wagenmakers, E.-J., Dutilh, G., & Sarafoglou, A. (2018). The Creativity-Verification Cycle in Psychological Science: New Methods to Combat Old Idols. Perspectives on Psychological Science, 13(4), 418–427. https://doi.org/10.1177/1745691618771357
  • Abadie, A. (2020). Statistical Nonsignificance in Empirical Economics. American Economic Review: Insights, 2(2), 193–208. https://doi.org/10.1257/aeri.20190252
  • Gelman, A., & Higgs, M. (2025). Interrogating the “cargo cult science” metaphor. Theory and Society, 54(2), 197–207. https://doi.org/10.1007/s11186-025-09614-6
  • Harmon-Jones, E., Harmon-Jones, C., Amodio, D. M., Gable, P. A., & Schmeichel, B. J. (2025). Valid replications require valid methods: Recommendations for best methodological practices with lab experiments. Motivation Science, 11(3), 235–245. https://doi.org/10.1037/mot0000398
  • Lengersdorff, L. L., & Lamm, C. (2025). With Low Power Comes Low Credibility? Toward a Principled Critique of Results From Underpowered Tests. Advances in Methods and Practices in Psychological Science, 8(1). https://doi.org/10.1177/25152459241296397

Philosophy of science

Approaches to assess the reliability of scientific theories, reasoning, and methods attempting to understand its ability to make predictions about the natural and social world. Introduces how differing philosophies (positivist, post-positivist, constructivist, etc.) influence what scientists consider valid evidence and how open science challenges some traditional norms.

Questionable measurement practices (QMPs), validity & reliability issues.

The quality of our measures impacts the validity of our results, and offers another avenue for us to address potential questionable practices. Examines how measurement choices shape the credibility of findings. Addresses Questionable Measurement Practices (QMPs) like ad-hoc scale trimming, unvalidated instruments, poor reliability reporting, ignored measurement invariance, and their impact on construct validity, reliability, and generalizability.

  • Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393
  • Flake, J. K., Pek, J., & Hehman, E. (2017). Construct Validation in Social and Personality Research. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
  • Hutmacher, F., & Franz, D. J. (2025). Approaching psychology’s current crises by exploring the vagueness of psychological concepts: Recommendations for advancing the discipline. American Psychologist, 80(2), 220–231. https://doi.org/10.1037/amp0001300
  • Hussey, I., & Hughes, S. (2020). Hidden Invalidity Among 15 Commonly Used Measures in Social and Personality Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 166–184. https://doi.org/10.1177/2515245919882903
  • Parsons, S. (2022). Exploring reliability heterogeneity with multiverse analyses: Data processing decisions unpredictably influence measurement reliability. Meta-Psychology, 6. https://doi.org/10.15626/MP.2020.2577
  • Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements. Advances in Methods and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695
  • Rodebaugh, T. L., Scullin, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., Zvielli, A., & Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125(6), 840–851. https://doi.org/10.1037/abn0000184
  • Heyman, T., Pronizius, E., Lewis, S. C., Acar, O. A., Adamkovič, M., Ambrosini, E., Antfolk, J., Barzykowski, K., Baskin, E., Batres, C., Boucher, L., Boudesseul, J., Brandstätter, E., Collins, W. M., Filipović Ðurđević, D., Egan, C., Era, V., Ferreira, P., Fini, C., … Buchanan, E. M. (2025). Crowdsourcing multiverse analyses to explore the impact of different data-processing and analysis decisions: A tutorial. Psychological Methods. https://doi.org/10.1037/met0000770

Research design, sampling methods, & its implications for inferences.

How design choices and sampling strategies shape bias, precision, and generalizability. Includes threats to validity (internal/external), power and sample-size planning, selection bias, clustering/design effects, weighting, and transparent reporting/preregistration. Design and sampling decisions determine the credibility and scope of statistical inference. This sub-cluster emphasizes adequate power and sample-size planning (e.g., safeguard power), transparent pre-analysis planning to constrain researcher degrees of freedom, and rigorous, valid methods as prerequisites for meaningful replication—reducing bias, increasing precision, and improving generalizability across lab and field work.

  • Gervais, W. M., Jewell, J. A., Najle, M. B., & Ng, B. K. L. (2015). A Powerful Nudge? Presenting Calculable Consequences of Underpowered Research Shifts Incentives Toward Adequately Powered Designs. Social Psychological and Personality Science, 6(7), 847–854. https://doi.org/10.1177/1948550615584199
  • Harmon-Jones, E., Harmon-Jones, C., Amodio, D. M., Gable, P. A., & Schmeichel, B. J. (2025). Valid replications require valid methods: Recommendations for best methodological practices with lab experiments. Motivation Science, 11(3), 235–245. https://doi.org/10.1037/mot0000398
  • Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard Power as a Protection Against Imprecise Power Estimates. Perspectives on Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
  • Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832

The logic of null hypothesis testing, p-values, Type I and II errors (and when and why they might happen).

Frequentist statistics are typically the default in quantitative research. They come with certain assumptions and implications, as well as often being misinterpreted. Frequentist statistics are typically the default in much quantitative research, but they are often misinterpreted. This sub-cluster clarifies the logic of NHST, the meaning of p-values, and when and why Type I and Type II errors arise, extending to Type S and Type M errors. It links these to design and power choices and outlines practical steps for better inference and reporting.

JUST-OS