4 Planning and Conducting Reproductions and Replications

5 Planning and Conducting Reproductions and Replications

Planning depends on whether the focus is on a certain method or a theory, that is whether the replication will be close or conceptual. Table 2 provides an overview of reproduction and replication types, or more generally “repetitive research” (Schöch 2023), drawn from different resources (e.g., (Dreber and Johannesson 2024); (Hüffmeier, Mazei, and Schultze 2016); for an alternative taxonomy see also (Cortina, Köhler, and Aulisi 2023)). The decision between these types is the first step in planning.

In addition, the formation of the replication team is important, as replications can take substantial resources. Notably, repetitive research has successfully been conducted collaboratively with graduate and undergraduate students (e.g., (Boyce et al. 2024; Hawkins et al. 2018; Jekel et al. 2020; Moreau and Wiebels 2023)) and we recommend the use of replication studies to engage students of different levels in conducting and publishing research.

Table 2

Types of repetitive research ordered by reproduction and replication and respective closeness to the original study.

Type	Description	Goals
Computational Reproduction	Reanalysis of the same data with the same code	Correctness of original report
Recoding reproduction	Reanalysis of the same data, with new (equivalent) code	Correctness of original report
Robustness Reproduction	Reanalysis of the same data with new coding choices; can vary in closeness	Robustness of original finding and sensitivity to different analytical decisions or software
Multiverse analysis	Analyze data in all sensible ways (i.e., a large number of different robustness reproductions)	Robustness and generalizability of original finding, identification of potential moderators or sources for effect variability
Internal replication	Replicate one of your own studies as closely as possible	Demonstrate one’s findings’ generalizability across studies and rule out fear of false-positives (e.g., for new discoveries)
Close / direct / exact replication	Conduct a new study (based on work by other researchers) that is as close as possible to the original study	Rule out fear of the original finding being a false-positive, validate original materials or design, check generalizability/external validity for theoretically irrelevant variables (e.g., population, year of data collection)
Close replication with extension	Add a variable or procedure to a close replication	Rule out fear of the original finding being a false-positive, test generalizability of original finding
Conceptual / constructive replication	Conduct a study with changes that may be theoretically relevant but that tests the same hypothesis (e.g., different operationalization)	Generalizability of original finding, validity of theory

5.1 Post Publication Conversations

When planning the replication study, additional knowledge should be taken into account such as any discussions of the original finding. There can be other studies citing the original studies, criticizing them, disconfirming their underlying theory, identifying errors, reinterpreting the finding, or making suggestions for replications. All of these might highlight considerations that need to be taken into account when designing a replication study that robustly tests the original claim or its generalisability.

Thus, replication researchers should look for post-publication discussions on the target study such as published comments and reviews, blog posts, or discussions on social media. These can often be found via Altmetric (https://www.altmetric.com) or other tools that allow researchers to quickly identify discussions on social media or news outlets beyond scientific journals (PubPeer, Hypothes.is), or the in-development platform Alphaxiv.org; for a review see (Henriques et al. 2023).

5.2 Reproduction before Replication

Many features of a replication study rest on the correctness of the original report. A reproduction allows researchers to investigate this by being able to uncover coding errors, fraud, robustness to analytical decisions, and generalizability. To make efficient use of resources, we encourage researchers to investigate the original finding’s reproducibility and robustness first. In other words, ideally, reproductions should take place before planning and conducting a replication study. Depending on the availability of the code and data, these can take several minutes to weeks.

If the original code and dataset are available, researchers can try to numerically reproduce the results. Beware, however, that differences in software versions or default settings may lead to slight deviations or require corrections in some cases (for a large-scale test of reproducibility see (Brodeur et al. 2024)). Similarly, the lack of a set seed for random number generators can mean that analyses relying on random numbers (e.g., bootstrapping) cannot be exactly reproduced. If no analysis script is available, analyses need to be recreated from the descriptions in the report (recoding reproduction). In this case, special attention should be paid to processing steps such as exclusion of outliers, transformation of variables, and handling of missing data. However, in many research areas information on these steps is often incomplete (Field et al. 2019); older research tends to be especially limited in terms of the methodological details they provide. In addition, we recommend testing the robustness of the original finding by making small alterations to the data processing and analyses procedure (robustness reproductions). For example, if the analyses were run for a subset of the data (e.g., participants aged 21 to 30 or without outliers ± 3 standard deviations), this subset can be changed (e.g., participants aged 18 to 30 or without outliers ± 2 standard deviations). Here, the initial focus should be on choices that are not determined by the theory that is presented, though this can also be used to explore the generalisability of some aspects of theory. Finally, if the original study was preregistered and the original code is available, reproduction researchers can check whether the original analyses adhere to the preregistered analysis plan.

If neither code nor data are available (or shared by the authors), no reproduction is possible. Researchers can still use automated tools to compare reported p-values with those that can be computed from test statistics via the website statcheck.io (where documents may be uploaded), the corresponding R package (Nuijten and Polanin 2020), or papercheck (DeBruineLakens2025?), which is still actively maintained.

Figure 3
Decision tree to choose between types of reproductions depending on available code and data.

5.3 Close replication before conceptual replication

If the goal is to increase the generalizability of a specific finding, we also suggest starting with replications that adhere as close as possible to the original study (e.g., close replications) and only later conduct conceptual replications. Based on Hüffmeier, Mazei, and Schultze (Hüffmeier, Mazei, and Schultze 2016), we propose the typology and order of replication attempts in Figure 3. Importantly, replications at any stage should not compromise any aspects of an original study, but rather (at the latest from the third study stage [constructive replications] onwards) try to improve one or more aspects of the original study, such as “[…] more valid measures, more critical control variables, a more realistic task, a more representative sample, or a design that allows for stronger conclusions regarding causality”, see Köhler & Cortina (Köhler and Cortina 2021, 494). Köhler and Cortina term such replications “constructive replications” and caution against the conduct of “quasi-random” replications that vary features without clear rationale.

Finally, there may be cases where the sequence of replications is not necessary, or where the context of the replication team requires a focus on generalisability to a specific context (see section The Role of Differences for the Interpretation of Findings).

Figure 4

Note: This is an adaptation and update of the typology of replication studies by Hüffmeier, Mazei, and Schultze (Hüffmeier, Mazei, and Schultze 2016). The typology is conceptualized as a hierarchy of studies that together help to (i) establish the validity and replicability of new effects, (ii) exclude alternative explanations, (iii) test relevant boundary conditions, and (iv) test generalizability.

Boyce, V., B. Prystawski, A. B. Abutto, E. M. Chen, Z. Chen, H. Chiu, and M. C. and Frank. 2024. “Estimating the Replicability of Psychology Experiments After an Initial Failure to Replicate,” May. https://doi.org/10.31234/osf.io/an3yb.

Brodeur, A., A. Dreber, F. Hoces de la Guardia, and E. Miguel. 2024. “Reproduction and Replication at Scale.” Nature Human Behaviour 8 (1): 2–3. https://doi.org/10.1038/s41562-023-01807-2.

Cortina, J. M., T. Köhler, and L. C. Aulisi. 2023. “Current Reproducibility Practices in Management: What They Are Versus What They Could Be.” Journal of Management Scientific Reports 1 (3-4): 171–205. https://doi.org/10.1177/27550311231202696.

Dreber, A., and M. Johannesson. 2024. “A Framework for Evaluating Reproducibility and Replicability in Economics.” Economic Inquiry. https://doi.org/10.1111/ecin.13244.

Field, S. M., R. Hoekstra, L. Bringmann, and D. van Ravenzwaaij. 2019. “When and Why to Replicate: As Easy as 1, 2, 3?” Collabra: Psychology 5 (1): 46. https://doi.org/10.1525/collabra.218.

Hawkins, R. X., E. N. Smith, C. Au, J. M. Arias, R. Catapano, E. Hermann, and M. C. and Frank. 2018. “Improving the Replicability of Psychological Science Through Pedagogy.” Advances in Methods and Practices in Psychological Science 1 (1): 7–18. https://doi.org/10.1177/2515245917740427.

Henriques, S. O., N. Rzayeva, S. Pinfield, and L. Waltman. 2023. “Preprint Review Services: Disrupting the Scholarly Communication Landscape?” https://doi.org/10.31235/osf.io/8c6xm.

Hüffmeier, J., J. Mazei, and T. Schultze. 2016. “Reconceptualizing Replication as a Sequence of Different Studies: A Replication Typology.” Journal of Experimental Social Psychology 66: 81–92. https://doi.org/10.1016/j.jesp.2015.09.009.

Jekel, M., S. Fiedler, R. Allstadt Torras, D. Mischkowski, A. R. Dorrough, and A. Glöckner. 2020. “How to Teach Open Science Principles in the Undergraduate Curriculum—the Hagen Cumulative Science Project.” Psychology Learning & Teaching 19 (1): 91–106. https://doi.org/10.1177/1475725719868149.

Köhler, T., and J. M. Cortina. 2021. “Play It Again, Sam! An Analysis of Constructive Replication in the Organizational Sciences.” Journal of Management 47 (2): 488–518. https://doi.org/10.1177/0149206319843985.

Moreau, D., and K. Wiebels. 2023. “Ten Simple Rules for Designing and Conducting Undergraduate Replication Projects.” PLOS Computational Biology 19 (3): e1010957. https://doi.org/10.1371/journal.pcbi.1010957.

Nuijten, M. B., and J. R. Polanin. 2020. “‘Statcheck’: Automatically Detect Statistical Reporting Inconsistencies to Increase Reproducibility of Meta‐analyses.” Research Synthesis Methods 11 (5): 574–79. https://doi.org/10.1002/jrsm.1408.

Schöch, C. 2023. “Repetitive Research: A Conceptual Space and Terminology of Replication, Reproduction, Revision, Reanalysis, Reinvestigation and Reuse in Digital Humanities.” International Journal of Digital Humanities 5 (2): 373–403. https://doi.org/10.1007/s42803-023-00073-y.