4 Planning and Conducting Reproductions and Replications
5 Planning and Conducting Reproductions and Replications
Planning depends on whether the focus is on a certain method or a theory, that is whether the replication will be close or conceptual. Table 2 provides an overview of reproduction and replication types, or more generally “repetitive research” (Schöch 2023), drawn from different resources (e.g., (Dreber and Johannesson 2024); (Hüffmeier, Mazei, and Schultze 2016); for an alternative taxonomy see also (Cortina, Köhler, and Aulisi 2023)). The decision between these types is the first step in planning.
In addition, the formation of the replication team is important, as replications can take substantial resources. Notably, repetitive research has successfully been conducted collaboratively with graduate and undergraduate students (e.g., (Boyce et al. 2024; Hawkins et al. 2018; Jekel et al. 2020; Moreau and Wiebels 2023)) and we recommend the use of replication studies to engage students of different levels in conducting and publishing research.
Table 2
Types of repetitive research ordered by reproduction and replication and respective closeness to the original study.
Type | Description | Goals |
---|---|---|
Computational Reproduction | Reanalysis of the same data with the same code | Correctness of original report |
Recoding reproduction | Reanalysis of the same data, with new (equivalent) code | Correctness of original report |
Robustness Reproduction | Reanalysis of the same data with new coding choices; can vary in closeness | Robustness of original finding and sensitivity to different analytical decisions or software |
Multiverse analysis | Analyze data in all sensible ways (i.e., a large number of different robustness reproductions) | Robustness and generalizability of original finding, identification of potential moderators or sources for effect variability |
Internal replication | Replicate one of your own studies as closely as possible | Demonstrate one’s findings’ generalizability across studies and rule out fear of false-positives (e.g., for new discoveries) |
Close / direct / exact replication | Conduct a new study (based on work by other researchers) that is as close as possible to the original study | Rule out fear of the original finding being a false-positive, validate original materials or design, check generalizability/external validity for theoretically irrelevant variables (e.g., population, year of data collection) |
Close replication with extension | Add a variable or procedure to a close replication | Rule out fear of the original finding being a false-positive, test generalizability of original finding |
Conceptual / constructive replication | Conduct a study with changes that may be theoretically relevant but that tests the same hypothesis (e.g., different operationalization) | Generalizability of original finding, validity of theory |
5.1 Post Publication Conversations
When planning the replication study, additional knowledge should be taken into account such as any discussions of the original finding. There can be other studies citing the original studies, criticizing them, disconfirming their underlying theory, identifying errors, reinterpreting the finding, or making suggestions for replications. All of these might highlight considerations that need to be taken into account when designing a replication study that robustly tests the original claim or its generalisability.
Thus, replication researchers should look for post-publication discussions on the target study such as published comments and reviews, blog posts, or discussions on social media. These can often be found via Altmetric (https://www.altmetric.com) or other tools that allow researchers to quickly identify discussions on social media or news outlets beyond scientific journals (PubPeer, Hypothes.is), or the in-development platform Alphaxiv.org; for a review see (Henriques et al. 2023).
5.2 Reproduction before Replication
Many features of a replication study rest on the correctness of the original report. A reproduction allows researchers to investigate this by being able to uncover coding errors, fraud, robustness to analytical decisions, and generalizability. To make efficient use of resources, we encourage researchers to investigate the original finding’s reproducibility and robustness first. In other words, ideally, reproductions should take place before planning and conducting a replication study. Depending on the availability of the code and data, these can take several minutes to weeks.
If the original code and dataset are available, researchers can try to numerically reproduce the results. Beware, however, that differences in software versions or default settings may lead to slight deviations or require corrections in some cases (for a large-scale test of reproducibility see (Brodeur et al. 2024)). Similarly, the lack of a set seed for random number generators can mean that analyses relying on random numbers (e.g., bootstrapping) cannot be exactly reproduced. If no analysis script is available, analyses need to be recreated from the descriptions in the report (recoding reproduction). In this case, special attention should be paid to processing steps such as exclusion of outliers, transformation of variables, and handling of missing data. However, in many research areas information on these steps is often incomplete (Field et al. 2019); older research tends to be especially limited in terms of the methodological details they provide. In addition, we recommend testing the robustness of the original finding by making small alterations to the data processing and analyses procedure (robustness reproductions). For example, if the analyses were run for a subset of the data (e.g., participants aged 21 to 30 or without outliers ± 3 standard deviations), this subset can be changed (e.g., participants aged 18 to 30 or without outliers ± 2 standard deviations). Here, the initial focus should be on choices that are not determined by the theory that is presented, though this can also be used to explore the generalisability of some aspects of theory. Finally, if the original study was preregistered and the original code is available, reproduction researchers can check whether the original analyses adhere to the preregistered analysis plan.
If neither code nor data are available (or shared by the authors), no reproduction is possible. Researchers can still use automated tools to compare reported p-values with those that can be computed from test statistics via the website statcheck.io (where documents may be uploaded), the corresponding R package (Nuijten and Polanin 2020), or papercheck (DeBruineLakens2025?), which is still actively maintained.
Figure 3
5.3 Close replication before conceptual replication
If the goal is to increase the generalizability of a specific finding, we also suggest starting with replications that adhere as close as possible to the original study (e.g., close replications) and only later conduct conceptual replications. Based on Hüffmeier, Mazei, and Schultze (Hüffmeier, Mazei, and Schultze 2016), we propose the typology and order of replication attempts in Figure 3. Importantly, replications at any stage should not compromise any aspects of an original study, but rather (at the latest from the third study stage [constructive replications] onwards) try to improve one or more aspects of the original study, such as “[…] more valid measures, more critical control variables, a more realistic task, a more representative sample, or a design that allows for stronger conclusions regarding causality”, see Köhler & Cortina (Köhler and Cortina 2021, 494). Köhler and Cortina term such replications “constructive replications” and caution against the conduct of “quasi-random” replications that vary features without clear rationale.
Finally, there may be cases where the sequence of replications is not necessary, or where the context of the replication team requires a focus on generalisability to a specific context (see section The Role of Differences for the Interpretation of Findings).
Figure 4
Note: This is an adaptation and update of the typology of replication studies by Hüffmeier, Mazei, and Schultze (Hüffmeier, Mazei, and Schultze 2016). The typology is conceptualized as a hierarchy of studies that together help to (i) establish the validity and replicability of new effects, (ii) exclude alternative explanations, (iii) test relevant boundary conditions, and (iv) test generalizability.