Framework for
Open and
Reproducible
Research
Training

Logo of FORRT is a fort.

Replications & Reversals


Summary


Replications of previous scientific work are at the core of the Open Scholarship movement. However, as replication efforts become more widespread, it can be challenging to scholars and educators to keep themselves up to date with which effects in their field replicate and which do not. FORRT’s replications and reversals aims to collate replications and specifically so-called reversal effects in social science. Reversals are—in the context of a replication—effects that have their original direction flipped. The extent of such reversals and non-replicated effects is already apparent in the social science literature, with even replicated effects being only half of the originally reported effect (Ioannidis, 2005; Open Science Collaboration, 2015). Although such failures to replicate are far less costly to society than for example medical ones (Prasad & Cifu, 2011), they broadly hinder science’s goal of accumulating knowledge and contribute to waste of scarce resources. This resource aims to be a “living”, freely available, crowd-sourced, and community-driven collection of effects that have either not been replicated or even reversed through empirical research across social sciences. Scholars from varied backgrounds and areas of social science are invited to contribute with prevalent effects in their respective fields.


Motivation

The purpose of collating these reversal effects in social science is to encourage educators to incorporate replications of these effects into their students' project (e.g., third-year, thesis, course work) to provide them the opportunity to experience the research process directly, assess their ability to perform and report scientific research, and to help evaluate the robustness of the original study, thereby also helping them become good consumers of research. The below crowdsourced and community-curated resource aims to satisfy three of FORRT’s Goals:

  • Support scholars in their efforts to learn and stay up-to-date on best practices regarding open and reproducible research;
  • Facilitating conversations about the ethics and social impact of teaching substantive topics with due regard to scientific openness, epistemic uncertainty and the credibility revolution;
  • Foster social justice through the democratization of scientific educational resources and its pedagogies.

and four of FORRT’s Mission:

  • Dismantling hierarchies surrounding research, teaching, and service;
  • Building community among educators and various non-academic communities working to improve scientific communication and literacy across academia and the general public;
  • Building capacity for advocacy; and
  • Advocacy for the creation and maintenance of educational resources.

Current Status

This is a dynamic project that is organized in four stages. Currently, we are in stage 2:

  1. Proof of Concept Phase (adaptation of original project into FORRT, inclusion of effects from social and cognitive psychology, using Gavin Leech’s collection as a basis) → ~150 entries finished in 2021.

  2. Team Science Expansion Phase Across Disciplines (crowd-sourcing entries and refine existing entries), started at the end of 2021 and planned until the end of 2023. Drafting of first ‘output’ piece. Currently including a total of 485+ effects spanning 22 disciplines.

  3. Review Phase (open review to identify inconsistencies, missing data, and errors), planned for 2024. Publishing of first ‘output’ piece.

  4. Regular Update Phases (dynamically adding new effects), planned for 2025 and beyond.

We have received the Open Science Fund 2023 from the Dutch Research Foundation (NWO)! Check out our project “Tracking and Mainstreaming Replications across the Social, Behavioral and Cognitive Sciences” here!
We have received a SIPS 2023 commendation from the Society for the Improvement of Psychological Science! Check out the awards page for the other commendations!

How to contribute?

Anyone can add new effects or edit existing effects by joining our initiative on Slack and then following the instructions in our reversals g-doc.

All past or current project contributors are listed at the bottom of this page.




All Effects (sorted by discipline)


You can find a list of all effects we are working on here. To search whether an effect already exists in our collection, use Ctrl-F or the search function together with a keyword in relation to the effect (e.g. “Macbeth” or “Priming”). Please note that not all effects in the spreadsheet are listed below, as this is a work in progress (last site update: 13th of July 2023)


Table of Contents


Social Psychology

  • Elderly priming. that hearing about old age makes people walk slower.

    • Status: reversed
    • Original paper: ‘Automaticity of social behaviour’, Bargh 1996; 2 experiments with Study 2a: n = 30, Study 2b: n = 30. [citations = 5938(GS, October 2021)]​.
    • Critiques: Doyen 2012 [n=120, citations=757(GS, October 2021)]. Lakens 2017 [meta analysis: citations = 21(GS, October 2021)]. Pashler et al. 2011 [n=66, citations=21(GS, October 2021)].
    • Original effect size: not reported. ​
    • Replication effect size: Doyen: walking speed: η2=.01/d = 0.10 [calculated, using this conversion]. Lakens: r= .29/d= .61. Pashler: not reported.​

  • Hostility priming (unscrambled sentences). Exposing participants to more hostility-related stimuli caused them subsequently to interpret ambiguous behaviours as more hostile.

    • Status: not replicated
    • Original paper: ‘The role of category accessibility in the interpretation of information about persons: Some determinants and implications’, Srull and Wyer, Jr. 1979; 2 experiments with Study 1: n = 96; Study 2: n = 96. [citations = 2409 (GS, November 2021)].
    • Critique: McCarthy et al. 2018 [n = 7,373 for Study 1, citations = 40(GS, November 2021)]. McCarthy et al. 2021 (see Figure) [n = 1,402 for close replication; n = 1,641 for conceptual replication, citations = 2(GS, November 2021)].
    • Original effect size: 2.99 (1.58%).
    • Replication effect size: All effect sizes are located in McCarthy et al. 2018: Acar: _d _= 0.16. Aczel: _d _= 0.12. Birt: d = -0.11. Evans: d = -.22. Ferreira-Santos: d = 0.01. Gonzalez-Iraizoz: d = -.21. Holzmeister: d = .11. Klein Selfe and Rozmann: d = -0.51. Koppel: d = -.14. Laine: d = -.27. Loschelder: XX =-.07. McCarthy: d = -.10. Meijer: d = .03. Ozdorgru: d = .22. Pennington: d = -.52. Roets: d = -.01. Suchotzki: d = .10. Sutan: d = .49. Vanpaemel: d = .17. Verschuere: d = -.14. Wick: d = .07. Wiggins: d = .01. Average replication effect size: d = -0.08. McCarthy et al. 2021: d = 0.06.

  • Intelligence priming (contemplation) (professor priming). Participants primed with a category associated with intelligence (e.g. “professor”) performed 13% better on a trivia test than participants primed with a category associated with a lack of intelligence (“soccer hooligans”).

    • Status: not replicated
    • Original paper: ‘The relation between perception and behavior, or how to win a game of trivial pursuit’, Dijksterhuis and van Knippenberg 1998; 4 experiments with Study 1: n = 60; Study 2: n = 58; Study 3: n = 95; Study 4: n = 43. [citations = 1124 (GS November 2021)].
    • Critiques: O’Donnell et al. 2018 [n = 4,493 who met the inclusion criteria; n = 6,454 in supplementary materials, citations = 71(GS November 2021)].
    • Original effect size: PD = 13.20%.
    • Replication effect size: All effect sizes are located in O’Donnell et al. 2018: Aczel: PD = -1.35%. Aveyard: PD = -3.99%. Baskin: PD = 4.08%. Bialobrzeska: PD = -.12%. Boot: PD = -4.99%. Braithwaite: PD = 4.01%. Chartier: PD = 3.23%. DiDonato: PD = 3.14%. Finnigan: PD: 2.89%. Karpinski: PD = 1.38%. Keller: PD = .17%. Klein: PD =.88%. Koppel: PD = -.20%. McLatchie: PD = -2.16%. Newell: PD = 1.66%. O’Donnell: PD = 1.58%. Phillipp: PD = 43%. Ropovik: PD = -.48%. Saunders: PD = -1.87%. Schulte-Mecklenbeck: PD = 4.24%. Shanks: PD = .11%. Steele: PD = -.58%. Steffens: PD = -.84%. Susa: PD = -.63%. Tamayo: PD = 1.41%. Meta-analytic estimate: PD = 0.02%.

  • Moral priming (cleanliness). Participants exposed to physical cleanliness were shown to reduce the severity of their moral judgments. Direct, well-powered replications did not find evidence for the phenomenon.

    • Status: not replicated
    • Original paper: With a Clean Conscience: Cleanliness Reduces the Severity of Moral Judgments, Schnall, Benton, and Harvey, 2008; 2 experiments with Study 1: n = 40, Study 2: n = 44. [citations=645 (GS November 2021)].
    • Critiques: Johnson et al. 2014, [Study 1: n = 208, Study 2: n = 126. citations=128(GS November 2021)].
    • Original effect size: Study 1: d = -0.60, 95% CI [-1.23, 0.04]; Study 2: d = -0.85, 95% CI [-1.47, -0.22]
    • Replication effect size: Study 1: d = -0.01, 95% CI [-0.28, 0.26]; Study 2: d = 0.01, 95% CI [-0.34, 0.36]

  • Moral priming (contemplation). Participants exposed to a moral-reminder prime would demonstrate reduced cheating.

    • Status: not replicated
    • Original paper: ‘The Dishonesty of Honest People: A Theory of Self-Concept Maintenance’, Mazar et al. 2008; 6 experiments with Study 1: n = 229; Study 2: n = 207; Study 3: n = 450; Study 4: n = 44; Study 5: n = 108; Study 6: n = 326. [citations= 3072 (GS November 2021)].
    • Critiques: Verschuere et al. 2018 [n = 5786 replication of Experiment 1, citations = 65(GS November 2021)].
    • Original effect size: not reported; commandants-cheat versus books-cheat: d = -1.45[-2.61, -0.29] [obtained from the Verschuere et al.’s 2018 meta analysis Figure 2], commandants-cheat versus commandants-control: d = -0.35 [–1.26, 0.57] [obtained from the Verschuere et al.’s 2018 meta analysis Figure 3].
    • Replication effect size: All effect sizes are located in Verschuere et al. 2018: Commandants-cheat versus books-cheat: Aczel: d = -0.26 [-1.22, 0.69]. Birt: d = 0.41 [-0.58, 1.39]. Evans: d = 0.85 [-0.13, 1.83]. Ferreira-Santos: d = -0.19 [-1.14, 0.77]. Gonzalez-Iraizoz: d = 0.26[-0.77, 1.28]. Holzmeister: d = 1.11[-0.30, 2.52]. Klein Selle and Rozmann: d = -0.27 [-1.11, 0.58]. Koppel: d = 0.39[-0.40, 1.17]. Laine: d = -0.37 [-1.18, 0.44]. Loschelder: d = -0.11[-0.86, 0.65]. McCarthy: d = 0.57 [-0.87, 2.02]. Meijer: d = -0.15 [-0.75, 0.44]. Ozdogru: d = 1.19 [0.01, 2.37]. Suchotzki: d = 0.00 [–0.93, 0.93]. Sutan: d = 0.02[-0.79, 0.83]. Vanpaemel: d = 0.17[-0.55, 0.88]. Verschuere: d = 0.18 [-0.55, 0.91]. Wick: d = -0.09 [-1.06, 0.87]. Wiggins: d = 0.19 [-0.51, 0.90]. Meta-analytic estimate: d = 0.11 [-0.09, 0.31]. Commandants-cheat versus commandants-control: Aczel: d = 0.05 [-0.77, 0.88]. Birt: d = 0.83 [-0.10, 1.75]. Evans: d = 0.60 [-0.39, 1.59]. Ferreira-Santos: d = -0.33 [-1.41, 0.74]. Gonzalez-Iraizoz: d = 1.11 [0.14, 2.08]. Holzmeister: d = 1.30 [-0.17, 2.78]. Klein Selle and Rozmann: d = -0.15 [-0.79, 1.09]. Koppel: d = 0.51 [-0.20, 1.22]. Laine: d = 0.10 [-0.63, 0.83]. Loschelder: d = -0.24 [-1.38, 0.90]. McCarthy: d = 1.10 [-0.20, 2.41]. Meijer: d = -0.31 [-0.89, 0.25]. Ozdogru: d = 1.15 [-0.10, 2.41]. Suchotzki: d = -0.05 [-0.86, 0.75]. Sutan: d = 0.41 [-0.41, 1.23]. Vanpaemel: d = 0.36 [-0.37, 1.09]. Verschuere: d = 0.13 [-0.61, 0.87]. Wick: d = -0.14 [-0.94, 0.67]. Wiggins: d = -0.08 [-1.02, 0.87]. Meta-analytic estimate: d = 0.24 [0.03, 0.44].

  • Distance priming. Participants primed with distance compared to closeness produced greater enjoyment of media depicting embarrassment (Study 1), less emotional distress from violent media (Study 2), lower estimates of the number of calories in unhealthy food (Study 3), and weaker reports of emotional attachments to family members and hometowns (Study 4).

  • Flag priming. Participants primed by a flag are more likely to be more in conservative positions than those in the control condition.

    • Status: mixed
    • Original paper: ‘A Single Exposure to the American Flag Shifts Support Toward Republicanism up to 8 Months Later’, Carter et al. 2011; experimental design, 2 studies with n = 191 completed three sessions and 71 completed the fourth session, Experiment 2: n = 70. [citations = 186 (GS, October 2021)].
    • Critique: Klein et al. 2014 [n=6,082, citations = 957 (GS, October 2021)].
    • Original effect size: d = 0.50.
    • Replication effect size: All effect sizes are located in ManyLabs: Adams and Nelson: d = .02. Bernstein: d = 0.07. Bocian and Frankowska: d = .19 (Study 1). Bocian and Franowska: d = -.22 (Study 2). Brandt et al.: d = .21. Brumbaugh and Storbeck: d = -.22 (Study 1). Brumbaugh and Storbeck: d = .02 (Study 2). Cemalcilar: d = .14. Cheong: d = -.11. Davis and Hicks: d = -.27 (Study 1). Davis and Hicks: d =-.03 (Study 2). Devos: d = -.11. Furrow and Thompson: d = .09. Hovermale and Joy-Gaba: d = -.07. Hunt and Krueger: d = .27. Huntsinger and Mallett: d = .06. John and Skorinko: d = .08. Kappes: d = .04. Klein et al.: d = -.11. Kurtz: d =.04. Levitan: d = -.01. Morris: d = .09 Nier: d = -.45. Packard: d = .04. Pilati: d = 0.00. Rutchick: d = -.07. Schmidt and Nosek (PI): d =.03. Schmidt and Nosek (MTURK): d = .09. Schmidt and Nosek (UVA): d = -.15. Smith: d = .27. Swol: d =-.03. Vaughn: d = -.17. Vianello and Galliani: d =.49. Vranka: d = -.03. Wichman: d = .11. Woodzicska: d =-.09. Average replication effect size: d = 0.03.

  • Fluency priming. Objects that are fluent (e.g., conceptually fluent, visually fluent) are perceived more concretely than objects that are disfluent (disfluent objects are perceived more abstractly).

  • Money priming. Images or phrases related to money cause increased faith in capitalism, and the belief that victims deserve their fate.

    • Status: not replicated
    • Original paper: ‘Mere exposure to money increases endorsement of free-market systems and social inequality’, Caruso 2013; experimental design, n between 30 and 168 [(citations~161 (GS, November 2021)].
    • Critiques: Rohrer 2015 [n=136, citations = 82 (GS, November 2021)]. Meta-analysis: Lodder 2019 [k=246, citations = 64 (GS, November 2021)].
    • Original effect size: system justification _d _= 0.8, just world _d _= 0.44, dominance _d _= 0.51, fair market ideology: not reported, d = 0.70 [obtained from Rohrer’s 2015 Experiment 4 results section].
    • Replication effect size: Rohrer et al. (Experiment 1): d = 0.07 [0.41, 0.27] for system justification, d = 0.06 [-0.14, 0.25] for belief in a just world, d = -0.06 for social dominance, social dominance: d = 0.06 [0.37, 0.26], fair market ideology, d = 0.14 [-0.23, 0.50]. For 47 preregistered experiments in Lodder: g = 0.01 [-0.03, 0.05] for system justification, g = 0.11 [-0.08, 0.3] for belief in a just world, g = 0.07 [-0.02, 0.15] for fair market ideology.

  • Commitment priming (recall). Participants exposed to a high-commitment prime would exhibit greater forgiveness.

  • Mortality Salience (Death Priming/Terror Management Theory). Reminders of death lead to subconscious changes in attitudes and behaviour, for example in the form of increased in-group bias and behaviour that serves to defend an individual’s cultural worldview.

  • Spatial priming for emotional closeness. Plotting points closer together led to participants reporting they were closer to their own family members than those who plotted points farther apart.

    • Status: not replicated
    • Original paper: ‘Keeping One’s Distance: The effect of spatial distance cues on affect and emotion’, Lawrence and Bargh 2008, 4 experiments with Study 1: n = 73; Study 2: n = 42; Study 3: n = 59; Study 4: n = 84. [citation= 583 (GS, January 2022)].
    • Critiques: Pashler et al. 2012 [n = 92, citations = 188 (GS, January 2022)]. Open Science Collaboration 2015 [n=125, citations = 6148(GS, January 2022)].
    • Original effect size: Study 1: η2 = .09/d = 0.10 [converted from partial eta squared to Cohen’s d using this conversion]; Study 2: η2 = .18/d = 0.22 [converted from partial eta squared to Cohen’s d using this conversion]; Study 3: η2 = .10/_d _= 0.11[converted from partial eta squared to Cohen’s d using this conversion]; Study 4: η2 = .11/d = 0.12 [converted from partial eta squared to Cohen’s d using this conversion].
    • Replication effect size: Pashler et al.: η2 = 0.01/d = 0.01 [converted from partial eta squared to Cohen’s d using this conversion]. Joy-Gaba et al.’s effect sizes are located in Open Science Collaboration 2015 for Study 4: η2 = .00/ d = .00.

  • Implicit God prime increases self-reported risky behaviour. Implicitly priming God using the scrambled-sentence paradigm increases self-reported risk taking.

    • Status: not replicated
    • Original paper: ‘Anticipating divine protection? Reminders of god can increase nonmoral risk taking’, Kupor et al. 2015; experimental design, Experiment 1a: n=61 and Experiment 1b: n=202. [citations=76 (GS, November 2022)].
    • Critiques: Gervais et al. 2020 [Experiment 1a: n=556, Experiment 1b: n=548, citations=9 (GS, November 2022)].
    • Original effect size: Experiment 1a: d=0.574 [0.05, 1.09]; Experiment 1b: d=0.323 [0.04,0.60].
    • Replication effect size: Gervais et al.: Experiment 1a: d=0.14 [-0.07, 0.34]; Experiment 1b: d=-0.11 [-0.31, 0.09].

  • Implicit God prime increases actual risky behaviour. Implicitly priming God using the scrambled-sentence paradigm increases willingness to engage in risky behaviour for financial reward.

    • Status: not replicated
    • Original paper: ‘Anticipating divine protection? Reminders of god can increase nonmoral risk taking’, Kupor et al. 2015; Experiment 3: n=101. [citations=76 (GS, November 2022)].
    • Critiques:[ Gruneau Brulin et al. 2018 Experiment 1b: n = 160, Experiment 2b: n=264, citations=19 (GS, November 2022)].
    • Original effect size: Experiment 3: b=0.61.
    • Replication effect size: Gruneau Brulin et al: Experiment 1b: d=-0.11 [-0.31, 0.09]; Experiment 2b: b=0.14 [-0.07, 0.34].

  • Heat priming. Exposure to words related to hot temperatures increases aggressive thoughts and hostile perceptions. This effect suggests that people mentally associate heat-related constructs with aggression-related constructs.

    • Status: not replicated
    • Original paper: ‘Hot under the collar in a lukewarm environment: Words associated with hot temperature increase aggressive thoughts and hostile perceptions’, DeWall & Bushman 2009; 2 experiments in which participants were first exposed to words related to either heat, cold, or neutral concepts and then completed a word stem completion task (Study 1; n=127) or had to rate person’s hostility basing on ambiguous description of this person (Study 2; n=72). [citation=76 (GS, June 2022)]​.
    • Critiques: McCarthy 2014 [n=182, citations=14 (GS, June 2022)]; including meta-analyses [n=499]​.
    • Original effect size: Study 1: d = 0.47 (hot vs. cold words), d = 0.46 (hot vs. neutral words); Study 2: d = 0.67 (hot vs. cold words), d = 0.63 (hot vs. neutral words).
    • Replication effect size: McCarthy: Study 2A: d = -0.12 (hot vs. cold words), d = -0.02 (hot vs. neutral words); Study 2B: d = -0.06 (hot vs. cold words), d = 0.00 (hot vs. neutral words) (both experiments replicate procedure from Study 2); Meta-analysis: d = 0.18.

  • ​​Honesty priming (goal-priming, social priming). An increased level of honesty to embarrassing behaviours after exposure to honesty-related words.

    • Status: not replicated
    • Original paper: ‘Using implicit goal priming to improve the quality of self-report data’, Rasinski et al. 2005; between-subjects, n = 64. [citations = 111 (GS, October 2022)].
    • Critiques: Pashler et al. 2013 [Direct replication, Experiment 1 n=149 and Experiment 2 n=152, and conceptual replication, Experiment 3 n=151 and Experiment 4 n=153, citations = 66 (GS, October 2022)].Dalal and Hakel 2016 [Experiment 1 n = 590, conceptual replication, citations = 41 (GS, March 2023)].
    • Original effect size: d = 1.21 (estimated from test-statistics in paper).
    • Replication effect size: Pashler et al.: Experiment 1: d = 0.18 (non-significant; not replicated); Experiment 2: d = -0.14 (non-significant; opposite direction); Experiment 3: Measure 1: _d _= -0.14 (non-significant; opposite direction; estimated from test statistics in paper), Measure 2: d = -0.13 (non-significant; opposite direction; estimated from test statistics in paper); Experiment 4: Measure 1: d = 0.04 (non-significant; not replicated; estimated from descriptive statistics), Measure 2: d = -0.14 (non-significant; opposite direction; estimated from descriptive statistics). Dalal and Hakel : _d _= -0.07 (non-significant; opposite direction; estimated from descriptive statistics in Table 2 (to get N of groups) and 3 (to get the means and standard deviations).

  • Achievement priming (goal priming, high-performance goal priming). Exposing individuals to words that are success oriented (e.g., win, strive) will increase their performance on a task compared to those exposed to neutral words (e.g., carpet, shampoo).​

    • Status: mixed.
    • Original paper: ‘The Automated Will: Nonconscious Activation and Pursuit of Behavioral Goals’, Bargh et al. 2001; between-subjects, Experiment 1: n=78, Experiment 2: n=60, Experiment 3: n=288, Experiment 4: n=76, Experiment 5: n=65. [citation = 2,987 (GS, October 2022)].
    • Critiques: Shantz and Latham 2009 [Pilot Study: n = 52, Field Experiment: n = 81, citations = 221 (GS, October 2022)]. Harris et al. 2013 [Experiment 1: _n _= 98, Experiment 2: n = 66, citations = 199 (GS, October 2022)]. Weingarten et al. 2016 [meta-analysis, n = NA, k = 133 studies, citations = 333 (GS, October 2022)].
    • Original effect size (estimated from test-statistics reported): Experiment 1: d= 0.72 (priming of high-performance words led to more words being found); Experiment 2: d = 0.53 (priming of cooperation words led to more cooperation between players); Experiment 3: d = 0.52 (adding delay between word exposure and task increased performance in the high-performance words group); Experiment 4: d = 0.76 (when given the stop signal, those in the high-performance word group continued to work on the task (Note: The statistics for this experiment suggest that they had more than 76 participants. Specifically, they fit a 2 x 2 ANOVA and have residual degrees of freedom of 75. If they had 76 participants, their residual degrees of freedom would be 72. For the purposes of estimating their effect sizes, I have used the corrected residual degrees of freedom value); Experiment 5: ​_ d_ = 0.68 (when interrupted, the high-performance word group was more likely to return to their task than the neutral group).
    • Replication effect size: Shantz and Latham: Participants either shown a picture of a woman winning a race or not to prime achievement, Pilot Study: d = 0.84 (replicated); Field Experiment: d = 0.43 (replicated). Harris et al. : Experiment 1 (direct replication of Experiment 1 in Bargh et al., 2001): d = -0.24 [0.15, -0.64] (not replicated); Experiment 2 (direct replication of Experiment 3 in Bargh et al., 2001): d = -0.03 [0.45, -0.52] (not replicated). Weingarten et al.: The meta-analysis looked at all priming experiments that examined behaviour (i.e., not just achievement priming). It found that there is a small effect of behavioural priming (d = 0.35 [0.29, 0.41]). Factors that affected the priming effects were: Publication status - Published (n = 255 studies): d = 0.39 [0.33, 0.44], Unpublished (n = 88 studies): d = 0.10 [0.01, 0.20]; Liminality - Supraliminal (n = 255 studies): d = 0.30 [0.24, 0.36; this is the method used in Bargh et al., 2001], Subliminal (n = 88 studies): d = 0.40 [0.30, 0.51]; Use of neutral control - No neutral control (n = 38 studies): d = 0.44 [0.27, 0.60], With neutral control (n = 307 studies): d = 0.31 [0.25, 0.37].

  • Weapons priming effect (weapons effect). Stimuli or cues associated with aggression, such as weapons, can elicit aggressive responses.

    • Status: mixed (the effect is smaller than originally believed)
    • Original paper: ‘Weapons as aggression-eliciting stimuli’, Berkowitz and LePage 1967; between-subjects design, n = 100 (male university students). [citations = 1161 (GS, October 2022)].
    • Critiques: Turner and Simons 1974 [n = 60, citations = 11 (GS, October 2022)]. Frodi 1975 [_n = _100, citations = 50 (GS, October 2022)]. Carlson et al. 1990 [meta-analysis; n = 628 (fail-safe), k = 56 studies, citations = 339 (GS, October 2022)]. Benjamin et al., 2018 [meta-analysis; n = 7,668 participants, k = 78 studies, citations = 12 (GS, October 2022)]. Ariel et al. 2019 [RCT of taser presence and the police force; n = 678 officers, citations = 42 (GS, October 2022)].
    • Original effect size: all reported effect sizes are found in Carlson et al.: d = 0.76 to 1.06.
    • Replication effect size: Turner and Simons: d = -1.17 to 0.64 (reported in Carlson et al., 1990); the greater the evaluation apprehension, the less likely aggressive behaviour was observed (mixed). Frodi: d = 0.91 (reported in Carlson et al., 1990) (replicated). Carlson et al.: d = 0.38 (replicated). Benjamin et al. : d = 0.29 [0.21, 0.36] (replicated); The effect is moderated by several variables : Smaller if looked at behaviour (d = 0.25 [0.07, 0.43]), educed for “field” experiments (d = 0.22 [-0.07, 0.51]), larger when photos used (d = 0.35 [0.26, 0.44]) rather than actual weapons (d = 0.12 [-0.08, 0.31]). Ariel et al. : The presence of a taser on the officer led to Increased use of force, IRR = 1.48 [1.27, 1.72] (replicated); Increased injury to officers, IRR = 2.11[1.53, 2.91] (replicated).

  • Goal priming effect (goal contagion, goal inspiration, behavioural inspiration). The observation of other’ behaviour (e.g., your observe someone jogging in the park) may lead to the inference of the goal in the observer (“This person wants to keep fit.”) and to the adoption of the same goal (“Maybe I should do some sports too.”).

    • Status: not replicated
    • Original paper: ‘Goal Contagion: Perceiving Is for Pursuing’, Aarts, Gollwitzer and Hassin 2004; Study 1, 2 (need for money: high vs low) x 2 (goal: money vs control) between-subjects ANOVA with dependent variable of earning money goal; note: the main effect of the manipulation is of interest, n=83. [citations=824(GS, December 2022)]​.
    • Critiques: Brohmer et al. 2021 [meta analysis total n=4751, citation=2(GS, December 2022)]. Corcoran et al. 2020 [n=300, citations=10(GS, December 2022)].
    • Original effect size: main effect of goal vs control: g = 0.38 [-0.05, 0.81] (based on F statistics: F(1,79) = 3.14, p < .08; original authors also report Goal x Need interaction effect, F(1, 79) = 5.32, p < .03).
    • Replication effect size: Corcoran et al. : g = -0.20 [-0.42, 0.02] (reported in Brohmer et al.’s meta analysis); Brohmer et al. : g = 0.30 [0.21, 0.40], but the bias-corrected meta-analytic summary effect (selection model approach) is g = 0.15 [-0.02; 0.32].

  • Verbal framing (temporal tense). Participants who read what a person was doing (relative to those who read what person did) showed enhanced accessibility of intention-related concepts and attributed more intentionality to the person.

    • Status: mixed
    • Original paper: ‘Learning about what others were doing: Verb aspect and attributions of mundane and criminal intent for past actions’, Hart and Albarracin (2011): 3 experiments with Study 1: n = 5458; Study 2: n = 37; Study 3: n = 48. [citations = 37, (GS, January 2022)].
    • Critiques: Eerland et al. (2016) [meta analysis (total n= 685 for perfective-aspect condition; n = 681 imperfective-aspect condition) of Study 3 citations = 70, (GS, January, 2022)]
    • Original effect size: Study 1: d = 1.00 for intentionality in imperfective-aspect condition; Study 2: d = 1.23 for imagery in imperfective-aspect condition; Study 3: d= 1.20 for intentionality, d = 0.92 for imagery and 0.55 for intention attribution in imperfective-aspect condition.
    • Replication effect size: All effect sizes are located in Eerland et al. 2016: intentionality: Arnal (lab): d = -0.35; Berger (lab): d = -0.98; Birt and Aucoin (lab): d = -0.38; Eerland et al. (lab): d =0.16; Eerland et al.(online): d = -0.33; Ferretti (lab): d = -0.01; Knepp (lab): d = -0.95; Kurby and Kibbe (lab): d = -0.14; Melcher (lab): d = 0.65; Michael (lab): d = -0.41; Poirier et al. (lab): d = 0.32; Prenoveau and Carlucci (lab): d = -0.38. Meta-analytic estimate for laboratory replications only: d = -0.24. Imagery: Arnal (lab): d = −0.01; Berger (lab): d = −0.45; Birt and Aucoin (lab): d = −0.40; Eerland et al. (lab): d =−0.01; Eerland et al.(online): d = -−0.13; Ferretti (lab): d = 0.33; Knepp (lab): d = 0.00; Kurby and Kibbe (lab): d = 0.02; Melcher (lab): d = −0.16; Michael (lab): d = -0.08; Poirier et al. (lab): d = -0.19; Prenoveau and Carlucci (lab): d = -0.02. Meta-analytic estimate for laboratory replications only: d = -0.08. Intention attribution: Arnal (lab): d = -0.15; Berger (lab): d = -0.15; Birt and Aucoin (lab): d = 0.08; Eerland et al. (lab): d =-0.01; Eerland et al.(online): d = 0.02; Ferretti (lab): d = -0.19; Knepp (lab): d = -0.29; Kurby and Kibbe (lab): d = 0.00; Melcher (lab): d = 0.12; Michael (lab): d = 0.13; Poirier et al. (lab): d = 0.06; Prenoveau and Carlucci (lab): d = 0.03. Meta-analytic estimate for laboratory replications: d = 0.00.

  • Reference framing. Risk preferences change depending on whether a choice is presented in terms of gains or losses, even when the prospects of the options are held constant.​

  • Prosocial spending. Spending money on other people leads to greater happiness than spending money on oneself.

    • Status: replicated
    • Original paper: ‘Spending Money on Others Promotes Happiness’, Dunn et al. , 2008; cross-sectional survey, n=632. [citations = 2008 (GS, March 2022)].
    • Critiques: Akinn et al., 2020 [3 Experiments, Experiment 1: n=712, Experiment 2: n =1,950, Experiment 3: n =5,199, citations = 51 (GS, March 2022)].
    • Original effect size: _b _= 0.11.
    • Replication effect size: Experiment 1: positive affect: d = .36, positive emotion: d = .32; Experiment 2: positive affect: d = .03, positive emotion: d = .02; Experiment 3: positive affect: d = .06, positive emotion: d = .06, positive meotion after spending one’s own money: d = .17.

  • Gustatory disgust on moral judgement. Gustatory disgust triggers a heightened sense of moral wrongness.

    • Status: not replicated
    • Original paper: ‘A Bad Taste in the Mouth: Gustatory Disgust Influences Moral Judgment’, Eskine et al. 2011; experiment, n = 57.[citation = 564 (GS, January 2022)].
    • Critiques: Ghelfi et al. 2020 [meta-analysis, total n = 1137, citations = 18 (GS, January 2022)]. Johnson et al. 2016 [Study 1: n = 478, Study 2: n = 934, citations=52 (GS January 2022)].
    • Original effect size:_ _Cohen’s _d _= 1.12 (comparison to control group); Cohen’s _d _= 1.28 (comparison to sweet taste).
    • Replication effect size: Johnson et al.: Cohen’s d = 0.04 (Study 1 - comparison to control group), Cohen’s d = 0.05 (Study 2 - comparison to control group). All effect sizes are located in Ghelfi et al. 2016: comparison to sweet group: Christopherson: Hedges g = 0.53. Christopherson: Hedges’ g = 0.04. Fischer: Hedges’ g = 0.25. Guberman: Hedges’ g = -0.30. de Haan: Hedges’ g = -0.13. Legate: Hedges’ g = 0.99. Legate: Hedges’ _g _= -0.02. Lenne: Hedges’ g = -0.19. Urry: Hedges’ g = -0.13. Wagemans: Hedges’ g = 0.03. Weber: Hedges’ g = -0.27. Meta-analytic estimate: Hedges’ g = -0.05. Comparison to control group: Christopherson: Hedges g = 0.68. Christopherson: Hedges’ g = -0.19. Fischer: Hedges’ g = -0.01. Guberman: Hedges’ g = -0.12. de Haan: Hedges’ g = -0.24. Legate: Hedges’ g = 0.79. Legate: Hedges’ _g _= 0.37. Lenne: Hedges’ g = -0.13. Urry: Hedges’ g = 0.08. Wagemans: Hedges’ g = -0.11. Weber: Hedges’ g = -0.04. Meta-analytic estimate: Hedges’ g = 0.10.

  • Macbeth effect. Moral aspersions induce literal physical hygiene.

    • Status: mixed
    • Original paper: ‘Washing away your sins: threatened morality and physical cleansing’, Zhong and Liljenquist 2006; 4 experiments with Study 1: n=60, Study 2: n=27, Study 3: n=32, Study 4: n=45. [citation = 1407 (GS, January 2022)].
    • Critiques: Siev et al. 2018 [meta-analysis: n=1,746, citations = 17(GS, January 2022)].
    • Original effect size: Study 1: g = 0.38; Study 2: g = 0.75; Study 3: g = 0.38; Study 4: g = 0.33.
    • Replication effect size: Siev et al.: g = 0.17 [0.04 – 0.31]. All effect sizes are located in Siev et al. 2018: Earp et al.: Study 1: g = 0.02 [-0.30 0.34], Study 2: g= 0.05 [-0.27, 0.37], Study 3: g = 0.13 [-0.11, 0.37]. Fayard et al.: Study 1: g = 0.11[-0.20 0.43]. Gamez et al.: Study 1: g = 0.02 [-0.54 0.56], Study 2: g= -0.01 [-0.64, 0.63], Study 3: g = 0.55 [-0.26, 1.37]. Lee and Schwarz: Study 2: g = 0.22 [-0.20 0.64]. Schaefer: Study 2: g = 0.71 [0.18, 1.23]. Siev et al. (unpublished): Study 1: g = -0.06 [-0.27 0.15], Study 2: g= -0.18 [-0.56, 0.20]. Zhong (unpublished): Study 2: g = 0.28.

  • Signing at the beginning rather than end makes ethics salient. Signing a statement of honest intent before providing information rather than after can reduce dishonesty.

  • Social class on prosocial behaviour. Individuals from a high social class are more likely to exhibit prosocial behavior than those from a low social class, but there is a U-shaped curve between social class and prosocial behavior that sometimes appears. The final study in the critique section below reported two pre-registered replications of Piff et al., 2010 with different results. There are more studies than those described here, but these should provide a good sense of the current state of the science.

    • Status: mixed
    • Original papers: ‘Volunteering in public health: An analysis of volunteers' characteristics and activities’, Ramirez-Valles, 2006; random-digit dialling in Illinois, US, n = 609. [citations = 9 (GS, June 2022)].
    • Critiques: Gittell & Tebaldi 2006 [n=NA, citations = 161 (GS, June 2022)]. James III & Sharpe 2007 [n = 16,442 households, citations=171 (GS, June 2022)]. Piff et al. 2010 [4 experiments with Experimeent 1: n = 115; Experiment 2 : n = 81; Experiment 3 : n = 155; Experiment4 : n = 91, citations=1572 (GS, June 2022)]. Guinote et al. 2015 ; [Experiment 1 : n = 44; Study 4 : n = 48 children, citations=185 (GS, June 2022)]. Chen et al. 2013 [n = 469 kindergarten children, citations=110 (GS, June 2022)]. Korndörfer et al. 2015 [8 studies, n1 = 9260 German households, n2 = 32,090 US households, n3 = 3975 (objective) & 3,857 (subjective) US persons, n4 = 33,072 German persons, n5 = 3,983 (objective) & n = 3,964 (subjective) US persons, n6 = 32,257 persons in 28 countries, n7 = 3,902 (objective) & n = 3,886 (subjective) US persons, n8 = 1,421 German persons, citations=238 (GS, March 2023)]. Stamos et al., 2020 [Experiment 1: n = 300, Experiment2: n = 200, citations=31 (GS, March 2023)].
    • Original effect size: Ramirez-Valles: household income on past-12-month volunteering in public health OR = 1.22; education NS OR = 1.02.
    • Replication effect size: Gittell and Tebaldi: correlation between income and volunteer rate (-.13), regression coefficients for personal income (769.1) and education (29.35) on average charitable contribution per tax filer. Piff et al.: Experiment 1 - subjective SES on dictator game resource allocation: β = -.23; Experiment 2 - self-reported family income: β = -.27 and manipulated social class: β = -.23 on attitudes toward charitable giving; Experiment 3 - combined education and income on trust game with arbitrary points: r = -.18; Experiment 4 - combined past and current income on ambiguous task helping: β = -.43. Guinote et al.: Experiment 1 - manipulated department rank on picking up pens for experimenter: d = 1.16; Experiment 4 - random winner on sticker donation T1: calculated d= 0.657, losing status: ηp2 = 0.34, gaining status: ηp2 = 0.38, NS differences at T2. Chen et al.: family income on sticker allocation in dictator game: Spearman’s ρ = -.10; parents education/migrant status: NS. Korndörfer et al.: Experiment 1 – household objective social class for each household on self-reported donation behavior for the previous year: OR = 2.07, NS quadratic term, on relative amount of donation, both standardized score: b= .158 and its quadratic term, b= .073; Experiment2 – household objective social class on self-reported donation behavior for the previous year: OR = 1.99, NS quadratic term, on relative amount of donation, standardized score: b= .078, NS quadratic term; Experiment 3 - Model 1, objective social class for each person on self-reported donation behavior for the previous year: OR = 2.54, NS quadratic term and frequency: b= .392, quadratic term: b= -.064; Model 2, four-category subjective social class for each person on self-reported donation behavior for the previous year: OR = 1.61, quadratic term: OR = 0.90 and frequency: b= .230, quadratic term: b= -.039; Experiment 4 - objective social class for each person on self-reported volunteering: OR = 2.03, quadratic term: OR = 0.91 and frequency: b= .336, quadratic term: b= -0.48; Experiment 5 - same models as Experiment3 but with a volunteering outcome: Model 1: OR = 1.64, NS quadratic term and frequency: b= .248, NS quadratic term: Model 2, OR = 1.29, NS quadratic term: and frequency: b= .135, NS quadratic term; Experiment 6 - Model 1, objective social class for each person on past 12 month volunteering: OR = 1.18, quadratic term: OR = 0.97 and frequency: b= 0.94, quadratic term: b= -.012; Model 2, six-category subjective social class on volunteering: OR = 1.15, NS quadratic term and frequency: b= 0.76, NS quadratic term; Experiment7 - same models as Experiments 3 and 5 but with a single everyday helping outcome: Model 1, b= .397, NS quadratic term; Model 2: NS and NS quadratic term; Experiment8 - objective social class for each person on behavior in a trust game, player 1: b= .468, player 2: b= .421. Stamos et al.: d = .36 (manipulated subjective SES), opposite direction: r = -.02 (family income).

  • Stanford Prison Experiment employed a simulation of a prison environment to examine the psychological effects of coercive situations. Utilizing role-playing, labeling and social expectations it showed that one third of participants in the role of prison guards displayed aggressive and dehumanizing behaviour.

    • Status: NA
    • Original paper: ‘Interpersonal dynamics in a simulated prison’, Haney, Banks, Zimbardo 1973; experimental and observational study, n=24. [, citations = 2115 (including highly referenced publications), (GS, January, 2022)].
    • Critiques: Le Texier 2019 [commentary, n=NA, citations= 38 (GS, January, 2022)]. Banuazizi & Mahavedi 1975 [methodological analysis, n=NA, citations= 118 (GS, January 2022)]. Festinger 1980 [book, n=NA, citations= 132 (GS, January 2022)]. Haslam, Reicher, & Van Bavel 2019 [methodological analysis, n=NA, citations = 37 (GS, January 2022)]. Griggs & Whitehead 2014 [textbook analysis, n=NA, citations = 37 (GS, January 2022)]. Griggs 2014 [textbook analysis, n=NA, citations = 48 (GS, January 2022)]. Blum 2018 [media coverage, n=NA, citations = 31 (GS, January 2022)]. LeTexier 2020 [preprint, citations= 0 (GS, January 2022)]. Izydorczak & Wicher 2020 [preprint, citations= 0 (GS, January, 2022)]. Reicher & Haslam 2011 [experimental case study but not exact replication of SFE; n = 15, citations ~435 (GS, January 2022)]. Lovibond, Adams, & Adams 1979 [original research but not exact replication of SFE; n = 60, citations= 55 (GS, January, 2022)].
    • Original effect size: Key claims were insinuation plus a battery of difference in means tests at up to 20% significance(!). n = 24, data analysis on 21.
    • Replication effect size: N/A. First, the study has been criticised for the lack of adherence to the experimental methodology. Although the study has been widely described as an ‘experiment’ it lacks many defining features: 1) it does not define the precise set of manipulated variables, 2) it manipulates multiple variables at time without the proper control over the effects of each one, 3) it does not define the dependent variable and how it will be measured, 4) it does not state any clear hypotheses. It is noteworthy that in the original paper, authors present their work as a “demonstration” not an experiment. Second group of serious issues is the degree of researchers’ ad-hoc interventions that were influencing the behaviour of the participants. One of the leading researchers, Philip F. Zimbardo took part in the experimental procedure as the prisons’ “Superintendent”. Another close collaborator of the research team David Jaffe, who initially conceived the idea of the mock-prison study, was playing the role of the “Warden”. Considering that these people knew the goal of the study and were, as later admitted, interested in the particular outcome (a call for reform of the prison system), the ad-hoc intervention, such as encouraging some of the guards to be more strict and ‘tough’, cast a reasonable doubt on the role of experimentator' expectations on the final results of the study. The third group of issues is sampling. Namely, the study has been conducted on a small (n=24, n per condition = 12) and largely unrepresentative sample (all males, all college students of similar age, all residents of the United States). Also, despite the screening procedures of the voluntarily applying candidates, it is still possible that a strong ‘demand characteristic’ and ‘self-selection bias’ may have affected the composition of the sample. All the participants have responded to the newspaper ad about wanting help in “psychological study of prison life”. The last issue with the Stanford Prison Experiment is the interpretation of the results. Even if the discovered effect is trustworthy (and above mentioned issues put this into questions), there is no clear theoretical interpretation of what this finding actually proves. Some critics argue that violent behaviour of the guards may be rooted in their following of a strong leadership, rather than from their immersion into attributed social role.

  • Milgram experiment was a study examining the influence of authority on the immoral behaviour. Participants were assigned the role of ‘teachers’ and they were instructed by the experimentator to administer electric shocks of 15-450 V voltage, whenever the ‘learner’ made a mistake. There were various variants of the study. In the most basic one, 100% of participants agree to administer a 300 V shock and 65% agreed to apply to maximum shock of 450 V.

    • Status: mixed
    • Original paper: ‘Behavioral Study of obedience’, Milgram 1963; experimental study, n=40 (The full range of conditions was n=740.). [citations =8502(GS, March 2023)].
    • Critiques: Sources: Burger 2011 [n=62 transcripts from the earlier experiment, citations= 108 (GS, March 2023)]. Perry 2012 [book, n=NA, citations= 261 (GS, March 2023)]. Brannigan 2013 [n=NA, citations= 14(GS, January 2022)]. Griggs 2016 [n=NA, citations= 28(GS, March 2023)]. Caspar 2020 [n=NA, citations= 25(GS, March 2023)]. Doliński et al. 2017 [n=80, citations= 122(GS, March 2023)]. Blass 1999 [n=NA, citations= 595(GS, March 2023)].
    • Original effect size: 65% of subjects said to administer maximum, dangerous voltage.
    • Replication effect size: Various sources (Burger, Perry, Branningan, Griggs, Caspar): Experiment included many** **researcher degrees of freedom, going off-script, implausible agreement between very different treatments, and “only half of the people who undertook the experiment fully believed it was real and of those, 66% disobeyed the experimenter.”. Doliński et al.: comparable effects to Milgram. Burger: similar levels of compliance to Milgram, but the level didn’t scale with the strength of the experimenter prods. Blass: average compliance of 63%, but suffer from the usual publication bias and tiny samples. (Selection was by a student of Milgram.) The most you can say is that there’s weak evidence for compliance, rather than obedience. (“Milgram’s interpretation of his findings has been largely rejected.").

  • Robbers Cave Study. Utilized arbitrary groupings to demonstrate that tribalism between groups arises spontaneously, and depending on the context, it can result in group competition (e.g., in case of scarce resources) or group cooperation (e.g., in case of superordinate goals and common obstacles)**. **

    • Status: NA
    • Original paper: ‘Superordinate Goals in the Reduction of Intergroup Conflict’, Sherif 1958; field experiment, n=22. [citations= 1,010(GS, February, 2022)]. In addition to the original paper, some related books from the author(s) are also highly cited including: ‘Groups in harmony and tension’, Sherif & Sherif 1958 [citations=2,280 (GS, February, 2022)] and ‘Intergroup Conflict and Co-operation', Sherif et al. 1961 [citations= 253, (GS, February, 2022)]. Overall, the effect accounts to more than 4000 total citations including the SciAm piece.
    • Critiques: Billig 1976 in passing [book, n=NA, citations= 808 (GS, February, 2022), see media mention by Haslam 2018]. Perry 2018 in passing [book, citations= 25 (GS, February, 2022), see also media summary by Shariatmadari 2018 and Haslam 2018]. Tavris 2014 [n=NA, citations= 11(GS, March 2023)] also claims that the underlying “realistic conflict theory” is otherwise confirmed. No definitive conclusion can be reached.
    • Original effect size: N/A. Not reported in conventional format. (Rationale: “results obtained through observational methods were cross-checked with results obtained through sociometric technique, stereotype ratings of in-groups and outgroups, and through data obtained by techniques adapted from the laboratory. Unfortunately, these procedures cannot be elaborated here.")
    • Replication effect size: N/A. Various sources (Billig, Perry, Tavris): No good evidence that tribalism arises spontaneously following arbitrary groupings and scarcity, within weeks, and leads to inter-group violence. The “spontaneous” conflict among children at Robbers Cave was orchestrated by experimenters; tiny sample (maybe 70?); an exploratory study taken as inferential; no control group; there were really three experimental groups - that is, the experimenters had full power to set expectations and endorse deviance; results from their two other studies, with negative results, were not reported. Set aside the ethics: the total absence of consent - the boys and parents had no idea they were in an experiment - or the plan to set the forest on fire and leave the boys to it.

  • Digital technology use and adolescent wellbeing. Adolescents who spent more time on new media (including social media and electronic devices such as smartphones) are more likely to report mental health issues.

  • Anthropomorphism for inanimate objects. Individuals who are lonely are more likely than people who are not lonely to attribute humanlike traits (e.g., free will) to nonhuman agents (e.g., an alarm clock),to fulfill unmet needs for belongingness.

  • Hurricane names. Female-named hurricanes are more deadly than male-named ones. Original effect size was a 176% increase in deaths, driven entirely by four outliers; reanalysis using a greatly expanded historical dataset found a nonsignificant decrease in deaths from female named storms.

    • Status: reversed
    • Original paper: ‘Female hurricanes are deadlier than male hurricanes’, Jung 2014; observational study, n=92 hurricanes discarding two important outliers. [citations = 113(GS, Mar 2022)].
    • Critiques: Christensen 2014 [same data, citations = 114(GS, March 2022)]. Smith 2016 [same data, citations = 8(GS, March 2022)].
    • Original effect size: d=0.65: 176% increase in deaths from flipping names from relatively masculine to relatively feminine.
    • Replication effect size: Smith: 264% decrease in deaths (Atlantic); 103% decrease (Pacific).

  • Implicit bias testing for racism. Implicit bias scores poorly predict actual bias, r = 0.15. The operationalisations used to measure that predictive power are often unrelated to actual discrimination (e.g. ambiguous brain activations). Test-retest reliability of 0.44 for race, which is usually classed as “unacceptable”. This isn’t news; the original study also found very low test-criterion correlations.

  • Pygmalion effect (Rosenthal Effect, self-fulfilling prophecy). Expectations about performance (e.g., academic achievement) impact performance. Specifically, teachers' expectations about their students’ abilities affect those students’ academic achievement; teacher beliefs impact their behaviour which in turn impacts student beliefs and behaviour.

    • Status: not replicated
    • Original paper: ‘Pygmalion in the classroom’, Rosenthal and Jacobson 1968; between-subjects experiment, N_ _= 320. [citations = 13625 (GS, January 2023)]. ‘Teachers’ expectancies: Determinants of pupils' IQ gains', Rosenthal and Jacobson 1966, n around 320. [citations=881, but the popularisation has 13,792 (GS, March 2023)].
    • Critiques: Raudenbush 1984 [n=findings from 18 experiments, citations= 598(GS, March 2023)]. Thorndike 1986 [review, n=NA, citations= 496(GS, March 2023)]. Spitz 1999 [review, n=NA, citations= 147(GS, March 2023)]. Jussim and Harber 2005 [review, n=NA, citations= 1,760(GS, March 2023)].
    • Original effect size: Average +3.8 IQ, d=0.25.
    • Replication effect size: Raudenbush: d=0.11 for students new to the teacher, tailing to d=0 otherwise. Snow: median effect d=0.035.

  • Stereotype threat on Asian women’s mathematical performance, i.e. the interaction between race, gender and stereotyping. This study found that Asian-American women performed better on a math test when their ethnic identity was activated, but worse when their gender identity was activated, compared with a control group who had neither identity activated.

    • Status: Mixed
    • Original paper: ‘Domain-specific Effects of Stereotypes on Performance’, Shih et al.1999; two between-subjects experiments, n1=46, n2=19. [citations = 2,073 (GS, March 2023)].
    • Critiques: Gibson et al. 2014 [n=127, citations= 81(GS, March 2023)]. Moon and Roeder 2014 [n=139, citations= 50(GS, March 2023)].
    • Original effect size: Asian-identity-salient > control > female-identity-salient, r=.27; Asian-identity-salient > female-identity-salient, r=.35.
    • Replication effect size: Gibson et al.: No group differences, η2=.01; Asian-primed vs. female-primed, p=.18, d=.27; Including only those who were aware of the stereotypes, group accuracy p=.02, η2=.04, and the means followed the predicted pattern, Asian (M=.63), Control (M=.55), and Female (M=.51); Likewise, female-primed participants performed worse than Asian-primed participants, p=.02, d=.53. Moon & Roeder: Group accuracy, p=.44, η2=.004; female-primed and Asian-primed conditions, p=.43, d=.17; Analysing just those who were aware of the stereotype, p=.28, η2=.012; female-primed participants vs. Asian-primed participants, p=.28, d=.27.

  • Stereotype threat on girls’ mathematical performance. A situational phenomenon whereby priming a negative gender stereotype (e.g., “women are bad at math”) has a detrimental impact on mathematical performance.

    • Status: mixed
    • Original paper: ‘Stereotype Threat and Women’s Math Performance’, Spencer et al. 1999; Experiment 2, n=30 women. [citations=5076 (GS, June 2022)].
    • Critiques: Stoet & Geary 2012 [meta-analysis, k = 23,.citations= 286(GS, March 2023)]. Flore & Wicherts 2015 [meta-analysis, n=47 measurements, citations= 357(GS, March 2023)]. Flore et al. 2018 [Registered Report n=2064 Dutch high school students, citations= 89(GS, March 2023)].; Agnoli et al. 2021 [conceptual replication with n_ _= 164 ninth grade and n = 164 eleventh grade Italian high school students, citations= 6(GS, March 2023)]. Other reported null results in the literature but not explicit replications, e.g. Ganley 2013 [n=931 across three studies, citations= 195(GS, March 2023)].
    • Original effect size: not reported; Experiment 2: Fig. 2 does not report specific values but appears to be control-group-women (M = 17, SD = 20) compared to experiment-group-women (M = 5, SD = 15), which translates to approximately d= −0.7 (calculated).
    • Replication effect size: Stoet and Geary: d= −0.61 for adjusted and 0.17 [−0.27, −0.07] for unadjusted scores. Together, only the group of studies with adjusted scores confirmed a statistically significant effect of stereotype threat. Flore and Wicherts: g= −0.22 [−0.21, 0.06) and significantly different from zero, but g = −0.07 [−0.21, 0.06] and not statistically significant after accounting for publication bias. Flore et al.: d= −0.05 [−0.18, 0.07]. Agnoli et al.: Both estimated stereotype threat effects were nonsignificant (see also Table S22; https://osf.io/3u2jd), Z = 1.53, p = .25 for ninth grade female participants and Z =.70, p = .97 for eleventh grade female participants.

  • Increase in narcissism (leadership, vanity, entitlement) in young people over the last thirty years. It’s an ancient hypothesis. The basic counterargument is that they’re misidentifying an age effect as a cohort effect (The narcissism construct apparently decreases by about a standard deviation between adolescence and retirement.) “every generation is Generation Me”.

    • Status: not replicated
    • Original paper: ‘The Evidence for Generation Me and Against Generation We’, Twenge 2013; review of various studies, including national surveys [citations=251(GS, March 2022)].
    • Critiques: Donnellan and Trzesniewski [k = 5, n=477,380, citations = 432(GS, March 2022)]. Arnett 2013 [unsystematic review, citations=171(GS, March 2022)]. Roberts 2017 [reanalysis of original data and analysis of new sample n = 476, citations=195(GS, March 2022)]. Wetzel 2017 [1990s: n = 1,166; 2000s: n = 33,647; 2010s: n = 25,412, citations=101(GS, March 2022)].(~660 total citations). Meta-analysis: Hamamura et al. 2020 [total n =24990, citations = 5(GS, March 2022)].
    • Original effect size: d=0.37 increase in NPI scores (1980-2010), n=49,000.
    • Replication effect size: Roberts doesn’t give a d but it’s near 0. something like d=0.03 ((15.65 - 15.44) / 6.59). Wetzel: d = -0.27 (1990 - 2010). Hamamura: d(leadership) = -0.26, d(vanity)=-0.39, d(entitlement) = -0.23.

  • Minimal group effect (Minimal group paradigm). An intergroup bias that manifests as ingroup favouritism (i.e., a tendency to prefer ingroup members) when participants are assigned to previously unfamiliar, experimentally created and largely meaningless social identities. In essence, the paradigm investigates the impact of social categorization on intergroup relations in the absence of realistic conflicts of interests, showing that mere social categorization is sufficient to produce ingroup favouritism.

    • Status: replicated
    • Original paper: ‘Arousal of ingroup-outgroup bias by a chance win or loss’, Rabbie and Horwitz 1969; experimental study, n=112. [citations= 679 (GS, January 2023)].
    • Critiques: Balliet et al. 2014 [meta-analysis, k=212, citations= 930(GS, March 2023)]. Billig and Tajfel 1973; experimental design, n=75. [citations=2232 (GS, January 2023)]. Falk et al. 2014 [Japanese: n1 = 324 Japanese and Americans: n2 = 594, Americans, citations= 58(GS, March 2023)]. Fischer and Derham 2016 [meta-analysis, n = 21,266, citations=70 (GS, March 2023)]. Lazić et al. 2021 [meta-analysis, k = 69, N = 5268, citations=5 (GS, March 2023)]. Kerr et al., 2018 [n_=_412, citations=21 (GS, January 2023)]. Mullen et al. (1992) [meta-analysis, _k_ = 137, citations= 1,867(GS, March 2023)]. [Tajfel 1970 n=64, citations= 4094 (GS, January 2023)]. Tajfel et al. 1971 [n1=64, n2=48, citations=8126 (GS, January 2023)].
    • Original effect size: N/A
    • Replication effect size: Balliet et al.: d= 0.19 (for situations with no mutual interdependence between group members) and d= 0.42 (for situations with strong mutual interdependence between group members). Fischer and Derham: d= 0.369 [0.33, 0.41]. Mullen et al.: r = 0.264. The ingroup bias effect was obtained from a meta-analysis on 74 hypothesis tests derived from artificial groups. Lazić, Purić & Krstić: d = 0.22 [0.07, 0.38]. Kerr et al.: comparing US vs Australian sample, highlights the importance of context-dependent factors (like differences in methodological approach) and cultural variation of MGE; significant main effects of categorization (Group vs. No-group) on allocation measures, ηp2 = 0.031 to 0.081; the ingroup favouritism effect was present in both Context conditions, but was stronger in the public (ηp2= 0.072) than in the private context (ηp2= 0.020). Falk et al. : culture was a significant predictor of resource allocation such that Americans chose more in-group favouring strategies than did Japanese, b = 1.43, z = 9.52, p < .00; American participants were also more likely to show an in-group bias in group identification (in-group vs. out-group comparison, _d _= .94), perceived group intelligence (d = .44), and perceived group personality traits (b = .15, z = 17.51) then Japanese participants (d= .50, d = -.003, b = .04, z = 2.75, respectively).

  • Solomon Asch’s conformity study. The degree to which a person’s own opinions are influenced by those of a group.

    • Status: replicated
    • Original paper: ‘Studies of independence and conformity: I. A minority of one against a unanimous majority of one against a unanimous majority’, Asch, 1956; experimental design, n = 123. [citations = 6558, GS, October 2021]​.
    • Critiques: Friend et al. 1990 [n= 99 accounts in social psychology textbooks, citations = 156 (GS, November 2021)]. Griggs 2015 [n= 20 introductory psychology textbooks and 10 introductory social psychology, citations = 12 (GS, November 2021)]. Bond and Smith, 1996 [meta-analysis, _k=_137, citations = 2,228 (GS, March 2023)]. Criticism focuses on the fact that textbooks exaggerate and misquotate evidence of conformity and omit or diminish evidence of independence.
    • Original effect size: 36.8% of the responses were incorrect (influenced by the majority). The effect has been interpreted by the author as evidence for the prevalence of independence (“The preponderance of judgments was independent, evidence that under the present conditions the force of the perceived data far exceeded that of the majority.”, Asch, 1956, p.24).
    • Replication effect size: Bond and Smith: d = .92[.89-.96], average rate of incorrect answers: 25%. Friend et al./Griggs: The majority of academic textbooks present the study as evidence for overwhelming conformity, failing to report the evidence of independent tendencies among participants. A common practice seen in many academic textbooks and popular writings is to report the value of “75%” or “76%” as the general indicator of conformity. In reality, this is the fraction of respondents who yielded to the majority in at least one of the twelve trials. The reversal of this value (rarely mentioned in the literature) would be 24% - a fraction of completely independent respondents or 95% - a fraction of respondents who remain independent in at least one of twelve trials.

  • Dynamic norms. Information about increasing minority norms increases interest/engagement in minority behaviour.​

  • Social comparison. No robust evidence for an interaction effect between body dissatisfaction and social comparison on fat talk.

    • Status: Mixed
    • Original paper: ‘Who is most likely to fat talk? A social comparison perspective’, Corning and Condoli 2012; correlational study, n = 143. [citation=61(GS, February 2022)]​.
    • Critiques: Pollet et al. 2021 [n = 189 and n = 371, citations=0(GS, February 2022)].
    • Original effect size: Hedges’ g = 0.36.
    • Replication effect size: Pollet et al.: Hedges’ g (meta-analysis)=0.02. Hedges’ g = -0.13 to 0.18.​

  • Bystander effect: claims that the feeling of responsibility diffuses with an increasing number of other observers. Research about the bystander effect was sparked by the 1964 murder of Catherine “Kitty” Genovese. See this New York Times article for details. Here’s a more detailed resource.

    • Status: mixed
    • Original paper(s): ‘Bystander Interventions in Emergencies: Diffusion of Responsibility’, Darley et al. 1968, experiment, n = 59. [citations = 4413 (GS, August 2022)].
    • Critiques: Fischer et al. 2011 meta analysis [n = 7700, citations = 963 (GS, August 2022)].
    • Original effect size: not reported.
    • Replication effect size: Fischer et al.: Hedges’ g= -0.35; Although the present meta-analysis shows that the presence of bystanders reduces helping responses, the picture is not as bleak as conventionally assumed. (…) bystander inhibition is less pronounced especially in dangerous emergencies.

  • Colour red on attractiveness. Viewing the colour red enhances men’s attraction to women. In a lingua franca this effect may reflect the amorous meaning in the human mating game. ​

    • Status: mixed
    • Original paper: ‘Romantic red: Red enhances men’s attraction to women’, Elliot and Niesta 2008; experiment, N = 42. [citation=66 (GS, February 2022)]​.
    • Critiques: Peperkoorn et al. 2016 [n=830, citations=48 (GS, February 2022)]. Pazda et al. 2021 [experiment 1: n = 116; experiment 2: n = 230; experiment 3: n = 230, citations= 3(GS, January 2023)]​.
    • Original effect size: Experiment 1: d = 1.11; Experiment 2: η2p = .08; Experiment 3: attractiveness: η2p = .11, sexual desire: η2p = .19, desired sexual behaviour η2p = 13; Experiment 4: attractiveness: d = 0.73, sexual desire: d = 1.55, desired sexual behaviour d = 1.11; Experiment 5: attractiveness: d =0.86, sexual desire: d = 1.00, desired sexual behaviour d = 1.11;_ _asking someone on a date: _d_ = 0.95; spending on a date: _d_ = 1.35.
    • Replication effect size: Peperkoorn et al.: Study 1: η2p= .03 (in support of white more attractive than red); Study 2: F = .07;​ Study 3: d = −.12. Pazda et al.: Experiment 1: sexually receptive: d = .42, attractive: d = .30, sexually appealing d = .47; Experiment 2: sexually receptive: d = .25, attractive: d = .16, perceptions of sexually appealing d = .23; Experiment 3: sexually receptive: d = .75, attractive: d = .54, sexually appealing d = .63.

  • Big brother effect. Being watched makes someone more likely to cooperate.

  • Imagined Contact - Bias. Imagining social contact (instead of having actual contact) with someone from an outgroup (based on e.g., ethnicity, sexuality, religion, age) can reduce intergroup bias.

    • Status: mixed
    • Original paper: ‘Imagining intergroup contact can improve intergroup attitudes’, Turner et al. 2007; three experiments, Study 1: N = 28, Study 2 = 24, Study 3 =27. [citations = 633 (GS, October 2022)].
    • Critiques: Firat and Ataca 2020 [N = 335 citations = 9 (GS, October 2022)], Hoffarth and Hodson 2016 [Study 1: N = 261, Study 2: N = 320 citations = 36 (GS, October 2022)]. Miles and Turner 2014 [meta-analysis, k = 71, N = 5,770 citations= 450 (GS, October 2022)].
    • Original effect size: Study 1: d = .42. Study 2: ηp² = 0.20. Study 3: d= 0.86 (as calculated for this entry, using Lakens’ tool).
    • Replication effect size: Firat and Ataca: ηp2 = .01. Hoffarth and Hodson: Study 1 (concerning gay people): many outcomes, largest β = .10; Study 2 (concerning Muslims): many outcomes, largest β = .095. Miles and Turner: overall d = .35 [0.26, 0.44].

  • Imagined Contact - Intentions. The claim that imagining social contact (instead of having actual contact) with someone from an outgroup (based on e.g., ethnicity, sexuality, religion, age) can increase contact intentions.

    • Status: mixed
    • Original paper: ‘Elaboration enhances the imagined contact effect’, Husnu and Crisp 2010; two experiments, Study 1:n = 33, Study 2: n = 60. [citations = 278 (GS, October 2022)].
    • Critiques: Klein et al. 2014 Many Labs study [n = 6344, citations = 1082 (GS, June 2022)]; Crisp et al. 2014 [citations = 16 (GS, October 2022] reply to Klein et al. stating that the effect size was significant and comparable to that obtained in the Miles and Crisp 2014 [citations = 450 (GS, October 2022)] meta-analysis for the relevant outgroup, suggesting that the Many Labs project may provide stronger evidence than originally thought.
    • Original effect size: Study 1: d= 0.86, Study 2: d= 1.13.
    • Replication effect size: Klein et al.: d= 0.13 [0.00, 0.19] (NB: original study focused on ‘British Muslims’ - this on Muslims across cultures). Miles and Crisp: d= 0.35 and estimate for religious groups, d= 0.22. Crisp et al.: the observed effect size of 0.13 in the Many Labs study is substantially different from the original Husnu and Crisp study, and from our overall estimate of 0.35, but not from the most appropriate comparison: The meta-analytic estimate for religious outgroups (0.22).

  • Stereotype susceptibility effects. Awareness of stereotypes about a person’s in-group can affect a person’s behaviour and performance when they complete a stereotype-relevant task.​

  • Positive mood-boost helping effect. People are more likely to do good when feeling good.

    • Status: mixed
    • Original paper: Isen and Levin 1972; experiment, Experiment 1: n = 52 male undergraduates, Experiment 2: n = 41 adults. [citations=1,881 (GS, October 2022)]​.
    • Critiques: Batson et al. 1979 [n = 40, citations=132 (GS, June 2022)]. Blevins and Murphy 1974 [n = 51, citations=50 (GS, October 2022)]. Carlson et al. 1988; meta-analysis [k = 61 from 34 papers (N not reported), citations = 862(GS, March 2023)]​. Weyant and Clark 1977 [Study 1 n = 64, Study 2 n = 106, citations=39 (GS, October 2022)]. Failed replications: Job 1987 [n=100 letters placed under the windshield wipers of cars, citations=38(GS, March 2023)].​
    • Original effect size, calculated: Study 1: OR = 2.25, Study 2: OR = 168 [no typo, both calculated].
    • Replication effect size: Batson et al.: OR = 4.3 [calculated]. Carlson et al.: d= .54 [reported]. Weyant & Clark: Study 1: OR = 4.2 (calculated, between dime and no-dime, excl. 2 other conditions), Study 2: OR = 0.7 [calculated]. Blevins & Murphy: OR = 0.9 [calculated]. Job: negative mood increases helping behaviour so that control vs neutral might be insufficient.

  • Superiority-of-unconscious decision-making effect (deliberation without attention effect). While conscious reflection produces better choices on simple tasks, complex choices “should be left to unconscious thought”.​

    • Status: mixed
    • Original paper: ‘On Making the Right Choice: The Deliberation-Without-Attention Effect’, Dijksterhuis et al. 2005; Study 1: n = 80, Study 2: n = 59) that show better choices (and two surveys that show greater satisfaction, not focus here). [citations = 1807 (GS, October 2022].
    • Critiques: ​ Meta-analysis: Acker 2008 [n=888 across 17 studies, citations=233(GS, November 2022)]. Meta-analysis: Nieuwenstein et al. 2015 [n=4518 across 67 studies, citations=103(GS, November 2022)].
    • Original effect size: All reported in Acker: Study 1: ηp2 = 0.06 / g = 0.434 to Study 2: 0.11 / g = 0.242 for interaction between choice complexity and deliberation. Main effects and descriptives not reported.
    • Replication effect size: All reported in Acker: Acker: g = 0.471. Ham et al.: g = 0.883 to g = 1.055. Lerouge: g = -0.064 to g = 1.116. Newell et al.: g = -0.504 to g = 0.722. Payne et al.: g = -0.483 to g = 0.722. Phillips et al.: g = -0.251. The mean effect size was g = .251.​ All reported in Nieuwenstein et al.: Abadie et al.: g = -0.62 to g = 0.22. Aczel et al.: g = -0.35. Ashby et al.: g = -0.21 to g = 1.00. Bos et al.: g = -0.10 to g = 1.48. Calvillo and Penaloza: g = -0.29 to g = -0.09. Dijksterhuis: g = 0.24 to g = 0.42. Dijksterhuis et al.: g = 0.70 to g = 0.86. González et al.: g = 0.00. Hasford: g = 0.43. Hess et al.: g = -0.14. Huizenga et al.: g = -0.50 to g = -0.33. Lassiter et al.: g = 0.27 to g = 0.51. Lerouge: g = 0.38 to g = 0.47. McMahon et al.: g = 0.62 to g = 0.67. Messner et al.: g = 0.63. Newell et al.: g = -0.50 to g = 0.17. Newell and Rakow: g = -0.37 to g = 0.31. Nieuwenstein and Van Rijn: g = -0.74 to g = 0.87. Nieuwenstein et al.: g = -0.01. Nordgren et al.: g = 0.27 to g = 0.36. Payne et al.: g = -0.10. Queen and Hess: g = -0.21. Rey et al.: g = 0.27. Smith et al.: g = 0.25 to g = 0.32. Strick et al.: g = 0.58 to g = 1.21. Thorsteinson and Withrow: g = 0.18 to g = 0.34. Usher et al.: g = 0.78 to g = 1.04. Waroquier et al.: g = -0.09 to g = 0.35. Pooled effect size of g = 0.15 [0.03, 0.26].

  • Behavioural-consequences-of automatic-evaluation (affective compatibility effect). Automatic classification of stimuli as either good or bad have direct behavioural consequences.​ Automatic evaluation results directly in behavioural predispositions toward the stimulus, such that positive evaluations produce immediate approach tendencies, and negative evaluations produce immediate avoidance tendencies.

    • Status: mixed
    • Original paper: ‘Consequences of Automatic Evaluation: Immediate Behavioral Predispositions to Approach or Avoid the Stimulus’, Chen and Bargh 1979; two mixed design experiments, Study 1: n= 42, Study 2: n = 50. [citations = 1943 (GS, October 2022)]​.
    • Critiques: Rotteveel et al. 2015 [Study 1: n=100, Study 2: n=50, citations = 35(GS, October 2022)]. Meta-analysis: Phaf et al. 2014 [N=1538 across 29 studies, citations=271(GS, October 2022)].
    • Original effect size: Study 1 (conscious evaluation) – congruence factor main effect ηp2= 0.168 / d = 0.44 [_ηp2 _calculated from reported F statistic and converted using this conversion]; Study 2 (automatic evaluation) – congruence factor main effect ηp2 = 0.078 / d = 0.29 [_ηp2 _calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Rotteveel et al.: Study 1 – Evaluative judgement × Lever movement interaction effect ηp2 = 0.030 [reported, non-significant] / d = 0.17 [converted using this conversion], Study 2 – Affective valence × Lever movement interaction effect ηp2 = 0.057 [reported, marginally significant] / d = 0.24 [converted using this conversion]. Phaf et al.: Positive emotions – The average effect size differed significantly from zero for explicit instructions to evaluate (g = 0.287; p = 0.0001; 95% CI = 0.204, 0.369) and for explicit-converted instructions (g= 0.287; p= 0.0001; 95% CI = 0.146, 0.429), but not for implicit instructions (g= 0.028; p= 0.572) [all reported]. Negative emotions – Effect sizes differed significantly from zero for explicit-converted instructions (g= 0.389; p= 0.001; 95% CI = 0.155, 0.624) and for explicit instructions (g= 0.249; p = 0.0001; 95% CI = 0.159, 0.339), but not for implicit instructions (g= 0.103; p= 0.0959) [all reported]. Both emotions – The average effect size differed significantly from zero for explicit-converted instructions (g= 0.433; p = 0.0001; 95% CI = 0.295, 0.571) and explicit instructions (g= 0.403; p = 0.0001; 95% CI = 0.286, 0.521), but not for implicit instructions (g= 0.076; p= 0.148) [all reported].

  • Self-control relies on glucose effect. Acts of self-control decrease blood glucose levels; low levels of blood glucose predict poor performance on self-control tasks; initial acts of self-control impair performance on subsequent self-control tasks, but consuming a glucose drink eliminates these impairments.

    • Status: mixed
    • Original paper: ‘Self-control relies on glucose as a limited energy source: Willpower is more than a metaphor’, Gailliot et al. 2007; 9 experiments with: Study 1 (self-control decreases blood glucose): n= 103; Study 2 (self-control decreases blood glucose): n= 37; Study 3 (low levels of blood glucose predict poor performance on self-control tasks): n= 15; Study 4 (low levels of blood glucose predict poor performance on self-control tasks): n= 10; Study 5 (low levels of blood glucose predict poor performance on self-control tasks): n= 19; Study 6 (low levels of blood glucose predict poor performance on self-control tasks): n= 15; Study 7 (glucose consumption): n= 61; Study 8 (glucose consumption): n= 72; Study 9 (glucose consumption): n= 17. [citations=1956(GS, June, 2022)].
    • Critiques: Meta-analysis: Hagger et al. 2010 [citations= 2638 (GS, June, 2022)]. Lange and Egger 2014 [n= 70, citations= 114 (GS, June 2022)]. Lange and Egger also points at statistical mistakes in the meta-analysis of Hagger et al.
    • Original effect size: Study 1 (self-control decreases blood glucose): ηp2 = 0.057 [calculated from the reported F(1, 100) = 6.08 using this conversion]; Study 2- discussing a sensitive topic with a member of a different race used up a significant amount of glucose among people with low Internal Motivation to Respond Without Prejudice scale (IMS), _b _=-3.28; Study 3 (low levels of blood glucose predict poor performance on self-control tasks): r= -0.62, Study 4 (low levels of blood glucose predict poor performance on self-control tasks): r= 0.56, Study 5 (low levels of blood glucose predict poor performance on self-control tasks): r= 0.45, Study 6 (low levels of blood glucose predict poor performance on self-control tasks): r= 0.43. Study 7 (glucose consumption): ηp2 = 0.081 [calculated], Study 8 (glucose consumption): ηp2 = 0.073 [calculated], , Study 9 (glucose consumption): d= 1.518 [calculated].
    • Replication effect size: Hagger et al.: for glucose consumption: d = 0.75 (includes the original study); for decrease of blood glucose levels: d= -0.87 (includes the original study). Lange & Egger: for glucose consumption: ηp2 = 0.02.

  • Physical warmth promotes interpersonal warmth. Exposure to physical warmth will lead to more positive judgments of strangers and an increase in prosocial behaviour (e.g., gift-giving).

    • Status: not replicated.
    • Original paper: ‘Experiencing physical warmth promotes interpersonal warmth’, Williams and Bargh 2008; between-subjects experiments, n1=41, n2=53 [citations = 1,894 (GS, October 2022)].
    • Critiques: Chabris et al. 2018 [Experiment 1 (attempted to replicate Experiment 1 of Williams and Bargh 2008): n = 128, Experiment 2 (attempted to replicate Experiment 2 of Williams and Bargh 2008): n = 177, citations = 53 (GS, October 2022)]. Lynott et al. 2014 [Sample 1: n = 306 (Ohio, USA), Sample 2: n = 250 (Michigan State University, USA), Sample 3: n = 305 (University of Manchester, UK),citations = 140 (GS, October 2022)] (Note: All samples attempted to replicate Experiment 2 of Williams and Bargh 2008).
    • Original effect size: Experiment 1 (estimated from test-statistic): d = 0.65 (people tended to give more positive ratings after holding a warm drink), Experiment 2 (converted from Lynott et al. 2014’s OR reported for this study): _d _= 0.65 (people were more likely to give a gift to a friend than themselves after holding a warming pad).
    • Replication effect size: Chabris et al.: Experiment 1: d = -0.06 (not replicated, converted from r statistic reported), Experiment 2: d = 0.04 (not replicated, converted from r statistic reported). Lynott et al.: Sample 1: d = -0.27 (opposite direction, converted from OR reported in paper), Sample 2: d = -0.05 (not replicated, converted from OR reported in paper), Sample 3: d = -0.14 (not replicated, converted from OR reported in paper).

  • Power impairs perspective-taking effect. Individuals made to feel high in power were more likely to inaccurately assume that others view the social world from the same perspective as they do.

    • Status: not replicated
    • Original paper: ‘Power and Perspectives Not Taken’, Galinsky et al. 2006; 3 between-subjects experiments, each with two conditions, Experiment 1: n = 57, Experiment 2a: n = 42, Experiment 2b: n = 51, Experiment 3: n = 70. [citations = 1550 (GS, June 2022)].
    • Critiques: Experiment 2a: Ebersole et al. 2016 [n = 2,969, citations = 438 (GS, June 2022)].
    • Original effect size: d = .77 [0.12,1.41] obtained from Ebersole et al. (2016).
    • Replication effect size: Ebersole et al.: d = .03 [− 0.04, 0.10].

  • Status-legitimacy effect. Members of low-status, disadvantaged, and marginalised groups are more likely to perceive their social systems as legitimate than their high-status and advantaged counterparts under certain circumstances. People who are most disadvantaged by the status quo, due to the greatest psychological need to reduce ideological dissonance, are most likely to support, defend, and justify existing social systems, authorities, and outcomes.

    • Status: mixed
    • Original paper: ‘Social inequality and the reduction of ideological dissonance on behalf of the system: evidence of enhanced system justification among the disadvantaged’, Jost et al. 2003; five cross-sectional / correlational studies, Study 1: n = 1345, Study 2: = 2485, Study 3: = 1396, Study 4: n = 2223, Study 5: n = 788. [citations =927(GS, October 2022)]​.
    • Critiques: (https://link.springer.com/article/10.1007/s11211-006-0012-x)​Brandt 2013 [n=151,794, citations=271(GS, October 2022)]. Caricati 2017 [n=38,967, citations=50(GS, October 2022)]. Henry and Saul 2006 [n=356, citations=156(GS, October 2022)].
    • Original effect size: Study 1 – effect of income, b = -0.22, race (European Americans vs. African Americans), b = -0.73, and education, b = -0.30, on willingness to limit the press; effect of income, b = -0.31, race (European Americans vs. African Americans),b = -1.01, and education,b = -0.38, on the attitudes of the rights of citizens, Study 2 – effect of income, b = 0.06, and education,_ b_ = -0.08, on trust in government officials among Latinos; Study 3 – effect of income on belief that large income differences are necessary to get people to work hard, _b_ = 0.04, and as an incentive for individual effort, _b_ = 0.02, Study 4 – main effects of region (North vs. South), _ηp2_ = 0.128 / _d_ = 0.38, and income, _ηp2_ = 0.09 / _d_ = 0.31, on meritocratic beliefs among African Americans [_ηp2_ calculated from the reported F statistic and converted using this conversion], Study 5 – effect of socio-economic status, _b_ = -0.34, and race (White versus Black), _b_ = -0.25, on legitimation of income inequality.
    • Replication effect size: Henry and Saul: group status effects on the support for of the dissent, ηp2 = 0.019 / d = 0.14, government approval, ηp2 = 0.024 / d = 0.16, and alienation from government, ηp2 = 0.024 / d = 0.16 [ηp2 calculated from the reported F statistic and converted using this conversion] (replicated).​ Caricati: effects of the top-bottom self-placement, b = 0.117, social class, b = 0.075, and personal income, b = 0.022, on perceived fairness of income distribution [all significant, reversed]. Brandt: effects of income on trust in government and confidence in societal institutions in various multilevel regression models b= -0.014 to b= 0.005 [all non-significant, not replicated]; effects of education on trust in government and confidence in societal institutions in various multilevel regression models b= -0.044 [significant, replicated] to b= 0.021 [significant, reversed]; effects of social class on trust in government and confidence in societal institutions in various multilevel regression models b= 0.055 [significant, reversed] to b= 0.110 [significant, reversed]; effects of race on trust in government and confidence in societal institutions in various multilevel regression models b= -0.019 [non-significant, not replicated] to b= 0.017 [significant, reversed]; Overall, only one effect out of the 14 was supportive, six effects were significant and positive (reversed) and the remaining seven effects were not significantly different from zero.

  • Red impairs cognitive performance. The colour red impairs performance on achievement tasks, as red is associated with the danger of failure and evokes avoidance motivation.

  • Reduced prosociality of high SES effect. Higher socioeconomic status predicts decreased prosocial behaviour. Affluence may be linked with reduced empathy and poverty may be linked with increased empathy.

    • Status: mixed
    • Original paper: ‘Having less, giving more: the influence of social class on prosocial behavior’, Piff et al. 2010; correlational and experimental design: self-report and behavioural measure of altruism, total N = 394. [citations=1633(GS, October 2022)]​.
    • Critiques: Andreoni et al. 2021 field experiment [n=360, citations=27(GS, October 2022)]. Stamos et al. 2020, preregistered replications [Study 1 n=300, Study 2 n=200, citations=25(GS, October 2022)].
    • Original effect size: mean r= −0.215.
    • Replication effect size: Andreoni et al.: mean r=.37 (reversed).​ Stamos et al.: r =0.01 (non-significant).

  • Moral licensing effect (self-licensing, moral self-licensing, licensing effect) is the effect that acting in a moral way makes people more likely to excuse and perform subsequent immoral, unethical, or otherwise problematic behaviours.

    • Status: not replicated
    • Original paper: ‘Sinning Saints and Saintly Sinners’, Sachdeva et al. 2009; three experiments using a priming-task where participants write a story about themselves using neutral/negative/positive traits, US student sample, Study 1 & 3: n = 46. [citations=919 (GS, June 2022)].
    • Critiques: Blanken et al. 2014 (direct replication of 2 of the original studies, 3 replication studies with 2 different populations) [Study 1: n = 105, Study 2: n = 150, Study 3: n = 940, citations = 81(GS, June 2022)]. Blanken et al. 2015 [meta-analysis, total n = 7,397, citations = 470(GS, June 2022)]. Simbrunner and Schlegelmilch 2015 [meta-analysis, k = 106 (n data points not reported), citations = 37(GS, June 2022)]. Kuper and Bott 2019 [re-analysis of the meta-analyses above, adjustment for publication bias, k=76 citations = 27(GS, June 2022)]. Urban et al. 2019 [failed conceptual replication of Mazar and Zhong 2010, moral licensing in the domain of environmental behaviour, 3 studies, total n  =  1274, citations = 62(GS, March 2023)]. Rotella and Barclay 2020 [failed pre-registered conceptual replication of the effect, n = 562, citations = 21(GS, March 2022)].
    • Original effect size: Study 1: d = 0.62 [-0.11, 1.35]; Study 3: d = 0.59 [-0.12, 1.30] (effect sizes taken from replication paper by Blanken et al.).
    • Replication effect size: Blanken et al: replication Study 1 (Dutch student sample): d = -0.03 [-0.51, 0.45]; replication Study 2 (Dutch student sample): d = -0.31 [-0.70, 0.08]; replication Study 1 & 3 ­­(US MTurk sample)­­: d = 0.05 [-0.15, 0.25]. Blanken et al.: meta-analysis, mean effect of d = 0.31 [0.23, 0.38]. Kuper and Bott: adjusted effect sizes: d= -0.05 (PET-PEESE) and d= 0.18 (3-PSM). Simbrunner and Schlegelmilch: mean effect of d = 0.319 [0.229, 0.408].

  • Colour on approach/avoidance. Red (versus blue) colour induces primarily an avoidance (versus approach) motivation and enhances performance on a detail-oriented task, whereas blue enhances performance on a creative task.​

    • Status: not replicated
    • Original paper: ‘Blue or Red? Exploring the Effect of Color on Cognitive Task Performances’, Mehta and Zhu 2009; six studies, studies 1-5 between-subject experiments, study 6 correlational, Study 1: n = 69, Study 2: n = 208, Study 3: n = 118, Study 4: n = 42, Study 5: n = 161, Study 6: n = 68. [citations=1003 (GS, November 2022)]​.
    • Critiques: Steele et al. 2010 direct replication of Mehta an d Zhu Study 1 [n=172, citations=2(GS, November 2022)]. Steele et al. 2013 direct replication of Mehta andZhu Study 1 [n=263, citations=2(GS, November 2022)]. Steele 2014 direct replication of Mehta and Zhu Study 1 [n=263, citations=45(GS, November 2022)].​
    • Original effect size: Study 1 – blue versus red condition comparison for approach-related anagrams, d = 0.81, and for avoidance-related anagrams d = 0.96; Study 2 – blue versus red condition on detailed-oriented task, d = 0.64, and on creative task, d = 0.6; Study 3 - blue versus red condition on detailed-oriented task, d = 1.05, and on creative task, d = 0.69; Study 4 - blue versus red condition on the practicality of the designed toy, d = 0.64, and on the originality/novelty of the designed toy, d = 0.67; Study 5​ - blue versus red condition on detailed-oriented processing style, d = 0.42, and creative thinking, d = 0.56;
    • Replication effect size:​ Steele et al.: colour by word-type interaction ηp2 = 0.038 / d = 0.20 [_ηp2 _calculated from the reported F statistics and converted using this conversion] (not replicated).​ Steele et al.: colour by word-type interaction ηp2 = 0.014 / d = 0.12 [_ηp2 _calculated from the reported F statistics and converted using this conversion] (not replicated).​ Steele: The colour X word type interaction ηp2 = 0.007 [reported] / d = 0.083 [converted using this conversion] (not replicated).

  • Playboy Effect. Men exposed to erotic images of the opposite-sex will report lower ratings of love for their partner and lower ratings for their partners sexual attractiveness compared to men exposed to abstract art. This effect was not found in women in either the original or replication attempts.

    • Status: not replicated
    • Original paper: ‘Influence of Popular Erotica on Judgments of Strangers and Mates’, Kenrick et al. 1989; between-subjects design, Experiment 2: n_ _= 30. [citations = 399 (GS, October 2022)].
    • Critiques: Balzarini et al. 2017 [Experiment 1: n_ _= 124, Experiment 2: n = 170, Experiment 3: n_ _= 121, meta-analysis n = 445, citations = 37 (GS, October 2022)].
    • Original effect size: Reduced sexual attraction to partner (d= 1.05), reduced love for partner (d= 0.77). Effect sizes estimated from test-statistics reported in paper.
    • Replication effect size: Balzarini et al.: Sexual attraction to partner: Experiment 1: d= 0.07 [-0.29, 0.42] (not replicated); Experiment 2: d= -0.10 [-0.40, 0.20] (opposite direction, but non-significant); Experiment 3: d= -0.15 [-0.51, 0.21] (opposite direction, but non-significant); Meta-analysis: d= 0.02 [-0.21, 0.24] (not replicated); Love for partner: Experiment 1: d= -0.19 [-0.55, 0.16] (opposite direction, but non-significant); Experiment 2: d= -0.10 [-0.40, 0.20] (opposite direction, but non-significant); Experiment 3: d= 0.16 [-0.20, 0.52] (not replicated); Meta-analysis: d= 0.02 [-0.22, 0.26] (not replicated). (Note: For the effects reported, only male effects are considered given these were the only significant effects found. As such, the number of subjects reported for the studies and the effect sizes account for only the male participants.)

  • Self-protective subjective temporal distance effect. Participants reported that negative events in their own lives felt farther away than positive events in their own lives, and this effect was stronger for participants higher in self-esteem.

    • Status: not replicated
    • Original paper: ‘It feels like yesterday: Self-esteem, valence of personal past experiences, and judgments of subjective distance’, Ross and Wilson 2002; Three studies: Study 1: N = 557 was a correlational study; Study 2: N = 357 was an experiment with two main predictors: recalled grade condition (best vs. worst; between-subjects) and self-esteem (measured); Study 3: N = 107 was an experiment with three main predictors: agent (self vs. acquaintance; between-subjects), valence of recalled experience (positive vs. negative; between-subjects), and self-esteem (measured). [citations = 462 (GS, June 2022)]. Study 2 was the one Many Labs 3 replicated.
    • Critiques: Ebersole et al. 2016 [n = 3433, citations = 438 (GS, June 2022)].
    • Original effect size: ηp2 = .0185 (based on transforming from beta of -.136).
    • Replication effect size: Ebersole et al.: ηp2 = .0001.

  • Trait loneliness hot shower effect. People self-regulate their feelings of social warmth (connectedness to others) through applications of physical warmth of shower or bath, without explicit awareness of this substitution. Loneliness as a form of “social coldness” can be relieved by applying physical warmth.

    • Status: not replicated
    • Original paper: ‘The substitutability of physical and social warmth in daily life’, Bargh and Shalev 2012; 4 experiments, n=403 across 4 experiments. [citations=414(GS, October 2022)]​.
    • Critiques: Donnellan et al. 2014 replicated Study 1 [n=3073 across 9 studies, citations=104 (GS, October 2022)]. See also reply to Donnellan et al. 2014 by Shalev and Bargh 2015 [n=555 across three samples, citations=6 (GS, October 2022). Wortman et al. 2014 replicated study 2 [n=260, citations=19(GS, October 2022)].
    • Original effect size: r = .57 (Study 1a; n=51) and r = .37 (Study 1b; n =41)
    • Replication effect size: Donnellan et al.: r = -.01 to .10 (but statistically indistinguishable from zero). Shalev and Bargh: loneliness-warmth index correlation for showering r = .143 and for baths r = .093 (replicated). Wortman et al.: warm vs. cold condition d = 0.02 [reported, non-significant].

  • American flag priming boosts Republican support. Subtle exposure to the American flag causes people to report more conservative, Republican beliefs and attitudes.

    • Status: not replicated
    • Original paper: ‘A Single Exposure to the American Flag Shifts Support Toward Republicanism up to 8 Months Later’, Carter et al. 2011; two experiments; Experiment 1: N = 235 in Session 1 (exposure to prime), 197 in Session 2 (appx. two weeks later, right before the 2008 presidential election), 191 in Session 3 (the week after this election), 75 in Session 4 (eight months after this election); Experiment 2: N = 70. [citations = 197 (GS, June 2022)].
    • Critiques:​ Klein et al. 2014 [n = 6344, citations = 1082 (GS, June 2022)]
    • Original effect size: d= .50
    • Replication effect size: Klein et al.: median d = .02.

  • Superstition boosts performance effect. The irrational belief that certain objects (e.g., lucky charms) or beliefs (e.g., religion) will benefit performance in a task.​

    • Status: not replicated
    • Original paper: ‘Keep Your Fingers Crossed!: How Superstition Improves Performance’, Damisch et al. 2010; between-subjects, Experiment 1: n = 28, Experiment 2: n = 51, Experiment 3: n = 41, Experiment 4: n = 31. [citations = 386 (GS, February 2023)].
    • Critiques: Aruguete et al. 2012 [Experiment 1: n = 141, Experiment 2: n = 139, citations = 12 (GS, October 2022)]. Calin-Jageman and Caldwell 2014 [Experiment 1: n = 124, Experiment 2: n = 111, Meta-analysis: n = ~719 participants, k = 11 studies, citations = 42 (GS, October 2022)]. Dickhäuser et al. 2020 [Experiment 1:_ n_ = 101, Experiment 2: _n_ = 175, citations = 0 (GS, October 2022)]. Lee et al. 2011 [n_ _= 40, citations = 78 (GS, October 2022)].
    • Original effect size: Experiment 1: d = 0.83; Experiment 2: d = 0.72 to d = 0.98; Experiment 3: d = 0.66; Experiment 4: d = 0.77.
    • Replication effect size:​ Aruguete et al.: Experiment 1: d = -0.07 (estimated from test-statistic regarding logical reasoning test) (not replicated); Experiment 2: ηp2 = 0.01 (estimated from test-statistic regarding logical reasoning test before exploratory analysis) (not replicated). Calin-Jageman and Caldwell: Experiment 1: d = 0.05 (not replicated); Experiment 2: d = 0.05 (not replicated); Meta-analysis: d = 0.40 [0.14, 0.65] (notably, this is heavily biassed by the effect size estimates of Damisch et al., 2010). Dickhäuser et al.: ES = NA. Unable to access article, but Abstract suggests both studies failed to replicate the Damisch et al. (2010) effect (not replicated). Lee et al. (Supplementary Materials: d = 0.74 (replicated).

  • Unethicality darkens perception of light (El Greco fallacy). Recalling abstract concepts such as evil (as exemplified by unethical deeds) and goodness (as exemplified by ethical deeds) can influence the sensory experience of the brightness of light. Recalling unethical behaviour led participants to see the room as darker and to desire more light-emitting products (e.g., a flashlight) compared to recalling ethical behaviour.

    • Status: not replicated.
    • Original paper: ‘Is It Light or Dark? Recalling Moral Behavior Changes Perception of Brightness’, Banerjee et al. 2012; two between-subjects experiments, Experiment 1: n = 40, Experiment 2: n = 74. [Citations= 194 (GS, October 2022)]​.
    • Critiques: Brandt et al. 2014 [online Study 1: n=475, online Study 2: n=482, lab Study 1: n=100, lab Study 2: n=121; meta-analysis: k=11, N not reported, citations=31(GS, October 2022)]. Firestone & Scholl 2013 [Experiment 4 n=89, Experiment 5 n=91, citations=266(GS, October 2022)].
    • Original effect size: perceived brightness – d= 0.65 [reported]​, estimated watts d= 0.64 [reported], lamp preference_ – d_= 1.23 [reported], candle preference – _d_= 0.79 [reported], flashlight preference _– d_= 1.33 [reported]
    • Replication effect size: All effect sizes reported in Brandt et al.: ​Brandt et al.: d= 0.12 [-0.46, 0.10] (non-significant, online study 1). Brandt et al.: d= -0.11 [-0.50, 0.28] (non-significant, lab study 1). estimated watts –Brandt et al.​: d= 0.05 [-0.15, 0.25] (online study 2). Brandt et al.: d= 0.03 [-0.36, 0.42] (non-significant, lab study 2). lamp preference – Brandt et al.​: d= -0.03 [-0.23, 0.17] (non-significant, online study 2). Brandt et al.: d= -0.11 [-0.35, 0.33] (non-significant, lab study 2). candle preference –Brandt et al.: d= 0.03 [-0.31, 0.37] (non-significant, online study 1). Brandt et al.: d= 0.01 [-0.33, 0.35] (non-significant, lab study 2). flashlight preference –Brandt et al.: d= -0.10 [-0.30, 0.10] (non-significant, online study 2). Brandt et al.: d= -0.09 [-0.25, 0.43] (non-significant, lab study 2). Meta-analytic estimate: effects on brightness judgements mean d = 0.14 [0.002, 0.28], desirability of light-emitting products mean effect size of d = 0.13 [-0.04, 0.29]. perceived brightness – Firestone and Scholl: d= 0.38 [-0.06, 0.82] (non-significant, study 4)​. Firestone and Scholl: d= 0.46 [0.02, 0.90] (study 5).

  • Fertility on voting (Ovulation effect). Ovulatory (or high-fertility) phase of the menstrual cycle affects voting preferences and has different effects on women who are single then women who are in committed relationships. Single women were more likely to vote for Barack Obama (liberal/Democrat candidate) if they were ovulating then if they were not, while the opposite was true for women in committed relationship – ovulation led them more likely to vote for Mitt Romney (conservative/Republican candidate).​

    • Status: mixed
    • Original paper: The fluctuating female vote: politics, religion, and the ovulatory cycle’, Durante et al., 2013; between-subjects design, two studies, Study 1: n = 275 women, Study 2: n = 502 women. [Citations = 117 (GS, October 2022)]​.
    • Critiques: Harris et al. 2014 [n = 1,206, citations=15 (GS, October 2022)].​
    • Original effect size: single women d = 0.32 [reported], women in relationships d = 0.37 [reported]​.
    • Replication effect size: Harris et al.: hypothetical voting preferences – single women d = 0.01 [reported; non-significant], women in relationships d = 0.37 [reported]​; actual voting behaviour - single women d = 0.40 [reported], women in relationships d = 0.02 [reported; non-significant]​.​

  • Modulation of 1/f noise on the weapon identification task. Making an effort to modulate the use of racial information decreases the emission of 1/f noise.

    • Status: not replicated
    • Original paper: ‘1/f noise and effort on implicit measures of bias’, Correll 2008; mixed-model design experiment, Study 2: n= 71. [citations=88 (GS, October 2022)].
    • Critiques: Madurski and LeBel 2015 [Sample 1: n=148, Sample 2: n=148, citations=6 (GS, October 2022)].
    • Original effect size: d=.59.
    • Replication effect size: Madurski and LeBel: Sample 1: d=.16; Sample 2: d=-0.09.​

  • Time is money effect. Putting a price on time can influence enjoyment of leisure activities as individuals get more impatient if they are compensated for engaging in these activities. ​

    • Status: not replicated
    • Original paper: Time, money, and happiness: How does putting a price on time affect our ability to smell the roses? ’, DeVoe et al. 2012; 3 experimental studies, Study 1: N = 53; Study 2: N = 401; Study 3: N =205. [citations = 119 (GS, June, 2022)]​.
    • Critiques: Connors et al. 2016 [replication attempt 1: N = 266; replication attempt 2: N = 254; citations = 29(GS, June, 2022)].​
    • Original effect size: Study 1: ηp2=.119; Study 2: ηp2 =.019; Study 3: ηp2 =.031.
    • Replication effect size: Replication of Study Connors et al.: Replication of Study 3, attempt 1 : ηp2 =.026; attempt 2: ηp2 =.010.​

  • Embodiment of secrets (secrets-as-burdens). Secrets are experienced as physical burdens, influencing how people perceive and act in the world.​ People who recalled, were preoccupied with, or suppressed an important secret estimated hills to be steeper and perceived distances to be farther.

    • Status: mixed.
    • Original paper: ‘The Physical Burdens of Secrecy’, Slepian et al. 2012; studies 1, 2 and 4 experimental mixed model design, study 3 correlational, study 1 n = 40, study 2 n = 36, study 3 n = 40, study 4 n = 30. [citations=113 (GS, November 2022)]​.
    • Critiques: LeBel and Wilbur 2014, direct Slepian et al. 2012 Study 1 replication [Study 1 n=240, Study 2 n = 90, citations=24(GS, November 2022)]. Pecher et al. 2015, direct Slepian et al.2012 Study 1 and Study 2 replication [Study 1 n=100, Study 2 n = 100, Study 3 n = 118, citations=11(GS, November 2022)]. Slepian et al. 2014 [Study 1 n=83, Study 2 n = 174, citations=51(GS, November 2022)]. ​Slepian et al. 2015, [Study 1 n = 100, Study 2 n = 100, Study 3 n = 100, Study 4 n = 352, citations=42(GS, November 2022)].
    • Original effect size: Study 1 – Big/meaningful vs. small/trivial secret hill steepness comparisons d = 0.78 (calculated from M and SD data in the paper, also reported in LeBel and Wilbur 2014); Study 2 - Big/meaningful vs. small/trivial distant perception comparisons d = 0.67 (calculated from M and SD data in the paper, also reported in Pecher et al. 2015); Study 3 – effects of the frequencies of thought of infidelity on estimated effort required by physical task _R2 _= .21 / d = 1.03 [converted using this conversion]; Study 4 – more burdensome vs. less burdensome secret concealment effects on willingness to help others with physical task r = .44 / d= 0.98 [converted using this conversion].
    • Replication effect size: LeBel and Wilbur: Study 1 - ​ Big/meaningful vs. small/trivial secret hill steepness comparisons d = 0.176 [-.08, .43] [reported] (not replicated); Study 2 - Big/meaningful vs. small/trivial secret hill steepness comparisons d = -0.319 [-.73, .10] [reported] (not replicated). Pecher et al.: Study 1 - Big/meaningful vs. small/trivial secret hill steepness comparisons d = 0.08 [-0.31, 0.47] [reported] (not replicated); Study 2 - Big/meaningful vs. small/trivial secret hill steepness comparisons d = 0.21 [-0.18, 0.60] [reported] (not replicated); Study 3 - Big/meaningful vs. small/trivial secret perceived distance comparisons d = 0.21 [-0.15, 0.57] [reported] (not replicated). Slepian et al. 2014: Study 1 - ​ Big/meaningful secret recollection condition effects on hill slant estimation in comparison to reveаling a secret, r = .29 [reported] / d= 0.61, and control condition r = .34 [reported] / d= 0.72 [d’s converted using this conversion] (replicated); Study 2 - Big/meaningful secret recollection condition effects on distance estimation in comparison to revealing a secret, r = .24 [reported] / d= 0.49, and control condition r = .30 [reported] / d= 0.62 [_d_s converted using this conversion] (replicated). Slepian et al. 2015: Study 1 - Big/meaningful vs. small/trivial secret hill steepness comparisons d = 0.31 (calculated from M and SD data in the paper, non-significant) (not replicated); Study 2 - Big/meaningful vs. small/trivial secret hill steepness comparisons r = .28 [reported] / d= 0.58 [converted using this conversion] (replicated); Study 3 – Recalling preoccupying vs. non-preoccupying secret effects on hill slant judgements r = .23 [reported] / d= 0.47 [converted using this conversion] (replicated); Study 4 - Recalling preoccupying vs. non-preoccupying secret effects on hill slant judgements r = .11 [reported] /d= 0.22 [converted using this conversion] (replicated).

  • Warmer-hearts-warmer-room effect. Priming “warm” communal traits (vs. other traits) led participants to report that the room in which they were taking the study was warmer.

  • Treating-prejudice-with-imagery effect. Imagining a positive encounter with a member of a stigmatised group promote positive perceptions when it was preceded by imagined negative encounter.​

    • Status: not replicated.
    • Original paper: ‘Treating” Prejudice: An Exposure-Therapy Approach to Reducing Negative Reactions Toward Stigmatized Groups’, Birtel and Crisp 2012; three between-subjects experiments, Experiment 1: n = 29, Experiment 2a: n = 32, Experiment 2b: n = 30. [Citations = 100 (GS, October 2022)]​.
    • Critiques: McDonald et al. 2014 [Study 1: n = 240, Study 2: n = 175, citations = 24 (GS, October 2022)].
    • Original effect size: All effect sizes reported in McDonald et al.: anxiety adult with schizophrenia d= 0.76, anxiety homosexual men d = 1.08, contact homosexual men d = -0.88.
    • Replication effect size: McDonald et al.: anxiety adult with schizophrenia d = 0.10 (non-significant), anxiety homosexual men d = -0.19 (non-significant), contact homosexual men d = 0.01 (non-significant).​

  • Grammar influences perceived intentionality. Describing a person’s behaviours in terms of what the person _was doing _(rather than what the person did) enhances intentionality attributions in the context of both mundane and criminal behaviors.​ Participants judged actions described in the imperfective as being more intentional and they imagined these actions in more detail.

    • Status: mixed.
    • Original paper: ‘Learning About What Others Were Doing: Verb Aspect and Attributions of Mundane and Criminal Intent for Past Actions’, Hart and Albarracín 2011; three between-subject experiments, Experiment 1 n = 54, Experiment 2 n = 37, Experiment 3 n = 48. [citations=40(GS, October 2022)]​.
    • Critiques: Eerland et al. 2016 Multilab direct replication of Study 3 [N=685 across 12 studies, citations=82(gs, October 2022)].​ Sherrill et al. 2015 [N=699 across 4 Experiments, citations=14(GS, October 2022)]. (https://journals.sagepub.com/doi/full/10.1177/1745691615605826)
    • Original effect size: Experiment 1 – accessibility to intention-relevant concepts d= 1.00 [reported]; Experiment 2 – attribution of intentionality d= 1.00 [reported], detailed segmentation of behaviour descriptions d= 1.23 [reported]; Experiment 3 – criminal intentionality d= 0.76 [reported], intention attributions d= 0.66 [reported], imagery d= 0.73 [reported].
    • Replication effect size: Eerland et al.: intentionality d= -0.98 to d= 0.65 [reported], Meta-analytic effect for laboratory replications d= -0.24 [-0.50, 0.02] [non-significant, reported]; imagery d= -0.45 to d= 0.33, Meta-analytic effect for laboratory replications d= -0.08 [-0.23, 0.07] [non-significant, reported]; intention attribution d= -0.29 to d= 0.19, Meta-analytic effect for laboratory replications d= 0.00 [-0.07, 0.08] [non-significant, reported]. Sherrill et al.: Experiment 2 – murder intentionality judgement in imperfective vs. perfective condition ηp2 = 0.036 [reported] / d = 0.19 [converted using this conversion] (replicated); Experiment 3 – murder intentionality judgement in imperfective vs. perfective condition ηp2 = 0.040 [reported] / d = 0.20 [converted using this conversion] (replicated); Experiment 4 – imperfective murder vs. perfective murder condition d= 0.15 [non-significant, reported].

  • Attachment-warmth embodiment effect (anxious attachment warm food effect) Attachment anxiety positively predicts sensitivity to temperature cues.​ Individuals with high (but not low) attachment anxiety report higher desires for warm foods (but not neutral foods) when attachment is activated.

  • Social and personal power. Social power (power over other people) and personal power (freedom from other people) have opposite associations with independence and interdependence; they have opposite effects on stereotyping (social power decreases and personal power increases stereotyping), but parallel effects on behavioural approach (both types of power increase it).

    • Status: mixed
    • Original paper: ‘Differentiating Social and Personal Power Opposite Effects on Stereotyping, but Parallel Effects on Behavioral Approach Tendencies’, Lammers et al. 2009; Study 1 between-subject experiments, Study 2 field/correlational study, n1 = 113, n2 = 3,082. [citations=233(GS, December 2022)]​.
    • Critiques: Mayiwar and Lai 2009 direct replication of the Lammers et al. Study 1 [n=295, citations=5(GS, December 2022)].​
    • Original effect size: Study 1 – effect of the power manipulations on participants’ stereotyping ηp2 = 0.23 [reported] / d = 0.54 [converted using this conversion]; effect of the power manipulations on participants’ behavioural approach tendencies ηp2 = 0.13 [reported] / d = 0.38 [converted using this conversion]; Study 2 – significant effects of personal, b = 0.05 [0.01, 0.09], and social power, b = -0.04 [0.08, 0.01] on stereotyping; significant effects of personal, b = 0.22 [0.19, 0.26], and social power, b = 0.18 [0.13, 0.24] on behavioural approach tendencies.
    • Replication effect size: Mayiwar and Lai: effect of the power manipulations on participants’ stereotyping ηp2 = 0.056 [reported] / d = 0.24 [converted using this conversion] (replicated); effect of the power manipulations on participants’ behavioural approach tendencies ηp2 = 0.017 [reported, not significant] / d = 0.13 [converted using this conversion] (not replicated).

  • Classical anchoring effect (anchoring and adjustment). Assimilation of numeric estimates toward previously considered numeric values. ​

  • Incidental environmental anchoring effect (incidental anchoring [Critcher & Gilovich, 2008], basic anchoring [Wilson, Houston, Etling, & Brekke, 1996] “Anchor values that are incidentally present in the environment can affect a person’s numerical estimates (…) these effects were not qualified by participants’ expertise in the relevant domain (study 1) or by their ability to subsequently recall the anchor value (study 3).” (Critcher & Gilovich, 2008).

    • Status: not replicated
    • Original paper: ‘Incidental environmental anchors’, Critcher and Gilovich, 2008; 3 studies w. Between-subjects manipulation of incidental anchors, n = 265 (Study 1) + 207 (Study 2) + 194 (Study 3) = 666. [citations=261(gs, August 2022)]​.
    • Critiques: Shanks et al., 2020 [n(Restaurant item)= 69 (Study 1) + 125 (Study 2) + 422 (Study 3) = 616, citations=11(GS, August, 2022)].
    • Original effect size: d = 0.49 [0.25, 0.72] according to Shanks et al., 2020, p. 9.
    • Replication effect size: Shanks et al.: d = -0.02 [-0.08, 0.05].

  • Subliminal anchoring effect (subliminal anchoring). Numeric estimates are biassed toward previously, subliminally presented numbers that could not be perceived by respondents.​

  • Facial redness increased perceived anger. When people rate faces and these are red, rated anger is positively associated with the faces’ redness.​

    • Status: mixed
    • Original paper: Facial redness, expression, and masculinity influence perceptions of anger and health’, Young et al., 2018; full within-subjects design, 40 (Study 1) + 44 (Study 2). [citations=23(GS, October 2022)]​.
    • Critiques: Effect could not be replicated with natural shades of red: Wolf et al., 2021 [n=609, citations=1(GSt, October 2022)]. Effect persisted only in a within-subjects design: Wolf et al., 2022 [n= 40 (Study 1) + 329 (Study 2), citations=0(GS, October 2022)]. ​
    • Original effect size: Cohen’s f = .35.
    • Replication effect size: Wolf, et al.: ηp2=0.04 [0.01, 0.06].

  • Romeo and Juliet effect. Greater love and commitment towards a romantic partner when others (e.g., parents, friends) are observed to interfere with, or disapprove of, the relationship.​

    • Status: reversed
    • Original paper: ‘Parental interference and romantic love: The Romeo and Juliet effect’, Driscoll et al. 1972; within-subjects, n = 140 (couples). [citations = 490 (GS, October 2022)].
    • Critiques: Parks et al. 1983 [n = 193, citations = 260 (GS, February, 2023)]. Sinclair et al. 2014 [Experiment: n = 396 (direct replication), Meta-analysis: n = NA, k = 22 studies, citations = 56 (GS, October 2022)].
    • Original effect size: Romantic love and parental interference: r = .34; Commitment and parental interference: r = .30.
    • Replication effect size: Parks et al.: All effects correlated with romantic love: Own family approval: r = .47 (opposite direction); Partners family approval: r = .42 (opposite direction); Own friend approval:_ r_ = .51 (opposite direction); Partners friends’ approval: _r_ = .49 (opposite direction); Own network approval: _r_ = .63 (opposite direction). Sinclair et al.: Experiment: Romantic love - Parental interference: _r_ = -.05 (not replicated); Friend interference: _r_= -.07 (not replicated); Commitment - Parental interference: _r_ = -.09 (not replicated); Friend interference: _r_ = -.06 (not replicated). Meta-analysis: Romantic love and network approval (_k_ = 11 studies): _g_ = 0.49 [0.26,0.72] (opposite direction); Commitment and network approval (_k_ = 16 studies): _g_ = 0.62 [0.50,0.74] (opposite direction).

  • Stereotype activation effect. Judgments of targets that follow gender-congruent primes are made faster than judgments of targets that follow gender-incongruent primes.​

    • Status: not replicated
    • Original paper: Automatic Stereotyping, Banaji and Hardin 1996; method - the semantic priming procedure, sample size = 68. [citations=1060 (CS, November 2022)]​.
    • Critiques: Müller and Rothermund, 2014 [n=294, citations=32(GS, November 2022)].
    • Original effect size: Prime Gender x Target Gender: F(2, 144) = 15.28, p<.001
    • Replication effect size: Müller and Rothermund: Prime Gender × Target Gender interaction: F(1, 293) = 39.68, p = 1.09 × 10−9

  • Sex difference in distress to infidelity. Men, compared to women, are more distressed by sexual than emotional infidelity, and this sex difference continued into older age.​

  • Content effect for cheater detection. There is a performance improvement on the Wason selection task if it involves cheater detection. College students were better able to complete the selection task for unfamiliar scenarios if it involved detecting a cheater instead of a descriptive scenario.

  • Dissenting deviant social rejection effect. Groups reject opinion deviates from future interaction.

    • Status: mixed
    • Original paper: ‘Deviation, rejection, and communication’, Schachter 1951; experiment, sample size = 198. [citations=2209 (GS, November 2022)]​.
    • Critiques: Wesselmann 2014 [n=80, citations=37 (GS, November 2022)].
    • Original effect size: d = 1.84 (source: meta-analysis by Tata et al. 1996)
    • Replication effect size: Wesselmann: replicated: Communication Pattern - effect for change over time in overall communication to the confederates F(5, 80) = 1.23, p = 0.30, np2 = 0.07; effect for the groups’ differential communication between the confederates F(2, 32) = 20.83, p < 0.01, np2 = 0.57; interaction between communication to the different confederates and the point of the conversation F(10, 160) = 0.99, p = 0.45, np2 = 0.06; not replicated: Committee Nomination Measure χ2(4) = 0.79, p = 0.94; replicated: Sociometric Test χ2(2) = 14.74, p < .01.

  • Sex differences in implicit maths attitudes. College students, especially women, demonstrated negativity toward maths and science relative to arts and language on implicit measures.​

    • Status: replicated
    • Original paper: Math = male, me = female, therefore math ≠ me, Nosek et al.2002; study design = experiment, n = 170. [citations=1428 (GS, November 2022)]​.
    • Critiques: Klein et al. 2014 [n=5842, citations=1129 (GS, November 2022)].
    • Original effect size: d=1.01[.54, 1.48] (reported in Klein et al. 2014).
    • Replication effect size: Klein et al.: d=0.56[0.45, 0.68]

  • Low versus high category scale effect on behaviour self-report. Response scales serve informative functions. The response categories suggest a range of “usual” or “expected” behaviours, and this information affects respondents' behavioural reports as well as related judgments.

  • Information source on attitudes effect. The source of information has a major impact on how that information is perceived and evaluated.

    • Status: replicated
    • Original paper: ‘Prestige, Suggestion, and Attitudes’, Lorge and Curtiss 1936; experiment, sample size = 99. [citations=242(GS, November 2022)]​.
    • Critiques: Klein et al. 2014 [n=6325, citations=1129 (GS, November 2022)].
    • Original effect size: NA.
    • Replication effect size: Klein et al.: d = 0.31[0.19, 0.42].

  • Door-in-the-face effect. The door-in-the-face effect occurs when making a larger initial request and then afterwards scaling back and asking a more moderate request increases compliance (with the moderate request) compared to either starting with the moderate request or starting with a small request.

  • Foot-in-the-door effect. The foot-in-the-door effect occurs when getting people to comply with a very small initial request increases the likelihood that they will agree to a larger request (compared to starting with the larger request).

    • Status: mixed
    • Original paper: ‘Compliance without pressure: The foot-in-the-door technique’, Freedman and Fraser 1966; between-subjects manipulation of whether or not there is a very small initial request, n=156. [citations=2,667(GS, January 2023)].
    • Critique: Gamian-Wilk and Dolinski 2019 [Between-subjects manipulation of whether or not there is a very small initial request, n=60 in each of 4 replication studies; 240 total, citations=3(GS, January 2023)].
    • Original effect size: OR = 3.912; d = 2.16.
    • Replication effect size: Out of 4 replication attempts, only 1 succeeded with p < .05, although most were directional and had small sample sizes. OR = 8.76, d = 4.83 if aggregating across all 4 replications (which probably makes the most sense given small sample sizes); or in just the successful replication: OR = 33.14 in successful one (due to only 1 person complying with large request in the control condition).

  • Ingroup-outgroup norm of reciprocity effect. “When confronted with a decision about allowing or denying the same behaviour to an ingroup and outgroup, people may feel an obligation to reciprocity, or consistency in their evaluation of the behaviours.”

    • Status: replicated
    • Original paper: ‘The Current Status of American Public Opinion’, Hyman and Sheatsley, 1950; experiment, n = NA. [citations=161(GS, November 2022)]​. Was not able to find the online version of the original paper.
    • Critiques: Klein et al. 2014 [n=6276, citations=1129 (GS, November 2022)].
    • Original effect size: d=0.16[0.06 0.27].
    • Replication effect size: d=0.27 [0.18, 0.36].

  • Social dominance-status (verticality effects). Vertical dimension of human relations (such as dominance and submission) and nonverbal behaviour are intimately and fruitfully linked; nonverbal behaviour, such as gazing, smiling, touching, and various body positions can signal high and low verticality.

    • Status: mixed
    • Original paper: ‘Body Politics: Power, Sex, and Nonverbal Communication’, Henley 1977; book/theoretical and anecdotal evidence, n=NA. [citations=2284(GS, May 2023)]​.
    • Critiques: Hall et al. 2005 [meta-analysis, k=211, citations=1103(GS, May 2023)].
    • Original effect size: NA.
    • Replication effect size: Hall et al.: beliefs (perceptions) about the relation of verticality to nonverbal behavior (average r, weighted by sample size) – smiling r=-.25 [-.29, -.21], gazing r=.10 [.06, .14], raised brows r=-.36 [-.41, -.31], nodding r=.12 [.00, .18], self touch r=-.09 [-.24, -.06], other touch r=.21 [.17, .29], hand/arm gestures r=.37 [.25, .49], postural relaxation r=-.09 [-.04, .24], body/leg shifting r=.10 [-.29, -.21], interpersonal distance r=-.34 [-.43, -.25], facing orentation r=.10 [-.01, .21], vocal variability r=.24 [.16, .32], loudness r=.47 [.39, .55], interruptions r=.61 [.52, .70], pausing/latency to speak r=-.78 [-.94, -.62], rate of speech r=.09 [.03, .15], pitch r=-.10 [-.19, -.01], vocal relaxation r=.33 [.18, .48]; actual relations between verticality and nonverbal behavior (average r, weighted by sample size) – smiling r=-.03 [-.09, .03], gazing r=-.01 [-.09, .07], raised brows r=-.06 [-.25, .18], nodding r=.03 [-.05, .17], self touch r=-.04 [-.10, .10], other touch r=-.02 [-.10, .16], hand/arm gestures r=.05 [-.06, .10], openess r=.13 [.03, .23], postural relaxation r=.02 [-.08, .12], interpersonal distance r=-.17 [-.24, -.20], loudness r=.24 [.16, .32], interruptions r=.04 [-.02, .10], overlaps r=.06 [-.06, .81], pausing/latency to speak r=-.06 [-.24, .12], back-channel responses r=.03 [-.07, .13], speech errors r=.02 [-.10, .14], rate of speach r=-.06 [-.15, .03].

  • Personal cognitive dissonance - free-choice paradigm. Personal cognitive dissonance, from the cognitive dissonance theory (Festinger, 1957), suggests that an inconsistency between two cognitions (e.g., an attitude and a past behaviour) creates an unpleasant psychological state (i.e., personal dissonance) that the individual is motivated to reduce (e.g., by changing one of the elements to fit the other). This personal cognitive dissonance has been studied in the literature through different paradigms, including the following three main ones: free-choice, induced-compliance and induced-hypocrisy paradigm. The mere act of choosing equally desirable options can arouse dissonance in the individual, because choosing option A implies the rejection of option B (in other words, choosing option A means accepting its advantages but also its disadvantages, but also accepting to deprive oneself of the advantages of option B). In order to reduce dissonance, subjects will increase the perceived gap between options (i.e., spreading of alternatives) by overestimating the chosen option and/or underestimating the rejected option. ​

    • Status: NA
    • Original paper: ‘Postdecision changes in the desirability of alternatives’, Brehm, 1956); experimental design, n =225.[citations= 1987 (GS, February 2023)]​​.
    • Critiques: Enisman et al. 2021 [meta-analyse; n= 43 studies, citations = 11 (GS, February 2023)]​.​ Izuma and Murayama 2013 [meta-analysis, k= 3 studies, citations = 109(GS, February 2023)].
    • Original effect size: NA.
    • Replication effect size: Enisman et al: Effect of free-choice paradigm on spreading of alternatives: d= 0.40 [0.32, 0.49]​.​ ​

  • Personal cognitive dissonance - induced-compliance paradigm. In this paradigm, subjects are led to perform, in a context of free choice, an inconsistent act with their own norms or social norms (e.g., agree to perform a counter-attitudinal act). Dissonance can be resolved through multiple modes of reduction (e.g., social support, trivialization, etv.), but attitude change remains the most studied mode of reduction.​

    • Status: replicated
    • Original paper: ‘Dissonance arousal: Physiological evidence’, Croyle and Cooper 1983; between-subjects design, n1 = 30, n2=30. [citations= 447 (GS, February 2023)]​.
    • Critiques: Kenworthy et al. 2011 [meta-analyse; n= 31 studies, citations = 71 (GS, February 2023)]. Kim et al. 2014 [meta-analyse; k= 230 effects, citations = 11 (GS, February 2023)]. Vaidis et al. 2018 [multi-Lab Replication; in preparation].​ Original effect size: Effect of induced-compliance on attitude change: d= 2.40 [1.40, 3.37]​.
    • Replication effect size: Kenworthy et al.: Effect of induced-compliance on dissonance effects: d= 0.81 [0.70, 0.91]​.​ Kim et al.: Effect of induced-compliance on attitude change: r= .22 [.21, .24]​.​

  • Personal cognitive dissonance - Induced-hypocrisy paradigm. In this paradigm, dissonance is aroused by making individuals aware of the discrepancy between a socially desirable behaviour (e.g., not wasting water; stage 1: normative commitment phase) and their own past transgressive behaviours (e.g., remembering one’s past water waste; stage 2: transgression salience phase). Most of the dissonance reduction work is done through behavioural means, and leads subjects to express behavioural intentions, and/or perform behaviours in the direction of the socially desirable behaviours expressed in step 1 (i.e., allowing for the reduction of the inconsistency between the norm, step 1, and the recall of transgressions, step 2).​

    • Status: replicated
    • Original paper: ‘Overcoming denial and increasing the intention to use condoms through the induction of hypocrisy’, Aronson et al., 1991; between-subjects design, n = 80. [citations= 485 (GS, February 2023)]​.
    • Critiques: Priolo et al. 2019 [meta-analyse; k= 38 studies, citations = 38 (GS, February 2023)].​
    • Original effect size: Effect of induced-hypocrisy on behavioural intention: r= .58 [.32, .75]​.
    • Replication effect size: Priolo et al.: Effect of induced-hypocrisy on behavioural intention: r= .35 [.22, .46]​; Effect of induced-hypocrisy on behaviour: r= .30 [.10, .48]​.​

  • Vicarious cognitive dissonance - induced-compliance paradigm. Vicarious cognitive dissonance, from the cognitive dissonance theory (Festinger, 1957; see “personal cognitive dissonance”), suggests that it would be possible for an individual to experience dissonance vicariously when they witness the performance of inconsistent act (e.g., counter-attitudinal or counter-normative behaviour) on the part of an in-group member with whom they strongly identify. As a personal cognitive dissonance, the inconsistency between two cognitions (e.g., between attitude and observed behaviour) creates an unpleasant psychological state (i.e., vicarious dissonance) that the individual is motivated to reduce (e.g., by changing one of the elements to fit the other). This vicarious cognitive dissonance has been studied in the literature through different paradigms, including the following two main ones: induced-compliance and induced-hypocrisy paradigm. In this paradigm, subjects are led to observe the realisation, by a member of their in-group, of a counter-attitudinal act with their own norms or social norms (e.g., agree to perform a counter-attitudinal act), performed in a context of free choice. Dissonance can be resolved through multiple modes of reduction (e.g., social support, trivialization, etc.), but attitude change remains the most studied mode of reduction.​

    • Status: mixed
    • Original paper: Vicarious dissonance: Attitude change from the inconsistency of others, Norton et al. 2003; experimental design, exp 1: n = 50, exp 2 : n = 43, exp 3: n = 108. [citations= 344 (GS, February 2023)]​.
    • Critiques: Jaubert 2022 [paper not found; n = 102, citations = NA]. Jaubert et al. 2020 [meta-analyse in submission; k= 13 studies, citations = 0 (GS, 2023)].​
    • Original effect size: Effect of vicarious dissonance on attitude change: d= 0.70 [0.21, 1.26].
    • Replication effect size: Jaubert: Effect of vicarious dissonance on attitude change: η2p= 0.07. Jaubert et al.: Effect of vicarious dissonance toward the induced-compliance paradigm: d= 0.35 [0.15, 0.54]. Global effect of vicarious dissonance: d= 0.41 [0.27, 0.54], lower estimated effects when correcting for publication bias (d= 0.22 [0.008, 0.43]).​

  • Vicarious cognitive dissonance - Induced-hypocrisy paradigm. In this paradigm, subjects are made to observe a member of their group becoming aware of the discrepancy between a socially desirable behaviour (e.g., not wasting water; stage 1: normative commitment phase) and their own past transgressive behaviours (e.g., remembering one’s past water waste; stage 2: transgression salience phase). Most of the dissonance reduction work is done through behavioural means, and leads subjects to express behavioural intentions, and/or perform behaviours in the direction of the socially desirable behaviours expressed in step 1 (i.e., allowing for the reduction of the inconsistency between the norm, step 1, and the recall of transgressions, step 2).​

    • Status: mixed
    • Original paper: Vicarious hypocrisy: Bolstering attitudes and taking action after exposure to a hypocritical ingroup member, Focella et al., 2016; experimental design, exp 1: n = 161, exp 2 : n = 68, exp 3: n = 64, exp 4: n = 68. [citations= 34 (GS, February 2023)]​.
    • Critiques: Gaffney et al. 2012 [n = 78, citations=17(GS, November 2022)]. Jaubert 2022 [n = 133, citations = NA]. Jaubert et al. 2020 [meta-analyse in submission; _k _= 13 studies, citations = 0 (GS, March 2023)].​ Monin et al. 2004 [study 1 n=57, study 2 n = 25, citations=97(GS, November 2022)].​
    • Original effect size: Effect of induced-hypocrisy on behavioural intention: d= 0.70 [0.12, 1.26]​.
    • Replication effect size: Gaffney et al.: group membership X response to the hypocrisy interaction effect on attitudes ηp2 =0.142 /d = 0.40 [calculated from the F statistics and converted using this conversion] (replicated). Jaubert: Effect of vicarious dissonance on attitude change: η2p= 0.03. Jaubert et al.: Effect of vicarious dissonance toward the induced-compliance paradigm: d= 0.46 [0.27, 0.64]. Monin et al.: study 1 disagree versus agree condition attitude change comparison d = 0.30 [calculated from the t-test values using this conversion]; study 2 no consequence versus consequence condition attitude change comparison d= 0.46 [calculated from the t-test values using this conversion] (replicated).

  • Imposter phenomenon. People who perform outstandingly both academically and professionally believe that in fact, they are not really bright and that they have fooled anyone who thinks otherwise. This phenomenon might be especially persistent in women. Key conclusion: Therapeutic interventions might help to overcome imposter syndrome.

    • Status: not replicated
    • Original paper: ‘The imposter phenomenon in high achieving women: Dynamics and therapeutic intervention’, Clance and Imes 1978; Therapeutic interventions (but not described in detail), n=178. [Citations = 2709 (GS, January 2023)]​.
    • Critiques: Bravata et al. 2020: [n= 62 studies - systematic review, citations= 272 (GS, January 23)].
    • Original effect size: NA; No effect sizes mentioned in original study since no statistical analyses were performed.
    • Replication effect size: Bravata et al.: NA, but imposter phenomenon both present in men and women, particularly high among ethnic minority groups (original study mentioned white middle class women).

  • Ability EI as a factor of intelligence. Ability EI is a collection of cognitive abilities relating to the recognition, understanding and management of emotions. There have been many controversies in attempting to contextualise Ability EI within models of intelligence/cognitive ability. MacCann et al. (2014) empirically tested multiple models of how various cognitive abilities interact, including hierarchical and bi-factor models, and the data demonstrated closest fit to a hierarchical structure where Ability EI was contextualised as a second-stratum factor. A recent replication repeated this modelling process and drew the same conclusion.

  • Matilda effect (Matthew Matilda effect). Male scientists and masculine topics are frequently perceived as demonstrating higher scientific quality.

    • Status: mixed
    • Original paper: The name of the effect shows up first in ‘The Matthew Matilda Effect in Science’, Rossiter, 1993; theoretical paper, n=NA. [citations=114(GS, February 2023)]​.
    • Critiques: Unpublished replication in Feeley and Lee 2015 [n=1177 articles across 3 journals, citations=5 (GS, February 2023)]. Related article also in communication research is Feeley and Yang 2022 [n=3324 articles across 10 journals, citations=2(GS, February 2023)]. Knobloch-Westerwick and Glynn 2013; correlational design, n=1020 articles across 2 journals [citations=114 (GS, February 2023)]. Rajko et al. 2023 [n=5,500 communication scholars from 11 countries, citations=0 (GS, February 2023)].
    • Original effect size: NA.
    • Replication effect size: Knobloch-Westerwick and Glynn: Publications with female lead authors were cited 12.77 times on average (SD = 20.57), whereas publications with male lead authors were cited 17.73 times on average (SD = 35.34), _η_² = 0.006; Male-typed publications receiving significantly more citations (M = 21.04, SD = 38.63 vs. M = 14.44, SD = 28.08), _η_² = 0.006; Publications with at least one male author received significantly more citations with M = 17.11 (SD = 33.38), compared with M = 11.93 citations (SD = 19.84) for publications from female authors, _η_² = 0.006Feeley and Lee: Female lead authors were cited, on average, 19.34 times (SD = 30.22) compared to male lead authors (M = 18.05, SD = 25.98) (non-significant); Male-typed topics (M = 22.43, SD = 36.55) received more citations than female-typed topics (M= 17.87, SD = 25.80), _η_² = 0.003. Feeley and Yang: 2 out of the 8 journals examined exhibited Matilda effects. Rajko et al.: After controlling for country, the total number of papers, and the total number of views, female scholars have significantly lower levels of citations than male peers (β = −.05; p < .001).

  • Being slightly behind increases the chance of winning. The original study has found that being slightly behind at halftime increases the chance of winning significantly in professional Basketball.​

    • Status: mixed
    • Original paper: ‘Can Losing Lead to Winning?’, Beger et al., Devin, 2011; natural experiment, N = 11968. [citations = 271 (GS, February 2023)].
    • Critiques: Teeselink et al. 2022 [natural experiment, N = 17535, citations = 9 (GS, February 2023)].
    • Original effect size: Teams behind by one point at halftime win between 5.8 and 8 percentage points more often than expected in the NBA across four models (between 2.1 and 2.5 percentage points in the NCAA). The result is statistically significant in all specifications in the NBA and in 3 out of 4 specifications in the NCAA.
    • Replication effect size: Teeselink et at.: With a larger sample, teams behind by one point at halftime win 5 percentage points more often than expected in the NBA (0.8 percentage points in the NCAA). The result is statistically significant for the large NBA sample (95% CI: 0.007, 0.094), but not the large NCAA sample (95% CI: −0.025, 0.041). Additionally, the replication also tests the hypothesis in other leagues and sports. Out of 12 leagues, the effect is only significant in the NBA.

  • Ethnoracial diversity and trust. Ethnoracial diversity negatively affects trust and social capital.

    • Status: mixed
    • Original paper: ‘E Pluribus Unum: Diversity and Community in the Twenty-first Century The 2006 Johan Skytte Prize Lecture’, Putnam 2007; observational study, N = 23260. [citations = 6860 (GS, February 2023)].
    • Critiques: Abascal et al. 2015 [observational study, N = 29733, citations = 331 (GS, February 2023)].
    • Original effect size: Trust in Neighbors increases by 0,18 (on a 4-point scale) when switching from a maximally heterogeneous to a maximally homogeneous community in the USA. t= 5.1. (see table 3).
    • Replication effect size: On the full-sample of the US, the authors find a similar result: trust in neighbours decreases by 0,12 in heterogeneous compared to homogeneous communities, BUT when using random subsamples of the US population, they only find a significant effect in 4 out of 30 models (average t=: -0,76).

  • Greed moderates the relationship between SES and unethical behaviour. The original study found that people of higher socio-economic status are more likely to engage in unethical behaviour, but that this relationship is moderated by greed. When study participants were primed to think positively about greed, those of lower SES became more likely to engage in unethical behaviour than those of higher SES.

    • Status: not replicated
    • Original paper: ‘Higher social class predicts increased unethical behavior’, Piff et al. 2012; experiment, N = 90. [citations = 1273 (GS, February 2023)].
    • Critiques: Balakrishnan et al. 2017; [experiment/meta analysis, n1= 264, n2=257, n3=306, n4=114, citations = 18 (GS, February 2023)].
    • Original effect size: Interaction effect between greed and SES: b=−0.24 [−0.44 , −0.04].
    • Replication effect size: Interaction effect between greed and SES in replication 1: unstandardized b=0.11 [−0.02 , 0.24], in replication 2: b=−0.06 [−0.16 , 0.04], in replication 3: b=0.01 [−0.10 , 0.12], in replication 4: b=0.10 [−0.10 , 0.29]. I.e., the interaction effect was not replicated in any of the four studies.

  • Women’s education increases domestic violence. Women with more education report higher levels of psychological violence at home​.

    • Status: not replicated
    • Original paper: ‘For Better or for Worse?: Education and the Prevalence of Domestic Violence in Turkey’, Erten and Keskin 2018; natural experiment (RDD/IV), N = 1462. [citations = 153 (GS, February 2023)].
    • Critiques: Akyol et al. 2020 [natural experiment (RDD/IV), N = 1093, citations = 4 (GS, February 2023)].
    • Original effect size: With a regression continuity design, the authors determine that 1 more year of education leads to a 0,12 standard deviation increase in reported domestic psychological violence (SE: 0,057) for Turkish women living in rural areas, which is significant at the 5% level.
    • Replication effect size: Akyol et al.: With the same design and sample, but a different definition of rural areas, the authors find only a 0.099 standard deviation increase in reported domestic violence (SE: 0,061), which is not significant at the 5 or 10% level.

  • Easterlin paradox (national income associated with happiness). When comparing across countries, higher levels of income are associated with higher levels of subjective well-being, yet this association does not show up across time.

    • Status: mixed
    • Original paper:’Does Economic Growth Improve the Human Lot? Some Empirical Evidence’, Easterlin 1974 observational study, n=25 time series observations for the United States from 1946-1970. [citations=8686(GS, February 2023)]​.
    • Critiques: Easterlin 2005 [focused on descriptive statistics, citations=470(GS, February 2023)]. Hagerty and Veenhoven 2003 [n=336 (21 countries including the United States from 1973-1996), citations=765(GS, February 2023)]. A more comprehensive analysis using a variety of analyses and datasets can be found in Sacks et al. 2012 [n=79 countries spanning 1980 to 2004, citations=525(GS, February 2023)]. Rand rejoinder in Veenhoven and Hagerty 2006 [focused on trends in the United States, Western Europe, and 8 developing nations, citations=470(GS, February 2023)].
    • Original effect size: No effect size explicitly reported, but Tables 8 to 10 of Easterlin 1974 contain time series patterns.
    • Replication effect size: Easterlin: No effect size reported, but refers to lack of association between subjective well-being and in Figure 1 for the United States. Hagerty and Veenhoven: regression coefficient b=1.26 [Z-statistic=2.67]. Sacks et al.: β=0.505, SE=0.109 for the World Values Survey, β=0.278, SE=0.164 for the Eurobarometer. Veenhoven and Hagerty: No effect size reported, but Tables 1 to 4 report trends.

  • Humour style clusters. A number of works have attempted to determine whether individuals can be categorised into different types of humour user. The first was by Galloway (2010) and suggested four types of humour user through use of cluster analysis: (1) above average on all of the styles, or (2) below average on all of the styles, or (3) above average on the positive styles (Affiliative and Self-enhancing), and below average on the negative styles (Aggressive and Self-defeating), or (4) above average on the negative styles and below average on the positive styles.

    • Status: mixed
    • Original paper: ‘Individual differences in personal humor styles: Identification of prominent patterns and their associates’, Galloway, 2010; cross-sectional study, n=318. [Citations = 149 (GS, January 2023)].
    • Critiques: Chang et al., 2015[n=1252, citations = 47 (GS, March 2023)]. Evans & Steptoe-Warren 2018[n=202, citations = 49 (GS, March 2023)]. Evans et al. 2020[n=863, citations = 3 (GS, March 2023)]. Fox et al. 2016[n=1108, citations = 37 (GS, March 2023)]. Leist and Muller 2013[n=305, citations = 119(GS, March 2023)]. Sirigatti et al. 2016[n=244, citations = 35 (GS, March 2023)].
    • Original effect size: NA.
    • Replication effect size: Chang et al.: NA, but the four-cluster solution described was replicated. Evans and Steptoe-Warren: three managerial humour clusters. Evans et al.: inconsistencies in the humour style profiles across countries tested and the extant literature, possibly indicative of cultural differences in the behavioural expression of trait humour. Fox et al.: NA, the presence of distinctive humour types in childhood. Leist and Muller: evidence for three humour types (endorsers, humour deniers, and self-enhancers). Sirigatti et al.: three humour styles identified.

  • Social referencing effect. Crosby et al. (2008) found that hearing an offensive remark caused subjects to look longer at a potentially offended person, but only if that person could hear the remark. On the basis of this result, they argued that people use social referencing to assess the offensiveness.

    • Status: mixed
    • Original paper: ‘Where do we look during potentially offensive behavior?’, Crosby et al. 2008; experimental design, n=25. [Citations = 95 (GS, Jan 2023)].
    • Critiques: Jonas and Skorinko 2015 [n=58, Citations = 0 (GS, Jan 2023)]. Rabagliati et al. 2020 [n = 283, Citations = 2 (GS, Jan 2023)].
    • Original effect size: F(3,69)=5.15, p<.005.
    • Replication effect size: Jonas and Skorinko: F(1.86, 101.7) = 0.07, p=0.917. Rabagliati et al: χ2(3) = 22.11, p < .001, pseudo-R2 = .85; χ2(3) = 22.11, p < .001, pseudo-R2 = .85.

  • Other-race effect (cross-race effect, own-race bias). Humans are better at distinguishing between faces of two individuals of their own race than two faces of another race.

    • Status: replicated
    • Original paper: ‘Recognition for faces of own and other race’, Malpass and Kravitz 1969; forced-choice recognition task, n=40. [Citations=1036 (GS, Feb 2023)].
    • Critiques: Lee and Penrod 2022 [various recognition tasks n=24937, Citations=1 (GS, Feb 2023)].
    • Original effect size: η2p=0.291 to 0.350 (insufficient information for CI).
    • Replication effect size: Lee and Penrod: Hedge’s g=0.54.

  • Unethical amnesia. Memories of unethical behaviour are less clear and vivid than memories of good deeds.​​

    • Status: not replicated.
    • Original paper: ‘Memories of unethical actions become obfuscated over time’, Kouchaki and Gino 2016; nine studies, between-subjects (Study 1a, 1b, 3, 4, 5, 6, 7a, 7b) and correlational design (Study 2), total n = 2,109. [citations=177(GS, May 2023)]​.
    • Critiques: Stanley et al. 2018 replication of Kouchaki and Gino 2016 study 5 [n1=228, n2=232, n3=228, citations=22(GS, May 2023)].
    • Original effect size: Study 1a - ηp2= 0.06; Study 1b – in the “self” conditions, participants had less clear memory of their unethical actions, ηp2 = 0.06; Study 2 –cheaters reported lower clarity of memory, ηp2 = 0.04; Study 3 –participants in the self-unethical condition had a less clear recall of thoughts and feelings than did participants in the self-ethical condition, ηp2 = 0.05; Study 4 - participants who read that they had cheated indicated they had a less clear memory than those who did not cheat, ηp2 = 0.20; Study 5 – objective memory score was lower for those who read in the story that they had cheated than for those who read that they had behaved honestly, d = 0.43; Study 6 - participants in the likely-cheating condition recalled the die-throwing task less precisely than those in the no-cheating condition, d = 0.57; Study 7 - participants in the likely-cheating condition recalled the die-throwing task less precisely than those in the no-cheating condition, d = 0.38 (Study 7a), d = 0.33 (Study 7b).
    • Replication effect size: Stanley et al.: no significant differences in objective memory score between participants who read the vignette depicting the ethical behaviour versus the unethical cheating behaviour, Study 1 - d = .26 (n.s.), Study 2 - d = .04 (n.s.), Study 3 - d = .17 (n.s.).

  • Feeling dirty after networking. People feel uncomfortable networking because networking triggers a state of “moral impurity,” which translates into feelings of “dirtiness” and a heightened desire for “cleansing”.

  • Public exposure influences shame and guilt differently. Public exposure (implicit and explicit) of transgression increases experienced shame more than guilt.

    • Status: not replicated
    • Original paper: ‘The role of public exposure in moral and nonmoral shame and guilt’, Smith et al. 2002; 4 studies, between-subject design (Study 1 and 4), within-subject design (Study 2), content analysis (Study 3), Study 1: n=168, Study 2: n=56, Study 3: n=510 passages, Study 4: n=60. [citations=690(GS, June 2023)]​.
    • Critiques: Zhang et al. 2022 [n=1727, citations=0(GS, June 2023)].
    • Original effect size: Study 1: shame f =.39 and guilt f =.0.27.
    • Replication effect size: Zhang: Study 1 - shame ηp2=.14 [.11, .17] and guilt ηp2=.13 [.10, .16].


Positive Psychology

  • Power pose. Taking on a power pose lowers cortisol and risk tolerance, while it raises testosterone and feelings of power.

    • Status: not replicated
    • Original paper: ‘Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance’, Carney et al. 2010; between-subjects design, n=42 mixed sexes.[citations = 1450 (GS, April, 2022)].
    • Critiques: Garrison et al. 2016 [n=305, citations = 70 (GS, April 2022)]. Metzler and Grezes 2019 [n = 82 men, citations = 3 (GS, April 2022)]. Ranehill 2015 [total n=200, citations = 291 (GS, April 2022)]. Ronay 2017 [n=108, citations = 38 (GS, April 2022)].
    • Original effect sizes: Φ = 0.30 in risk-taking from Carney et al. (2010), Sources unknown: d = -0.30 for cortisol, d=0.35 for testosterone, d=0.79 for feelings of power.
    • Replication effect size: Garrison et al.: feeling of power: ηp2 = .016. Metzler and Grezes: cortisol: ηp2 = 0.02, testosterone: ηp2 = 0.01. Ranehill: cortisol: d = -0.157, feelings of power: d = 0.34; risk taking: d = -0.176, testosterone: d = -0.200. Ronay: cortisol: d = 0.034, feeling of power: d = 0.226, testosterone: d = 0.121.

  • Facial Feedback. Smiling causes a good mood, while pouting produces a bad mood.

    • Status: not replicated
    • Original paper: ‘Inhibiting and Facilitating Conditions of the Human Smile: A Nonobtrusive Test of the Facial Feedback Hypothesis’, Strack et al. 1988; 2 experimental studies, n1= 92, n2=83. [citation= 2577(GS, February, 2022)].
    • Critiques: Coles et al. 2019 [meta-analysis k = 98, n=3878, citation= 115 (GS, February 2022)]. Wagenmakers et al. 2016 [meta-analysis n=1894, citation=349 (GS, February 2022)]. Schimmack 2017 [replicability analysis k=19, citation=0 (GS, March 2023)].
    • Original effect size: Study 1: d = 0.82, d = 0.43 (0.82 out of 9).
    • Replication effect size: Wagenmakers et al.: 0.03 out of 9, CI overlapping 0. Coles et al.: A meta-analysis of 98 studies finds d= 0.2 [0.14, 0.26] with an absurdly low p value, and doesn’t find publication bias. But this latter point simply can’t be right. Given d = 0.2 and the convention of targeting 80% power to detect a real phenomenon, you would need very high sample sizes, n > 500. And almost all of the included studies are N < 100. All reported in Coles et al.: Andréasson & Dimberg: d=-0.22. Andréasson: d=-0.05 to d =0.49. Baumeister et al.: d=0.63 to d =1.26. Bodenhausen et al.: d =0.55. Bush et al.: d =0.16. Butler et al.: d =-0.1 to d =-0.83. Butler et al.: d =-0.03. Cai et al.: d =-0.08. Ceschi & Scherer: d =0.74. Clapp: d =0.08 to d =0.69. Laird & Crosby: d =-0.13 to d =0.35. Davey et al.: d =-0.25 to d =0.73. Davis: d =-0.19 to d =0.99. Davis: d =-0.19 to d =0.87. Davis et al.: d =0.07 to d =0.51. Davis et al.: d =-0.15 to d =0.1. Davis et al.: d =-0.16. Demaree et al.: d =0.16 to d =0.62. Demaree et al.: d =-0.64 to d =0.06. Dillon et al.: d =0.11. Dimberg & Söderkvist: d =0.1 to 0.51. Duncan & Laird:_ d_ =0.38 to _d_ =0.51. Duncan & Laird: _d_ =0.44 to _d_ =0.59. Dzokoto et al.: _d_ =0.2 to _d_ =1.07. Flack, Laird & Cavallaro: _d_ =-0.49 to _d_ =1.31. Flack, Laird & Cavallaro: _d_ =0.29 to _d_ =1.41. Flack: _d_ =0.35 to _d_ =0.72. Gan et al.: _d_ =-0.11. Goldin et al.: _d_ =0.8. Gross & Levenson: _d_ =0.04 to _d_ =0.37. Gross: _d_ =-0.23 to _d_ =0.37. Gross: _d_ =0.18. Harris: _d_ =0.07. Hawk et al.: _d_ =0.85. Helt & Fein: _d_ =0.42. Hendricks & Buchanan: _d_ =-0.08. Hendricks: _d_ =0.02. Henry et al.: _d_ =-0.49 to _d_ =0.25. Henry et al.: _d_ =-0.05 to d_ d_ 0.53. Hess et al.: _d_ =-0.28 to _d_ =0.14. Hofmann et al.: _d_ =-0.03. Ito et al.: _d_ =-0.39 to _d_ =-0.25. Kalokerinos et al.: _d_ =-0.06 to _d_ =1.32. Kao et al.: _d_ =-0.67 to _d_ =0.98. Kircher et al.: _d_ =1.14 to _d_ =1.89. Korb et al.: _d_ =0.21. Labott & Teleha: _d_ =0.04 to _d_ =0.91. Laird: _d_ =0.13 to _d_ =0.55. Lalot et al.: _d_ =-0.17. Larsen et al.: _d_ =0.43. Lee: _d_ =-0.27 to _d_ =0.48. Lewis & Bowler: _d_ =1.35. Lewis: _d_ =0.56 to _d_ =0.71. Ma: _d_ =-0.21. Maldonado et al.: _d_ =0.12. Marmolejo-Ramos & Dunn: _d_ =-0.07 to _d_ =0.38. Martijn et al.: _d_ =-0.24. McCanne & Anderson: _d_ =-2.16 to _d_ =4.73. McCaul et al.: _d_ =0.25. McIntosh et al.: _d_ =0.54. Meeten et al.: _d_ =0.49. Miyamoto: _d_ =0.49 to _d_ =0.17. Moore & Zoellner: _d_ =-0.87. Kappas: _d_ =0.08 to _d_ =0.74. Ohira & Kurono: _d_ =-1.38 to _d_ =1.23. Paredes et al.: _d_ =0.85. Paul et al.: _d_ =0.91. Pedder et al.: _d_ =0.07 to _d_ =0.22. Phillips et al.: _d_ =0.08 to _d_ =0.18. Reisenzein & Studtmann: _d_ =-0.08 to _d_ =0.034. Richards, Butler & Gross: _d_ =-0.12 to _d_ = 0.19. Richards & Gross: _d_ =-0.1 to d=0.36. Richards & Gross: _d_ =-0.12 to _d_ =0.39. Richards & Gross: _d_ =0.34. Roberts et al.: _d_ =0.07. Robinson & Demaree: _d_ =-0.04 to _d_ =0. Roemer: _d_ =0.29 to _d_ =0.58. Rohrmann et al.: _d_ =0.13 to _d_ =0.16. Rummer et al.: _d_ =0.46 to _d_ =0.57. Schmeichel , Vohs, & Baumeister: _d_ =-0.23. Schmeichel et al.: _d_ =0.1. Söderkvist & Dimberg (unpublished): _d_ =0.36. Söderkvist et al.: _d_ =0.17 to _d_ =0.34. Soussignan: _d_ =-0.17 to _d_ =1.11. Stel et al.: _d_ =1 to _d_ =1.11. Strack et al.: _d_ =-0.51 to _d_ =0.55. Tamir et al.: _d_ =-0.16. Tourangeau & Ellsworth: _d_ =0.3. Trent: _d_ =-0.22 to _d_ =-0.06. Vieillard et al.: _d_ =-0.12 to _d_ =0.66. Wagenmakers et al.: _d_ =-0.17 to _d_ =0.25. Wittmer: _d_ =-0.36 to _d_ =-0.21. Yartz: _d_ =-0.18 to _d_ =0.5. Zajonc et al.: _d_ =0.31 to _d_ =1.27. Zariffa et al.: _d_ =-0.57 to _d_ =-0.14. Zhu et al.: _d_ =1.74. Schimmack: strong evidence of publication bias on a subset of these papers, using a proper power analysis. Direct replications of the original pen-in-mouth protocol fail; but new conceptual replications appear to work. Coles et al.: effect is an absolute 5% increase in a happiness metric.

  • Positive affirmation on mood. Positive self-statements boost mood for people with high self-esteem and reduce mood for people with low self-esteem.

    • Status: not replicated
    • Original paper: ‘Positive Self-Statements: Power for Some, Peril for Others’, Wood et al. 2009; 3 experiments with Study 1: n = 249, Study 2: n = 68, Study 3: n=116. [citation=294(GS, February 2022)]​.
    • Critiques: Flynn and Bordieri 2020 [experiment: n = 462, citations=4(GS, February 2022)].
    • Original effect size: Study 1: not reported [g = 0.53]; Study 2: not reported [g = 1.00 calculated]; Study 3: not reported [g = 0.86, d= -0.74, d= 2.13, d = -0.49 calculated]. A meta-analysis combining the studies suggested that participants with high self-esteem did receive some benefit, Z = 2.51, p < .013, d = 0.66 (for participants with low self-esteem, Z = −3.21, p < .002, d = 0.72).
    • Replication effect size: Flynn and Bordieri: Study 1: not reported [g = 0.53 calculated]; Study 2: not reported [g = 1.00 calculated].

  • Mindfulness for mental health. Mindfulness, the practice of paying attention to the present moment in a non-judgemental way, is thought to have a beneficial effect on mental health outcomes, including but not limited to helping individuals reduce stress and anxiety and manage emotional states more effectively

    • Status: mixed
    • Original paper: ‘An Outpatient Program in Behavioral Medicine for Chronic Pain Patients Based on the Practice of Mindfulness Meditation’, Kabat-Zinn, 1982; longitudinal study, n=51. [citations = 5726 (PubMed, January 2023)].
    • Critiques: Hoffmann & Witt 2010 [meta-analysis, k=39 studies, n=1,140, citations= 5337 (GS, May 2023)]. Khoury et al. 2013 [meta-analysis, k= 209 studies, n=12,145, citations= 2484 (GS, May 2023)]. Strauss et al., 2014 [meta-analysis, k= 19 studies, n=578, citations= 659 (GS, May 2023)]. Fumero et al. 2020 [n= 91, citations= 30 (PubMed, March 2023)]. Tao et al. 2022 [meta-analysis, k=7 trials, n=502, citations= 2 (GS, May 2023)]. Chayadi et al., 2022 [meta-analysis, k=36, n=1677, citations= 8 (GS, May 2023)]. Coronado-Montoya 2016 [n=357, citations= 21 (GS, March 2023)]. Britton 2019 [n= NA, citations= 169 (GS, March 2023)]. Hsiao et al. 2020 [n = NA, citations= 17(GS, May 2023)]. Kuyken et al. 2022 [n=8376, citations= 21 (PubMed, March, 2023)]. Sephton et al. 2007 [n= 91, citations= 495 (PubMed, March, 2023)].
    • Original effect size: prima facie, d = 0.3 for anxiety or depression.
    • Replication effect size: Fumero et al: 75% (9/12) of reviews revealed a positive effect of MBIs, comparing pre-post intervention anxiety scores and compared with a control group (SMD) = 0.57 [0.22, 0.89]; for those with negative results, SMD = -0.27 [-0.52, 0.02]. The range of effect size for studies that yielded positive results for mindfulness in comparison to control or other intervention groups was diverse. 20% of studies exhibited a large range of effect size, 50% displayed a moderate range, and 30% demonstrated a small range. Khoury et al.:meta-analysis on pre-post-comparisons revealed a mean effect size on anxiety for ten pre-post studies, Hedge’s g = .89 [.71, 1.08], p <.001. Hoffmann & Witt: average pre-post effect size estimate (Hedges’ g) based on 10 studies was 0.67 [0.47, 0.87], p < .01. Tao et al.: the effects of MBIs (MBSR: Mindfulness-based Stress Reduction, MBCT: Mindfulness-based Cognitive Therapy) on depressive poststroke patients has revealed a positive effect on depression in poststroke depression patients compared with the control group (MBSR: n= 196, Hedges' g = 0.49 [0.42, 0.56], p < 0.01; MBCT: n= 301, Hedges' g = 0.85 [0.71, 1.00], p < 0.01.). Chayadi et al.: the medium effect sizes of MBIs on reducing anxiety (Hedges’ g = 0.56, S.E = 0.107 [0.35, 0.77], p < 0.01) and depression (Hedges’ g = 0.43, S.E = 0.059 [0.32, 0.55], p < 0.01) among cancer patients with fatigue. Strauss et al.:the medium effect sizes of MBIs for depressive symptom severity (Hedges g = −0.73  [−0.09, −1.36]) but not for anxiety symptom severity (Hedges g = −0.55 n[0.09 to −1.18]) among cancer patients. Coronado-Montoya: indications of possible reporting bias; 108 (87%) of 124 published trials reported ≥1 positive outcome in the abstract, and 109 (88%) concluded that mindfulness-based therapy was effective, 1.6 times greater than the expected number of positive trials based on effect size d = 0.55 (expected number positive trials = 65.7); of 21 trial registrations, 13 (62%) remained unpublished 30 months post-trial completion. Britton: ES=NA; a number of mindfulness-related processes—including, mindful attention (observing awareness, interoception), mindfulness qualities, mindful emotion regulation (prefrontal control, decentering, exposure, acceptance), and meditation practice—show signs of non-monotonicity, boundary conditions, or negative effects under certain conditions. Hsiao et al.: the effects of Mindfulness-Based Relapse Prevention for Substance Use Disorders were small-to-medium in Study 1 (d = .08 to .48) and were much smaller in Study 2 (d =.03 to .21). Kuyken et al.: no evidence of school-based mindfulness training being superior to teaching-as-usual at 1 year; standardised mean differences (intervention minus control) were: 0.005 [-0.05 to 0.06] for risk for depression; 0.02 [-0.02 to 0.07] for social-emotional-behavioural functioning; and 0.02 [-0.03 to 0.07] for well-being. Sephton et al.: The MBSR treatment significantly reduced basal electrodermal (skin conductance level; SCL) activity (t = 3.298, p = .005) and SCL activity during meditation (t = 4.389, p = .001), consistent with reduced basal sympathetic (SNS) activation among women with fibromyalgia.


Cognitive Psychology

  • Verbal overshadowing effect. In a series of six experiments, verbalising the appearance of previously seen visual stimuli impaired subsequent recognition performance.

    • Status: replicated
    • Original paper: ‘Verbal overshadowing of visual memories: Some things are better left unsaid’, Schooler and Engstler-Schooler 1990; experiment, n = 117 (study 4), n = 88 (study 1), n = 104 (study 2).[citations=1218 (GS, November 2022)]​.
    • Critiques: Experiment 1 and 4: Alogna 2014 [n=1105 (experiment 1), n = 663(experiment 2), citations=192 (GS, November 2022)]. Мeta-analysis.
    • Original effect size: Experiment 1: -22%, Experiment 2: -25%.
    • Replication effect size: Alogna: Experiment 1: 4.01% [−7.15%, −0.87%]. Experiment 2: −16.31% [−20.47%, −12.14%].

  • Age of acquisition effects - influence on free recall (pure block). Early-acquired items are recalled more accurately than late-acquired items when early-acquired items are presented in a separate block and late-acquired items are presented in a separate block.

  • Age of acquisition effects - influence on free recall (mixed block). Early-acquired items are recalled more accurately than late-acquired items when early-acquired items are mixed with late-acquired items in a block.

  • Age of acquisition effects - influence on recognition (mixed block). Early-acquired items are recalled more accurately than late-acquired items.

    • Status: reversed
    • Original paper: ‘Word imagery but not age of acquisition affects episodic memory’, Coltheart and Winograd 1986; experiment, Experiment 2: n = 102. [citations=44(GS, November 2022)]​.
    • Critiques: Dewhurst et al. 1998 [Experiment 1: n=30, Experiment 2: n = 30; citations=117(GS, November 2022)]. Macmillan et al. 2022 [n = 44, citations = 9 (GS, November 2022)].
    • Original effect size: ηp² = .03 [ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Dewhurst et al.: Experiment 1: Hits: ηp² = 0.42 [ηp2 calculated from reported F statistic and converted using this conversion], False alarms: F < 1, d’: ηp² = 0.31 [ηp2 calculated from reported F statistic and converted using this conversion]; Experiment 2: Hits: d = 0.74 [d calculated from reported t statistic and converted using this conversion], False alarms: d= 0.57 [d_ _calculated from reported t statistic and converted using this conversion], d’: _d_ = 0.09 [_d_ calculated from reported t statistic and converted using this conversion]. Macmillan et al.: Hits: _d_ = 0.023, False alarms: _d_= 0.56, d’: _d_ = 0.44, C = 0.35, da = 0.65, slope = 0.25.

  • Age of acquisition influences the pre-conceptual stages of lexical retrieval (progressive demasking). Early-acquired items are identified more accurately than late-acquired items, using a progressive demasking task. A progressive demasking task is a type of perceptual identification task where participants are presented with a series of words that are gradually revealed over time and their ability to identify words at each stage of the task is measured. Words learned at an earlier age are thought to be easier to demask than those learned later in life, perhaps because the individual has gained more experience and exposure to the word, which can make it easier to recognize.

    • Status: not replicated
    • Original paper: ‘Word age-of-acquisition and visual recognition threshold’, Gilhooly and Logie 1981a; experiments, Experiment 1: n = 36, Experiment 2: n = 18. [citations=32(GS, December 2022)]​.
    • Critiques: Gilhooly and Logie 1981b [n = 16, citations = 101(GS, December 2022)]​. Ghyselinck et al. 2004 [n = 21, citations = 192(GS, December 2022)]. Chen et al. 2009 [n = 30, citations = 28(GS, December 2022)]​. Ploetz and Yates 2016 [n = 64, citations = 1(GS, December 2022)].
    • Original effect size: Experiment 1: Beta = 0.05; Experiment 2: Beta = 0.03.
    • Replication effect size: author: Gilhooly and Logie: Beta = 0.09; Ghyselinck et al.: ηp² = 0.58 [ηp2 calculated from reported F statistic and converted using this conversion]. Chen et al.: ηp² = 0.27 [ηp2 calculated from reported F statistic and converted using this conversion]. Ploetz and Yates: ηp² = .124.

  • Age of acquisition influence on the pre-conceptual stages of lexical retrieval (object decision). The age at which one acquires the concept of an object does not contribute to the speed and accuracy of recognising whether an object is a real object or not a real world object that has chimeric features.

    • Status: not replicated
    • Original paper: ‘Age of acquisition, not word frequency, affects object naming, not object recognition’, Morrison et al. 1992; experiment, n = 20. [citations=495(GS, December 2022)]​.
    • Critiques: Catling and Johnston 2009 [Experiment 2: n = 20; citations = 54 (GS, December 2022)]. Holmes and Ellis 2006 [Experiment 2: n = 20, Experiment 3: n = 20, Experiment 7: n = 46, citations = 87 (GS, December 2022)]. Moore et al. 2004 [Experiment 1: n = 39, Experiment 2: n = 38, citations = 79 (GS, December 2022)]. Vitkovitch and Tyrrell 1995 [n = 16, citations = 211 (GS, December 2022)].
    • Original effect size: Beta = .044.
    • Replication effect size: Catling and Johnston: ηp2 = 0.18 [ηp2 calculated from reported F statistic and converted using this conversion]. Holmes and Ellis: Experiment 2: d= 1.18 [d calculated from t statistic and converted using this conversion], Experiment 3: d= 1.44[d calculated from t statistic and converted using this conversion], Experiment 7: ηp2 = 0.38[ηp2 calculated from reported F statistic and converted using this conversion]. Moore et al.: ηp2 = 0.27 [ηp2 calculated from reported F statistic and converted using this conversion]. Vitkovitch and Tyrell: Beta = .426.

  • Age of acquisition influence on the pre-conceptual stages of lexical retrieval (anagram solution). Age of acquisition is thought to affect lexical retrieval through its impact on anagram (word jumbles) solutions, such that words acquired at an earlier age tend to be solved more quickly and accurately in anagram tasks than those learned later in life. This may be because words learned earlier in life are more deeply encoded and may therefore be more easily accessed.

  • Age of acquisition influence on the pre-conceptual stages of lexical retrieval (visual duration threshold). Early-acquired items are identified more accurately than late-acquired items, using visual duration threshold task.

  • Age of acquisition influence on the pre-conceptual stages of lexical retrieval (category verification). The age at which one acquires an object does not contribute to the speed and accuracy of category verification during a semantic categorisation task (where objects have to be decided whether they represent one group or another, e.g. tools vs. furniture).

  • Age of acquisition influence on the pre-conceptual stages of lexical retrieval (category falsification). The age at which one acquires the name of an object object does not contribute to the speed and accuracy of category falsification (i.e. deciding that a different word and the picture of the acquired concept do not match; e.g. the picture of the acquired concept of a rabbit, paired with the non-matching word “mouse”).

  • Age of acquisition influence on face recognition. Early-acquired faces are recognised more quickly and accurately than late-acquired faces.

  • Age of acquisition influence on face familiarity decision. Early-acquired faces are recognised as familiar faces more quickly than late-acquired faces when the task is to discriminate between familiar and unfamiliar faces.

  • Age of acquisition influence on face gender decision. The age at which a celebrity face is acquired does not affect the speed to recognise a celebrity’s face, using a gender decision task (is this face male or female?).

  • Age of acquisition influence on semantic decision. Early-acquired semantic concepts are categorised more quickly and accurately than later acquired concepts.

    • Status: replicated
    • Original paper: ‘Age-of-acquisition effects in semantic processing tasks’, Brysbaert et al. 2000; experimental design, Experiment 2: n = 36. [citations = 307(GS, December 2022)]​.
    • Critiques: Bai et al. 2013 [Experiment 3: n = 32, citations = 6(GS, December 2022)]. Chen et al. 2007 [Experiment 2: n = 28, citations = 43(GS, December 2022)].De Deyne and Storms 2007young adult: n = 21, older adult: n = 21, citations = 35 (GS, December 2022)]. Ghyselinck et al. 2004 [n = 20, citations = 192 (GS, December 2022)]. Izura and Hernandez-Munoz 2017 [first categorisation task n = 30, second categorisation task: n = 26, citations = 1 (GS, December 2022)].
    • Original effect size: Experiment 2: ηp2 = 0.75 [ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Bai et al.: ηp2 = 0.10 [ηp2 calculated from reported F statistic and converted using this conversion]. Chen et al.: ηp2 = 0.57 [ηp2 calculated from reported F statistic and converted using this conversion]. De Deyne and Storms: young adult: beta = 11.68, older adult: beta = 4.09. Ghyselinck et al.: ηp2 = 0.47[ηp2 calculated from reported F statistic and converted using this conversion]. Izura and Hernandez-Munoz: first categorisation task: Beta = .256, second categorisation task: Beta = -0.017.

  • Age of acquisition influence on the conceptual stages of lexical retrieval in opaque languages (spoken picture naming in opaque language). Early-acquired objects are named more quickly and accurately than late-acquired objects in opaque languages or deep orthography (i.e. spelling-sound correspondence is not direct where one is able to pronounce the word correctly based on the spelling; e.g. English, French).

    • Status: replicated
    • Original paper: ‘Age-of-acquisition norms for 220 picturable nouns’, Carroll and White 1973; experiment, n = 62. [citations=339(GS, January 2023)]​.
    • Critiques: Alario et al. 2004 [n = 46, citations = 372 (GS, January 2023)]. Bonin et al. 2001 [n = 30, citations=166(GS, December 2022)]​. Bonin et al. 2003 [n = 30, citations = 381(GS, January 2023)]. Catling and Elsherif 2020 [Experiment 1b: n = 48, citations = 12(GS, December 2022)]. Catling and Johnston 2009 [Experiment 4: n = 24, citations = 54 (GS, December 2022)]. Johnston et al. 2010 [n = 25, citations = 35(GS, January 2023)]. [Karimi and Diaz 2020 n = 212, citations = 9(GS, January 2023)]. Perret et al. 2014 [n = 21, citations = 42(GS, December 2022)]. Schwitter et al. 2004 [n = 31, citations = 52(GS, January 2023)]. Snodgrass and Yuditsky 1996 [ n = 84, citations = 403(GS, January 2023)].
    • Original effect size: ratings: r = -771, objective: r = .773.
    • Replication effect size: Alario et al.: beta = 69.4. Bonin et al.: beta = .194. Bonin et al.: ηp2 = 0.81[ηp2 calculated from reported F statistic and converted using this conversion]. Catling and Elsherif: Experiment 2b: d = 1.15 [d calculated from reported t statistic and converted using this conversion]. Catling and Johnston: Experiment 4: d =0.45. Johnston et al.: beta = .341. Karimi and Diaz: beta = .072. Perret et al. : d = 0.82 [_d _calculated from reported t statistic and converted using this conversion]; Schwitter et al.: beta = .222. Snodgrass and Yuditsky: beta = .30.

  • Age of acquisition influence on the conceptual stages of lexical retrieval in logographic languages (spoken picture naming in logographic languages). Early-acquired names of objects are produced more quickly and accurately than late-acquired names in logographic languages such as Japanese and Chinese.

    • Status: replicated
    • Original paper: ‘Predictors of timed picture naming in Chinese’, Weekes et al. 2007; experiments, Experiment 1: n = 30, Experiment 2: n = 100. [citations=78(GS, December 2022)]​.
    • Critique: Liu et al. 2011 [n = 30, citations = 84(GS, December 2022)].
    • Original effect size: Experiment 1: beta = .19, Experiment 2: beta = .24.
    • Replication effect size: Liu et al.: objective AoA: r = .591, rated AoA: r = .475.

  • Age of acquisition influence on the conceptual stages of lexical retrieval in transparent languages (spoken picture naming in transparent language). Early-acquired objects are named more quickly and accurately than late-acquired objects in transparent languages or shallow orthography (i.e. spelling-sound correspondence is direct where one is able to pronounce the word correctly based on the spelling; e.g. Spanish, Turkish, Italian).

  • Age of acquisition influence on the conceptual stages of lexical retrieval (written picture naming). Early-acquired object names are written more quickly and accurately than late-acquired names.

    • Status: replicated
    • Original paper: ‘Age of Acquisition and Word Frequency in Written Picture Naming’, Bonin et al. 2001; experiment, n = 30. [citations=166(GS, December 2022)]​.
    • Critiques: Bonin et al. 2002 [n = 72, citations = 257(GS, December 2022)]. Catling and Elsherif 2020 [Experiment 2b: n = 48, citations = 12(GS, December 2022)]. Perret et al. 2014 [n = 20, citations = 42(GS, December 2022)].
    • Original effect size: ηp2 = 0.68 [ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Bonin et al.: Beta = 0.341. Catling and Elsherif: Experiment 2b: d = 0.80 [d calculated from reported t statistic and converted using this conversion]. Perret et al.: d = 0.79 [d calculated from reported t statistic and converted using this conversion].

  • Age of acquisition influence on the conceptual stages of lexical retrieval (typing). Early-acquired object names are typed more quickly than late-acquired object names. Typing allows more precise measure for the response execution, while written picture naming is a measure for lexical retrieval.

    • Status: replicated
    • Original paper: ‘Naming times for the Snodgrass and Vanderwart pictures’, Snodgrass and Yuditsky 1996; experiment, experiment 2, n = 96. [citations=403(GS, December 2022)]​.
    • Critiques: Scaltritti et al. 2016 [n = 86, citations = 26(GS, December 2022)].
    • Original effect size: RT: beta = 0.19; accuracy: beta = -0.31.
    • Replication effect size: Scaltritti et al.: onset latency: d = 0.66 [d calculated from reported t statistic and converted using this conversion]; interkeystroke interval: not reported.

  • Age of acquisition influence on the post-lexical stages of lexical retrieval (delayed spoken picture naming). Early-acquired words should not differ from late-acquired words in terms of accuracy and response speed of spoken naming, when using a delayed picture naming task that requires participants to name a picture a few seconds (e.g. 2-4 sec) after seeing the actual picture. This task enables researchers to assess if any possible delay of naming effects result at an articulatory level, as opposed to a conceptual level or lexical retrieval stage.

  • Age of acquisition influence on the post-lexical stages of lexical/sublexical retrieval (delayed spoken word naming). Early-acquired words should not differ from late-acquired words, when using delayed word naming. This enables researchers to assess if the lexical/sublexical effects result at an articulatory level.

  • Age of acquisition influence on lexical retrieval (written word naming). Early-acquired words are written and spelled more quickly and accurately than late-acquired words. In contrast to written picture naming, written word naming involves access to the lexical and sublexical pathways that are not accessed in typing or written picture naming.

  • Age of acquisition influence on lexical retrieval in opaque languages (immediate spoken word naming in opaque language). Early-acquired words are named more quickly and accurately than late-acquired words in opaque languages or deep orthography (i.e. spelling-sound correspondence is not direct where one is able to pronounce the word correctly based on the spelling; e.g. English, French).

  • Age of acquisition influence on lexical retrieval (immediate spoken word naming in transparent language). Early-acquired words are named more quickly and accurately than late-acquired words in transparent languages or shallow orthography (i.e. spelling-sound correspondence is direct where one is able to pronounce the word correctly based on the spelling; e.g. Italian, Spanish).

    • Status: mixed
    • Original paper: ‘Word Frequency Affects Naming Latency in Dutch when Age of Acquisition is Controlled’, Brysbaert 1996; experiment, n = 22. [citations=73(GS, January 2023)]​.
    • Critiques: Brysbaert et al. 2000 [n = 20, citations = 227 (GS, January 2023)]. Cuetos and Barbon 2006 [n = 53, citations = 96(GS, January 2023)]. De Luca et al. 2008 [n = 51, citations = 55(GS, January 2023)]. Ghyselinck et al. 2004 [n = 21, citations = 192(GS, January 2023)]. Raman 2006 [n = 28, citations = 69(GS, January 2023)]. Wilson et al. 2012 [Experiment 1: n = 40, Experiment 2: n = 32, citations = 25(GS, January 2023)]. Wilson et al. 2013 [Experiment 1: n = 27, Experiment 4: n = 33, citations = 37(GS, January 2023)].
    • Original effect size: beta = -0.58.
    • Replication effect size: Brysbaert et al.: ηp2= 0.30 [ηp2 calculated from reported F statistic and converted using this conversion]. Cuetos and Barbon: objective AoA r = .316, subjective AoA: r = .384. De Luca et al.: not reported. Ghyselinck et al.: ηp2 = 0.24 [ηp2 calculated from reported F statistic and converted using this conversion]. Raman: d = 0.48 [d calculated from reported t statistic and converted using this conversion]. Wilson et al.: Experiment 1: ηp2 = 0.07 [ηp2 calculated from reported F statistic and converted using this conversion], Experiment 2: ηp2 = 0.33 [ηp2 calculated from reported F statistic and converted using this conversion]. Wilson et al.: Experiment 1: ηp2 = 0.21 [ηp2 calculated from reported F statistic and converted using this conversion], Experiment 4: ηp2 = 0.48 [ηp2 calculated from reported F statistic and converted using this conversion].

  • Age of acquisition influence on lexical retrieval in logographic languages (spoken character naming in logographic languages). Early-acquired characters are named more quickly and accurately than late-acquired characters in logographic languages such as Japanese and Chinese.

  • Age of acquisition influence on speeded phonological retrieval (transparent language). Early-acquired words are responded to more quickly and accurately than late-acquired words, using a speeded naming paradigm, where participants must name the items as quickly as possible within a short timeframe (e.g. 400 milliseconds). This effect is argued to reduce the influence of semantics on phonological activation, which is argued to accumulate over the word naming process.

    • Status: replicated
    • Original paper: ‘Age-of-acquisition and frequency effects in speeded word naming’, Gerhand and Barry 1999; experiment, n = 30. [citations=118(GS, December 2022)]​.
    • Critiques: Ghyselinck et al. 2004 [n = 23, citations = 192 (GS, December 2022)]. Wilson et al. 2013 [Experiment 2: n = 35, citations = 37(GS, December 2022)].
    • Original effect size: ηp2 = 0.61[ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Ghyselinck et al.: ηp2= 0.25 [ηp2 calculated from reported F statistic and converted using this conversion]. Wilson et al.: ηp2= 0.09 [ηp2 calculated from reported F statistic and converted using this conversion].

  • Age of acquisition influence on lexical retrieval (auditory lexical decision task). Early-acquired words are heard and responded to more quickly and accurately than late-acquired words, using auditory lexical decision tasks where participants have to judge whether they heard a real word or not.

    • Status: replicated
    • Original paper: ‘Lexical Search Speed in Children and Adults’, Cirrin 1984; experimental study, kindergarten children: n = 12; first grade children: n = 11; third grade children: n = 11; adults: n = 11. [citations = 50(GS, December 2022)].
    • Critiques: ​Baumgaertner and Tompkins 1998 [n = 35, citations = 15(GS, December 2022)]. Turner et al. 1998 [n = 20, citations = 161(GS, December 2022)].
    • Original effect size: kindergarten children: r = .485, first grade children: r = .172, third grade children: r = .369, adults: r = .379
    • Replication effect size: Baumgaertner and Tompkins: r = 0.66. Turner et al.: d = 1.30 [d calculated from reported t statistic and converted using this conversion].

  • Age of acquisition influence on lexical retrieval (visual lexical decision in opaque languages). Early-acquired words are seen and responded more quickly and accurately than late-acquired words in opaque languages or deep orthography (i.e. spelling-sound correspondence is not direct where one is able to pronounce the word correctly based on the spelling; e.g. English, French), using visual lexical decision task. Participants have to decide whether they saw a word or not.

  • Age of acquisition influence on lexical retrieval (Visual lexical decision in transparent language). Early-acquired words are responded more quickly and accurately than late-acquired words in transparent languages or shallow orthography (i.e. spelling-sound correspondence is direct where one is able to pronounce the word correctly based on the spelling; e.g. Spanish, Turkish, Italian).

  • Age of acquisition influence on lexical retrieval (Visual lexical decision in logographic languages). Early-acquired logograms are responded more quickly and accurately than late-acquired logograms in logographic languages such as Chinese and Japanese, using a visual lexical decision task.

  • Age of acquisition influence on silent reading (eye-tracking). Early-acquired words show shorter fixations, gaze and total reading times than late-acquired words in sentences and paragraphs, using eye-tracking.

    • Status: replicated
    • Original paper: ‘Investigating the effects of a set of intercorrelated variables on eye fixation durations in reading’, Juhasz and Rayner 2003; experiment, n = 40. [citations=311(GS, December 2022)]​.
    • Critiques: Dirix and Duyck 2017 [n = 14, citations = 17(GS, December 2022)]. Juhasz 2018 [n = 45, citations = 24(GS, December 2022)]​. Juhasz and Rayner 2006 [Experiment 1: n = 32, Experiment 2: n = 40, citations = 185 (GS, December 2022)].[ Juhasz and Sheridan 2020 n = 47, citations = 2(GS, December 2022)].
    • Original effect size: First fixation: beta = 4.01, Single fixation: beta = 6.55, Gaze duration: beta = 6.62, Total duration: beta = 7.00.
    • Replication effect size: Dirix and Duyck: Single fixation: d = 0.95 [d calculated from reported t statistic and converted using this conversion], Gaze duration: d = 0.81 [d calculated from reported t statistic and converted using this conversion], total reading time: d = 0.71 [d calculated from reported t statistic and converted using this conversion]. Juhasz: First fixation: d = 0.27 [d calculated from reported t statistic and converted using this conversion], single fixation: d = 0.29 [d calculated from reported t statistic and converted using this conversion], gaze duration: d = 0.34 [d calculated from reported t statistic and converted using this conversion], total fixation: d = 0.37 [d calculated from reported t statistic and converted using this conversion]. Juhasz and Rayner: Experiment 1: First fixation: ηp2 = 0.29 [ηp2 calculated from reported F statistic and converted using this conversion], Single fixation: ηp2 = 0.23 [ηp2 calculated from reported F statistic and converted using this conversion], Gaze duration: ηp2 = 0.43 [ηp2 calculated from reported F statistic and converted using this conversion], Total duration: ηp2= 0.34 [ηp2 calculated from reported F statistic and converted using this conversion], Experiment 2: First fixation: d = 0.39 [d calculated from reported t statistic and converted using this conversion], Single fixation: d = 0.42 [d calculated from reported t statistic and converted using this conversion], Gaze duration: d = 0.35 [d calculated from reported t statistic and converted using this conversion], Total duration: d = 0.31 [d calculated from reported t statistic and converted using this conversion]; Juhasz and Sheridan: First fixation: d = 0.22, single fixation: d = 0.20, gaze duration: d = 0.20, total fixation: d = 0.27; skipping percentage: d = .17.

  • Age of acquisition influence on name retrieval. The earlier an individual learns a celebrity name and face, the more quickly and accurately the participant will name the celebrity.

    • Status: replicated
    • Original paper: ‘The Effect of Age of Acquisition on Speed and
    • Accuracy of Naming Famous Faces’, Moore and Valentine 1998; experiments, experiment 1 n=30, experiment 2 n= 24, experiment 3 n= 24. [citations= 83(GS, December 2022)]​.
    • Critiques: Smith-Spark et al. 2012 [Experiment: n = 72, citations = 3 (GS, December 2022)].
    • Original effect size: Experiment 1: RT ηp2= 0.37[ηp2 calculated from reported F statistic and converted using this conversion], accuracy: ηp2= 0.40[ηp2 calculated from reported F statistic and converted using this conversion],Experiment 2: RT: ηp2= 0.63[ηp2 calculated from reported F statistic and converted using this conversion]; accuracy: ηp2 = 0.27[ηp2 calculated from reported F statistic and converted using this conversion], Experiment 3: RT: ηp2= 0.38[ηp2 calculated from reported F statistic and converted using this conversion], accuracy: ηp2 = 0.24[ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Smith-Spark et al.: accuracy: ηp2= 0.13[ηp2 calculated from reported F statistic and converted using this conversion]; RT: ηp2= 0.43[ηp2 calculated from reported F statistic and converted using this conversion].

  • Age of acquisition on lexical-semantic processes (translation task). Compared to late-acquired words, early-acquired words in a native or other language are translated more quickly to the other language or the native language, respectively.

    • Status: replicated
    • Original paper: ‘Characteristics of words determining how easily they will be translated into a second language’, Murray 1986; experiment, n = 16 [citations= 12(GS, December 2022)].
    • Critiques: Izura and Ellis 2004 [Experiment 1: n = 20, Experiment 3: n = 20, citations = 102(GS, December 2022)]​. Bowers and Kennison 2011 [n = 36, citations = 19(GS, December 2022)]​.
    • Original effect size: L1 AoA and translate to L1: r = .27, L1 AoA and translate to L2: r = .19.
    • Replication effect size: Izura and Ellis: Experiment 1: ηp² = 0.03 for L1 AoA [ηp2 calculated from reported F statistic and converted using this conversion], ηp² = 0.73 for L2 AoA [ηp2 calculated from reported F statistic and converted using this conversion]; Experiment 3: ηp² = 0.33 for L1 AoA [ηp2 calculated from reported F statistic and converted using this conversion], ηp² = 0.47 for L2 AoA [ηp2 calculated from reported F statistic and converted using this conversion]. Bowers and Kennison: ηp² = 0.90.

  • Age of acquisition influence on lexical-semantic processes (picture-word interference). The pictures of objects whose concept is acquired earlier show smaller semantic interference with simultaneously appearing semantically related words, compared to when the task is done using pictures of objects whose concept is acquired later.

  • Age of acquisition influence on lexical change. In contrast to the meaning of late-acquired words, the meaning of early-acquired words are less likely to change over time in the conceptual representation of the speaker and community.

  • Age of acquisition influence on learning (conceptual learning). The earlier a concept is learned, the more likely the concept will be more strongly consolidated and more likely to be recalled.

    • Status: replicated
    • Original paper: ‘Order of acquisition in learning perceptual categories: A laboratory analogue of the age-of-acquisition effect?’, Stewart and Ellis 2008; experiment, n = 27. [citations=35(GS, December 2022)]​.
    • Critiques: Catling et al. 2013 [n = 16, citations = 19(GS, December 2022)]. Izura et al. 2011 [Experiment 1: n = 25, Experiment 2: n = 26, Experiment 2: n = 24, citations = 78(GS, December 2022)].
    • Original effect size: ηp2= 0.17 [ηp2 calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Catling et al.: naming: ηp2= 0.23 [ηp2 calculated from reported F statistic and converted using this conversion], visual duration threshold ηp2= 0.27 [ηp2 calculated from reported F statistic and converted using this conversion]. Izura et al.: Experiment 1: ηp2= 0.49, Experiment 2: ηp2= 0.20, Experiment 3: delayed picture naming: d = 0.36 [d calculated from reported t statistic and converted using this conversion]/ ηp2= 0.12 [ ηp2 calculated from Cohen’s d and converted using this conversion], immediate picture naming: F < 1, corrected latencies: ηp2= 0.15, lexical decision: ηp2= 0.32, semantic categorisation: ηp2= 0.20.

  • Age of acquisition influence on learning (procedural). The order of learning new actions of a procedures influences the speed and accuracy of recalling the correct position.

    • Status: replicated
    • Original paper: ‘Acquisition and long-term retention of a simple serial perceptual-motor skill’, Neumann and Amons 1956; experiment, n = 20. [citations=56(GS, December 2022)]​.
    • Critiques: Magil 1976 [n = 105, citations = 13 (GS, December 2022)].
    • Original effect size: ηp2= 0.23[ηp2sup> calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Magil: ηp2= 0.06 for position 1 and 2 in block number 3 [ηp2 calculated from reported F statistic and converted using this conversion], ηp2= 0.06 for position 1 and 3 in block number 3 [ηp2 calculated from reported F statistic and converted using this conversion], F < 1 for position 2 and 3 in block number 3, ηp2= 0.05 for position 1 and 2 in block number 4 [ηp2 calculated from reported F statistic and converted using this conversion], ηp2= 0.11 for position 1 and 3 in block number 3 [ηp2 calculated from reported F statistic and converted using this conversion], ηp2= 0.01 for position 2 and 3 in block number 4.

  • Ego depletion. Self-control is a limited resource that can be depleted by efforts to inhibit a thought, emotion or behaviour.

    • Status: not replicated
    • Original paper: ‘Ego Depletion: Is the Active Self a Limited Resource?’, Baumeister 1998, n=67 [citations = 7141 (GS, September 2022)].
    • Critique: Xu et al. 2014, 4 conceptual replications with high-power to detect medium-large effects [citations = 7136 (GS, September 2022)]. Hagger 2016, 23 independent conceptual replications [citations = 1027 (GS, September 2022)]. Vohs et al. 2021, multisite project, n = 3,531 [citations = 63 (GS, September 2022)].
    • Original effect size: not reported (calculated d = -1.96 between control and worst condition).
    • Replication effect size: Xu et al. 2014: hand grip persistence, community adults d = −0.30, young adults d=  −0.002, combined difference d = −0.20; Stroop interference, community adults d = −0.15, young adults d = .21, combined difference d = −0.06. Hagger 2016: d = 0.04 [−0.07, 0.14] (NB: not testing the construct the same way). Vohs et al. 2021:d = 0.06.

  • Dunning-Kruger effect. A cognitive bias whereby people with limited knowledge or competence in a given intellectual or social domain greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general.

    • Status: replicated
    • Original paper: ‘Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments’, Dunning & Kruger 1999. This contains claims (1), (2), and (5) but no hint of (3) or (4) [n=334 undergrads, citations = 8376 (GS, September, 2022)].
    • Critiques: Gignac 2020, [n=929,citations = 53 (GS, September, 2022)]; Nuhfer 2016 and Nuhfer 2017, [n=1154, citations = 34 (GS, September, 2022)]; Luu 2015; Greenberg 2018, n=534; Yarkoni 2010, Jansen 2021 [2 studies, n=2000 each study, citations= 26 (GS, October2022)], Muller 2020 [n= 56, citations= 20 (GS, October 2022)]
    • Original effect size: not reported. Study 1 on humor (n= 15): difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 2.58 [calculated], while for “competent” (top quartile) participants d= -0.55 [calculated]. Study 2 on logical reasoning ( n= 45): difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 5.44 (percieved logical reasoning ability) [calculated], d= 3.48 (test performance) [calculated], while for “competent” (top quartile) participants d= -1.12 [calculated], d= -0.79 (percieved test performance) [calculated]. Study 3 on grammar (n= 84): difference between the actual and estimated performance of “incompetent” (percieved bottom quartile) participants d= 3.42 (percieved ability) [calculated], d= 3.94 (percieved test performance) [calculated], while for “competent” (top quartile) participants d= -1.18 (percieved ability) [calculated], d= -1.27 (perceived test performance) [calculated].
    • Replication effect size: Gignac 2020 (for IQ): when using statistical analysis as in Dunning & Kruger 1999 η2 = 0.20, but running two less-confounded tests, r= −0.05/d= -0.1 [calculated] between P and errors , and r= 0.02/d= 0.04 [calculated] for a quadratic relationship between self-described performance and actual performance. Jansen 2021 (for grammar and logical reasoning): not reported (Bayesian models support the existence of the effect in the data and replicate claim 1). Muller 2020 (for recognition memory): the difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 4.73 [calculated], while for “competent” (top quartile) participants d= -0.88 [calculated].

  • Depressive realism effect. Increased predictive accuracy or decreased cognitive bias among the clinically depressed.

    • Status: reversed
    • Original paper: ‘Judgment of contingency in depressed and nondepressed students: sadder but wiser?’, Alloy and Abramson (1979): 4 experiments with Study 1: n1 = 48, n2 = 48, Study 2: n1 = 32, n2 = 32; Study 3: n1 = 32, n2 = 32; Study 4: n1 = 32, n2 = 32 [citations = 2855 (GS, June 2022)].
    • Critiques: Moore and Fresco 2012 [meta analysis, n = 7305, citations = 311 (GS, June 2022)]
    • Original effect size: not reported. [d= -0.32 calculated for bias about ‘contingency’, how much the outcome actually depends on what you do]
    • Replication effect size: Moore and Fresco 2012: d = -0.07.

  • Hungry judge effect, of massively reduced acquittals just before lunch. Case order isn’t independent of acquittal probability (“unrepresented prisoners usually go last and are less likely to be granted parole”); favourable cases may take predictably longer and so are pushed until after recess; effect size is implausible on priors; explanation involved ego depletion.

    • Status: NA
    • Original paper: ‘Extraneous factors in judicial decisions’, 2011 [n= 8 judges, 1122 judicial rulings, citations = 1626 (GS, October, 2022)].
    • Critiques: Weinshall-Margel 2011 [n= 227 decisions, citations= 79 (GS, October, 2022)], Glöckner 2016, Lakens 2017.
    • Original effect size: d= 1.96, “the probability of a favorable ruling steadily declines from ≈0.65 to [0.05] and jumps back up to ≈0.65 after a break for a meal”, n=8 judges with n=1122 cases.
    • Replication effect size: NA.

  • Multiple intelligences. This theory suggests that there are multiple types of intelligence that can be distinguished from one another, rather than a single general intelligence that underlies all cognitive abilities. Some of the proposed types of intelligence by Gardner are linguistic intelligence, logical-mathematical intelligence, musical intelligence, bodily-kinesthetic intelligence, interpersonal intelligence, intrapersonal intelligence, and naturalistic intelligence. More broadly, this theory can be taken to suggest people have different cognitive strengths and weaknesses.

    • Status: NA
    • Original paper: ‘Frames of Mind: The Theory of Multiple Intelligences’, Gardener 1983; book/theoretical work, n=NA. [citation=45591(GS, March 2022)]​.
    • Critiques: Shearer and Karanian 2017 [n = 172 neuroscience reports, citations = 92 (GS, February 2023)]. Sternberg 1994 [n = NA, citations = 103 (GS, February 2023)]. Tirri and Nokelainen 2008 [n = 410, citations = 92 (GS, February 2023)]. Visser et al. 2006 [n = 200, citations = 379 (GS, February 2023)]. Waterhouse 2006 [n = NA, citations = 103 (GS, February 2023)].
    • Original effect size: No empirical data collected [Allix, 2000; Lubinski & Benbow, 1995; Sternberg 1994; Waterhouse, 2006; Gardner acknowledged lack of empirical data, 2004 (p. 214)].
    • Replication effect size: Shearer and Karanian: NA; descriptive statistics, neural patterns consistent with Gardner’s hypothesis. Tirri and Nokelainen: NA; Confirmatory Factor Analysis; supports existence of logical-mathematical and spatial intelligences. Visser et al.: NA; Factor Analysis, modest support for Gardner. Strong loadings on g factor.

  • Brain training on intelligence - far transfer from daily computer training games to fluid intelligence. Transfer of knowledge and skills from daily computer training games to fluid intelligence in general, in particular from the Dual n-Back game.

    • Status: mixed
    • Original paper: ‘Improving fluid intelligence with training on working memory’, Jaeggi 2008; experimental design, n=70. [citations= 2840 (GS, October 2022)].
    • Critiques: Melby-Lervåg 2013 [meta-analysis of 23 studies, citations= 2156 (GS, October 2022)]. Gwern 2012 [meta-analysis of 45 studies, citations= NA(GS, April 2023)]. Reddick 2013 [n= 73, citations= 824 (GS, October 2022)]. Lampit 2014 [meta-analysis of 52 studies, n= 4885, citations= 809 (GS, October 2022)]. Berger 2020 [n= 572, citations= 22 (GS, October 2022)]. Simons 2016 [comprehensive review of literature, n=NA, citations= 1015 (GS, October 2022)].
    • Original effect size: d= 0.4 over control, 1-2 days after training.
    • Replication effect size: Melby-Lervåg: d= 0.19 [0.03, 0.37] nonverbal; d= 0.13 [-0.09, 0.34] verbal. Gwern: d= 0.1397 [-0.0292, 0.3085], among studies using active controls. Reddick: found “no positive transfer to any of the cognitive ability tests”, all ηp2 < 0.054. Lampit : g=  0.24 [0.09, 0.38] nonverbal memory; g=  0.08 [0.01, 0.15] verbal memory; g =  0.22 [0.09, 0.35] working memory; g = 0.31 [0.11, 0.50] processing speed; g =  0.30 [0.07, 0.54] visuospatial skills. Berger(RCT in 6-7 year olds): d= 0.2 to 0.4, but many of the apparent far-transfer effects come only 6-12 months later, i.e. well past the end of most prior studies.

  • Brain training on intelligence - music lessons improve intelligence. An original experimental study found an increase in IQ for children who received a year of music lessons, compared to children who were randomly assigned to drama lessons or no lessons.

    • Status: not replicated
    • Original paper: ‘Music lessons enhance IQ’, Schellenberg 2004; randomised control trial, n=144. [citations = 1424, GS, December 2021)]​.
    • Critiques: Mehr et al. 2013 [Study 1 n=29, Study 2 n=55, citations=52 (GS, December 2021)]. D’Souza & Wiseheart 2018 [n=75, citations=20 (GS, December 2021)].
    • Original effect size: d= 1.948.
    • Replication effect size: Mehr et al.: Wilks' λ = .851/η2p= 0.077 [calculated]. D’Souza & Wiseheart: for task switching: Bayes Factor (BF) inclusion= 1.964 (weak evidence); for processing speed BF inclusion= 0.757 (box completion task), 0.243 (symbol copy task), 0.213 (symbol coding task) (weak evidence); for working memory: BF inclusion= 0.216 (digit span forward task), 0.138 (the digit span backward task), 0.004 (self-ordered pointing task) (weak evidence); for inference control: BF inclusion= 0.137 (flanker task), 0.007 (Stroop task) (weak evidence); for nonverbal intelligence: BF inclusion= 0.778 (Peabody Picture Vocabulary Test) (weak evidence).

  • Bilingual advantages in executive control - inhibition. Speaking two languages improves general cognitive control processes (executive control).

    • Status: mixed
    • Original paper: ‘How does bilingualism improve executive control? A comparison of active and reactive inhibition mechanisms’, Colzato et al. 2008; 3 experiments, Study 1: n1 = 16 monolingual and n2 = 16 bilingual; Study 2: n1 = 12 bilinguals and n2 = 18 monolinguals; Study 3: n1 = 18 monolinguals and n2 = 18 bilinguals for experiment 3. [citation = 421(GS, October 2021)].
    • Critique: De Bruin et al. 2015 (meta-analysis, n=128, citations=547(GS, May 2022)]. Gunnerud et al. 2020 [meta-analysis, n=143 independent group comparisons comprising 583 EF effect sizes, citations=102 (GS, December 2021)]. Kappes 2015 (Experiment 3: 38 bilingual, 40 monolingual, citations: 0 (Unpublished)]. Paap et al. 2013 [n=286, citations=1007 citations (GS, December 2021)]. Sanchez-Azanza et al. 2017 [systematic review, n=189, citations=38(GS, May 2022)]. Bialystok et al. 2004 [n=40 in study 1, n=94 in study 2, n=20 in study 3, citations=2350(GS, January 2023)].
    • Original effect size: r = .22 ± .48.
    • Replication effect size: De Bruin et al.: ηp2 = .073 (challenge vs. support), ηp2 = .089 (all 4 result outcomes). Gunnerud et al.: The bilingual advantage in overall EF was significant, albeit marginal (g = 0.06), and there were indications of publication bias. Kappes: r = .06 ± .36. Paap et al.: Inhibitory control (Simon task) ηp2=.69, Mixing cost η2=.52, Switching cost η2=.67. Sanchez-Azanza et al.: ηp2 = .363 (paper category), ηp2 = .281 (year), ηp2 = .155 (paper category and year interaction).

  • Bilingual advantages in executive control - Non-verbal task switching. The idea that bilingual language switching on a daily basis makes bilinguals better at general non-verbal task switching, compared to monolinguals who do not perform this extensive daily language switching.

    • Status: mixed
    • Original paper: ‘Bilingual language switching in naming: Asymmetrical costs of language selection’, Meuter and Allport 1999 (conceptual original article); within-group design, sample size = 16. [citations = 1557(GS, January 2023)]​.
    • Critiques: de Bruin et al. 2015 [n1 = 28, n2 = 24, n3 = 24, citations = 110(GS, January 2023)]. Paap and Greenberg 2013 [study 1: n1 = 30, n2 = 44; Study 2: n1 = 31; n2 = 49; study 3: n1 = 48; n2 = 51, citations = 1135(GS, January 2023)]. Prior and Macwhinney 2009 [n1 = 32, n2 = 47, citations = 782(GS, January 2023)]. Stasenko et al. 2017 [n1 = 80, n2 = 80, citations = 55(GS, January 2023)]. Timmermeister et al. 2020 [n1 = 27, n2 = 27, citations = 8(GS, January 2023)].
    • Original effect size: NA.
    • Replication effect size: de Bruin et al.: ηp2 (language group X trial type) = .74; ηp2 (raw switching costs) = .09; ηp2 (proportional switching) = ns; ηp2 (language group X trial type) = ns; (mixed). Paap and Greenberg: ηp2 (study 1)=.001; ηp2 (study 2)= .014; ηp2 (study 3)= .000; ηp2 (all bilingual vs. monolingual)= .004; (not replicated). .Stasenko et al.: ηp2 (CTI)=.892, ηp2 (trial type) = 488; ηp2 (half) = .339; ηp2 (CTI X language group)=.037; ηp2 (CTI X half) = .259; ηp2 (CTI X trial type) = .079; ηp2 (CTI X trial type X half) = .025; ηp2 (trial type X half X group) = .044; d (language group in trials half 1) = .34; d (language group in trials half 2) = ns; ηp2 (CTI X group, on only switch trials) = .55; ηp2 (CTI X group, on only switch trials) = ns; ηp2 (CTI X half, on error rates for bilinguals only) = .059; (mixed). Timmermeister et al.: ηp2 (accuracy and switching costs)= 0.10; ηp2 (MANCOVA with the previous factors, and SES and knowledge of Dutch as covariates) = 0.03; ηp2 (RTs and mixing costs) = 0.13; ηp2 (MANCOVA with the previous factors, and SES and knowledge of Dutch as covariates) = 0.06; (not replicated).

  • Bilingual advantages - theory of mind. Bilingual children are more likely to score higher in Theory of Mind tasks than monolingual counterparts, using an unexpected transfer task.

    • Status: mixed
    • Original paper: ‘The effects of bilingualism on theory of mind development’, Goetz 2003; experiment, English monolinguals: n = 32, Mandarin monolinguals: n = 32, English-Mandarin bilinguals: n = 40. [citations = 486 (GS, January 2023)].
    • Critiques: Dahlgren et al. 2017 [Monolinguals: n = 14, bilinguals: n = 14, citations = 23 (GS, January 2023)]. Diaz and Farrar 2018 [Monolinguals: n = 33, Bilinguals: n = 32, citations = 23(GS, January 2023)]. Farhadian et al. 2010 [Monolinguals: n = 65, bilinguals: n = 98, citations = 61 (GS, January 2023)]. Gordon 2016 [Monolinguals: n = 26, bilinguals n = 26, citations = 24(GS, January, 2023)].
    • Original effect size: ηp2 = 0.06 [_ηp2 _calculated from reported F statistic and converted using this conversion].
    • Replication effect size: Dahlgren et al.: not reported. Diaz and Farrar: ηp2 = .063. Farhadian et al.: d = 0.40 [_d _calculated from mean differences and standard deviation and converted using this conversion]. Gordon: d = 0.123.

  • Bilingual advantages - perspective taking in referential communication. Bilingual children are more likely to score higher in Director tasks than monolingual counterparts, using the director task.

    • Status: replicated
    • Original paper: ‘The exposure advantage: Early exposure to a multilingual environment promotes effective communication’, Fan et al. 2015; experiment, English monolingual children: n = 24, monolingual children exposed to other languages: n = 24, bilingual children: n = 24. [citations = 260 (GS, January 2023)].
    • Critiques: Navarro and Conway 2021 [Monolingual adults: n = 26, bilinguals n = 28, citations=10(GS, January, 2023)].
    • Original effect size: bilingual vs. monolingual: d = 0.83, bilingual vs. monolingual exposed to other languages: d = 0.02.
    • Replication effect size: Navarro and Conway: director task experimental condition: d = -0.51, director task control condition: d = 0.29. Non-director task: ηp2 = .01. ​

  • Exposure to another language in social communication - perspective taking in referential communication. Children who are exposed to a second language are more likely to score higher in Director tasks than children who are not exposed to a second language, using the director task.

    • Status: replicated
    • Original paper: ‘The exposure advantage: Early exposure to a multilingual environment promotes effective communication’, Fan et al. 2015; experiment, English monolingual children: n = 24, monolingual children exposed to other languages: n = 24, bilingual children: n = 24. [citations = 260 (GS, January 2023)].
    • Critiques: Agostini et al. 2022 [preprint, high exposure for monolingual children: n =32, lower exposure for monolingual children: n = 29, no exposure monolingual children: n = 38, citations=0 (GS, January, 2023)].
    • Original effect size: monolinguals exposed to other languages vs. monolingual: d = 0.74.
    • Replication effect size: Agostini et al.: T1: not reported, T2: not reported.

  • Bilingual disadvantages in creativity - fluency. Monolinguals are more likely to rapidly produce a large number of ideas or solutions to a problem than bilinguals, using the Torrance Test.

    • Status: mixed
    • Original paper: ‘An Intercultural Study of Non-Verbal Ideational Fluency’, Gowan and Torrance 1965; experiment, monolingual children: n = 853, bilingual children: n = 555. [citations=35(GS, January 2023)]​.
    • Critiques: Kharkhurin 2008 [bilingual adults: n =103, monolingual adults: n = 47, citations=163(GS, January 2023)]. Kharkhurin 2017 [bilingual adults: n =58, monolingual adults: n = 28, citations=27(GS, January 2023)]. Torrance et al. 1970 [monolingual children: n = 527, bilingual children: n = 536, citations=241(GS, January 2023)].
    • Original effect size: not reported.
    • Replication effect size: Kharkhurin: ηp2 = 0.07 [_ηp2 _calculated from reported F statistic and converted using this conversion]. Kharkhurin: not reported. Torrance et al.: d = 0.27 [d calculated from reported t statistic and converted using this conversion]. ​

  • Monolingual advantages in creativity - Flexibility. Monolinguals are more likely to consider a variety of approaches to a problem simultaneously than bilinguals, using the Torrance Test.

    • Status: reversed
    • Original paper: ‘Creative functioning of monolingual and bilingual children in Singapore’, Torrance et al. 1970; experiment study design, monolingual children: n = 527, bilingual children: n = 536. [citations=241(GS, January 2023)]​.
    • Critiques: Kharkhurin 2008 [bilingual adults: n =103, monolingual adults: n = 47, citations=163(GS, January 2023)]. Kharkhurin 2017 [bilingual adults: n =58, monolingual adults: n = 28, citations=27(GS, January 2023)].
    • Original effect size: Torrance et al.: d = 0.20 [_d _calculated from reported t statistic and converted using this conversion].
    • Replication effect size: Kharkhurin ηp2 = 0.04 [ηp2 calculated from reported F statistic and converted using this conversion]. Kharkhurin: ηp2 = 0.07.

  • Null Bilingual advantages in creativity - Originality. There should be no difference between bilinguals and monolinguals in the tendency to produce ideas different from those of most other people, using the Torrance Test. ​

    • Status: replicated Original paper: ‘Creative functioning of monolingual and bilingual children in Singapore’, Torrance et al. 1970; experiment study design, monolingual children: n = 527, bilingual children: n = 536. [citations=241(GS, January 2023)]​.
    • Critiques: Kharkhurin 2008 [bilingual adults: n =103, monolingual adults: n = 47, citations=163(GS, January 2023)]. Kharkhurin 2017 [bilingual adults: n =58, monolingual adults: n = 28, citations=27(GS, January 2023)]. ​
    • Original effect size: Torrance et al.: d = 0.03 [d calculated from reported t statistic and converted using this conversion].
    • Replication effect size: Kharkhurin: not reported. Kharkhurin : not reported.

  • Null bilingual advantages in creativity - Elaboration. There should be no difference between bilinguals and monolinguals in the tendency to think through the details of an idea, using the Torrance Test.

    • Status: mixed Original paper: ‘Creative functioning of monolingual and bilingual children in Singapore’, Torrance et al. 1970; experiment study design, monolingual children: n = 527, bilingual children: n = 536. [citations=241(GS, January 2023)]​.
    • Critiques: Kharkhurin 2008 [bilingual adults: n =103, monolingual adults: n = 47, citations=163(GS, January 2023)]. Kharkhurin 2017 [bilingual adults: n =58, monolingual adults: n = 28, citations=27(GS, January 2023)]. ​
    • Original effect size: Torrance et al.: d = 0.06 [d calculated from reported t statistic and converted using this conversion].
    • Replication effect size: Kharkhurin: ηp2 = 0.01[ηp2 calculated from reported F statistic and converted using this conversion]. Kharkhurin: not reported.

  • Mozart effect. Listening to Mozart’s sonata for two pianos in D major (KV 448) enhances performance on spatial tasks in standardised tests.

    • Status: not replicated
    • Original paper: ‘Music and spatial task performance’, Rauscher et al. 1993; experimental design, n=36. [citations= 2110 (GS, November 2021)].
    • Critiques: Pietschnig et al. 2010 [meta analysis: k=39, citations= 235 (GS, November 2021)]. Steele et al. 1999a [n=86, citations=555 (GS, November 2021)]. Steele et al. 1999b [n=206, citations=126 (GS, November 2021)].
    • Original effect size: d= 1.5 [0.65, 2.35].
    • Replication effect size: All reported in Pietschnig et al.: Adlmann: d = 0.57 [0.25, 0.89]. Carstens: Study 1: d = -0.22 [-0.89, 0.45]; Study 2: d = 0.47 [-0.23, 1.17]. Cooper: d = 0.42 [-0.23, 1.08]. Flohr: Study 1: d = 0.14 [-0.35, 0.63]; Study 2: d = 0.16 [-0.26, 0.58]. Gileta: Study 1: d =0.13 [-0.26, 0.51]; Study 2: d = -0.05 [-0.43, 0.34]. Ivanov: d = 0.77 [0.20, 1.34]. Jones: d = 0.92 [0.27, 1.56]. Jones: d = 0.54 [0.11, 0.97]. Kenealy: d = -0.22 [-1.08, 0.64]. Knell: d = 0.45 [0.13, 0.77]. Lints: d = -0.37 [0.75, 0.02]. McClure: d = 0.46 [-0.02, 0.95]. Nantals: Study 1: _d _= 0.77 [-0.07, 1.61]; Study 2: d = 0.06 [-0.72, 0.84]. Rauscher and Hayes: d = 0.52 [0.18, 0.86]. Rauscher and Ribar: Study 1: d = 1.81 [1.24, 2.37]; Study 2: d = 0.93 [0.46, 1.39]. Rideout: d = 1.54 [-0.67, 3.75]. Rideout: d = 1.01 [0.19, 1.82]. Rideout: d =1.01 [-0.21, 2.23]. Rideout: d = 0.28 [-1.04, 1.60]. Siegel: d = 0.26 [-0.39, 0.91]. Spitzer: d = 0.01 [-0.32, 0.33]; Steele et al.: _d _= 0.85 [0.41, 1.30]. Steele, Dalla Bella, et al.: Study 1: d = 0.49 [-0.01, 1.00]; Study 2: d = -0.41 [1.15, 0.33]. Steele, Dalla Bella, et al.: d = 0.85 [0.41, 1.30]. Steele, Brown and Stoecker: d=0.20 [-.08, 0.48]. Sweeny: Study 1: d = -0.43 [-0.93, 0.07]; Study 2: d = -0.06 [-0.56, 0.42]; Study 3: d = 0.14 [-0.37, 0.65]. Twomey: d = 0.63 [-0.01, 1.27]. Wells: d = -0.18 [-0.83, 0.47]. Wilson: d =0.85 [-0.44, 2.13]. Pietschnig et al.: meta-analytic estimate: d = 0.37 [0.23, 0.52].

  • Education enhances intelligence. Education has a consistent positive effect on intelligence. A meta-analysis suggests that one additional year of education corresponds to a gain of approximately 1 to 5 IQ points (contingent on study design, inclusion of moderators, and publication-bias correction).

  • Automatic imitation. The observation of the topographical features of an action facilitates the execution of a similar action in the observer. Humans are prone to automatically imitate others. Automatic imitation differs from spatial compatibility effects and provides an important tool for the investigation of the mirror neuron system, motor mimicry, and complex forms of imitation.

    • Status: mixed
    • Original papers: ‘Evidence for visuomotor priming effect’, Craighero et al. 1996; visuomotor priming, n = 17 [citation=219 (GS, June 2022)]​.
    • Critiques: Akzel 2012 [n=114, citations=13(GS, June 2022)]. Akzel 2015 [n=102, citations=7(GS, June 2022)]. Brass et al. 2000 [n1 = 8, n2 = 8, n3 = 8 citations = 885 (GS, June 2022)]. Meta-analysis: Cracco et al. 2018 [n=226 experiments, citations=134 (GS, June 2022)].
    • Original effect size: N/A.
    • Replication effect size: Akzel: n.s. Brass et al.: Experiment 1 ηp2 = 0.93, Experiment 2 ηp2 = 0.94, Experiment 3 - ηp2 = 0.39 (n.s.) [all ηp2 calculated from reported F statistic and converted using this conversion]. Cracco et al.: gz = 0.95 [0.88, 1.02]. ​

  • Congruency sequence effect (conflict adaptation or Gratton effect). A cognitive phenomenon in which the processing of stimuli is affected by the stimuli that preceded it e.g. congruency effects are smaller following incongruent trials rather than congruent trials.

    • Status: mixed
    • Original paper: ‘Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention’, Neely 1977; speeded word–nonword classification task, n = 120. [citation = 3963 (PSYCNET.APA, January 2023)].
    • Critiques: Aczel et al. 2021 [Kan et al. 2013 replication, n=103, 70 and 38 participants for Experiments 1, 2 and 3, citations=4(GS, Feb 2022)]. Gratton et al. 1992 [n1=6, n2=5, n3a=6, n3b= 8, citation = 2004 (GS, April 2023)]. Gyurkovics et al. 2020 [n=489 over four tasks, citations=3(GS, April 2023)]. Kan et al. 2013 [n = 41 in Experiment 1; n = 28 in Experiment 2; n = 15 in Experiment 3, citation=81(GS, February 2022)]
    • Original effect size: Greatest facilitation in Non-shift-Expected-Related word X target condition and greatest inhibition effects in Shift-Unexpected-Unrelated and Nonshift-Unexpected-Unrelated conditions; η2 = 0.689 (calculated from the reported F(4, 84) = 46.85, using this conversion).
    • Replication effect size: Aczel et al.: The congruency sequence effect for the RT analysis was inconclusive in all three experiments, ηp2 =0.00 to 0.02 (calculated from the reported _F _statistic), and for the accuracy in two out of three experiments, ηp2 =0.00 to 0.04 (calculated from the reported _F _statistic). Gratton et al.: compatible vs. incompatible trials, Reaction time ηp2 =0.88 to 0.94 (calculated from the reported _F _statistic), Error rate - ηp2 =0.59 to 0.98 (calculated from the reported _F _statistic). Gyurkovics et al.: ηp2=.40-.96. Kan et al.: congruent vs. incongruent trials, Stroop accuracy ηp2 =0.14 to 0.57 (calculated from the reported F statistic), Stroop reaction time ηp2 =0.19 to 0.46 (calculated from the reported F statistic).

  • Action-sentence Compatibility Effect (ACE). Participants’ movements are faster when the direction of the described action (e.g., Mark dealt the cards to you) matches the response direction (e.g., toward).

    • Status: not replicated
    • Original paper: ‘Grounding language in action’, Glenberg and Kaschak 2002; experimental design, Experiment 1: n= 44, Experiment 2A: n= 70, Experiment 2B: n= 72. [citations= 2870 (GS, October, 2022)].
    • Critiques: Morey et al. 2022 [pre-registered multi-lab replication, 18 labs, n= 1278, citations= 30 (GS, October 2022)].
    • Original effect size: Experiment 1: ηp2= 0.186 [calculated]. Experiment 2A: ηp2 = 0.051 [calculated].
    • Replication effect size: Morey et al.: for native English speakers d= 0.0036; for non-native English speakers d= -0.019.

  • The attentional spatial-numerical association of response codes (Att-SNARC) effect. The finding that participants had quicker detects to left-side targets preceded by small numbers and to the right-side targets preceded by large numbers. This finding triggered many assumptions about the number representations grounded in body experience.

    • Status: mixed
    • Original paper: ‘The mental representation of parity and number magnitude’, Dehaene et al. 1993; 9 experiments of timed odd-even judgements investigated how parity and number magnitude were accessed from Arabic and verbal numerals, Experiment 1: n=20, Experiment 2: n=20, Experiment 3: n=12, Experiment 4:, n=20, Experiment 5: n=10, Experiment 6: n=8, Experiment 7: n=20, Experiment 8: n=24, Experiment 9: n=24. [citations= 3233 (GS, January 2023)].
    • Critiques: Fischer et al. 2003 [n=15, citations= 857 (GS, January 2023)]. Colling et al. 2020 [n=1105 at 17 labs, citations= 34, (GS, January 2023)]. Wood et al. 2008 [n=46 studies (meta analysis), citations= 545, (GS, January 2023)].
    • Original effect size: NA.
    • Replication effect size: Fischer et al.: not reported. All reported in Colling et al.: The estimate for a 250 ms interstimulus-interval (ISI) condition [90% CI]: Fischer et al.: −5.00 ms [−12.48, 2.48]. Ansari: 1.22 ms [−1.74, 4.19]. Bryce: −0.25 ms [−3.20, 2.71]. Chen: −2.59 ms [−5.25, 0.06]. Cipora: 2.65 ms [−0.15, 5.44]. Colling (Szucs):−1.93 ms [−4.39, 0.54]. Corballis: −0.25 ms [−3.03, 2.53]. Hancock: 0.55 ms [−2.50, 3.61]. Holmes: −0.67 ms [−3.34, 2.00]. Lindemann: 0.13 ms [−3.33, 3.59]. Lukavský: −0.06 ms [−2.52, 2.40]. Mammarella: −1.66 ms [−3.95, 0.63]. Mieth: 1.01 ms [−1.30, 3.31]. Moeller: −0.34 ms [−3.32, 2.64]. Ocampo: −0.44 ms [−3.05, 2.18]. Ortiz-Tudela: 0.51 ms [−2.27, 3.28]. Toomarian: 0.37 ms [−2.35, 3.08]. Treccani: 0.38 ms [−2.70, 3.46]. Model 1 (No Moderators): −0.05 ms [−0.82, 0.71]. Model 2 (Consistent Right-Starter): 0.29 ms [−0.89, 1.47]. Model 2 (Consistent Left-Starter): 0.12 ms [−1.24, 1.48]. Model 3 (Left-to-Right): 0.10 ms [−0.87, 1.06]. Model 3 (Not Left-to-Right): −1.65 ms [−3.58, 0.28]. Model 4 (Left-Handed): −1.83 ms [−3.88, 0.22]. Model 4 (Right-Handed): −0.03 ms [−0.72, 0.66]; the estimate for a 500 ms interstimulus-interval (ISI) condition: Fischer et al.:18.00 ms [7.51, 28.49]. Ansari: 0.72 ms [−1.89, 3.32]. Bryce: −0.13 ms [−2.78, 2.52]. Chen: 2.79 ms [0.45, 5.12]. Cipora: 0.27 ms [−1.79, 2.33]. Colling (Szucs):−0.48 ms [−3.45, 2.49]. Corballis: 0.09 ms [−2.33, 2.52]. Hancock: 2.21 ms [−0.29, 4.71]. Holmes: 0.99 ms [−1.95, 3.94]. Lindemann: −1.56 ms [−5.31, 2.19]. Lukavský: −1.10 ms [−3.61, 1.40]. Mammarella: 1.54 ms [−0.08, 3.16]. Mieth: 4.19 ms [2.23, 6.14]. Moeller: 0.57 ms [−2.88, 4.01]. Ocampo: 3.88 ms [1.54, 6.23]. Ortiz-Tudela: −3.43 ms [−6.30, −0.55]. Toomarian: 3.16 ms [0.53, 5.80]. Treccani: −0.42 ms [−2.61, 1.77]. Model 1 (No Moderators): 1.06 ms [0.34, 1.78]. Model 2 (Consistent Right-Starter): 1.24 ms [0.15, 2.32]. Model 2 (Consistent Left-Starter): 0.18 ms [−1.03, 1.39]. Model 3 (Left-to-Right): 0.91 ms [−0.02, 1.83]. Model 3 (Not Left-to-Right): 2.21 ms [−0.27, 4.69]. Model 4 (Left-Handed): 1.69 ms [−0.28, 3.65]. Model 4 (Right-Handed): 0.95 ms [0.07, 1.84]; the estimate for a 750 ms interstimulus-interval (ISI) condition: Fischer et al.:23.00 ms [8.30, 37.70]. Ansari: −4.07 ms [−6.76, −1.37]. Bryce: −0.69 ms [−3.19, 1.82]. Chen: 0.08 ms [−2.56, 2.72]. Cipora: −1.58 ms [−3.68, 0.53]. Colling (Szucs):0.70 ms [−1.53, 2.94]. Corballis: 0.30 ms [−2.51, 3.11]. Hancock: −1.44 ms [−4.02, 1.14]. Holmes: 0.35 ms [−2.48, 3.19]. Lindemann: 2.45 ms [−0.43, 5.33]. Lukavský: 1.48 ms [−1.29, 4.24]. Mammarella: −0.60 ms [−2.47, 1.26]. Mieth: 0.61 ms [−1.17, 2.39]. Moeller: 0.66 ms [−1.57, 2.88]. Ocampo: 5.75 ms [3.44, 8.06]. Ortiz-Tudela: −1.73 ms [−4.93, 1.48]. Toomarian: 0.35 ms [−2.61, 3.31]. Treccani: −2.18 ms [−4.36, 0.01]. Model 1 (No Moderators): 0.19 ms [−0.53, 0.90]. Model 2 (Consistent Right-Starter): 0.13 ms [−0.97, 1.23]. Model 2 (Consistent Left-Starter): −0.03 ms [−1.23, 1.18]. Model 3 (Left-to-Right): 0.24 ms [−0.68, 1.17]. Model 3 (Not Left-to-Right): −2.25 ms [−4.31, −0.20]. Model 4 (Left-Handed): −1.92 ms [−4.03, 0.19]. Model 4 (Right-Handed): 0.24 ms [−0.84, 1.31]; the estimate for a 1,000 ms interstimulus-interval (ISI) condition: Fischer et al.:11.00 ms [1.47, 20.53]. Ansari: 1.22 ms [−1.03, 3.48]. Bryce: 0.53 ms [−1.90, 2.96]. Chen: −1.71 ms [−3.90, 0.49]. Cipora: −1.09 ms [−3.31, 1.12]. Colling (Szucs):2.48 ms [0.28, 4.68]. Corballis: 0.67 ms [−1.55, 2.89]. Hancock: −0.18 ms [−2.78, 2.42]. Holmes: 0.36 ms [−1.97, 2.69]. Lindemann: 2.06 ms [−0.83, 4.95]. Lukavský: −3.86 ms [−7.10, −0.63]. Mammarella: 1.42 ms [−0.34, 3.18]. Mieth: −0.57 ms [−2.66, 1.51]. Moeller: 0.97 ms [−2.31, 4.25]. Ocampo: −1.34 ms [−3.84, 1.15]. Ortiz-Tudela: −0.39 ms [−2.99, 2.21]. Toomarian: 2.44 ms [0.11, 4.76]. Treccani: −1.39 ms [−3.53, 0.74]. Model 1 (No Moderators): −1.27 ms [−3.29, 0.75]. Model 2 (Consistent Right-Starter): 0.12 ms [−1.12, 1.35]. Model 2 (Consistent Left-Starter): 0.42 ms [−0.71, 1.55]. Model 3 (Left-to-Right): 0.50 ms [−0.54, 1.54]. Model 3 (Not Left-to-Right): 0.29 ms [−0.62, 1.19]. Model 4 (Left-Handed): 0.18 ms [−0.51, 0.88]. Model 4 (Right-Handed): −2.51 ms [−4.59,-0.43]. Wood et al.: Pooled size of the SNARC effects - Parity d= -0.99; Magnitude classification (fixed standard) d=-1.04; Magnitude comparison (variable standard) d=-0.59; Tasks without semantic manipulation d=-0.60; bimanual response d=-0.79; eye saccades latency d=-1.20; eye saccade amplitudes d=-0.07; manual bisection d=-1.08; pointing RT d=-1.02; pointing MT_ d_= -0.94; unimanual finger response _d_=-1.69; naming _d_=0.09; foot response _d_=-1.59; grip aperture _d_=-3.29. All reported in Wood et al.: Shaki and Petrusic: intermixed adj. _R2_=.45; negative blocked adj. _R2_=.94; positive blocked adj. _R2_=.94. Shaki et al.: adj. _R2_=.92. Bachot et al.: control children adj. _R2_=.42; VSD children adj. _R2_=.24. Gevers et al.: adj. _R2_=.82. Castronovo & Seron: blind participants adj. _R2_=.92; sighted participants adj. _R2_=.93. Nuerk et al.: adj. _R2_=.96. Fischer and Rottmann: whole interval adj. _R2_=.69; negative interval adj. _R2_=0.01. Bull et al.: deaf participants adj. _R2_=.94; hearing participants adj. _R2_=.60. Ito and Hatta: adj. _R2_=.16. Bächthold et al.: ruler task adj. _R2_=.96; clock-face task adj. _R2_=.97.

  • Scarcity effect - Attention. Having too little resources leads individuals to misallocate attention, leading to consequences such as overborrowing. Study 1 examined whether scarcity causes greater cognitive fatigue, measured by poorer performance on a cognitive ability task.​

    • Status: mixed
    • Original paper: ‘Some consequences of having too little’, Shah et al. 2012; 5 experiments with Study 1: n=60; Study 2: n=68; Study 3: n=143; Study 4: n=118; Study 5: n=137. [citations=1403 (GS, April 2022)].
    • Critiques: Camerer et al. 2018 [n=619, citations=855(GS, November 2021)]. O’Donnell et al. 2021 [n=668, citations=0(GS, November 2021)]. Shah et al. 2019 [n=997, citations=19(GS, November 2021)].
    • Original effect size: r = .267.
    • Replication effect size: Camerer et al.: r = -.015; O’Donnell et al.: r= -.039; Shah et al.: η2 = .004.

  • Scarcity effect - Meaning in life. Threats to people’s sense that they can afford things that they need in the present and foreseeable future, undermines perceptions of meaning in life.​

  • Scarcity effect - Discounting. A negative income shock was associated with increased discounting rates for gains and loses.​

  • Scarcity effect - Physical pain. The higher the economic insecurity is associated with the higher the physical pain.

  • Scarcity effect - Self expansion. Lower self-concept clarity (conceptualised as a finite resource) is associated with lower self-expansion.​

  • Scarcity effect - Wellbeing. Imagining having less time available in one’s current city is positively associated with well-being.​

  • Scarcity effect - Decision making. Lacking time or money can lead to making worse decisions.​

    • Status: not replicated
    • Original paper: ‘Poverty impedes cognitive function’, Mani et al. 2013; experimental design, Study 4, n=96. [citation=2332(GS, November 2021)]​.
    • Critiques: O’Donnell et al. 2021 [n=229, citations=0(GS, November 2021)].
    • Original effect size: r= .205.
    • Replication effect size: r= .024.

  • Scarcity effect - Opportunity costs. Poor people are more likely to consider opportunity costs spontaneously.

  • Scarcity effect - Conscious thoughts. Thoughts triggered by financial concerns intrude more often into consciousness of poorer individuals than for wealthier individuals.​

    • Status: not replicated
    • Original paper: ‘Money in the mental lives of the poor’, Shah et al. 2018; experimental design, Study 3, n=568. [citation=78(GS, November 2021)]​.
    • Critiques: O’Donnell et al. 2021 [n=1334, citations=0(GS, November 2021)].
    • Original effect size: r = .111.
    • Replication effect size: r = .027.

  • Scarcity effect - Absoluteness of losses. Poorer individuals view losses in more absolute, rather than relative, terms than do wealthier individuals.​

    • Status: not replicated
    • Original paper: ‘Scarcity frames value’, Shah et al. 2015; experimental design, study 6, n=73. [citation=315(GS, November 2021)]​.
    • Critiques: O’Donnell et al. 2021 [n=209, citations=0(GS, November 2021)].
    • Original effect size: r= .264.
    • Replication effect size: r= .090.

  • Bottomless soup bowl. Visual cues related to portion size increase intake volume of soup.

  • Simon effect. Faster responses are observed when the stimulus and response are on the same side than when the stimulus and response are on opposite sides.

    • Status: mixed
    • Original paper: ‘Choice reaction time as a function of angular stimulus-response correspondence and age’, Simon and Wolf 1963; experimental design, n1 = 20, n2 = 20. [citation=289(GS, June 2022)]​.
    • Critiques: Ehrenstein 1994 [n1=12, n2=14, citations=27(GS, June 2022)]. ​ Marble and Proctor 2000 [n1=48, n2=20, n3=32, n4=80, citations=89(GS, June 2022)]. Proctor et al. 2000 [n1=64, n2=64, citations=74(GS, June 2022)]. Theeuwes et al. 2014 [n1=30, n2=30, n3=30, n4=30, citations=30(GS, June 2022)].
    • Original effect size: not reported but could be calculated.
    • Replication effect size: Ehrenstein: not reported but could be calculated. Marble and Proctor: not reported but could be calculated. Proctor et al.: not reported but could be calculated. Theeuwes et al.: ηp ² (the compatible S-R instructions condition vs. the incompatible S-R instructions condition)=.12; ηp ²(the compatible S-R instructions condition vs. the incompatible practised S-R instructions condition)=.07; ηp ²(the incompatible S-R instructions condition vs. the compatible S-R instructions condition)=.21; ηp ² (e incompatible practised S-R instructions condition vs. the compatible S-R instructions condition)=.11.

  • ERPs in lie detection. Particularly the P300 ERP component has been related in literature using Guilty Knowledge Tests to conscious recognition of crime-related targets as meaningful and salient stimuli, based on crime-related episodic memories.

    • Status: mixed
    • Original paper: ‘Late Vertex Positivity in Event-Related Potentials as a Guilty Knowledge Indicator: A New Method of Lie Detection’, Rosenfeld et al. 1987; experimental design, n1=10, n2=6. [citation=126(GS, May 2022)]​.
    • Critiques: Abootalebi et al. 2006 [n=62, citations=159(GS, May 2022)]. Bergström et al. 2013 [n1=24, n2=24; citations=61(GS, May 2022)]. Mertens & Allen 2008 [n=79, citations=187(GS, May 2022)]. Rosenfeld et al. 2004 [n-ex1=33; n-ex2.1=12, n-ex2.2=10, citations=419(GS, May 2022)]. Wang et al. 2016 [n=28, citations=61(GS, May 2022)].
    • Original effect size: N/A.
    • Replication effect size: Abootalebi et al.: not reported but could be calculated. Bergström et al.: d=2.89 (effort in uncooperative recall suppression); d=2.28 (success in uncooperative recall suppression); partial _η2 = _0.20 (experiment 1 - voluntary modulations of P300); partial η2 = 0.31 (experiment 2 - voluntary modulations of P300); d = 0.48 and d = 0.31 (experiment 1 - cooperative phase); d =0.03 ( experiment 1 - uncooperative phase); d = 0.14 (experiment 1 - innocent phase); d = 0.77 (experiment 1: targets vs. probes - innocent phase); d = 0.71 (experiment 1: targets vs. probes - uncooperative phase); d = 1.03 and d = 0.48 (experiment 2 - cooperative phase); d = 0.48 and d = 0.99 (experiment 2 - uncooperative phase); d = 1.81 (experiment 2 - innocent phase); d = 0.50 (experiment 1: cooperative vs. uncooperative); d = 0.52 (experiment 2: cooperative vs. uncooperative); d = 0.07 ( experiment 1: uncooperative vs. innocent); d = 0.57 (experiment 2: uncooperative vs. innocent); d < 0.17 (targets vs. irrelevants for experiment 1 and 2). Mertens and Allen: not reported but could be calculated. Rosenfeld et al.: not reported but could be calculated. Wang et al.: not reported but could be calculated.

  • Evaluative conditioning. Implicit and explicit attitudes are differently sensitive to different kinds of information. Explicit attitude are formed and changed in response to the valence of consciously accessible, verbally presented behavioural information and implicit attitudes are formed and changed in response to the valence of subliminally presented primes.​

    • Status: mixed
    • Original paper: ‘Of Two Minds: Forming and Changing Valence-Inconsistent Implicit and Explicit Attitudes’, Rydell et al. 2006; mixed design experiment with n=50. [citation=403(GS, November 2022)]​.
    • Critiques: Heycke et al. 2018 [n1=51, n2=57, citations=32(GS, November 2022)].
    • Original effect size: Explicit attitudes: two-way interaction between condition and time η2 = 0.71 [reported] / d= 1.54 [converted using this conversion]​; Implicit attitudes: two-way interaction between condition and time η2 = 0.13 [reported] d= 0.38 [converted using this conversion].
    • Replication effect size: Heycke et al.: Explicit attitudes: time of measurement X valence condition – Experiment 1: η2 = 0.757 [reported] d= 1.75 [converted using this conversion] (replicated); Experiment 2: η2 = 0.828 [reported] d= 2.17 [converted using this conversion] (replicated); Implicit attitudes: 2-way interaction of time of measurement and condition Experiment 1: η2 = 0.075 [reported] d= 0.28 [converted using this conversion] (reversed); Experiment 2: η2 = 0.102 [reported] d= 33 [converted using this conversion] (reversed).

  • Bilingual deficit in lexical retrieval. Compared to monolinguals, bilinguals have often been found to be slower or less accurate in accessing the meaning of a certain word or the word for a certain representation under certain conditions. ​

    • Status: mixed
    • Original paper: ‘Memory in a monolingual mode: When are bilinguals at a disadvantage?’, Ransdell and Fischler, 1987; between-group multi-experiment study, with monolingual and bilingual young adults, n1 = 28, n2 = 28. [citations=216(GS, May 2022)]​.
    • Critiques: Bialystok et al. 2007 [study 1: n1=24, n2 = 24; study 2: n1 = 50, n2 = 16, citations=338(GS, May 2022)]. Gollan et al. 2002 [n1=30, n2=30, citations=584(GS, May 2022)]. Gollan et al. 2005 [study 1: n1=31, n2=31; study 2: n1=36, n2=36, citations=665(GS, May 2022)]. Rosselli et al. 2000 [n1=45, n2=18, n3=19, citations=341(GS, May 2022)]. Rosselli et al. 2002 [n= 45, n2=18, n3=19, citations=151(GS, May 2022)].
    • Original effect size: not reported but could be calculated.
    • Replication effect size: Bialystok et al.: not reported but could be calculated. Rosselli et al.: not reported but could be calculated. Rosselli et al.: not reported but could be calculated. Gollan et al.: not reported but could be calculated. ​Gollan et al.: not reported but could be calculated.

  • Nostalgia as a positive emotional experience. A predominantly positive, albeit bittersweet emotion that arises from personally relevant and longful memories of one’s past. Nostalgia was once considered a disease or mental illness, but it has been shown to counteract loneliness, boredom and anxiety.

    • Status: replicated.
    • Original paper: ‘Nostalgia: A Psychological Perspective’, Batcho 1995; Cross-sectional survey to assess nostalgia for 20 aspects of experience, n=648. [citations=399(GS, February 2023)]​.
    • Critiques: Wildschut et al. 2006 [Total N=504 over seven studies, citations=1460(GS, February 2023)].
    • Original effect size: Factor analysis suggested that nostalgia is composed of five factors reflecting different spheres and levels of experience. ES not reported, although the regression coefficient for nostalgia on judgement of the past, however, was positive (0.22, p < .0001), suggesting that nostalgia increases as the past is perceived more favourably.
    • Replication effect size: Wildschut et al.: Nostalgic autobiographical narratives were richer in expressions of positive than negative affect, ηp2 =0.783 [calculated from the reported F statistic, F(1, 41) = 147.62, using this conversion]; Participants expressed significantly more positive than negative affect, when describing how writing nostalgic narrative made them feel, ηp2 =0.535 [calculated from the reported F statistic, F(1, 171) = 196.56, using this conversion] and reported more positive than negative affect on PANAS measure, ηp2 =0.633 [calculated from the reported F statistic, F(1, 171) = 294.61, using this conversion]; Relative to participants in the control condition, those in the nostalgia condition scored higher on measures of social bonding, ηp2 = 0.205 [calculated from the reported F statistic, F(1, 50) = 12.88, using this conversion], positive self-regard,ηp2 = 0.238 [calculated from the reported F statistic, F(1, 50) = 15.63, using this conversion], and positive affect, ηp2 = 0.139 [calculated from the reported F statistic, F(1, 50) = 8.05, using this conversion].

  • Spacing effect. Long-term memory is enhanced when learning events are spaced apart in time rather than massed in immediate succession.​

    • Status: replicated
    • Original paper: ‘Memory: A contribution to experimental psychology, Ebbinghaus 1964; series of single-case studies, n=1. [citations=6103 (GS, September, 2022)].
    • Critiques: Cepeda et al. 2006, meta-analysis [n= 184 articles, citations=1894 (GS, September 2022)]. Janiszewski et al. 2003, meta-analysis [n= 97 verbal learning studies, citations= 373 (GS, September 2022)].
    • Original effect size: N/A.
    • Replication effect size: Cepeda et al.: Cohen’s d for the difference in the accuracy between massed and spaced learning trials in verbal recall tasks= 0.567 (calculated). Janiszewski et al.: ηp2= 0.093 (calculated from the reported F(1, 478)=49.23,p<.01 using this conversion) for a linear relationship between the number of lags between learning events and the accuracy of recall; ηp2= 0.051 for the log relationship (calculated fomr the reported F(1, 478)=25.69, p<.01 using this conversion).

  • False memories - eyewitness testimony. A phenomenon of recalling a real event that differs from what actually happened or an event that never occurred.

  • Context-dependent memories. The improved recall or recognition of information when cues in the environment are the same during both encoding and retrieval.

    • Status: mixed (replicated, but smaller effect-size).
    • Original paper: ‘Context-dependent memory in two natural environments: On land and underwater’, Godden and Baddeley 1975; experimental design, Experiment 1 n = 18, Experiment 2 n = 16. [citation = 2,447 (GS, October 2022)]​.
    • Critiques: Godden and Baddeley 1980 [n_ _= 16, citations = 449 (GS, October, 2022)].Isarida et al. 2012 [Experiment 1 n = 80, citations = 24 (GS, October 2022)]. Martin and Aggleton 1993 [n = 40, citations = 42 (GS, October 2022)]. Murre 2021 [n = 16, citations = 3 (GS, October 2022)]. Smith and Vila 2001 [meta-analysis; _k_ = 93 studies, citations = 1,046 (GS, October 2022)].
    • Original effect size: Not reported in the paper, but can be estimated from the test-statistics. The original effect size was dz = 1.35. This is calculated using the test-statistics provided: F(1, 12) = 22.0, p < .001.
    • Replication effect size: Godden and Baddeley: NA. This study found no significant difference in recognition performance across contexts (not replicated). Isarida et al. : ηp2 = .05 (replicated). Martin and Aggleton: d = 0.69 (estimated from test-statistics) (replicated). Murre: d= 0.37 (estimated from test-statistics). Result non-significant (p > .050). However, effect size is similar to meta-analyses (mixed). Smith and Vila: d =0.28 [0.23, 0.33] overall; Recall: d = 0.29 [0.21, 0.37], Recognition: d = 0.27 [0.18, 0.36] (replicated).

  • Motor priming. Motor priming refers to the phenomenon where a previous motor action influences the subsequent execution of a motor task. Scientific findings have shown that motor priming can have a moderate to large effect on task performance. It’s also important to note that the effect size of motor priming can depend on the specific task being used, the population being studied, and the experimental design.

    • Status: mixed
    • Original paper: ‘A priming method for investigating the selection of motor responses’, Rosenbaum and Kornblum 1982; experimental design, _ _n=6. [citations= 227 (GS, March 2023].
    • Critiques: da Silva et al. 2020 [n=814 (36 articles, meta-analysis), citations = 10 (PubMed, January 2023)].​ Kiesel et al. 2007 [Theoretical paper, n=NA, citations=138 (GS, March 2023]. Stoykov and Madhavan 2015 [Review, n=NA, citations=148 (GS, March 2023).
    • Original effect size: not reported.
    • Replication effect size: da Silva: Mean Difference = 8.64 [10.85, 16.43], Z = 2.17, p = .003, d=0.30 (estimated from the Z value using d=Z/sqrt(n) equation). Kiesel et al.: not reported. Stoykov and Madhavan: not reported.​

  • Flanker task. The Flanker task is a measure of inhibition of prepotent responses. Response times to target stimuli flanked by irrelevant stimuli of the opposite response set (incongruent) are significantly more impaired than when they are flanked by irrelevant stimuli of the same response set (congruent).

    • Status: replicated
    • Original paper: ‘Effects of noise letters upon the identification of a target letter in a non-search task’, Eriksen and Eriksen 1974; within-subject design, n=6. [citations= 8085 (GS, August 2022)]​.
    • Critiques: Miller 1991 [Experiment 1: n= 36, Experiment 2: n=42, Experiment 3: n= 24, Experiment 4: n= 32, Experiment 5: n=32, Experiment 6, n=32, citations= 370 (GS, August 2022)].
    • Original effect size: Spacing condition: ES = 2.96, Noise condition: ES= 2.09.
    • Replication effect size: Miller: For only noise condition (i.e response compatible/incompatible) Reaction times: ES: Experiment 1= 0.89, Experiment 2: 0.23, Experiment 3: 0.74, Experiment 4: 0.58, Experiment 5: 1.28, Experiment 6: 0.47; Percent Accurate: ES: Experiment 1=0.39, Experiment 2= 0.23, Experiment 4= 0.72, Experiment 5= 0.83, Experiment 6= 0.35; For Spacing condition: Experiment 1, wide separation, ES = 0.40.

  • Mere Exposure Effect. Participants who are repeatedly exposed to the same stimuli rate them more positively than stimuli that have not been presented before.

    • Status: replicated
    • Original paper: ‘Attitudinal effects of mere exposure. Journal of Personality and Social Psychology’, Zajonc, 1968; correlational and experimental evidence, n=NA. [citation=9458(GS, February 2022)]​.
    • Critiques: Bornstein 1989 [Meta-analysis, total N = 33047, citation=2944(GS, February 2022)].
    • Original effect size: Experiment 1, Nonsense words, ηp2 = 0.078 [ηp2 calculated from reported F(5,355) = 5.64, p < .001 using this conversion] ; Experiment 2, Chinese characters ηp2 = 0.066 [ηp2 calculated from reported F(5, 335) = 4.72, p < .001 using this conversion]; Experiment 3, Photographs ηp2 = 0.129 [ηp2 calculated from reported F(5, 355) = 9.96, p < .001 using this conversion].
    • Replication effect size: Combined effect size r = .260.

  • Cocktail Party Effect. Participants hear their own name being presented in the irrelevant message during a dichotic listening task.

  • Mental simulation - mismatch advantage: object colour. Readers verify pictures more quickly when they match rather than mismatch the object colour from the preceding sentence.

  • Mental simulation - match advantage object orientation. Readers verify pictures more quickly when they match rather than mismatch the object orientation from the preceding sentence.

    • Status: replicated
    • Original paper: ‘The effect of implied orientation derived from verbal context on picture recognition’, Stanfield and Zwaan 2001; experimental design, n=40. [citation=897(GS, November 2022)]​.
    • Critiques: de Koning et al. 2017 [n = 160, citation = 14(GS, November, 2022)]. Rommers et al. 2013 [Experiment 1: n = 52, Experiment 2: n = 44, Experiment 3: n = 88, citation = 48(GS, November 2022)]. Zwaan and Pecher 2012 [Experiment 1a: n=176; Experiment 1b: n=176, citations=192 (GS, November 2022)].​
    • Original effect size: d = .13.
    • Replication effect size: de Koning et al.: d = .07. Rommers et al.: Experiment 1: d = .14, Experiment 2: d = .12, Experiment 3: d = .14 [calculated using this conversion]. Zwaan and Pecher: Experiment 1a: d = .10; Experiment 1b: d = .09.​

  • Mental simulation - match advantage object distance. Readers verify small pictures more quickly when they are far from the protagonist, in contrast to big pictures, while big pictures are verified more quickly when closer to the protagonist, as opposed to smaller pictures.

  • Mental simulation - match advantage object number. Verification response was faster for concept-object match when there was numerical congruence (compared with incongruence) between the number word and quantity.

    • Status: mixed
    • Original paper: ‘The conceptual representation of number’, Patson et al. 2014; experimental design, n = 63. [citation = 29(GS, November 2022)]​.
    • Critiques: Beg et al. 2021 [Experiment 1: n = 63, Experiment 2: n = 68, Experiment 3: n =42, citation = 5(GS, November 2022)]​. Patson et al. 2016 [Experiment 1: n = 63, Experiment 2: n = 63, citation = 11(GS, November 2022)]​. Patson 2021 [Experiment 1: n = 62, Experiment 2: n = 83, citation = 1(GS, November 2022)]. Šetić and Domijan 2017 [Experiment 1: n = 48, Experiment 2: n = 33, citation = 10(GS, November 2022)].
    • Original effect size: _ηp2 _= 0.11.
    • Replication effect size: Beg et al.: Experiment 1: ηp2= 0.11, experiment 2: ηp2= not reported, Experiment 3: ηp2=0.05. Patson et al.: Experiment 1: ηp2=0.03, Experiment 2: ηp2=0.06. Patson: Experiment 1: ηp2=0.08, Experiment 2: ηp2= 0.12. Šetić and Domijan: Experiment 1: ηp2=0.13, Experiment 2: ηp2= 0.21.

  • Mental simulation - match advantage object shape. Readers verify pictures more quickly when they match rather than mismatch the object shape from the preceding sentence.

  • Mental simulation - match advantage object size. Readers verify small imagined pictures more quickly when they are small real pictures, in contrast to big real pictures, while big imagined pictures are verified more quickly when they are big real pictures, as opposed to big imagined pictures.​

  • Mental simulation - bigger is better effect. Items that are big in real size are processed more quickly than items that are small in real size. ​

  • Transposed word effect. Responses to transposed word sequences (e.g. “you that read wrong”) are more error-prone and judged as ungrammatical compared with a control sequence (e.g. “you that read worry”).

  • Personality > intelligence predicting life outcomes. Personality is generally more predictive than IQ on a variety of important life outcomes, such as educational attainment and wage.

    • Status: not replicated
    • Original paper: ‘What grades and achievement tests measure’, Borghans et al. 2016; correlational study, n=23,023 over four large-scale survey datasets. [citations=265(GS, January 2023)]​.
    • Critiques: Zisman and Ganzach 2022 [n=26,600 over six large-scale datasets, citations=5(GS, January 2023)].​
    • Original effect size: Personality more predictive of education, R2 = 0.143, grades, R2 = 0.028 to R2 = 0.093, and wage, R2 = 0.021 to R2 = 0.053, then intelligence (education – R2 = 0.108, grades – R2 = 0.009 to R2 = 0.216, wage, R2 = 0.024 to R2 = 0.18.
    • Replication effect size: Zisman & Ganzach: Intelligence more predictive of educational attainment, R2 = 0.120 to R2 = 0.328 (average R2 = 0.232), grades, R2 = 0.175 to R2 = 0.268 (average R2 = 0.229), and pay, R2 = 0.031 to R2 = 0.148 (average R2 = 0.080), then personality (educational attainment – R2 = 0.029 to R2 = 0.079, average R2 = 0.053; grades – R2 = 0.011 to R2 = 0.041, average R2 = 0.024; pay, R2 = 0.021 to R2 = 0.079, average R2 = 0.040) (not replicated).

  • Error salience (epistemic contextualism effects). Judgments about “knowledge” are sensitive to the salience of error possibilities. This is explained by the fact that salience shifts the evidential standard required to truthfully say someone “knows” something when those possibilities are made salient.​

    • Status: mixed.
    • Original paper: ‘Knowledge Ascriptions and the Psychological Consequences of thinking about Error’, Nagel 2010; theoretical paper, n=NA. [citations=133(GS, May 2023)]​.
    • Critiques: Feltz & Zarpentine 2010 [n1=152, citations=128(GS, May 2023)]. Hansen & Chemla 2013 [n1=40, citations=66(GS, May 2023)].​ Alexander et al. 2014 [n1=40, n2=187, n3=93 (not relevant here), n4=126, citations=34(GS, May 2023)]. Buckwalter 2017 [review paper, n=NA, citations=20(GS, May 2023)]. Buckwalter 2021 [n1=99, n2=201, n3=203, citations=5(GS, May 2023)].
    • Original effect size: NA​.
    • Replication effect size: Feltz & Zarpentine: Experiment 1 - low versus high
    • practical consequences and error salience d=0.29 (n.s., calculated from the reported t(71)=1.213, p=0.23 using this conversion). Hansen & Chemla: Positive polarity sentences η2 = 0.39 to η2 = 0.59; Negative polarity sentences η2 = 0.11 to η2 = 0.50 (calculated from the reported F statistics in Figure 5 using this conversion). Alexander et al.: Study 1 – d = 1.36; Study 2 – η2 = 0.12 (calculated from the reported F (4, 209) = 7.28, p < .000 using this conversion); Study 4 – d= 1.26. Buckwalter: ES=NA, mixed evidence reported. Buckwalter: Experiment 1 – truth statements d=0.75, belief statements d=0.85, evidence statements d=1.42, actionability statements d=-1.05, knows statements d=1.47; Experiment 2 – truth statements d=0.62, belief statements d=0.06, evidence statements d=0.58, actionability statements d=-0.25, knows statements d=0.43; Experiment 3 – truth statements d=-0.09, belief statements d=-0.39, evidence statements d=0.38, actionability statements d=-0.71, knows statements d=0.41.

  • Gettier intuition effect. Participants attributed knowledge in Gettier-type cases (where an individual is justified in believing something to be true but their belief was only correct due to luck) at rates similar to cases of justified true belief.

    • Status: mixed.
    • Original paper: ‘The folk conception of knowledge’ Starmans and Friedman 2012; between-subject experiments, n1a=144, n1b=133, n1c=46, n2=51, n3=43. [citations=183(GS, March 2023)].
    • Critiques: Turri et al. 2015 [n1=135, n2=141, n3=576, n4= 813, citations = 97 (GS, March 2023)]. Hall et al. 2018 (pre-print) [n=4724, Citations=4 (GS, March 2023)].
    • Original effect size: Experiment 1a: knowledge attribution exceeded chance in both the Gettier and Control conditions; in the False Belief condition knowledge was attributed less than in the Gettier condition and at rates less than would be expected by chance (ηp2 =0.47, calculated from the reported F(2,141) = 63.65, p < .001 using this conversion); Experiment 1b: participants attributing knowledge equally in the Control and in the Gettier condition, but less in the False Belief condition than in the Gettier condition (ηp2 =0.34, calculated from the reported F(2,91) = 23.75, p < .001 using this conversion); Experiment 1c: laypeople consider Gettier cases to be instances of knowledge (d = 1.75, calculated from the reported t(45) = 5.93, p < .001 using this conversion); Experiment 2: participants attributed knowledge in High justification condition, but not in the Low justification condition (ηp2 =0.19, calculated from the reported F(1,49) = 11.75, p = .001 using this conversion); Experiment 3: participants readily attributed knowledge when the Gettiered individual formed a belief based on authentic evidence as compared to apparent evidence (ηp2 =0.29, calculated from the reported F(1,42) = 17.51, p < .001 using this conversion).
    • Replication effect size: Turri et al.: knowledge attributions are surprisingly insensitive to lucky events that threaten, but ultimately fail to change the explanation for why a belief is true, Experiment 1 - Cramér’s V = .509, Experiment 2: Cramér’s V =.534, Experiment 3: Cramér’s V = .406, Experiment 4: Cramér’s V = .546 (all replicated). Hall et al.: participants were more likely to attribute knowledge in standard cases of justified true belief than in Gettier cases, Pseudo- R2 = 0.12 - 0.15.

  • Left-cradling bias (Child cradling lateralization). Humans preferentially hold their child on the left body side. This is hypothesised to be modulated by handedness as the dominant hand is preferably free for mundane tasks.

    • Status: replicated
    • Original paper: ‘Handedness as a major determinant of functional cradling bias’, van de Meer and Husby 2007; laboratory study in which left- and right-handers were asked to cradle a baby doll, side of holding was recorded in the studies, n=765. [citations = 67(GS, June 2022)]​.
    • Critiques: Packheiser et al. 2019 [meta-analysis, n=6799, citations = 27(GS, June 2022)]. ​
    • Original effect size: d = 1.06.
    • Replication effect size: Packheiser et al.: d = 0.34.

  • Handedness differences - schizophrenia. Non-right-handedness is more prevalent in individuals with schizophrenia compared to the healthy population.

  • Handedness differences - depression. Being left-handed is associated with a higher likelihood of being depressed.

    • Status: mixed
    • Original paper: ‘Cerebral laterality and depression: Differences in perceptual asymmetry among diagnostic subtypes’, Bruder et al. 1989; analysis of different patterns of brain lateralization between depressed individuals and controls, n = 70. [citations=202 (GS, January 2023)].
    • Critiques: Denny 2009 [n= 27,482, citations = 49 (Tandfonline, June 2022)]. Elias et al. 2001 [n=541, citations = 37 (ScienceDirect, June 2022)]. Packheiser et al. 2021 [meta-analysis, k=87, n = 35501, citations = 1 (ScienceDirect, June 2022)].
    • Original effect size: d= 0.57.
    • Replication effect size: Elias et al.: No main effect but a significant interaction with sex; left-handed men show higher depression scores (no effect size). Denny: being left-handed is associated with a higher level of depressive symptoms, no significant interaction with sex (no effect size). Packheiser et al.: No link between handedness and depression (OR = 1.04 [0.95 - 1.15]).

  • Handedness differences - stuttering. The rate of stuttering was much higher in left-handers than in right-handers.

  • Handedness differences - dyslexia. The rate of learning disabilities was much higher in left-handers than in right-handers. ​

  • Handedness differences - intelligence. Left-handedness is associated with lower scores in fluid intelligence.

    • Status: replicated
    • Original paper: ’Handedness and Intelligence’, Hicks and Beveridge, 1978; correlational survey, n = 67. [citations = 29 (Science Direct, June 2022)].
    • Critiques: Ntolka and Papadatou-Pastou 2017 [systematic review of 36 studies, n = 65,519, citations = 20 (Science Direct, June 2022)]. Papadatou-Pastoua & Tomprou 2015 [meta-analysis, n = 16,076, citations = 39 (Science Direct, June 2022)]. Somers et al. 2015 [meta-analysis, k=30, n = 359,890, citations = 63 (SD, June 2022)].
    • Original effect size: N/A.
    • Replication effect size: Ntolka & Papadatou-Pastou: for a subset of n = 19,744 statistically significant but marginal differences in IQ were found between the right-handed and the left-handed (d = -.07) and between the right-handed and the not-right-handed (d = -.06) each time in favour of the right-handed. Papadatou-Pastoua & Tomprou: d = -.09 (for a subset of n = 195). No effect size could be calculated for the rest of the studies in this meta-analysis. Overall, there were higher levels of non-right-handedness among the intellectually impaired, but the level was not different between typically developed individuals and gifted individuals. Somers et al.: No significant differences in overall verbal ability: Hedges’ g = −0.03; spatial ability was significantly higher for right-handed individuals: Hedges’ g = −0.14.

  • Handedness differences - cognitive ability. Difference in spatial ability between left and right handers. Left handers have a supposed deficit in spatial ability.

    • Status: mixed
    • Original paper: ‘Possible Basis for the Evolution of Lateral Specialization of the Human Brain’, Levy 1969; comparison between spatial IQ (WAIS) in left and right handers in graduate students, n=25. [citations=857, GS, January 2023)].
    • Critiques: Briggs et al. 1976 [n = 34, citations = 114 (GS, January 2023)]. Inglis and Lawson 1984 [n=1880, citations=37 (GS, January 2023)]. Somers et al. 2015 [meta-analysis, n = 218,351, citations=97, (GS, April 2023)].
    • Original effect size: d = 1.42.
    • Replication effect size: Briggs et al.: no difference between left and right handers in spatial ability. Inglis and Lawson: no difference between left and right handers in spatial ability. Somers et al.: g = 0.14 (effect was significant, but did not survive sensitivity analyses).

  • Handedness differences - sexual orientation. Intrauterine testosterone levels may determine both handedness and sexuality, with homosexuals having an increased rate of left-handedness.

    • Status: mixed
    • Original paper: ‘Cerebral Lateralization Biological Mechanisms, Associations, and Pathology: II. A Hypothesis and a Program for Research’, Greschwind and Galaburda 1985; theory paper meaning no sample size present. [citations = 780 (GS, October 2022)].
    • Critiques: Becker et al. 1992 [n = 1,612, citations = 43 (GS, October 2022)]. Lalumière et al. 2001 [n = 23,410 (meta-analysis), citations = 301 (GS, October 2022)]. Lindesay 1987 [n = 194, citations = 101 (GS, October 2022)].Lippa and Blanchard 2007 [n = 159,779, citations = 159 (GS, October 2022)]. Marchant-Haycox et al. 1991 [n = 774, citations = 53 (GS, October 2022)]. Rosenstein and Bigler 1987 [n = 89, citations = 31 (GS, October 2022)]. Satz et al. 1991 [n = 993, citations = 62 (GS, October 2022)]. Tran et al. 2019 [n = 3,870, citations = 7 (GS, October 2022)].
    • Original effect size: NA (based on anecdotal correspondence between Greschwind and Galaburda and the homosexual community).
    • Replication effect size: Lindesay: Significantly more homosexual men were left-handed than heterosexual men (χ2(1) = 6.2, p = .013) (replicated). Rosenstein and Bigler: r = .06 (not replicated). Marchant-Haycox et al.: No ES available, but non-significant relationship found between handedness and homosexuality (χ2(1) = 2.6, p = .107) (not replicated). Satz et al.: No ES available, but non-significant effect found between handedness and sexuality (not replicated). Becker et al.: _φ _= .08 to .11 (replicated). Lalumière et al.: OR = 1.39 (replicated). Lippa and Blanchard: φ(Males) = .02, φ(Females) = .05. Tran et al.: OR(Men) = 0.98 (p > .050), OR(Women) = 1.96 (_p _< .010). Homosexual women found to be more likely to be “mixed handed” (ambidextrous) (not replicated).

  • Handedness differences - twins. Handedness differences between twins and singletons. Twins have been suggested to show increased rates of left handedness compared to singletons.​

    • Status: mixed
    • Original paper: ‘Handedness in Twins: a Meta-analysis’, Sicotte et al. 1999; meta-analysis on rates of atypical handedness, n = 85.371. [citations = 137 (GS, January, 2023)].
    • Critiques: Zheng et al. 2020 [n=631, citations=8 (GS, January 2023)]. Medland et al. 2003 [n = 9176, citations=50 (GS, January 2023)]. De Kovel et al. 2019 [UK Biobank study with n ~500,000, citations=107 (GS, April 2023)]. Pfeifer et al. 2022 [meta-analysis, n = 189,422, citations=10 (GS, April 2023)].
    • Original effect size: OR = 1.43 [1.23 - 1.66]​.
    • Replication effect size: Zheng et al.: No difference between singletons and twins. Medland et al.: No difference between singleton and twins. De Kovel et al.: OR = 1.20. Pfeifer et al.: OR = 1.40 [1.26 - 1.57] (replicated).

  • Handedness differences - sex. Handedness differences between men and women. Men have been suggested to show increased rates of left-handedness compared to women.

    • Status: mixed
    • Original paper: ‘Measuring handedness with questionnaires’, Bryden 1977; questionnaire study to assess handedness using factor analysis, n=1106. [citations=963 (GS, January 2023)].
    • Critiques: Cornell and& McManus 1992 [n = 266, citations = 11 (GS, January 2023)]. Green and Young 2001 [n=284, citations = 93 (GS, January 2023)]. Holtzen 1994 [n = 260, citations = 45 (GS, January 2023)]. Papadatou-Pastou et al. 2008 [meta-analysis, k = 144 studies, totaling N = 1,787,629 participants, citations = 323(GS, January 2023)].
    • Original effect size: OR = 1.38​.
    • Replication effect size: Green and Young: similar rates of handedness between men and women. Holtzen: similar rates of handedness between men and women. Cornell and McManus: similar rates of handedness between men and women. Papadatou-Pastou et al.: OR = 1.23 [1.19 - 1.27] (replicated).

  • Overlooking of subtractive change. People systematically default to searching for additive transformations, and consequently overlook subtractive transformations. A tendency to generate and/or select additive ideas over subtractive ones.

    • Status: replicated
    • Original paper:‘People systematically overlook subtractive changes’, Adams et al. 2021; between subject design, N = 2261 across 8 studies. [citation = 75 (GS, October 2022)].
    • Critiques: Fillon et al. 2022 [n=477, citations = 0 (GS, October 2022)].
    • Original effect size: ² between 9.71 and 13.63.
    • Replication effect size: Fillon et al.: ² between 0.11 and 13.8, 5 out of the 6 effects are statistically significant.

  • Heterogeneity reduces perceived quantity. Sets of multiple colourful or different objects (e.g., stars, squares, triangles) seem less with respect to their quantity than the same sets that consist of only one type of object (e.g., only red triangles).

    • Status: not replicated
    • Original paper: ‘The presence of variety reduces perceived quantity’, Redden and Hoch, 2009; within-subjects design, Study 1: n = 80, Study 2: n = 57, Study 3: n = 105, Study 4: n = 64. [citations=90(GS, October 2022)]​.
    • Critiques: Röseler et al. , in press [Study 1: n = 104, Study 2: n = 199, Study 3: n = 144, Study 4: n = 82, Study 5: n = 45, Study 6: n = 84, citations=2(GS, October 2022)].
    • Original effect size: d = 0.394 to d = 2.377.
    • Replication effect size: Röseler et al.: d = -0.302 to d = 0.108​.

  • Eye movements and false memories. Lateral eye movements increase false memory rates.

  • Gaze-liking effect. People are more likely to rate objects as more likeable when they have seen a person repeatedly gaze toward, as opposed to away from the object.

  • Phonological working memory impairment in dyslexic adults. Dyslexic individuals show lower scores on phonological working memory, using a nonword repetition task. ​

    • Status: replicated
    • Original paper: ‘Pseudoword repetition ability in learning-disabled children’, Taylor et al. 1989; experiment, neurotypical: n1 = 20, dyslexic: n2 = 24. [citations=142(GS, March 2023)]​.
    • Critiques: Elsherif et al. 2021 [AWS: n = 30, NT: n = 84, AWD: n = 50, citations=10(GS, March 2023)].
    • Original effect size: d = 0.97 [d calculated from reported t statistic and converted using this conversion].
    • Replication effect size: Elsherif et al.: d = 1.54. ​

  • Phonological monitoring impairment in dyslexic adults, dyslexic show lower scores on phonological monitoring than neurotypical adults.

  • Phonological awareness impairment in dyslexic adults. Dyslexic show lower scores on phonological awareness than neurotypical adults.

  • Phonemic fluency impairment in dyslexic adults. Dyslexic adults show lower scores on phonemic fluency tasks than neurotypical adults. Phonemic fluency tasks are a type of verbal fluency task, where people are asked to generate as many words as possible according to a specific criterion relating to phonemes, for instance words starting with the letter ‘M’.

  • Semantic fluency impairment in dyslexic adults. Dyslexic adults show lower scores on semantic fluency than neurotypical adults. Semantic fluency tasks are a type of verbal fluency task, where people are asked to generate as many words as possible according to a specific criterion, for instance items that are part of the same category, such as foods.

  • Lexical precision lexical competition. The direction and magnitude of inhibitory priming in word targets with dense neighbourhoods is moderated by spelling.

  • Placebo Effect. Refers to the phenomenon in which a treatment or intervention that has no specific therapeutic effect (such as a sugar pill or saline injection) can still produce a therapeutic response in some individuals. The concept of the placebo effect can be traced back to the 18th century, when physicians and researchers began to notice that patients often reported improvements in their symptoms after receiving treatments that did not have any known physiological effects.

    • Status: replicated
    • Original paper: ‘The powerful placebo’, Beecher 1955; analysis of 15 studies on patients receiving either a placebo or an active treatment for various conditions such as pain, nausea, and anxiety. [citations= 2885 (GS, January 2023)]​.
    • Critiques: Hróbjartsson 2001 [a systematic review of 130 clinical trials, n = 4730, citations=2011 (GS, January 2023)]. Tang 2022 [meta-analysis of studies on pain, discomfort, sleep difficulty, and anxiety, k= 15, n= 1506, citations = 3 (GS, January, 2023)]. Yeung 2018 [meta-analysis on placebo on insomnia symptoms, k= 15, n= 566, citations= 56 (GS, January, 2023)].
    • Original effect size: NA.
    • Replication effect size: Hróbjartsson: As compared with no treatment, placebo had no significant effect on binary outcomes (pooled relative risk of an unwanted outcome with placebo, 0.95 [0.88 to 1.02]; for the trials with continuous outcomes, placebo had a beneficial effect (pooled standardised mean difference in the value for an unwanted outcome between the placebo and untreated groups: MD= -0.28[-0.38, -0.19]; trials involving the treatment of pain, placebo had a beneficial effect MD = -0.27 [-0.40, -0.15]. Tang: g = .298 (replicated). Yeung:placebo treatment led to improved perceived sleep onset latency (g = 0.272), total sleep time (g = 0.322), and global sleep quality (g = 0.581).

  • Placebo empathy analgesia. Downregulating first-hand pain perception via placebo analgesia (administration of an inert treatment such as a sugar pill)​ also dampens empathy for another person in pain.

  • Nocebo effect. This phenomenon is said to occur when negative expectations of an individual about an experience (e.g. a medical treatment) cause the experience to have a more negative effect than it would have otherwise.

    • Status: replicated
    • Original paper: ‘The nocebo reaction’, Kennedy 1961; editorial, n=NA. [citations=325(GS, January 2023)].
    • Critiques: Petersen et al. 2014 [meta-analysis, n=334, citations=206(GS, January 2023)]. Horváth et al. 2021 [meta-analysis, n=1999, citations=3(GS, January 2023)].
    • Original effect size: NA.
    • Replication effect size: Petersen et al.: lowest d = 0.65 [0.24, 1.05], highest d = 1.07 [0.65, 1.48]. Horváth et al.: nocebo effects on motor performance, mean effect size = 0.60 [mean ES calculation method was not reported].

  • Stroop Effect. A phenomenon in which it takes longer to name the ink colour of a word when the word itself is a colour name that is different from the ink colour (e.g. the word “red” printed in blue ink). The Stroop effect is considered a classic demonstration of the interference between different types of information processing.

    • Status: replicated
    • Original paper: ‘Studies of interference in serial verbal reactions’, Stroop 1935; list of colour words (e.g. “red”, “blue”, “green”) that were printed in different ink colours, and asked them to name the ink colour as quickly as possible, n = 70. [citations = 24125 (PSYCNET, January 2023)].
    • Critiques: Damen 2021 [n=66, citations= 1 (GS, April 2023)]. Epp et al. 2012 [meta-analysis, k=47, citations= 235 (GS, April 2023)]. Homack and Riccio 2004 [meta-analysis, k=33, citations= 520 (gs, April 2023)]. MacLeod 1991[n = NA, citations = 7389(PsycNet, January 2023)].
    • MacKenna and Sharma 2004 [n=176, citations= 376 (PUBMED, January 2023)].
    • Original effect size: NA.
    • Replication effect size: Damen: ηp2 = 0.541 [0.369, 0.652]. Epp et al.: Emotional Stroop task in depression (replicated): on negative stimuli, g=.98, and on positive stimuli, g=.87. Homack and Riccio: individuals with ADHD fairly consistently exhibit poorer performance as compared to normal controls on the Stroop (mean weighted effect size of 0.50 or greater). MacKenna and Sharma: doubt on the fast and non-conscious nature of emotional Stroop.

  • Disfluency effect. Disfluency, the subjective experience of difficulty associated with cognitive operations, leads to deeper cognitive processing. If information is processed with difficulty or disfluently (e.g. when written in hard-to-read fonts), this experience serve as a cue that the task is difficult or that one’s intuitive (System 1) response is likely to be wrong, thereby activating more elaborate (System 2) processing, resulting in more positive cognitive outcomes.​

    • Status: not replicated
    • Original paper: ‘Overcoming intuition: Metacognitive difficulty activates analytic reasoning’, Alter et al. 2007; four between-subject experiments, Study 1 n=40, Study 2 n=42, Study 3 n=150, Study 4 n=41. [citations=1196(GS, January 2023)]​.
    • Critiques: Kühl and Eitel 2016 [n=1,079 across 13 studies, citations=64 (GS, January 2023)]. Meyer et al. 2015 [n=7,177 across 13 studies, citations=114(GS, January 2023)].​ Thompson et al. 2013 [n=579 across three studies (2c, 3a and 3b), citations=261 (GS, January 2023)]. (https://psycnet.apa.org/record/2015-13746-007)
    • Original effect size: Study 1 – participants answered more items on the Cognitive Reflection Test (CRT) correctly in the disfluent font condition than in the fluent font condition, η2 = 0.056 / _d _= 0.71 [reported in Meyer et al.]​.
    • Replication effect size: Kühl and Eitel: no disfluency effect on cognitive and metacognitive processes and outcomes in any of the thirteen studies reviewed; effect size estimates not reported (not replicated). Meyer et al.: the effect of disfluent font on cognitive reflection test scores in 13 studies from d= -0.25 to d= 0.12 (reported, all non-significant) [not replicated]. Pooled effect of the 17 studies (including Thompson et al. and original Alter et al. study) d = -0.01 (non-significant). Thompson et al.: the effects of disfluent font on cognitive reflection test scores in three studies from d= -0.19 to d= 0.25 (d’s reported in Meyer et al., all non-significant) [not replicated].

  • Retrieval-induced forgetting (RIF). Forgetting of some items is in part a consequence of remembering other items.

    • Status: mixed
    • Original paper: ‘Remembering can cause forgetting: Retrieval dynamics in long-term memory’, Anderson et al. 1994; tested retrieval-induced forgetting, three experiments, n = 148. [citations=2065 (GS, January 2023)].
    • Critiques: Jonker et al. 2013 [n=30 across two experiments, citations=175 (GS, December 2022)]. Rowland et al. 2014 [n=72 (experiment 1); n=140 (experiment 2); n=70 (experiment 3), citations=18 (GS, January 2023)].
    • Original effect size: NA.
    • Replication effect size: Jonker et al.: reported ηp2 - experiment 1: 0.25; Experiment 2a: 0.29; Experiment 2b=0.19; Experiment 3: Standard condition: 0.43, study reinstatement condition: 0.31. Rowland et al.: reported Cohen’s d. Experiment 1: 0.31; Experiment 2: 0.38.

  • Mood-dependent retrieval (mood-dependent memory, state dependent memory, encoding specificity). Memory is enhanced when an individual’s mood (i.e., emotional state) at retrieval matches their mood at encoding.

    • Status: mixed
    • Original paper: ‘Emotional mood as a context for learning and recall’, Bower et al. 1978; three between-subjects experiments, Exp. 1: n_ _= 10, Exp. 2: n_ _= 16, Exp. 3: n = 24 [citations = 741 (GS, January 2023)]​.
    • Critiques: Bower and Mayer 1985 [failed replication of Exp 3, n = 48, citations = 308 (GS, January 2023)]. Eich 1995 [review of 48 studies with mixed evidence, citations = 476 (GS, January 2023)].
    • Original effect size: NA.
    • Replication effect size: Bower and Mayer: not reported. Eich: not reported (theoretical review).

  • Perky effect. Mental imagery interferes with perception. If persons were asked to describe their images of common objects while dim facsimiles of the objects were presented before them, they reported only an “imagery,” not a “perceptual,” experience; imagery and stimuli are indistinguishable.​

    • Status: replicated.
    • Original paper: ‘An experimental study of imagination’, Perky 1910; experimental design, Experiment 1 n=3 children, Experiment 2 n=24, Experiment 3 n=5. [citations=933(GS, March 2023)]​.
    • Critiques: Craver-Lemley and Reeves 1987 [n=125, citations=109(GS, March 2023)]. Okada and Matsuoka 1992 [n=14, citations=26(GS, March 2023)].​ Reeves et al. 2020 [n=111, citations=4(GS, March 2023)].​ Segal and Fusella 1970 [n1=8, n2=6, citations=579(GS, March 2023)]. Segal and Gordon 1969 [n1=24, n2=24, citations=52(GS, March 2023)]. (https://psycnet.apa.org/doiLanding?doi=10.1037%2Fh0028840)
    • Original effect size: ES not reported but the data in all three experiments showed that respondents mistook the perceptual for the imaginative consciousness; they did not report a perception, but the image described resembled the unreported stimulus.
    • Replication effect size: Craver-Lemley and Reeves: Mean accuracy for reporting the offset of vertical line targets declined from 80% to 65% when subjects were requested to imagine vertical lines near fixation (replicated). Okada and Matsuoka: the Perky effect described in the auditory modality. The auditory imagery of a pure tone affected the detection only when the frequency of the imaged tone was the same as that of the detected tone (ηp2 =0.346, calculated from the reported F(4,52) = 6.90, p < .01 using this conversion) (replicated). Reeves et al.: Visual imagery interferes with acuity when performance is good but facilitates it when performance is poor. The mean Perky effect for the 47 subjects which scored over 80% in No Imagery condition was 21%; average correlation between Perky effects with baseline accuracy level across 111 subjects r = 0.63 (replicated). Segal and Fusella: Mental imagery was found to block detection of both visual and auditory signals; Experiment 1 - sensitivity (d') was lower during visual (1.70) and auditory imaging (2.13) than in either the preceding (1.93) or following discrimination tasks (1.72) (all _p_s <.001) (replicated); Experiment 2 - sensitivity (d') was lower during visual (1.48) and auditory imaging (1.68) than in either the preceding (2.64) or following discrimination tasks (2.84) (all _p_s <.001) (replicated). Segal and Gordon: Experiment 1: The significant differences in the perceptual sensitivity, d' measures, in the Perky condition (0.74) and in the informed task (2.03) (replicated); Experiment 2: greater sensitivity in the discrimination task (d'= 2.39), compared to the imaging procedures, Experimenter-projection (d'=1.54) and self-projection (d'=1.19) (replicated).

  • Positive emotions broaden scope of attention. People experiencing positive emotions exhibit broader scopes of attention than do people experiencing no particular emotion.

    • Status: mixed
    • Original paper: ‘Positive emotions broaden the scope of attention and thought‐action repertoires’, Fredrickson and Branigan, 2005; between-subjects design, n=104. [citations=5037 (GS, March 2023)].
    • Critiques: Bruyneel et al. 2013 [Exp 1: n=35, Exp 2: n=38, Exp 3: n=25, citations=83 (GS, March 2023)]. Huntsinger 2013 [review, citations=137 (GS, March 2023)]. Huntsinger et al. 2010 [Exp 1: n=62, Exp 2: n=72, citations=160 (GS, March 2023)].
    • Original effect size: d = 0.375 (calculated by using this calculator).
    • Replication effect size: Bruyneel et al.: Across three experiments, positive affect consistently failed to exert any impact on selective attention, Exp 1: ηp2 = 0.04, Exp 2: ηp2 = 0.001, Exp 3: ηp2 = 0.01 (null effects). Huntsinger: Rather than having fixed effects on the scope of attention, the impact of positive and negative affect is surprisingly flexible. Huntsinger et al.: Positive affect empowers whatever focus is momentarily dominant, Exp 1: d= 0.58, Exp 2: d= 0.71.

  • Emotional information facilitates response inhibition. Response inhibition refers to suppression of prepotent responses which are inappropriate to current task demands. In the lab setting, this is investigated with a stop signal task. The effect showed that both fearful and happy faces as stop signals facilitated response inhibition relative to neutral ones.

    • Status: not replicated
    • Original paper: ‘Interactions between cognition and emotion during response inhibition’, Pessoa et al. 2012; within-subjects design, n=36. [citations=245 (GS, March 2023)].
    • Critiques: Pandey and Gupta 2022 [n=54, citations=3 (GS, March 2023)]. Williams et al. 2020 [Study 1: n=40, Study 2: n=40, Study 3: n=42 (only younger adults sample), citations=12 (GS, March 2023)].
    • Original effect size: η2= 0.17, d= 0.44 (fearful vs neutral), d= 0.33 (happy vs neutral) (Calculated using this calculator).
    • Replication effect size: Pandey and Gupta: Angry faces as stop signal impaired response inhibition compared to happy faces, d = 0.35. Williams et al.: Fearful faces impaired response inhibition compared to happy faces, Study 1: d= 0.03 (fearful vs neutral), d= 0.04 (happy vs neutral), d = 0.08 (fearful vs happy), Study 2: d= 0.11 (fearful vs neutral), d= 0.04 (happy vs neutral), d= 0.15 (fearful vs happy), Study 3: d= 0.56 (fearful vs neutral), d= 0.04 (happy vs neutral), d= 0.58 (fearful vs happy).

  • Inhibition induced devaluation. Inhibition-induced devaluation refers to reduced response to stimuli which were previously inhibited. This effect results in participants bidding less for shapes that were paired with stop-signals, giving less trustworthiness rating for faces previously paired with stop signals. This effect has several implications for behaviour modification techniques.

    • Status: replicated
    • Original paper: ‘When approach motivation and behavioral inhibition collide: Behavior regulation through stimulus devaluation’, Veling et al. 2008; within-subjects design, Exp 1: n=33, Exp 2: n=47, Study 3: n=40. [citations=189 (GS, March 2023)].
    • Critiques: Chen et al. 2016 [Exp 1: n=45, Exp 2: n=48, Study 3: n=40, citations=122 (GS, March 2023)]. Wessel et al. 2014 [Exp 1: n=36, Exp 2: n=27, citations=64 (GS, March 2023)].
    • Original effect size: Exp 1: nogo vs go: d= -0.49, nogo vs new: d= 0.48, Exp 2: nogo vs go: d= -0.33, nogo vs new: d= -0.53.
    • Replication effect size: Wessel et al.: Exp 1: η2= 0.25, Exp 2: η2= 0.24. Chen et al.: Exp 1: nogo vs go: d= -0.39 [-0.71, -0.08], nogo vs untrained: d= -0.57 [-0.71, -0.08], Exp 2: nogo vs go: d= -0.91 [-1.31, -0.55], nogo vs untrained: d = -0.60 [-0.97, -0.27].

  • Inhibition induced forgetting. Inhibition-induced forgetting refers to impaired memory for the stimuli to which responses were inhibited.

    • Status: mixed
    • Original paper: ‘Inhibition-induced forgetting: when more control leads to less memory’, Chiu and Egner 2015; within-subjects design, Exp 1: n=54, Exp 2: n=54, Exp 3: n=53. [citations=77 (GS, March 2023)].
    • Critiques: Le and Cho 2020 [Exp 1: n=40, Exp 2: n=48, Exp 3: n=40, citations=1 (GS, March 2023)].
    • Original effect size: Exp 1: d= 0.28, Exp 2: d= 0.45, Exp 3: d= 0.3.
    • Replication effect size: Le and Cho: Showed inhibition induced forgetting when stimuli was task relevant, Exp 1: d= 0.06, Exp 2: d= 0.32, Exp 3: d= 0.31 (Calculated using this calculator).

  • Body-object interaction (BOI) effect in lexical-semantic processing. Words that have higher ratings on the BOI measure receive faster responses (RTs) in lexical-semantic tasks (e.g., lexical decision, semantic decision). The BOI quantifies the ease with which the human body can physically interact with a word’s referent. The BOI effect is thought to show that sensorimotor information contributes to word meaning, providing support for embodied theories of semantic representation.

    • Status: replicated
    • Original paper: ‘Evidence for the activation of sensorimotor information during visual word recognition: The body–object interaction effect’, Siakaluk et al. 2008; experimental within-subjects design (high-BOI vs low-BOI), Study 1: n=30, Study 2: n = 30. [citations = 172 (GS, April 2023)].
    • Critiques: Siakaluk et al. 2010 [Study 1: n = 35, Study 2: n = 35, Study 3 task: n = 35, citations = 104 (GS, April 2023)]. Tousignant and Pexman 2012 [Study 1 Entity: n = 41, Study 2: n = 39, Study 3: n = 39, Study 4: n = 40, citations = 47 (GS, April 2023)].Wellsby et al. 2010 [Study 1: n = 25, Study 2: n = 25, Study 3: n = 25, citations = 35 (GS, April 2023)].
    • Original effect size:_ _Study 1: _η2_ = .33. Study 2: _η2_ = .30.
    • Replication effect size: Siakaluk et al.: Study 1a: η2 = .38, Study 1b: η2 = .38, Study 2: η2 = .57.. Tousignant and Pexman: Study 1: η2 = 0.69, Study 2: η2 = 0.25, Study 3: η2 = 0.16, Study 4: d = 0.05 (not reported, calculated from the M and SD reported in Table 3 but does not take into account within-subject design). Wellsby et al.: Study 1: η2 = .32, Study 2: η2 = .32, Study 3: η2 = .33.

  • False memory implantation (false memory fabrication). People fabricate false memories after the suggestion that it happened. After discussing their memories with a researcher, participants reported a false memory.

    • Status: replicated
    • Original paper: ‘The formation of false memories’, Loftus & Pickrell 1995; one experimental condition, n = 24. [citations=1813(GS, June 2023)].
    • Critiques: Murphy et al. 2023 [n= 123, citations=1(GS, June 2023)]​.
    • Original effect size: 25% of participants ‘remembered’ the false memory.
    • Replication effect size: Murphy et al.: 35% of participants ‘remembered’ the false memory.

  • Serial dependence. Serial dependence describes a visual bias that a reported item (e.g., orientation) is systematically attracted towards the previous reported item.

    • Status: replicated
    • Original paper: ‘Serial dependence in visual perception’, Fischer & Whitney 2014; experiment, n=12. [citations=654 (GS, June 2023)]​.
    • Critiques: Fritsche et al. 2017 [n=25, citations=326 (GS, June 2023)]. Czoschke et al. 2019 [n1=15, n2=19, citations=44 (GS, June 2023)]. Fischer et al. 2020 [n1=20, n2=49, citations=58 (GS, June 2023)].
    • Original effect size: a (height parameter of fitted Derivative-of-Gaussian curve)=8.19°[NA], a=6.76°​[6.25, 7.28] ](calculated), a=8.75°[8.22, 9.28] (calculated).
    • Replication effect size: Fritsche et al.: a=1.15°[NA] – a=1.17°[NA]. Czoschke et al.: a=6.11°[5.28 – 6.94] (calculated), a=4.11°[3.04 – 5.18] (calculated). Fischer et al.: effect of previous target on the current target a=2.99°[2.80 – 3.18] (calculated), dest=1.351, R2 = 0.140, a=2.00°[1.94 – 2.06] (calculated), dest=1.123, R2 = 0.118.

  • Modality-switching cost (modality switch effect). When verifying object properties, processing is slowed when the modality being processed is different from the modality processed in the preceding trial. The presence of the switching cost suggests that people represent semantic information in a modality-specific, rather than amodal or abstract, manner.​

    • Status: replicated
    • Original paper: ‘ Verifying different-modality properties for concepts produces switching costs’, Pecher et al. 2003; repeated measures design, Experiment 1: n = 64, Experiment 2: n = 88. [citations=485 (GS, June 2023)]​.
    • Critiques: Vermeulen et al. 2007 [n=81, citations=104(GS, June 2023)]. Lynott & Connell (2009) [n=24, citations=279(GS, June 2023)]. Ambrosi et al. 2011 [n=40, citations=8(GS, June 2023)].
    • Original effect size: Experiment 1a - d = 0.18 (29ms), Experiment 1b - d= 0.15 (20ms), Experiment 2 - d= 0.27 (41ms).
    • Replication effect size: Vermeulen et al.: d= 0.5, consistent with the original, with Switch trials being slower than Non-Switch trials. Lynott & Connell: d = 0.36. Ambrosi et al.: d= 0.2 (Adults), dm= 0.24 (Children).

  • Tactile Disadvantage (conceptual tactile disadvantage). Participants find it more difficult and are slower to process words strongly related to the tactile modality (e.g., sticky), compared to processing words from other modalities (visual, auditory etc.). The presence of this conceptual tactile disadvantage mirrors a similar disadvantage observed in perceptual processing, where tactile stimuli are slower and more difficult to process than visual or auditory stimuli.​

  • Attentional blink. The attentional blink describes the phenomenon that in a rapid serial visual presentation ​of items, humans show a reduced ability to detect the second of two targets among distractors if the second target follows after approximately 200ms – 500ms after the first target. This effect is interpreted as displaying one of the limitations of human visual processing.


Developmental Psychology

  • Growth mindset. Thinking that skill is improvable on attainment.

    • Status: mixed
    • Original paper: ‘Implicit theories and their role in judgments and reactions: A word from two perspectives’, Dweck 1995; correlational and test-retest design, n=638. [citations=3311 (GS, April 2023)].
    • Critiques: Folioano et al. 2019 [a big study of the intervention in English schools, n=4584, citations=46 (GS, April 2023)]. Sisk 2018 [a pair of meta-analyses on both questions, n=365,915, citations=834 (GS, April 2023)].
    • Original effect size: Hard to establish, but reported up to r = 0.54 / d=0.95 in some papers.
    • Replication effect size: Folioano: d = 0.00 [-0.02, 0.02]. Sisk: r = 0.10 [0.08, 0.13] for the (nonexperimental) correlation; d = 0.08 [0.02, 0.14].

  • Expertise attained after 10,000 hours practice. The notion that it takes around 10,000 hours of practice to become an expert in a particular field or domain. Specifically, that deliberate practice explains from most to all of the variance in (expert) performance in sports.

  • Tailoring to learning styles. Tailoring teaching to students’ preferred learning styles has any effect on objective measures of attainment.

  • Neonate imitation. Babies are born with the ability to imitate.

    • Status: NA
    • Original paper: ‘Imitation of facial and manual gestures by human neonates’, Meltzoff and Moore 1977; observational and experimental design, 2 studies with: Study 1: n=6, Study 2: n=12. [citation=5311 (GS, December 2021)]​.
    • Critiques: Oostenbroek et al. 2016 [n=106, citations=259 (GS, December 2021)].
    • Original effect size: Not reported​.
    • Replication effect size: Not reported.

  • Violent media content on aggression. Violence content in media can affect people to be more aggressive. Notably, the studies of this effect differ by media (TV, games, etc.) and whether long, medium, or short-term effects have been investigated. The variety of methods/tests further complicates the literature. Distinct media types are marked for each reference below.

    • Status: mixed
    • Original paper: ‘Transmission of aggression through imitation of aggressive models’, Bandura et al. 1961; experimental design, n = 72. [citations = 4151 (GS, March, 2022)].
    • Critiques: Anderson and& Bushman 2001 [meta-analysis, n= 4,262, citations = 3908(GS, April 2023)]. Bushman 2016 [meta-analysis, k=37, n=10,410, citations = 92 (GS, April 2023)]. Drummond et al. 2020; [gaming, n ≈ 21000, citations = 35 (GS, June 2022)]. Elson and Ferguson 2014 [gaming, review paper, n=NA, citations = 266 (GS, June 2022)]. Ferguson and Kilburn 2009 [meta-analysis, k=25, n=12,436, citations = 497(GS, April 2023)]. Greitemeyer and Mügge 2014 [meta-analysis, k=98, n= 36,965, citations = 984(GS, April 2023)]. Hilgard et al. 2016 [gaming, meta-analysis, k=7 to k=40 for various gaming effects, citations = 161 (GS, June 2022)]. Savage and Yancey 2008 [media, n= 26 independent samples of subjects, citations = 255 (GS, June 2022)]. Strasburger and Wilson 2014 [review paper, n= NA, citations = 23(GS, April 2023)].
    • Original effect size: Bandura et al.: r2 = 8.96, p < .02.
    • Replication effect size: Anderson and Bushman: r = 0.27. Bushman: r = 0.20. Drummond et al.: r = 0.059. Elson and Ferguson: evidence regarding the impact of violent digital games on player aggression is, at best, mixed and cannot support unambiguous claims that such games are harmful or represent a public health crisis. Ferguson and Kilburn: r = 0.08. Greitemeyer and Mügge: r = 0.19. Hilgard et al.: substantial publication bias in experimental research on the effects of violent games on aggressive affect and aggressive behavior detected; after adjustment for bias, the effects of violent games on aggressive behavior in experimental research are estimated as being very small, and estimates of effects on aggressive affect are much reduced. Martins & Weaver: r = 0.15. Savage & Yancey: r = 0.07 (nonsignificant). Strasburger & Wilson: r = 0.3 (review outcome, not meta-analysis).

  • Mutual exclusivity bias. When presented with two objects, one of which has a known label and one which does not, infants (by ~20mo) are more likely to choose the object without a known label when prompted with a novel label.

  • Mutual exclusivity bias in bilinguals. Bilingual (and multilingual) infants exhibit delayed and/or lower mutual exclusivity bias than monolingual infants.

  • Abstract rule learning. After training on strings on stimuli generated from a particular repetition pattern (e.g., AAB), young infants attend longer to novel stimuli generated from a novel pattern (e.g., ABA) than novel stimuli generated from the original pattern.

    • Status: replicated
    • Original paper: ‘Rule learning by seven-month-old infants’, Marcus et al. 1999; preferential looking time, Experiment 1: n=16. [Citations=1707 (GS, February 2023)].
    • Critiques: Rabagliati et al. 2018 [meta-analysis, n=1318, citations=53 (GS, February 2023)].
    • Original effect size: η2p=0.647 [0.252, 0.789].
    • Replication effect size: Rabagliati et al.: g=0.25 [0.09, 0.40].

  • Theory of mind in below-4 year olds (false belief). Children under four years of age fail at theory of mind tasks.

  • Theory of mind over development (false belief). Success rate at theory of mind tasks increases over age within the first 6 years of life.

  • Preference for prosocial agents. Infants and toddlers prefer prosocial agents over neutral or antisocial agents.

    • Status: mixed (overall positive effect but much smaller effect size)
    • Original paper: ‘Social evaluation by preverbal infants’, Hamlin et al. 2007; preferential reaching task, n(6mo)=12, n(10mo)=16. [Citations=1890 (GS, February 2023)].
    • Critiques:Margoni and Surian 2018 [meta-analysis, n=1244, citations=117 (GS, February 2023)]. Schlingloff et al. 2020 [replication of Hamlin et al., n(14–16mo)=32, citations=40 (GS, February 2023)]. Sitch, 2018 (unpublished [direct replication of Hamlin & Wynn 2011, n(6–9mo)=115, citations=0 (GS, February 2023).
    • Original effect size: log OR(6mo)=1.77, log OR(10mo)=1.07.
    • Replication effect size: Margoni & Surian: log OR=0.15 [0.11, 0.18]. Schlingloff et al.: log OR=0.00. Sitch: log OR=0.20.

  • Flynn effect. The observed rise over time in standardised intelligence test scores in the successive versions of Stanford-Binet and Wechsler intelligence tests. This effect points toward an increase in “intelligence” in the next generation compared to the previous generation.

    • Status: replicated
    • Original paper: ‘The Mean IQ of Americans: Massive Gains 1932 to 1978’, Flynn, 1984; meta analysis, k=73, n=7431. [citations=1852 (GS, March 2023)].
    • Critiques: Fletcher et al. 2010 [meta-analysis, k=14 studies, n=2169, citations=27 (GS, March 2023)]. Trashon et al. 2014 [meta-analysis, k=285 studies, n=14,031, citations=369 (GS, March 2023)].
    • Original effect size: 3.1 (IQ gain per decade).
    • Replication effect size: Fletcher et al.: 2.80 [2.50, 3.09]. Trashon et al.: 2.93 [2.3, 3.5].


Differential psychology

  • 2D:4D ratio of the fingers and its correlation with psychological traits. This ratio was used as a predictor for different interindividual (e.g., intelligence) and especially gender differences.

    • Status: mixed
    • Original paper: ‘Finger length as an index of assertiveness in women’, Wilson 1983; correlational study, n = 985 females. [citations=125 (GS, January 2022)].
    • Critiques: Consumers' choices of gender-imaged products: Aspara et al. 2014 _ _[n=588, citations=43 (GS, January 2022)]. Associations with different psychological traits/biological states: Sex role identity: Csathó, et al. 2003 [n=46, citations=185 (GS, October 2021)]. Attractiveness in men: Ferdenzi et al. 2011 [n=49 males, citations=35 (GS, January 2022)]. Critical comment: Jones et al. 2020 [n=NA, citations=4 (GS, January 2022)]; Gender and Intelligence: Grisilda et al. 2021 [n=88 (50:50 f/m), citations=0 (GS, June 2022)]. Celebrity worship: Huh 2012 [n=106 (adolescence), citations=14 (GS, January 2022)]. Signal of male reproductive fitness: Longman et al. 2015 [n=542, citations=21 (GS, January 2022)].Gender and COVID-19 mortality: Manning and Fink 2020 [n=41 nations, largest BBC Internet study, citations=22 (GS, January 2022)]; From the same group and again a BBC study: Manning et al. 2021 [n=169,467, BBC Internet study 2005, citations=1 (GS, January 2022)]; Smoliga et al. 2021 [n=176, citations=3 (GS, January 2022)]. Empirical studies & Meta-analysis: Voracek et al. 2011 [3 studies, n=1867, citations=55 (GS, October 2021)]. Wilson (2010) [Review, n=NA, citations=14 (GS, October 2021)].
    • Original effect size: : d=0.13 (estimated from the reported χ² (1, N=896)=3.79).
    • Replication effect size: (all |r| < .04); Aspara et al.:for males β = -.18. Csathó et al.: the relationship between Bem sex role inventory and second to fourth digit ratio, Right hand b=0.29, Left hand b=0.26 (n.s.), Mean ratio b=0.31. Ferdenzi et al.: _r_2 = [-.25, -.43]. Grisilda et al.: No ES reported. Huh: females r = .50. Jones et al.: no significant association between male 2D:4D and % of male deaths (left hand: r = −0.32, p = .079; right hand: r = −0.20, p = .283). Longman et al.: r = .23 for females, .43 for males. Manning and Fink: R2 = .12-.40; no ES for (2021). Manning et al.: the association of income inequality (operationalized as relative parental income) and children’s 2D:4D, in males: right hand F(3,109668)=12.55, p<0.0001; left hand F(3,109668)=14.86, p<0.000; in females: right hand F(3,89642)=4.74, p=0.003; left hand F(3,89642)=4.02, p=0.007. Smoliga et al.: r = .28. Voracek et al.: Critique in general and a great empirical study. The authors show that 2D:4D can be associated with anything, e.g, good luck. They did find a significant association between 2D:4D and “poker hand”. However, they stress that this and similar associations are found simply due to chance only and are one of the reasons for a replication crisis in psychology.

  • Personality traits and consequential life outcomes. Personality traits (i.e., characteristic patterns of thinking, feeling, or behaving that tends to be consistent over time and across relevant situations), particularly the Big Five factors, are linked with consequential individual, interpersonal, and social-institutional outcomes.

    • Status: mixed
    • Original paper: ‘Personality and the prediction of consequential outcomes’, Ozer and Benet-Martínez, 2006; review summarizes 86 associations between the Big Five traits and 49 life outcomes. [citation=2754(GS, August 2022)].
    • Critiques: Roberts et al. 2007, review of prospective longitudinal studies, 34 studies that link personality traits to mortality/longevity, N=117713. [citation=2482(GS, August 2022)]​. Soto 2019 [n=1504,citations=256(GS, August 2022)].
    • Original effect size: r = .10 to .40.
    • Replication effect size: Soto: Shrank by 20% (r = .06 to .60).

  • Reading prevents cancer. Combining bibliotherapy and short-term individual therapy can reduce the probability of dying from cancer and may prolong lives of those already suffering from cancer.​

  • Graphology. The analysis of handwriting with the attempt to determine someone’s personality traits.

    • Status: not replicated
    • Original paper: ‘Graphological analysis and psychiatry: an experimental study’, Eysenck 1945; participants filled in a personality questionnaire and copied the questionnaire without filling it in, graphologist analysed it, N=50. [citations=44 (GS, January 2023)].
    • Critiques: Dazzi & Pedrabissi 2009 [N=203 across 2 studies, citations= 52 (GS, January 2023)]. Furnham and Gunter 1987 [N=64, citations=52 (GS, January 2023)].
    • Original effect size: not reported.​
    • Replication effect size: Furnham and Gunter: not reported. Dazzi and Pedrabissi: Study 1 - Spearman’s rho correlation coefficients between the scores on the Big Five Questionnaire and the graphologists’ ratings of personality traits based on the handwriting samples varied from –.07 to .21, with a mean value of .04 (first graphologist) and .07 (second graphologist).Only one out from 15 coefficients significantly differed from zero; Study 2 - Spearman’s rho correlation coefficients between the scores on the Big Five Questionnaire and the ratings of traits given by the: first graphologist varied from –.11 to .20, with a mean of .05, and only one of 15 coefficients was statistically significant; coefficients of the second graphologist ranged from –.11 to .14, with a mean of .03, and no single coefficient turned out to be significantly different from zero.


Judgment and Decision Making

  • Default effect. In a choice scenario between two alternatives, when an alternative is presented as a default option, people stick with it rather than change it. For example, ‘Opt Out’ default organ donation policies increase organ donations.

    • Status: replicated
    • Original paper: ‘Do defaults save lives?’, Johnson and Goldstein 2003; 3 between-subjects experiment, N = 161. [citations=2649 (GS, June 2022)]​.
    • Critiques: DellaVigna & Linos 2022 [Meta analysis of 241 nudges based on 23.5 million participants, citations = 185 (GS, July 2022)]. Chandrasehkar et al. 2022 [N = 1920, citations = 1(GS, April 2023)].
    • Original effect size: OR = 5.15 to OR = 5.93​.
    • Replication effect size: DellaVigna & Linos: Nudge effects in the published literature tends to report false positives and inflated effect sizes. Chandrasehkar et al.: OR=1.38 to OR = 1.67​.

  • Decoy effect (alternatives: asymmetric dominance; attraction effect). The Decoy Effect is a cognitive bias in which an individual’s preference between two options is influenced by the presence of a third, asymmetrically dominated option (i.e., a decoy similar but inferior to one of the original options). Individuals are more likely to choose the option that is similar to the decoy option than if the decoy were absent. Decoy effect has been replicated in different studies and contexts, though the magnitude of the effect can vary, particularly depending on the specific features of the options being considered and the context in which the decisions are being made.

  • Nudges. Choice architecture interventions that promote beneficial decisions.

    • Status: mixed
    • Original paper: ‘Nudge: Improving Decisions about Health, Wealth, and ’, Thaler & Sunstein, 2008, Book [citations=23376 (Google Scholar, October 2022)].
    • Critiques: Mertens et al. (2021)[citations=55 (Google Scholar, October 2022)] conducted a meta-analysis on nudges and found medium effect size across all types of nudges. They conducted several publication bias tests, the most severe indicated a very small but significant effect size. Maier et al. (2022) [citations=15 (Google Scholar, October 2022)] re-analysed the data reviewed by Mertens et al. (2021) and found no nudging effect after adjusting for publication bias.
    • Original effect size: No effect sizes were provided in the original book.
    • Replication effect size: Mertens et al. 2021: d= 0.37 to d= 0.46. Maier et al., 2022: d= 0.00 to d= 0.14.

  • Risky Choice Framing Effect (term used by Levin et al., 1998), alt-term = framing effect in risky-decision making. Under loss-frame, people are risk-seeking, whereas under gain-frame, people are risk-averse. In framing studies, logically equivalent choice situations are differently described and the resulting preferences are studied ( Kühberger, 1998). In risky choice problems, the way a choice is presented influences the decision. (e.g. saving 10 people out of 100 vs losing 90 people out of 100).

    • Status: replicated
    • Original paper: ‘The framing of decisions and the psychology of choice’, Tversky & Kahnemann, 1981; experimental design, P1: 152; P2: 155; P3: 150; P4: 86; P5: 77; P6: 85; P7: 81; P8: 183; P9: 200; P10.1: 93; P10.2: 88; Total = 1350* (unclear if those samples are different samples) [citations = 24617 (GS, October 2022)]​.
    • Critiques: Meta-analysis: Kühberger, 1998. [Total studies reviewed=136, citations=1554 (GS, June 2022)] The author finds that certain characteristics of framing studies are crucial to getting a consistent framing effect, but that the closer a methodology is to the original methodology, the better chance to replicate the original effect. Large scale replication in Klein et al., 2014 [Total replication studies = 36, citations=1082 (GS, June 2022)]
    • Original effect size: Kahneman and Tversky (1982): d = 1.13, 95% CI [0.89, 1.37] (based on Klein et al., 2014 calculation) Meta-analytical effect size (many close and conceptual replications): Steiger & Kühberger (2018): d = 0.52 to 0.56.
    • Replication effect size: Kühberger, 1998: d = .308.; Revised in Steiger & Kühberger, 2018 to d = .522 with only 81 of the 136 studies; Klein et al., 2014 : d=.60 (95% CI 0.53-0.67); Steiger & Kühberger, 2018 : d=.56.

  • Risk and Goal Message Framing. a) For illness detection behaviors, loss framing (presenting information of negative consequences with undesirable behaviors / without desirable behaviors) would be more effective than gain framing (presenting information of benefits through engaging in desirable behaviors) in encouraging healthy attitudes, intentions, and behaviors (perhaps because illness detection behaviors are riskier, Rothman & Salovey, 1997), whereas b) for health-affirming behaviors, gain framing would be more effective than loss framing in motivating healthy attitudes, intentions, behaviors (perhaps because health-affirming behaviors are less risky, Rothman & Salovey, 1997).

    • Status: Mixed, depending on operationalizations, DVs, and method (meta-analysis vs empirical study). The conceptual replication failed to provide support for the interaction, but this may be due to limited power.
    • Original paper: Rothman et al. (1999), between-subject design, sample size: 120 (Study 2) [citations=548(GS, October 2022)]​.
    • Critiques: van Riet et al. (2016) criticized reasoning of applying Kahneman and Tversky (1981) Prospect Theory (which was more suitable and applicable for risky choice framing) to goal message framing. Van Riet et al. (2016) also reviewed direct empirical and meta-analytical evidence, and it appears the evidence of risk-framing hypothesis in message framing is not conclusive.Original effect size: Rothman et al. (1999): partial eta squared=0.03, [90% CI [0.00, 0.10], to partial eta squared=0.06, 90% CI [0.01, 0.14].
    • Replication effect size: Cox et al. (2006): author: partial eta squared =0.03, 90% CI [0.00, 0.12], non-significant, but may be due to limited power.

  • Status quo effect (status quo bias). A cognitive bias that leads people to prefer things to stay the same, even when change may be beneficial, thus a preference for the current state of affairs

    • Status: replicated
    • Original paper: ‘Status quo bias in decision making’, Samuelson and Zeckhauser 1988; series of decision-making experiments, n = 486. [citations=2707(SPRINGER LINK, January 2023)]​.
    • Critiques: Review: Bostrom and Ord 2006 [n=NA, citations = 354 (GS, January 2023)]. Godefroid et al. 2022 [n = NA, citations=4(GS, February 2023)]​. Johnson and Goldstein 2003 [n = 161, citations = 2824 (GS, February 2023)]. Xiao et al. 2021 [Experiment 1: n = 311, Experiment 2: n = 316, citations = 4 (GS, January 2023)].
    • Original effect size: : Cohen’s h from .16 to .79 (recalculated in Xiao 2021).
    • Replication effect size: Bostrom and Ord: no ES (replicated). Godefroid et al.: NA. Johnson and Goldstein: no ES (but replicated as default effect). Xiao et al.: Cohen’s h from .45 to .62.

  • Temporal action-inaction effect. The proposed phenomenon that people associate or experience stronger regret with action compared to inaction in the short-term, but stronger regret with inaction compared to action in the long-term. ​

    • Status: mixed
    • Original paper: ‘The temporal pattern to the experience of regret’, Gilovich and Medvec 1994; hypothetical scenario experiments and real-life experience studies, Study 1: n =60, Study 2: n=77, Study 3: n= 80, Study 4: n=34, Study 5: n=32. [citations=564(GS, June 2022)]​.
    • Critiques: Bonnefon and Zhang 2008 [n=957, citations = 23 (GS, April 2023)]. Feldman et al. 1999 [n1=157, n2=622, citations = 97 (GS, April 2023)]. Towers et al. 2016 [n=500, citations = 31 (GS, April 2023)]. Yeung and Feldman 2022 [n=988, citations = 0 (GS, April 2023)]. Zeelenberg et al. 1998 [n1=165, n2=75, n3=100, n4=150, citations = 455(GS, April 2023)].
    • Original effect sizes: Study 1: V = 0.50, Study 3: V = 0.28 to V = 0.53, Study 4: V = 0.24 to V = 0.53, Study 5: V = 0.06 to V = 0.56​ (reported in Yeung and Feldman 2022).
    • Replication effect sizes: Bonnefon and Zhang: The intensity of recent regrets is predicted by the consequences of the behaviour, and especially so for actions. The intensity of distant regrets is predicted by the consequences of the behaviour and by its justification, the effect of justification being stronger for actions than for inactions; failed to find support for temporal pattern. Feldman et al.: Participants reported more inaction than action regrets, and, contrary to prior research findings, regrets produced by actions and inactions were equally intense; failed to find support for temporal pattern. Towers et al.: Although regrets of inaction were more frequent than regrets of action, regrets relating to actions were slightly more intense; failed to find support for temporal pattern. Yeung and Feldman: Study 1: V = 0.25, Study 3: V = 0.15 to V = 0.23, Study 4: V = 0.10 to V = 0.24, Study 5: V = 0.04 to V = 0.05. Zeelenberg et al.: found support for temporal pattern of regret with real-life experience studies; when prior outcomes were positive or absent, people attributed more regret to action than to inaction; however, following negative prior outcomes, more regret was attributed to inaction, a finding that the authors label the inaction effect.

  • Money market versus goods/social market. The money market relationship refers to an exchange in which effort level is determined based on the level of compensation. By contrast, the social market relationship is an exchange in which effort level is most influenced by altruistic motivations rather than the compensation level. Heyman and Ariely (2004) proposed and showed that when the former is primed with monetary compensation, the more compensation they receive, the more effort they displayed. Yet, effort level did not vary depending on the level of compensation when the latter is primed with non-monetary compensation (i.e., goods), effort level does not depend on compensation level. ​

    • Status: mixed
    • Original paper: ‘Effort for Payment: A Tale of Two Markets’, Heyman and Ariely 2004; between-subjects design, n = 614 (Experiment 1). [citations=1261(GS, June 2022)].​
    • Critiques: Imada et al. 2022 [n=2203 (study 1) and 999 (study 2), citations=1(GS, June 2022)].
    • Original effect size: Money market: d = -0.59 [-0.89, -0.29]; Social market: d = 0.25 [-0.55, 0.05]
    • Replication effect size: Imada et al.: replicated the original finding that people expect others to be more willing to help when they are offered a medium amount of cash compared to a small amount of cash; contrary to Heyman and Ariely 2004, they found that people expect others to be more willing to help when they are offered with medium amount of goods compared to small amount of goods; the effect size (medium vs. small) for money was much larger than that for goods, and their replications overall supported the original claim that the sensitivity to compensation level differs depending on money vs. social market relationships; Study 1: Money market: d = -1.25 [-1.43, -1.06]; Social market: d = -0.43 [-0.60, -0,26]; Study 2: Money market: d = -1.30 [-1.39, -1.22]; Social market: d = -0.87 [-0.94, -0.79].

  • Risk and benefit perceptions(affect heuristic). Increasing risks of a hazard leads people to judge its benefits as lower while vice versa increasing benefits leads people to judge its risk as lower.​

    • Status: replicated
    • Original paper: ‘The affect heuristic in judgments of risks and benefits’, Finucane et al. 2000; A mixed 4 (between - affective information: high risk/low risk/high benefit/low benefit) x 3 (within - technologies: nuclear power/natural gas/food preservatives), n=213 participants. [citations = 3694 (GS, June 2022)]​.
    • Critiques: Efendić et al. 2021 [n=1552, citations = 2(GS, November 2022)].
    • Original effect size: r = -0.74 [-0.92,-0.30].
    • Replication effect size: Efendić et al.: Study 1: r = -0.87 [-0.96, -0.59]; Study 2: r = -0.84 [-0.95, -0.50].

  • Temporal value asymmetry (TVA). The phenomenon that contemplating future events elicits stronger emotions than contemplating past events has been coined “temporal value asymmetry” (TVA). TVA was robust in between-persons comparisons and absent in within-persons comparisons.

    • Status: not replicated
    • Original paper: ‘A Wrinkle in Time: Asymmetric Valuation of Past and Future Events’, Caruso et al. 2008; 2x2 between-subject design, Study 1 n=121, Study 4 n=182. [citations=252 (GS, July 2022)].
    • Critiques: ‘Caruso 2010Study 1: n=116. [citations=112 (GS, July 2022)]. Kvam et al. 2022 [n=70, citations = 2(GS, April 2023)].
    • Original effect sizes: Between-subjects analysis - Study 1 : Monetary Valuation: d = 0.41 [0.04, 0.76]; Difficulty: d = 0.08 [-0.27, 0.44]; Qualification: d = 0.19 [-0.17, 0.55]. Study 4 (DV = Monetary Value): Relevance: ηp2 = 0.03 [0.00, 0.10]; Temporal Location: ηp2 = 0.05 [0.01, 0.13]; RelevanceTemporal Location: ηp2 = 0.02 [0.00, 0.08]; Study 4 (DV = Stress): Relevance: N/A, Temporal Location: N/A; RelevanceTemporal Location: ηp2 = 0.02 [0.01, 0.12]. Study 1: Fairness: d = 0.43 [0.06, 0.80]; Negative Emotions: d = 0.37 [0.003, 0.74]; Brand’s Intentions: d = 0.33 [-0.03, 0.70].
    • Replication effect sizes: Caruso et al. (within-subject analysis): Study 1: Monetary Valuation: d = 0.03 [-0.24, 0.30]; Difficulty: d = 0.01 [-0.27, 0.26]; Qualification: d = 0.18 [-0.09, 0.45]; Study 4 (DV = Monetary Value): Relevance: ηp2 = 0.00 [0.00, 0.02]; Temporal Location: ηp2 = 0.00 [0.00, 0.02]; RelevanceTemporal Location: ηp2 = 0.00 [0.00, 0.01]; Study 4 (DV = Stress): Relevance: ηp2 = 0.01 [0.00, 0.04]; Temporal Location: ηp2 = <0.001 [0.00, 0.01]; RelevanceTemporal Location: ηp2 = 0.001 [0.00, 0.02]. Caruso (within-subject analysis): Study 1: Fairness: d = 0.13 [-0.06, 0.32]; Negative Emotions: d = 0.01 [-0-19, 0.20]; Brand’s Intentions: d = -0.09 [-0.28, 0.10]. Kvam et al.: Our work provides a direct counterpoint to both the empirical phenomenon and this theoretical explanation for the temporal value asymmetry. First, we show systematic reversals of the temporal value asymmetry where participants sometimes preferred past outcomes. There were multiple instances where participants indicated that they preferred payoffs that could have occurred in the past to ones that could occur in the future in perfectly matched pairs – where the same dollar amount could be received in either the past or future at the same temporal distance (X days ago vs X days from now). Second, these reversals occurred as both the magnitude of and distance to the past / future payoffs were manipulated. Participants favoured past events when payoffs were small and temporal distance was large ($11, 2 years ago / from now), and favoured future events when payoffs were large or temporal distance was small ($10k, 7 days ago / from now). This may explain apparent replication failures related to the temporal value asymmetry (El Halabi et al., 2021) – not because the phenomenon is not real, but because different stimuli can cause it to reverse, and thus on average, fail to appear. Third and finally, a model comparison showed that framing the temporal value asymmetry in terms of hyperbolic (or even the more general hyperboloid) discounting is insufficient to account for the patterns of behaviour we observed.

  • Exceptionality effect. (emotional amplification, normality bias, exceptional-routine effect). The affective response to an event is enhanced if its causes are abnormal. Exceptionality effect is the phenomenon that people associate stronger negative affect with a negative outcome when it is a result of an exception (abnormal behaviour) compared to when it is a result of routine (normal behaviour). The exceptionality enhances the response to an event for the emotion of regret, self-blame, the cognitive response for victim compensation and offender punishment.

    • Status: replicated
    • Original paper: ‘Norm Theory: Comparing Reality to Its Alternatives’, Kahneman and Miller, 1986; within subject design (exceptional vs. normal), n=92 participants. [citations=4427(GS, July 2022)]​.
    • Critiques: Fillon et al. 2020 [meta-analysis, k= 48, citations = 10 (GS, July 2022)]. Kutscher and Feldman 2019 [exact replication (within-subject), n1= 342, n2 = 342, citations = 19 (GS, April 2023)].
    • Original effect size: Hedge’s g =1.09 to g=2.78​.
    • Replication effect size: Kutscher and Feldman: d= 1.58 to d= 3.12. Fillon et al.: g= 0.41 to g= 0.79; the effect of exceptionality on counterfactuals was not significant and close to zero (k = 5, g = 0.39 [0.08, 0.70]). They also found that the effect for between-participants design is half the size of studies with a within-subject design.

  • Temporal differences in trait self-ascriptions. Much like how we are more likely to ascribe dispositional traits, as opposed to situational variables, when explaining the behaviour of other people compared to ourselves, the same asymmetry can also be observed when making trait assessments about our temporally distant selves (e.g. past or future). People are more likely to ascribe dispositional traits, compared to situational explanations, when making judgements about their past or future self.

    • Status: mixed (social distance replicated, but weaker | self-enhancement replicated | self-enhancement over temporal distance reversed | core hypothesis about temporal distance not replicated)
    • Original paper: ‘Temporal differences in trait self-ascription : When the self is seen as an other’, Pronin and Ross, 2006; study 1: 2 (between - temporal distance: past vs. present) x 2 (between - social distance: self vs friend); study 2: between, temporal distance - present vs future self; study 3: 3 (between - temporal distance: past vs present vs future self) x 2 (within - valence of trait attributions : positive vs negative), study 1: n=167 (students = 123, staff = 44); study 2: n=40; study 3: n=75. [citations=345(GS, June 2022)]​.
    • Critiques: Adelina and Feldman 2021 [n=911, citations = 1(GS, November 2022)].
    • Original effect size: People attribute more dispositional traits to others compared to themselves (social distance), f = 0.35 [0.03, 0.16]; People attribute more positive, compared to negative, traits when making self-assessments (self-enhancement), f = 0.77 [0.29, 1.25], and this ratio does not increase with temporal distance, f = 0.16 [0.00, 0.36]; People ascribe more dispositional traits when making assessments about their temporally distant self compared to their present self (temporal distance), f = 0.54 [0.27, 0.77].
    • Replication effect size: Adelina and Feldman: Social distance: f = 0.10 [0.03, 0.16]; Self-enhancement: f = 0.88 [0.50, 1.26], ratio increases with temporal distance, f = 0.33 [0.22, 0.42]; Tempn the psychology of experimental surprise oral distance: f = 0.02 [0.00, 0.06].

  • Bias Blind Spot. The phenomenon that people perceive stronger biases for others compared to self. Pronin (2002) found support for self-other asymmetries in perceived biases but failed to find support for self-other asymmetries in perceived personal shortcomings. Chandrashekar et al. (2021) found support for self-other asymmetries for both biases and personal shortcomings.​

    • Status: replicated
    • Original paper: ‘Perceptions of bias in self versus others’, Pronin et al. 2002; within-subject design, Study 1: n = 24 , Study 2: n = 30 . [citations=1406 (GS, October 2022)].
    • Critiques: Chandrashekar et al. 2021 [N = 969, citations=10 (GS, April 2022)].
    • Original effect size: d = -0.86 for biases, d = 0.28 for personal shortcomings.
    • Replication effect size: Chandrashekar et al.: d = -1.00 for biases, d = -0.34 for personal shortcomings.

  • Hindsight Bias. Hindsight bias refers to the tendency to perceive an event outcome as more probable after being informed of that outcome.

    • Status: replicated
    • Original paper: ‘On the psychology of experimental surprise’, Slovic and Fishhoff 1977; between study design, n=184. [citations = 591 (GS, October 2022)]​.
    • Critiques: Chen et al. 2021 [n = 608, citations = 1 (GS, October 2022)].
    • Original effect size: d=0.36 to d=0.61.
    • Replication effect size: Chen et al.: d=0.05 to d=0.32.

  • Disjunction Effect. The sure-thing principle (STP) posits that if decision-makers are willing to make the same decision regardless of whether an external event happens or not, then decision-makers should also be willing to make the same decision when the outcome of the event is uncertain. People regularly violate the STP – uncertainty about an outcome influence decisions.​

    • Status: mixed
    • Original paper: ‘The Disjunction Effect in Choice under Uncertainty’, Tversky and Shafir 1992; within and between subject design, n1=199, n2=98, n3=213, n4=171. [citations=860(GS, March 2023)]​​.
    • Critiques: Kühberger et al. 2001 [n1=177, n2=184, n3=35, n4=97, citations=58(GS, March 2023)]. Lambdin and Burdsal 2007 [N=55, citations=51(GS, March 2023)]. Ziano et al. 2021 [N=890, citations=3(GS, March 2023)].
    • Original effect size: “Paying-to-know” paradigm – participants were willing to pay a small fee to postpone a decision about a vacation package promotion when outcome of an exam was uncertain, despite preferences to purchase the package regardless of exam outcome, Cramer’s V = 0.22 [0.14, 0.32] (reported in Ziano et al. 2021); “Choice under risk” problem – facing uncertainty about the outcome of an initial bet led to less willingness to again accept the exact same bet, compared to when having learned the outcome of the first bet, Cramer’s V = 0.26 [0.14, 0.39] (reported in Ziano et al. 2021)​.
    • Replication effect size: “choice under risk” problem: Kühberger et al.: ES not reported but failed to replicate the “choice under risk” problem in four experiments. Lambdin & Burdsal: ES not reported but failed to replicate. Ziano et al.: Cramer’s V = 0.11 [- 0.07, 0.20]) (not replicated). “paying to know“ problem: Ziano et al.: Cramer’s V = 0.30 [0.24, 0.37] (replicated).

  • Money Illusion. The inability of individuals to account for inflation or deflation when making decisions. If inflation, money loses value over time, leading to people to fail to consider the impact of inflation or real value of money.

    • Status: replicated
    • Original paper: ‘Money Illusion’, Shafir et al. 1997; experiment, Problem 1: n = 358; Problem 2: n = 431 ; Problem 3: n =362; Problem 4: n = 139. [citations=1190(GS, November 2022)]​.
    • Critiques: Ziano et al. 2021 [n=604, citations=8(GS, November 2022)]. ​
    • Original effect size: Problem 1: Cramer’s V = 0.26 [0.17, 0.37]; Problem 2: XX = 48% [42%, 52%]; Problem 3; buy: 38% [33%, 43%] and sell: 43% [38%, 48%]; and Problem 4: V = 0.25 [0.13, 0.42] (obtained from Ziano et al. 2021).
    • Replication effect size: Ziano et al.: Problem 1: V = 0.28 [0.21, 0.36]; Problem 2: 70% [66%, 73%]); Problem 3; buy: 47% [43%, 51%] and sell: 43% [39%, 47%]); and Problem 4: V = 0.17 [0.10, 0.25]).

  • Choosing versus rejecting (Framing effects, compatibility principle). People are inconsistent in their preferences when faced with choosing versus rejecting decision-making scenarios.

    • Status: not replicated
    • Original paper: ‘Choosing versus rejecting: Why some options are both better and worse than others’, Shafir 1993; 8 experiments with between-subjects design (i.e., two between-subjects conditions), across 8 studies sample size ranged from 170 to 398. [citations=779 (GS, July 2022)]​.
    • Critiques: Chandrashekar et al. 2021 [n = 1026, citations = 1 (GS, July 2022)]. Proposed and tested alternative theoretical predictions to that of Shafir 1993.Ganzach 1995 [n = 41 & 96, citations = 94 (GS, July 2022)]. Wedell 1997 [n1=225, n2= 125, citations = 79(GS, July 2022)].
    • Original effect size: d= 0.22 to 0.51​.
    • Replication effect size: Chandrashekar et al.: d = 0.01. Ganzach: Experiment 1 - ηp ² = 0.065 (calculated from the reported F(1,39)=6.6, p<.01 using this conversion), Experiment 2 - ηp ²= 0.105 (calculated from the reported F(1,94)=11,p<.001 using this conversion).

  • Conjunction bias (conjunction fallacy). The fallacy consists of judging the conjunction of two events as more likely than any of the two specific events, violating one of the most fundamental tenets of probability theory.

  • Direct versus indirect harm. Individuals believe that causing indirect harm is more moral than direct harm, regardless of outcomes, intentions, or self-presentational concern.

    • Status: mixed
    • Original paper: ‘The Preference for Indirect Harm’, Rozyman and Baron 2002; within-subjects, n_ _= 54. [citations = 223 (GS, January 2023)]​. Three experiments were conducted but Experiment 2 is the key experiment.
    • Critiques: Aims and Fiske (2013) [between-subjects; Experiment 1: n = 80 (Intentional harm = 39, Unintentional harm = 41), Experiment 2: n = 93 (Intentional harm = 46, Unintentional harm = 47), Experiment 3: n = 79 (Intentional harm = 41, Unintentional harm = 38), citations = 182 (GS, January 2023)].Ziano et al. 2021 [Experiment 1: n = 46, Experiment 2: n = 314, Meta-Analysis: n = 414, k = 3 experiments (original and the two conducted by the experimenters), citations = 5 (GS, January 2023)] replicated Experiment 2 of Rozyman and Baron 2002. .
    • Original effect size: d = 0.70 [0.40, 1.00] (estimated from test-statistic: t(53) = 5.12, _p _< .001).
    • Replication effect size: Ames and Fiske: Experiment 1: d = 0.74 [0.29, 1.19] (estimated using descriptive statistics) (replicated); Experiment 2: d = 0.22 [-0.18, 0.63] (estimated using descriptive statistics) (not replicated); Experiment 3: d = 0.48 [0.03, 0.93] (estimated using descriptive statistics) (replicated). Ziano et al.: Experiment 1: Scenario 1: d = 0.55 [0.24, 0.86] (replicated); Scenario 2: d = 0.41 [0.11, 0.71] (replicated); Experiment 2: Scenario 1: d = 0.24 [0.13, 0.35] (replicated, but smaller); Scenario 2: d = 0.36 [0.24, 0.47] (replicated); Meta-Analysis: Scenario 1: d = 0. 47 [0.18, 0.75] (replicated); Scenario 2: d = 0. 46 [0.26, 0.65] (replicated).

  • Distinction bias. When evaluating how happy options would make them, people who evaluated the options simultaneously predicted greater happiness for the good options and lower happiness for the bad options, whereas people who evaluated the options separately (i.e., only evaluated one option) showed little difference between the options.

    • Status: mixed
    • Original paper: ‘Distinction bias: Misprediction and mischoice due to joint
    • evaluation’, Hsee and Zhang 2004; study 1: 5 conditions, pairwise comparisons between groups, total sample size n = 249; Study 2: 9 conditions, pairwise comparisons between groups, total sample size n = 360. [citations = 380 (GS, January 2023)]​.
    • Critiques: Anvari et al. 2021 [n = 824, citations = 6 (GS, April 2023)].
    • Original effect size: Study 1: d = 1.17 and 3.26; Study 2: = 0.60, 0.75, 0.91, and 1.20.
    • Replication effect size: Anvari et al.: Study 1: d = 2.60 and 4.13; Study 2: d= 0.45, 0.02, 1.55, and 0.02.

  • Inaction inertia effect. Forgoing an offer that is less appealing, but still desirable, than a previous offer. For example, if you missed the opportunity to attend a skiing trip that was offered at £40 rather than the usual £100​, you would reject the offer of going to the same ski trip when offered for £80 rather than the usual £100.

    • Status: mixed (but mostly replicable).
    • Original paper: ‘Inaction inertia: Foregoing future benefits as a result of an initial failure to act’, Tykocinski et al. 1995; between-subject design, Experiment 1: n = 108, Experiment 2: n = 120, Experiment 3: n = 135, Experiment 4: n = 76, Experiment 5: n = 61, Experiment 6: n = 165. [citations = 238 (GS, February 2023)]​.
    • Critiques: Chen et al. 2021 [Experiment 1: n = 43 (between-subjects), Experiment 2: n = 309 (between-subjects), Experiment 3: n = 1,203 (mixed-design), Between-subjects: n = 603, Within-subjects: n = 600, Meta-Analysis: n = 1,555, k = 4 (mini analysis of own studies), citations = 10 (GS, February 2023)]. Zeelenberg et al. 2006 [Experiment 1: n = 80 (between-subjects), Experiment 3: n = 80 (between-subjects), Experiment 4: n = 120 (between-subjects), Experiment 5: n = 159 (between-subjects), citations = 96 (GS, February 2023)].
    • Original effect size: Experiment 1 (Ski resort): ωp2 = .08 [.01, .19] (estimated from test-statistic: F(2, 105) = 5.92, p = .004); Experiment 2: Car: ωp2 = .05 [.00, .14] (estimated from test-statistic: F(2, 117) = 4.12, p = .019), Frequent flyer: ωp2 = .07 [.00, .16] (estimated from test-statistic: F(2, 117) = 5.46, p = .005), Fitness centre: ωp2 = .05 [.00, .13] (estimated from test-statistic: F(2, 117) = 3.92, p = .022); Experiment 3 (Ski resort): d = 0.59 [0.24, 0.94] (estimated from test-statistic: F(1, 131) = 11.28, p < .001); Experiment 4 (Fitness centre): d = 0.44 [-0.02, 0.90] (reported as significant in the paper, but estimated from test-statistic: F(1, 75) = 3.68, p = .059); Experiment 5 (Betting): Betting amount: d = 0.73 [0.19, 1.27] (estimated from test-statistic: F(1, 56) = 7.54, p = .008), Likelihood of placing bet: d = 0.61 [0.07, 1.14] (estimated from test-statistic: F(1, 56) = 5.20, p = .026); Experiment 6 (Frequent flyer): d = 0.39 [0.08, 0.71] (estimated from test-statistic: F(1, 159) = 6.18, p = .014)
    • Replication effect size: Chen et al.: Experiment 1: Ski resort: η2 = .02 [.00, .12] (not replicated), Car: η2 = .38 [.13, .54] (replicated), Frequent flyer: η2 = .10 [.00, .26] (not replicated), Fitness centre: η2 = .20 [.01, .38] (replicated); Experiment 2: Ski resort: η2 = .10 [.04, .16] (replicated), Car: η2 = .10 [.04, .16] (replicated), Frequent flyer: η2 = .02 [.00, .05] (not replicated), Fitness centre: η2 = .14 [.07, .21]; Experiment 3: Between-subjects: Ski resort: η2 = .06 [.02, .09] (replicated), Car: η2 = .05 [.02, .08] (replicated), Frequent flyer: η2 = .01 [.00, .03] (not replicated), Fitness centre: η2 = .14 [.09, .19] (replicated); Within-subjects (supplementary materials): Ski resort: η2 = .10 [.06, .14] (replicated), Car: η2 = .06 [.03, .10] (replicated), Frequent flyer:_ η2_ = .00 [.00, .01] (not replicated), Fitness centre:_ η2_ = .16 [.11, .21] (replicated); Meta-Analysis: Large-difference versus small-difference – _d_ = 0.49 [0.32, 0.67] (inaction inertia as described by Tykocinski et al., 1995). Zeelenberg et al.: Experiment 1: _d = _1.18 [0.69, 1.66] (estimated from test-statistic: _F(1, 76)_ = 26.50, _p_ < .001) (replicated); Experiment 3:_ d = _1.57 [1.05, 2.08] (estimated from test-statistic: _F(1, 76)_ = 46.59, _p_ < .001) (replicated); Experiment 4: _d = _0.63 [0.25, 1.00] (estimated from test-statistic: _F(1, 116)_ = 11.44, _p_ = .001) (replicated); Experiment 5: _d = _1.12 [0.78, 1.46] (estimated from test-statistic: _F(1, 151)_ = 47.48, _p_ < .001) (replicated).

  • Mere ownership effect. The mere ownership effect refers to an individual’s tendency to evaluate an object more favourably merely because he or she owns it.

    • Status: replicated
    • Original paper: ‘On the social nature of nonsocial perception: The mere ownership effect’, Beggan 1992; 3 experimental studies with study 3 not relevant here, study 1: n=43, study 2: n=59. [citations=1132(GS, June 2022)]​.
    • Critiques: Białek et al. 2022 [n=3024, citations=3(GS,June 2022)], pre-registered meta-analysis/preprint.
    • Original effect size: study 1: r=0.33 study 2: r=0.35.
    • Replication effect size: Białek et al.: g = 0.55 [0.43, 0.66].

  • Omission Bias (alternative terms: action-inaction effect). The tendency to view harmful actions as worse than inactions, despite the result being the same.

    • Status: mixed (but mostly replicated).
    • Original paper: ‘Omission and commission in judgment and choice’, Spranca et al. 1991; within-subjects design, Experiment 1: n = 38, Experiment 4: n = 48 [citations = 1,178 (GS, February 2023)]..
    • Critiques: Jamison et al. 2020 [n = 313; citations = 19 (GS, February 2023)]. Yeung et al. 2022 [meta-analysis, n = 1,999 participants, k = 21 studies, citations = 8 (GS, February 2023)].
    • Original effect size: (effect sizes generated using a formula in Rosenthal & DiMatteo (2001, p. 71) using the conversion from r to Cohen’s d) Experiment 1: Scenario 1: d = 0.63 (estimated from frequencies reported in report: 37 vs. 20, χ2(1, N = 57) = 5.07, p = .024), Scenario 2: d = 0.80 (estimated from frequencies reported in report: 39 vs. 18, χ2(1, N = 57) = 7.74, p = .005) (*For both scenarios the effect size was estimated from the reported frequencies, generating a chi-square value, converting this to Pearson’s r and then to Cohen’s d for comparison with other experiments); Experiment 4: d = 0.94 [0.33, 1.55] (estimated from reported ANOVA: F(1,46) = 10.2, p = .003).
    • Replication effect size: Jamison et al.: Scenario 1: d= 0.45 (replicated), Scenario 2: d = 0.47 (replicated). Yeung et al.: The overall effect size was g = 0.45 [0.14, 0.77]; Measure Used: Morality (k= 14): g = 0.45 [0.14, 0.77] (replicated); Blame (k = 7): g= 0.32 [0.01, 0.64] (replicated); Decision (k = 4): g= 0.30 [-0.62, 1.21] (not replicated).

  • Identifiable victim effect. Refers to the phenomenon that people are more likely to offer greater help to specific, identifiable victims than to anonymous victims.

    • Status: not replicated
    • Original paper: ‘Sympathy and callousness: The impact of deliberative thought on donations to identifiable and statistical victims’, Small et al. 2007; field experiments, N = 280 split into Study 1: n=121 and Study 3 n=159. [citations=1126 (GS, June, 2022]​.
    • Critiques: Lee and Feeley 2016 [meta-analysis, k=41, citations=131 (GS, April 2023)]​.
    • Original effect size: ηp 2 = 0.06 (Study 1) to ηp 2 = 0.07 (Study 3)​.
    • Replication effect size: Lee and Feeley: ηp 2 = 0.00 (Study 1), ηp 2 =0.01 (Study 3); much weaker and less robust than previously thought, with lots of mixed findings, failed replications, null findings and numerous boundary conditions, overall significant yet modest effect, r = .05; after adjusting for publication bias with robust Bayesian meta-analysis there is evidence against an effect in this meta-analysis.​

  • Psychophysical numbing. People prefer to save lives if they are a higher proportion of the total (e.g. do people prefer to save 4,500 lives out of 11,000 or 4,500 lives out of 250,000?).

    • Status: mixed
    • Original paper: ‘Insensitivity to the value of human life: A study of psychophysical numbing’, Fetherstonhaugh et al. 1997; 3 studies with within-subjects design, 2 of which are split into Part A and Part B with n’s = 1: 54; 196 ; 2: 162; Experiment 3: n=165. [citations = 468 (GS, December 2021)].
    • Critique: Ziano et al. 2021 [n=4799, citations = 0 (GS, December 2021)].
    • Original Study 1 effect size: _: ηp2 = _0.14
    • Replication effect size: Study 1a: ηp2 = 0.06, Study 1b: ηp2 = 0.21; Study 1c: ηp2 = 0.13.

  • Signing at the top and dishonesty. Signing a veracity statement at the top, instead of at the end, of a form/document encourages honest reporting.

    • Status: not replicated
    • Original paper: ‘Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end’, Shu et al.2012 (Retracted 09/2021); 3 experiments with Study 1: n = 101, Study 2: n = 60, Study 3: n = 13,488. [citations=482 (GS, June 2022)].
    • Critiques: Kristal et al. 2020 [ five conceptual replications, n = 4,559, and one highly powered, preregistered, direct replication, n = 1,235, citations=62 (GS, April 2023)]​. Data Colada post about evidence of fraud in the data for study 3 of original paper.
    • Original effect size: d = -1.05 (Study 1); d = -.53 (Study 2); d = -.20 (Study 3).
    • Replication effect size: Kristal et al.: no effect of signing first on honest reporting; d = .11 (Study 1); d = -.01 (Study 2); d = .05 (Study 3); d = -.05 (Study 4); d = .01 (Study 5); d = -.04 (Study 6).

  • Loss aversion. The subjective value of losses exceeds the subjective values of gains. This phenomenon can denote a stronger preference for avoiding losses rather than acquiring gains. Loss aversion is still mostly replicable but with weaker effects for some people and in some situations (see Mrkva et al., 2020).

    • Status: mixed
    • Original paper: ‘ Prospect theory: An analysis of decision under risk’ , Kahneman and Tversky, 1979; multiple experiments with n between 66 and 95 [citations=72861(GS, June 2022)]
    • Critiques: Brown et al. 2021 [n=607 estimates from 150 articles, citations=10 (GS, December 2021)]. Meta-analyses: Nieuwenstein et al. 2020 [total n = 399, citations=109(GS, April 2023)]. Walasek et al. 2018 [k=19 studies, citations=11 (GS, December 2021)].
    • Original effect size: λ = 2.25 (reported in Walasek et al. 2018).
    • Replication effect size: Brown et al.: λ = 1.955 [1.824, 2.104]. Walasek et al.: λ = 1.31 [1.10, 1.53]. All reported in Nieuwenstein et al.: Abadie et al.: g=-0.62 to g=0.22. Acker: g =-0.47. Aczel et al.: g =-0.35. Ashby et al.: g =-0.21 to g =1.48. Bos et al.: g =-0.10. Bos et al.: g =1.48. Calvillo & Penaloza: g =-0.28 to g =-0.09. Dijksterhuis: g =0.24 to g =0.46. Dijksterhuis et al.: g =0.70 to g =0.86. González Vallejo et al.: g =0.00. Hasford: g =0.43. Hess et al.: g =-0.14. Huizenga et al.: g =-0.26 to g =-0.50. Lassiter et al.: g =0.27 to g =0.51. Lerouge: g =0.38 to g =0.47. McMahon et al.: g =0.62 to g =0.67. Messner et al.: g =0.63. Newell et al.: g =-0.50 to g =0.17. Newell and Rakow: g =-0.37 to g =0.31. Nieuwenstein and Van Rijn: g =-0.74 to g =0.87. Nieuwenstein et al.: g =-0.01. Nordgren et al.: g =0.27 to g =0.36. Payne et al.: g =-0.10. Queen&Hess: g =-0.21. Rey et al.: g =0.27. Smith et al.: g =0.25 to g =0.32. Strick et al.: g =0.58 to g =1.21. Thorsteinson & Withrow: g =0.18 to g =0.34. Usher et al.: g =0.78 to g =1.04. Waroquier et al.: g =-0.56 to g =0.35.

  • Effort heuristic. People judge products that took longer time to complete as higher in quality and monetary value.

    • Status: mixed
    • Original paper: ‘The effort heuristic’, Kruger et al. 2004; Study 1: Between-subject design, n = 144, Study 2: Mixed design, n = 66. [Citations = 404 (GS, October 2022)].
    • Critiques: Ziano et al. 2022 [total N = 1405, citations=0 (GS, April 2023)].
    • Original effect sizes:Study 1: d = 0.34 [0.00, 0.68] (liking/quality), d = 0.33 [-0.02, 0.67] (monetary value); Study 2: _ηp2 = _0.09 [0.01, 0.21] (liking/quality), _ηp2 = _0.15 [0.03, 0.28] (calculated by Ziano et al. 2022).
    • Replication effect sizes: Ziano et al.: Study 1 MTurk: d = -0.05 [-0.21, 0.11] (liking/quality), d = 0.02 [-0.14, 0.18] (monetary value); Study 1 Prolific: d = 0.23 [0.08, 0.38] (liking/quality), d = 0.08 [-0.07, 0.22] (monetary value); Study 2: ηp2= 0.02 [0.00, 0.04] (liking/quality), ηp2= 0.04 [0.02, 0.07] (monetary value).

  • Unconscious thought advantage (“deliberation-without-attention”). The idea that for complex choices (with more features to take into account), not deliberating leads to better decisions (as defined by the research team, i.e., normatively).​

    • Status: not replicated
    • Original paper: ‘On making the right choice’, Dijksterhuis, 2006; two experiments and two quasi-experiments, n = 80, 59, 93, 115. [citations = 605, WoK (October 2021)].
    • Critiques: Nieuwenstein and van Rijn 2012 [n = 48, 24, 32, 24, citations = 12 (WoK, October 2021)]. Nieuwenstein et al. 2015 [meta-analysis, k=61 studies, n = 40-399, replication study, n = 423, citations = 49 (WoK, October 2021)]. See also González-Vallejo et al. 2008 for a theoretical critique [n=NA, citations = 51 (WoK, October 2021)].
    • Original effect size: Experiment 1: g= .86; Experiment 2: g= .70 (as per Nieuwenstein et al. 2015).
    • Replication effect size: Nieuwenstein & van Rijn: g= 0.10, g= -0.55, g= 0.87, g= -0.74. Nieuwenstein et al.: g= -0.01, after trim-and-fill, meta-analysis pooled Hedges’ g = 0.018 [−0.10, 0.14]. ​

  • Self-interest is overestimated. How much personal benefits affect policy preferences and behaviours.

    • Status: replicated
    • Original paper: ‘The Disparity Between the Actual and Assumed Power of Self-Interest**’, **Miller and Ratner 1998; _n_s around 50 for 2- and 4-cell experiments across multiple studies (very underpowered). [citations = 552(GS, April 2023)].
    • Critiques: Studies 1 and 4 were run and successfully replicated in Brick et al. 2021 [two samples, UK and US, n = 800 each, citations = 4(GS, April 2023)].
    • Original effect size: Effect sizes cannot be calculated as no variance was provided, but the effects looked large.
    • Replication effect size: Brick et al.: Overestimation of the importance of payment for blood donation in Study 1, d = 0.59 [0.51, 0.66], 0.57 [0.49, 0.64]; and of smoking status for smoking policy preferences in Study 4, d = 0.75 [0.59, 0.90], 0.84 [0.73, 0.96].

  • Marshmallow experiment (self-imposed delay of gratification). A child’s success in delaying the gratification of eating marshmallows or a similar treat is related to better outcomes in later life. ​Outcomes that have been studied include coping, social, and academic competence, substance use, borderline personality features, BMI, executive functioning, and neural activation patterns.

  • Differential reinforcement of alternate behaviour (DRA). DRA procedures reduce a certain behaviour by reinforcing an appropriate alternative behaviour that serves the same function.

    • Status: replicated
    • Original paper: ‘The alteration of behavior in a special classroom situation’, Zimmerman and Zimmerman 1962; case studies, N = 2. [citations= 372 (GS, March 2023)].
    • Critiques: Allen and Harris 1965 [N = 2, citations= 307 (GS, March 2023)]. Fisher et al. 1993 [N = 4, citations= 375 (GS, March 2023)]. Hagopian et al.1998 [N = 21, citations= 475 (GS, March 2023)]. Hall et al. 1968 [N = 6, citations= 993 (GS, March 2023)].
    • Original effect size: NA.
    • Replication effect size: Allen and Harris = NA. Hall et al.= NA. Fisher et al.= NA. Hagopian et al.= NA.

  • Differential reinforcement of incompatible behaviour (DRI). DRI reinforces a physically incompatible behaviour to replace the unwanted behaviour.

  • Differential reinforcement of low rates of behaviour (DRL). DRL is a technique in which a positive reinforcer is delivered at the end of a specific interval if a target behaviour has occurred at a criterion rate.

  • Extinction bursts. Extinction is an intervention procedure to reduce tantrum behaviours, by removing enforcement (eg. ignoring a child crying), and an extinction burst is a temporary increase in the frequency or intensity of that behavior.

    • Status: mixed
    • Original paper: ‘The elimination of tantrum behavior by extinction procedures’, Williams 1959; single-case experimental design, specifically a multiple-baseline design across behaviours (case report), n=1. [citation=519(GS, March 2023)]​.
    • Critiques: Arin et. al. 1966 [n=16, citation =778(GS, March 2023)]. Katz and Lattal 2020 [n1=9, n2=20, n3=20, citation=13(GS, March 2023)]. Lerman and Iwata 1996 [meta-analysis of 113 sets of extinction data, citation=266(GS, March 2023)]. Lerman et. al. 1999 [case report: n=1, citation=49(GS, March 2023)].
    • Original effect size: NA, case report.
    • Replication effect size: Arin et al.: Pigeons exhibited aggression towards nearby pigeons or models after being conditioned to peck a response key. This aggression was caused by the transition from food reinforcement to extinction. Various factors influenced the duration and frequency of attack. Katz and Lattal: Response increases relative to baseline during the first 20 min of a 324.75-min extinction session (Experiment 1) or during the first 30-min extinction session (Experiments 2 and 3) were rare and unsystematic. The results reinforce earlier meta-analyses concluding that extinction bursts may be a less ubiquitous early effect of extinction than has been suggested. Lerman and Iwata: Reported an initial increase in the frequency of the target response in 24% of the cases when extinction was implemented. Lerman et al.: Pattern of behaviour is consistent with what has been observed in studies of extinction bursts, where an initial increase in the targeted behaviour is often observed following the introduction of an extinction procedure.

  • Above-Average Effect (Better-Than-Average Effect). People have the tendency to perceive themselves as superior in comparison to the average peer.

    • Status: replicated
    • Original paper: ‘Global self-evaluation as determined by the desirability and controllability of trait adjectives’, Alicke 1985; within-subject design, n=164. [citations = 1589(GS, January 2023)].
    • Critiques: Meta-analysis: Zell et al. 2020 [n=124 published articles, 291 independent samples, and more than 950,000 participants, citations = 84 (GS, February 2022)]. Replication and extension by Ziano et al. 2021 [n = 1, 573, citations = 14(GS, January 2023)]. Korbmacher et al. 2022 [n=756, citations = 0 (GS, February 2022)].
    • Original effect size: For the trait desirability effect, ηp2 = .78 [.73, .81]; for the effect of desirability being stronger for more controllable traits, ηp2 = .21 [.12, .28].
    • Replication effect size: Zell et al.: dz = 0.78 [0.71, 0.84]. Ziano et al.: For the trait desirability effect, sr2 = .54 [.43, .65]; for the effect of desirability being stronger for more controllable traits, sr2 = .07 [.02, .12]. Korbmacher et al.: Own ability & comparative ability r= .99, Domain difficulty and comparative ability r= -.85; Easy domains: from d = 0.54 to d = 1.18, Difficult domains: from d =0.11 (non-sig) to d = -0.65.

  • Below-Average Effect. The tendency of a person to underestimate their intellectual or social abilities when comparing to other people.

    • Status: mixed
    • Original paper: ‘Lake Wobegan be gone! The “below-average effect” and the egocentric nature of comparative ability judgements’’, Kruger 1999; in studies 1 and 2 participants compared themselves with peers on ability domains, in study 3 cognitive load was added as a condition, study 1: N=37, study 2: N=104, study 3: N=49. [citations=1258 (GS, May 2023)].
    • Critiques: Eriksson and Funke 2015 [study 1: n=800, study 2: 193, citations=25 (GS, May 2023)]. Windschitl et al. 2002 [study 1: N=40, study 2: N=40, study 3: N=40, study 4: N=87, study 5: N=82, study 6: N=90, experiment 7: N=206, citations= 36 (GS, May 2023)]. Korbmacher et al. 2022 [n=756, citations = 0 (GS, February 2022)].
    • Original effect size: Study 1: participants thought they were above average (i.e., above the 50th percentile) in the easy ability domains, but below average in difficult domain (all p’s < .01); Easy domain percentile estimates - Using mouse = 58.8, Driving = 65.4, Riding bicycle = 64, Saving money = 61.5; Difficult domain - Telling jokes = 46.4 (n.s.), Playing chess = 27.8, Juggling = 26.5, Computer programming = 24.8; Study 2: beta coefficient= .90; Study 3: participants thought that they were above average in terms of the easy abilities (M percentile = 78.4), and below average in terms of the difficult abilities (M percentile = 23.1).
    • Replication effect size: Eriksson and Funke: Study 1: ES not reported, but comparison of above-ingroup measures with zero levels show that Democrats exhibited a statistically significant below-average effect on warmth and a null effect on competence; Republicans, on the other hand, exhibited a significant above-average effect on warmth and a null effect on competence; Study 2: ES not reported, but Democrats exhibited a statistically significant below-average effect on warmth (but no significant effect on competence); Republicans, on the other hand, exhibited significant below-average effect on competence (but no significant effect on warmth). Windschitl et al.: not reported.​ Korbmacher et al.: Own ability & comparative ability r = .99, Domain difficulty and comparative ability r= -.85; Easy domains: from d = 0.54 to d = 1.18, Difficult domains: from d =0.11 (non-sig) to d = -0.65.

  • Overconfidence. (“unskilled and unaware of it” effect, overplacement, overprecision, calibration of subjective probabilities, realism of confidence). It is the overestimation of one’s actual ability, performance, level of control, or chance of success in any given situation.

    • Status: mixed
    • Original paper: ‘Do Those Who Know More Also Know More about How Much They Know?”, Lichtenstein & Fischhoff 1977; five experiments with various designs, Experiment la: n= 92, Experiment 1b: n= 63, Experiment 2: n= 57, Experiment 3: n= 120, Experiment 4: n= 50, Experiment 5: n= 93. [citations=1548 (GS, March 2023)]​.
    • Critiques: Gigerenzer et al. 1991 [n=2081, citations=2 (GS, March 2023)]. Dawes & Mulford 1996 [n=145, citations=383 (GS, March 2023)]. Klayman et al. 1999 [Experiment 1: n=32, Experiment 2: n= 54, Experiment 3: n=32, citations=942 (GS, March 2023)]. Olsson 2014 [n=NA(review study), citations=74(GS, March 2023)].
    • Original effect size: ​proportions over/-underconfidence: Experiment 1a – +.15; Experiment 1b – +.18, Experiment 2 – training (+.07), no training (+.14); Experiment 3 – best subjects (+.05), middle subjects (+.07), worst subjects (+.15), best subjects-easy items (-.05), best subjects-hard items (+.14), middle subjects-easy items (-.05), middle subjects-hard items (+.19), worst subjects-easy items (+.03), worst subjects-hard items (+.25); Experiment 4 - best subjects-easy items (-.06), best subjects-hard items (+.05), worst subjects-easy items (-.03), worst subjects-hard items (+.17); Experiment 5 – Easy test (-.02), Hard test (+.12).​.
    • Replication effect size: Gigerenzer et al.: Experiment 1 – correct was 52.9, mean confidence was 66.7, and overconfidence was 13.8; Experiment 2 – In the selected set, mean confidence was 71.6%, and percentage correct was 56.2, overconfidence 15.4; in the representative set, overconfidence largely disappeared (2.8%), mean confidence was 78.1% and percentage correct was 75.3%. Dawes and Mulford: ES not reported; regression effects account for the evidence cited in support of overconfidence. Klayman et al.: Experiment 1 – overconfidence between -.073 and +.130 on forty questions, mean overconfidence +.046; Experiment 2 – July temperatures +.150, State poverty levels -.014, Mountain heights -.115, State populations +.023, Shampoo prices -.025, Presidential sequence +.008; Experiment 3 – With confidence-range questions, overconfidence is large, on the order of 45%; differences between domains and between individuals are strong as well. Olson: ES not reported; methodological and statistical artefacts can explain many of the observed instances of apparent overconfidence.

  • Better-than-average effect. People tend to rate themselves as better than average on desirable traits and skills.​

    • Status: replicated
    • Original paper: ‘Are we all less risky and more skillful than our fellow drivers?’, Svenson 1981; participants in a US (n = 81) and a Swedish (n = 80) sample rated their driving safety or driving skill compared to others. [citations = 2699 (GS, June 2022)]​.
    • Critiques: Koppel et al. 2021 [n = 1,203, citations = 0 (GS, June 2022)]. Meta-analysis: Zell et al. 2020 [n = 965,307, citations = 100 (GS, June 2022)]​.
    • Original effect size: Hedges’s _g _= 0.41 to Hedges’s g = 1.25 (calculated from statistics reported in original paper).
    • Replication effect size: Koppel et al.: Hedges’s g = 1.18 to Hedges’s g = 1.70. Zell et al.: robust across studies (dz = 0.78 [0.71, 0.84]), with little evidence of publication bias.

  • Accuracy of information (truth discernment). Asking people to think about the accuracy of a single headline improves “truth discernment” of intentions to share news headlines about COVID-19.​

    • Status: mixed
    • Original paper: ‘Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention’, Pennycook et al. 2021; 2 survey studies with Study 1: n = 853, Study 2: n = 856 [citations=887(GS, March 2022)]​.
    • Critiques: Roozenbeek et al. 2021 [n=1583, citations=22(GS, March 2022)].
    • Original effect size: Study 1: d = 0.657 [0.477, 0.836] on accuracy judgement; d = 0.121 [0.030, 0.212] on sharing intention; Study 2: control condition: d = 0.050 [−0.033, 0.133]; treatment condition: d = 0.142 [0.049, 0.235].
    • Replication effect size: Roozenbeek et al. : Study 1: F = 1.53; Study 2: treatment: d = −0.14 [−0.17, −0.12], control: d = −0.10 [−0.13, −0.078].

  • Resultant moral luck. The phenomenon of moral judgments being influenced by factors beyond the agent’s control that affect the outcome of their actions. Kneer and Machery (2019) claim that there is no evidence for resultant moral luck and that the puzzle of moral luck is not a genuine problem.​

    • Status: NA
    • Original paper: ‘No luck for moral luck’, Kneer (2019); two experiments conducted, between-subjects (1a) and within-subjects (1b), 1a: n=196 and 1b: n=95. [citations=76(GS, January 2023)]​.
    • Critiques: Laves 2020 [theoretical/review paper, n=NA, citations=12(GS, January 2023)].
    • Original effect size: Between-subjects Design (1a): Wrongness: d=0.44 [0.16, 0.72], Blame: d=0.39 [0.17, 0.58], Permissibility: d=0.26 [-0.02, 0.55], Punishment: d=0.79 [0.50, 1.08]; Within-subjects Design (1b): Wrongness: d=0.16 [0.004, 0.27], Blame: d=0.24 [0.09, 0.38], Permissibility: d=0.06 [-0.02, 0.14], Punishment: d=0.47 [0.30, 0.64]; Within paper replications.
    • Replication effect size: Laves: NA; Laves argues that Kneer and Machery’s experiments do not dissolve the puzzle of moral luck, but rather show that people have inconsistent intuitions about moral luck depending on the context and framing of the scenarios; he also questions the validity and reliability of the measures used by Kneer and Machery, and suggests that their results are influenced by confounding factors such as moral emotions, causal responsibility, and moral principles. No replication studies conducted as of January 2023. ​

  • Incidental disgust (Amplification hypothesis). Irrelevant feelings of disgust can amplify the severity of moral condemnation.

    • Status: mixed
    • Original paper: ‘Hypnotic disgust makes moral judgments more severe’, Wheatley and Haidt 2005; two between-subject experiments, Study1 n=64, Study 2 n=94. [citations=1437(GS, February 2023)].
    • Critiques: Chapman et al. 2009 [n=87 across three experiments, citations=919(GS, February 2023)]. Eskine et al. 2011 [n=57 (with 3 dropped for guessing the hypothesis), citations=600(GS, February 2023)]. Ghelfi et al. 2020 [n=1137 (across 11 studies), citations=32(GS, February 2023)]. Meta-analysis Landy and Goodwin 2015 [n=5,102 across 51 studies, citations=391(GS, February 2023)].
    • Original effect size: Study 1 – participants rated vignettes as being more morally wrong when the hypnotic disgust word was present than when the word was absent r=34 [d=0.53, reported in Landy and Goodwin 2015]; Study 2 - r =.36 [d=0.25, reported in Landy and Goodwin 2015]​.
    • Replication effect size: Eskine et al.: Regression slope coefficient of physical disgust measure on moral judgement = 0.525, t(52) = 4.445, p < .001. Ghelfi et al.: Linear mixed effects regression slope coefficient of standardised disgust ratings on standardised moral-wrongness judgments = 0.07, p = .014.​ Landy and Goodwin: effects of disgust induction in different sensory modalities on moral judgements - d=-0.38 to d=1.44, weighted mean of the effect sizes across 51 studies d=0.11. Chapman et al.: Increasing disgust with increasing offer unfairness [_ηp_²=0.324, calculated from the reported F(1,135) = 64.8, p < 0.001, using this conversion].

  • Single-exposure musical conditioning. An important study, which employed classical conditioning theory, proposed that a person’s preference for a product can be influenced by the type of music they hear while being exposed to it. A follow-up experiment differentiated between scenarios to see whether classical conditioning or information processing might be a better explanation for product preference.

    • Status: mixed
    • Original paper: ‘The Effects of Music in Advertising on Choice Behavior: A Classical Conditioning Approach’, Gorn 1982; Experiment 1, n = 195, Experiment 2 = 122. [citations = 1592 (GS, January 2023)]​.
    • Critiques: Vermeulen and Beukeboom 2015 [Experiment 1: n = 182, Experiment 2 = 224, Experiment 3: n = 127, citations = 42 (GS, January 2023)]. Reported here without considering participants that were excluded due to deviant musical taste.
    • Original Effect Size: Considering participants that were excluded due to deviant musical taste, the log OR was 2.67 from Gorn’s original analysis.
    • Replication Effect Size: Vermeulen and Beukeboom: Both experiment results reported are based on the exclusion of participants due to their deviant musical taste. For experiment 1, a chi-square test showed a significant effect of music on choice (χ2(N = 132; 1) = 4.93, p = .026, 𝜙 = .19, log OR = .79 [.09, 1.48]. In the same line as the effect for the full sample, the effect is reliably smaller than the 𝜙 = .49, log OR = 2.67 from Gorn’s original paper. Concerning experiment 2, a significant effect of music on choice was found (χ2 (1) = 4.57, p = .033, 𝜙 = .17, log OR = .70 [.06, 1.35]. However, the obtained ES was also reliably smaller than the ES stated by Gorn (log OR D 2.67).

  • Rational Expectations. The extent to which participants in an experiment choose the action with the highest expected payoff based on their private signal and the choices and outcomes of previous participants.

    • Status: mixed
    • Original paper: ‘Do We Follow Others When We Should? A Simple Test of Rational Expectations’, Weizsacher 2010; a laboratory experiment with 24 sessions and 12 participants per session, n=288. [citations=153(GS, January 2023)].
    • Critiques: Ziegelmeyer et al. 2013 [n= 30,683 decisions made by 2,948 participants in 13 information cascade experiment, citations=18(GS, March 2023)].​
    • Original effect size: When the expected payoffs from contradicting one’s signal is higher than 1/2 (where it is empirically optimal for the participants to contradict their own private information), the respondents choose this action (among two alternatives) in less than half of the cases; the frequency of the optimal choice is 0.44; when the expected payoffs is lower than 1/2, the participants follow the signal nine out of ten times; they make the optimal choice with a frequency of 0.91.
    • Replication effect size: Ziegelmeyer et al.: Where the empirical payoff for contradicting one’s own signal is greater than 1/2, the relative frequency of optimal choice is 0.60 (partly replicated, participants are moderately successful in learning from others); where the empirical payoff for contradicting one’s own signal is lower than 1/2, the optimal choice occurs with a relative frequency of 0.92 (replicated).

  • Decreased sense of free will reduces personal responsibility. Vohs and Schooler (2008) asked participants to read an article either debunking free will or a control passage, and found that those reading the former cheated more on an experimental task. It was suggested that the decreased sense of free will as a result of reading the text reduced perceptions of personal responsibility.

    • Status: not replicated.
    • Original paper: ‘The value of believing in free will: Encouraging a belief in determinism increases cheating’, Vohs and Schooler 2008; experimental design, n1=30, n2=122. [Citations=1044(GS, January 2023)].
    • Critiques: Buttrick et al. 2020 also found [n = 621, citations= 11 (GS, January 2023)]. The Open Science Collaboration Embley et al. 2015 [n = 58, citations=5(GS, January 2023)].
    • Original effect size: d=.88.
    • Replication effect size: Buttrick et al.: no differences in cheating behaviour using a more rigorous measurement approach, d = 0.076 [−0.082, 0.22]. Embley et al.: no differences in cheating behaviour between the two experimental conditions, d = 0.20 [−0.33, 0.74], p = .44.

  • Unrealistic optimism (Optimism bias). The tendency to overestimate the likelihood of experiencing positive outcomes and underestimating negative ones.​

    • Status: mixed (largely replicated, but contextual factors influence size).
    • Original paper: ‘Unrealistic optimism about future life events’, Weinstein 1980; within-subjects design, two experiments conducted but Experiment 1 is most relevant, n_ _= 258. [citations = 7,521 (GS, January 2023)]​.
    • Critiques: Klein and Helweg-Larsen 2002 [meta-analysis, n = 5,142 participants, k = 27 studies, citations = 546 (GS, January 2023)]. Maksim et al. 2022 [preprint, Experiment 1: n = 105, Experiment 2: n = 71, citations = 0 (GS, January 2023)]. Shepperd et al. 1996 [Experiment 1: n = 83 (mixed-design; 31 sophomores, 22 juniors and 29 seniors), Experiment 2: n = 144 (mixed design, but only looking at within-subject’s comparisons), citations = 491 (GS, January 2023)].
    • Original effect size: Overestimating positive outcomes: d = 0.43 [0.30, 0.55] (estimated from test-statistic: t(255) = 6.8, p < .001); Underestimating negative outcomes: d = 0.87 [0.73, 1.01] (estimated from test-statistic: t(255) = 13.9, p < .001).
    • Replication effect size: Klein and Helweg-Larsen: Overall effect size: d = 0.64 [0.60, 0.68] (replicated); Some of the moderators reported found that: Larger in student samples (d = 0.94 [0.87, 1.02]) than non-students (d = 0.50 [0.45, 0.55]), The US showed larger effects (d = 1.23 [1.16, 1.13]) than elsewhere (d = 0.36 [0.31, 0.41]). Maksim et al.: Experiment 1: d = 0.34 (replicated), Experiment 2: d = 0.37 (replicated). Shepperd et al.: Experiment 1: Students asked to predict their likely starting salary post-graduation; Beginning of semester: Sophomores: d = 1.26 [0.78, 1.73] (estimated from test-statistic: t(30) = 6.91, p < .001) (replicated), Juniors: d= 1.20 [0.63, 1.75] (estimated from test-statistic: t(21) = 5.50, p < .001) (replicated), Seniors: d = 0.53 [0.13, 0.93] (estimated from test-statistic: t(28) = 2.83, p = .004) (replicated); Two weeks prior to graduation: Sophomores: d = 1.02 [0.57, 1.45] (estimated from test-statistic: t(30) = 5.57, p < .001) (replicated), Juniors: d= 0.86 [0.35, 1.35] (estimated from test-statistic: t(21) = 3.95, p < .001) (replicated), Seniors: d = 0.28 [-0.10, 0.65] (estimated from test-statistic: t(28) = 1.48, p = .075) (not replicated); Experiment 2: Students asked to predict their exam score; Students overestimated their test score 1-month prior to the examination: d = 0.63 (estimated from test-statistic: t(88) = 5.91, p < .001) (replicated), Students underestimated their test score 3 seconds before scores were released: d = 0.16 [-0.01, 0.32] (estimated from test-statistic: t(145) = 1.87, p = .016 (one-tailed) (opposite direction).

  • Miles per gallon illusion (MPG illusion, kilometres per litres illusion)**. People misperceive how much fuel and money will be saved by, because they assume fuel use increases linearly with MPG, whereas in reality, increasing by a few MPG saves much more gas at low levels of MPG (e.g., 12 to 14 MPG) compared to high levels (e.g., 30 to 32 MPG).

    • Status: replicated
    • Original paper: ‘The MPG Illusion’, Larrick and Soll 2008; experimental design, n=171 (for the experimental manipulation of GPM frame vs. MPG frame and choice between Program A and Program B). [citations=465(GS, January 2023)].
    • Critique: Murata 2016 [n=66, citations = 0(GS, January 2023)].
    • Original effect size: OR = 5.27.
    • Replication effect size: Murata: OR = 2.09 (for the experimental manipulation scenario–choice between Program A and Program B); the other studies replicated even better than this one, but are more difficult to convert to effect sizes.

  • Certainty effect. The tendency to overweight the importance of an increase from 99% to 100% probability that some prospect/event will occur.

    • Status: replicated
    • Original paper: ‘Prospect Theory: An Analysis of Decision Under Risk’, Kahneman and Tversky 1979; between-subject manipulation of choice problems, n=66. [citations = 75,571 (GS, January 2023)].
    • Critique: Ruggeri et al. 2020 [n=4,098, citations = 135 (GS, January 2023)].
    • Original effect size: Log OR = -1.36 (for the 1st certainty effect demonstration; item 1 vs. 2 contrast).
    • Replication effect size: Ruggeri et al.: Log OR = -1.72 [-1.58, -1.97] (pooled across many samples from different countries). The other certainty effect contrasts also replicated successfully.

  • Overweighting small probabilities. People tend to overweight/overreact to changes in probability from 0 to very small probabilities. (In other words, whereas classical economic theory would suggest changing from 0% to 1% chance should have the same impact as a change from a 20% to 21% chance, people respond much more strongly to the former change.

    • Status: replicated
    • Original paper: ‘Prospect Theory: An Analysis of Decision Under Risk’, Kahneman and Tversky 1979; between-subject manipulation of choice problems, n=66. [citations = 75,571 (GS, January 2023)].
    • Critique: Ruggeri et al. 2020 [n=4,098, citations = 135 (GS, January 2023)].
    • Original effect size: Log OR= -1.23.
    • Replication effect size: Ruggeri et al.: Log OR= -2.78 [-2.54, -2.62] (3 out of the 4 demonstrations of this phenomenon were successfully replicated in Ruggeri et al., while 1 was not).

  • Positive affect increases patience. Watching a positive affect-inducing video will increase patience in an intertemporal choice task.

  • Slow to anger, fast to forgive. Adding noise/uncertainty to the reason for a person’s action leads people to be more lenient (slow to anger, quick to forgive). A secondary finding is that under these circumstances with noise/uncertainty, cooperative strategies also lead to higher payoffs (in situations with cooperative equilibria).

    • Status: replicated
    • Original paper: ‘Slow to anger, fast to forgive’, Fudenberg et al.2012; experimental play of the repeated prisoner’s dilemma, n=384 [citations=372(GS, January 2023)].
    • Critique: Camerer et al. 2016 [n=128, citations=1,206(GS, April 2023)].
    • Original effect size: b = -.627.
    • Replication effect size: Camerer et al.: b = -.605.

  • Isolation effect (Von Restorff effect). The isolation effect occurs when people focus on differences between options rather than similarities. We not only remember the differences between two stimuli, but we also tend to give it greater weighting. For example, we notice the one single yellow that stands out in a batch of red apples.

  • Magnitude effect (magnitude perception). People are sensitive to relative as well as absolute magnitude. Most people find the difference between $100 and $200 more meaningful than the difference between $1,100 and $1,200; the marginal value of the outcome generally scales with magnitude.​

  • Reflection effect. People tend to be risk seeking when maximising gains, but risk averse when minimising losses.​ The preference between negative prospects is the mirror image of the preference between positive prospects – the reflection of prospects around 0 reveres the preference order.

    • Status: mixed
    • Original paper: ‘Prospect Theory: An analysis of decisions under risk’ Kahneman and Tversky 1979; between-subject manipulation of choice problems, Problem 3 n = 95, Problem 4 n = 95, Problem 5 n = 72, Problem 6 n = 72, Problem 7 n = 66, Problem 8 n = 66, Problem 9 n = 95, Problem 10 n = 141. [citations=76,664(GS, March 2023)]​.
    • Critiques: Ruggeri et al. 2020 [multinational replication study n=4,098, citations=143(GS, March 2023)].​
    • Original effect size: Different choice problems comparisons log OR = -3.772 to log OR = 0.949 (reported in Ruggeri et al. 2020, data available at https://osf.io/esxc4/).
    • Replication effect size: Ruggeri et al.: log OR = -3.332 to log OR = 0.026; four out of five comparisons are replicated.

  • Unusual disease problem (Asian disease problem). When survival is communicated (positive framing), people tend to choose an option with a certain outcome (risk averse decision). In contrast, when mortality is communicated (negative framing), people tend to choose an option with an uncertain outcome (risk-seeking decision).​

    • Status: replicated
    • Original paper: ‘The Framing of Decisions and the Psychology of Choice’, Tversky & Kahneman 1981; correlational, two “Asian disease problem” situations N1=152, N2=155. [citations=25,330(GS, February 2023)].
    • Critiques: Diederich et al. 2018 [N = 43, citations=31(GS, February 2023)]. Meta-analysis: Kühberger 1998 [n≈30,000 respondents over 136 empirical papers, citations=1607(GS, February 2023)]. Otterbring et al. 2021 [Study 1 N = 200, Study 2 N=800, citations=27(GS, February 2023)]. Peterson and Tollefson 2023 [N = 1,209, citations=0(GS, February 2023)].​
    • Original effect size: d=1. 16 [reported in Kühberger 1998].
    • Replication effect size: Diederich et al.: Gain versus Loss effect on Risky option preference significant in two regression models, β=-0.206 and β=-0.303, respectively. (replicated). Kühberger: mean effect size for the 80 studies with Asian d=0.57 [0.53, 0.61] (replicated). Otterbring et al.: Study 1 - statistically significant effect of framing on participants’ choice of program (b = 1.52, Z = 4.80, p < 0.001), such that a larger proportion of participants chose the risky program under conditions of negative (78.0%) compared to positive framing (44.0%); Study 2 - statistically significant effect of framing on participants’ choice of program (b = 1.82, Z = 11.13, p < 0.001), such that a larger proportion of participants chose the risky program under conditions of negative (68.8%) compared to positive framing (26.6%) (replicated). Peterson and Tollefson: d =0.26 [calculated from the reported chi-square and sample size, χ2 =17.41, N = 1021] (replicated).

  • Last place aversion. A phenomenon where individuals are averse to being in last place and choose gambles with the potential to move them out of last place that they reject when randomly placed in other parts of the distribution.

    • Status: replicated
    • Original paper: ‘Last-Place Aversion”: Evidence and Redistributive Implications’, Kuziemko et. al. 2014; laboratory experiment, N=84. [citations=358(GS, March 2023)]​.
    • Critiques: Bull 2020 [Study 1: n=1144, Study 2-4: n=1203, citations= 30 (GS, March 2023)]
    • Original effect size: Paper does not provide enough information to convert to effect size. Lottery experiment (Appendix Table 2): Relevant coefficient: “Last or fifth place”, Coefficient value: 0.448, P-value: < 0.01.
    • Replication effect size: Bull: Study 1: Observational analysis of customers queuing at a grocery store, where the author recorded the queue positions, wait times, and switching and abandonment behaviours of 1,144 customers. The results showed that being last in line doubled the probability of switching queues and quadrupled the chances of leaving the line altogether. The last-place indicator has a coefficient of 1.255 with a p-value < 0.05. This suggests that, holding all else constant, customers were 3.5 times more likely to switch queues when they were in the last place compared to having a single person waiting in line behind them. Studies 2-4: All were online experiments participants waited in a virtual queue for a chance to win a gift card. Show that being in last place increased 1)reduced wait satisfaction, increased abandonment rates 2) increased perceived value of the service, reduced 3) queue transparency moderated effects.

  • Ikea effect. When compared to objectively similar goods not produced by themselves, consumers place a higher value on goods they have assembled. Consumers show a higher willingness-to-pay when they assemble products themselves.

    • Status: replicated.
    • Original paper: ‘The “IKEA Effect”: When Labor Leads to Love’, Norton et al. 2012; four between-subject experiments, N1a=52, N1b= 106, N2=118, N=39. [citations=1,358(GS, February 2023)]​.
    • Critiques: Mochon et al. 2012 [four experiments N1=79, N2=135, N3a=75, N3b=41, citations=262(GS, February 2023)]. Sarstedt et al. 2016 conceptual replication [N=103, citations=26(GS, February 2023)].​
    • Original effect size: Experiment 1a: builders bid significantly more for their boxes (M=$0.78, SD=0.63) than non-builders (M=$0.48, SD=0.40), d= 0.59 (calculated from reported t statistic, t(50)=2.12, p<.05); Experiment 1b: builders' valuation of their origami (M=$0.23, SD= 0.25) was nearly five times higher than what nonbuilders were willing to pay for these creations M=$0.05, SD= 0.07), ηp2 = 0.096 / d= 0.32 (calculated from reported F statistic, F(2, 100)=5.34, p<.01 and converted to d using this conversion); Experiment 2 –bids overall were highest in the build condition than in the unbuild and prebuilt conditions, ηp2=0.126 / d= 0.38 (calculated from F(2, 106)=7.68, p<.01 and converted to d using this conversion); Experiment 3 – builders bid significantly more for their boxes (M= $1.46, SD= 1.46) than incomplete builders (M= $0.59, SD=0.70), d =0.75 (calculated from reported t statistic, t(37)=2.35, p<.05).
    • Replication effect size: Mochon et al.: Study 1 – builders were willing to pay significantly more for their cars (M=$1.20, SD=1.35) than non-builders (M=$0.57, SD=.76), d= 0.56 (calculated from reported t statistic, t(73)=2.44, p<.05) [replicated]; Study 2 – builders (M = $0.72, SD = .45) were willing to pay significantly more than non-builders (M=$0.46, SD=.50) in no-affirmation condition, d =0.54 (calculated from reported t statistic, t(52)=1.99, p=.05) [replicated]. Sarstedt et al.: Participants in the experimental group (assembly group) offered significantly more money for the loom bands than the control participants, mean difference = 1.36, p < 0.01, d =1.68 (calculated from the M, SD and n data given in Table 5 in the Supplementary material) [replicated].

  • Endowment effect. People are more likely to retain an object they own than acquire that same object when they do not own it. This implies that the value that an individual assigns to objects appears to increase substantially as soon as that individual is given the object.

    • Status: replicated
    • Original paper: ‘Experimental Tests of the Endowment Effect and the Coase Theorem’, Kahneman et al. 1990; experimental design, Experiment 1: n=42, Experiment 2: n=38, Experiment 3: n=26, Experiment 4: n=74 [citations=6392 (GS, March 2023)].
    • Critiques: Carmon and Ariely 2000 [study 1: n=91, study 2: n=472, study 3: n=75, study 4: n=250, citations=776 (GS, March 2023)]. Shogren et al. 1994 [n=142, citations=776 (GS, March 2023)].
    • Original effect size: 5 (selling price divided by buying price).
    • Replication effect size: Carmon and Ariely: study 1: d= 0.03 (calculated from converting Pearson’s r to Cohen’s d through this calculator); study 2: NA; study 3: NA, study 4: NA. Shogren et al.: 1.05 over trial 5 (selling price divided by buying price); d=-0.069 (calculated from M and SD reported in Table 2) .


Marketing

  • Choice overload. The idea that giving people too many choices can lead to certain undesirable consequences such as reduced purchasing intentions.

    • Status: mixed (duelling meta-analyses, mix of successful and failed replications).
    • Original study: ‘When choice is demotivating’, Iyengar and Lepper, 2000; field experiment, 3 experiments, Study 1: n=502, Study 2: n=197, Study 3: 134. [citations = 2460(GS, April 2023)].
    • Critiques: Chernev et al. 2010 [commentary, n=NA, citations=98 (GS, April 2023)]. Chernev et al. 2015 [meta-analysis of 99 observations, N = 7202, citations=717 (GS, April 2023)]. Scheibehenne 2008 [three replications in the field and in the lab with a total of n= 850 participants and six laboratory experiments with n=595, citations=50 (GS, April 2023)]. Greifeneder 2008 [unpublished manuscript (link not available), n=NA, citations=4 (GS, April 2023)]. Scheibehenne et al. 2010 [meta-analysis of 63 conditions from 50 published and unpublished experiments, N = 5,036, citations=1241 (GS, April 2023)]. Simonsohn et al. 2014 [n=NA, citations=681 (GS, April 2023)].
    • Original effect size: d=0.77 (study1) and d=0.29 (study2), and d=0.88 (study3) (as calculated from the χ2 values in the text with this online calculator).
    • Replication effect sizes: Scheibehenne: failed to directly replicate Iyengar and Lepper (2000) jam study. Greifeneder: a lab experiment with chocolates and also failed to conceptually replicate. Scheibehenne et al. 2010: “We found a mean effect size of virtually zero” (d=.02). Chernev et al. 2010: That’s because many of the studies were designed to show instances when there is no effect. You need to split the data into “choice is good” vs. “choice is bad.” Simonsohn et al.: We agree with Chernev et al.: When we split it up, we found that the choice is bad studies (choice overload) lack collective evidential value (uniform p-curve). Chernev et al.: <ignoring Simonsohn et al. 2014> Choice overload is a reliable effect under certain conditions (moderators).

  • Mate guarding, Women use conspicuous luxurious goods to deter female rivals by signalling to other women they have a devoted partner.

    • Status: reversed
    • Original paper: ‘Conspicuous Consumption, Relationships, and Rivals: Women’s Luxury Products as Signals to Other Women’, Wang & Griskevicius 2014; 5 experimental studies, Study 1: N=69, Study 2: N=137, Study 3: N=115, Study 4: N=75, Study 5: N=175. [citation=450 (GS, January 2022)]​.
    • Critiques: Tunka & Yanar 2020 [conceptual Study 1, N= 250, and direct replications Study 2, N=255, of study 1 of Wang & Griskevicius, citations=2 (GS, January 2022)].
    • Original effect size: d=0.24.
    • Replication effect size: Tunka & Yanar: Study 1: did not replicate the original findings that women with luxurious goods are perceived by other women as having devoted partners (d =0.13); Study 2: a reversal is observed, such that women with non-designer possessions were perceived to have a more devoted partner than women with designer possessions (d=-0.27).

  • Scarcity effect - Overborrowing. Perceived financial scarcity causes consumers to overborrow.

  • Scarcity effect - Resource allocation. Poor economic conditions favour resource allocations to daughters over sons.

  • Scarcity effect - Planning. Consumers who feel resource constrained shift to engage in relatively more priority planning, rather than efficiency planning.

  • Scarcity effect - Competition/threat. Exposure to limited-quantity promotion advertising prompts consumers to perceive other shoppers as competitive threats.

  • Scarcity effect - Brand attitudes. Observing luxury brand consumers whose consumption arises from unearned financial resources reduces observers’ brand attitudes when observers place a high value on fairness.​

  • Scarcity effect - Product use creativity. Scarcity salience is associated with greater creativity.​

  • Scarcity effect - Wage rates. The difference in implied wage rates based on a time elicitation versus a money elicitation procedure is reduced as the time horizon increases.​

  • Scarcity effect - Selfishness. Reminders of scarcity causes selfish behaviour to a greater extent in people with low social value orientation.​

  • Scarcity effect - Preference for material goods. Scarcity leads to a preference for material goods over experiential goods.​

  • Scarcity effect - Preference polarisation. Perceived scarcity leads to greater preference polarisation and stronger preference for a preferred option over a less preferred option.​

  • Product size and status. People who are low in power choose supersized foods and drinks to signal status.

    • Status: not replicated.
    • Original paper: ‘Super size me: product size as a signal of status’, Dubois et al. 2012; 6 experimental studies with n’s = Study 1: n = 183, Study 2: n = 142, Study 3: n = 89, Study 4: n = 269, Study 5: n = 134, Study 6: n = 104. [citation=384(GS, November 2022)]​.
    • Critique: Tunca et al 2022 [direct replication of study 1 of Dubois et al. 2012; n=415, citations=0(GS, November 2012)].
    • Original effect size: Study 1: small vs. large product size: d=1.49 [1.09, 1.89]; medium vs. large product size: d=0.89 [0.52, 1.26]; small vs. medium product size: d=0.62 [0.26, 0.98].
    • Replication effect size: Tunca et al.: small vs. large product size: d = 0.09 [-0.15, 0.33]; medium vs. large product size: d = 0.11 [-0.13, 0.34]; small vs. medium: d=-0.01 [-0.25, 0.23].

  • Less-is-better effect. People are willing to pay more money for a product that contains lower quantity when it looks more full, such as an overfilled ice cream cup with 7 oz rather than underfilled larger ice cream cup with 8 oz.

    • Status: replicated
    • Original paper: ‘Less is better: When low-value options are valued more highly than high-value options’, Hsee 1998; between-subjects manipulation of lower amount with more complete/full vs. higher amount with less complete/broken/full, Study 1 n=83, Study 2 n= 69, Study 3 n=98, Study 4 n=104. [citations=491(GS, January 2023)].
    • Critique: Study 1 was also replicated in Klein et al. 2018 [aggregate replication sample N=7,646, citations=779(GS, February 2023)]. Vonasch et al 2023 [preprint, Study 1 n=132, Study 2 n=133, Study 4 n = 131, citations=0(GS, January 2023)].
    • Original effect sizes: Study 1: d = 0.70 [0.24, 1.15]; Study 2: d = 0.74 [0.12, 1.35]; Study 4: d = 0.97 [0.43, 1.50].
    • Replication effect sizes: Klein et al.: Study 1 also replicated with d=.78 [.74, .83]. Vonasch et al.: Study 1: d = 0.99 [0.72, 1.25]; Study 2: d = 0.32 [0.05, 0.56]; Study 4: d = 76 [.50, 1.02].

  • Left digit bias. The leftmost digit of a number disproportionately influences decision making. Consumers judge the difference between $4.00 and $2.99 to be larger than that between $4.01 and $3.00, even though the numeric differences are identical; it is this change in the left digit, rather than the one cent drop, that affects the magnitude perception.

    • Status: replicated.
    • Original paper: ‘Penny Wise and Pound Foolish: The Left-Digit Effect in Price Cognition’, Thomas and Morwitz 2005; five experiments with various designs, Study 1a: n= 52, Study 1b: n=63, Study 2: n = 154, Study 3: n = 53, Study 4: n= 27. [citations=474(GS, February 2023)]​.
    • Critiques: Bhattacharya et al. 2012 [n≈100 million stock transactions, citations=109(GS, February 2023)]. Lacetera et al. 2012 [n=22 million wholesale used-car transactions, citations=386(GS, February 2023)].​ (https://pubsonline.informs.org/doi/abs/10.1287/mnsc.1110.1364) Manning and Sprott 2009 [Study 1: n=442, Study 2a: n=409, Study 2b: n = 329, citations=149(GS, February 2023)]. Sokolova et al. 2020 [Study 1: n=145, Study 2: n=120, Study 3: n=99, Study 4: n=150, Study 5: n=201, Study 6: n=15,236 choices across 3 product categories, citations=30(GS, February 2023)].​
    • Original effect size: Study 1a – Significant effect of price ending (nine vs zero ending) on price magnitude perception when left digits differed, η ²=.16; Study 1b – Significant left-digit change caused by a nine-ending price effects on the price’s magnitude perception, when the comparison standard is perceived to be close, η ²=.15; Study 2 – For all levels of comparison standard, nine-ending target prices were perceived to have lower magnitude than zero-ending ones, η ²=.15; Study 3 – the significant effects of nine-ending numbers on response times when the distance between the target number and the comparison standard is small, η ²=.08; Study 4 – Significant effect of nine-ending in the target on the perceived quality ratings differences, when the psychological distance was low, η²=.41.
    • Replication effect size: Bhattacharya et al.: Buys outnumber sells at trade prices immediately below a round number, regression coefficients from b=0,270 to b=1.493, and sells outnumber buys at trade prices immediately above a round number, regression coefficients from b=-0,177 to b=-0.367; the highest ratio of buys to sells by liquidity demanders occurs at trade prices ending in .99, and the lowest ratio of buys to sells by liquidity demanders occurs at trade prices ending in .01 (replicated). Lacetera et al.: significant discontinuous drops in car value/auction price for car sale at mileage thresholds where left digits change (e.g., 10,000-mile marks) in two regression models (average -$157 and -$173, respectively) (replicated). Manning and Sprott: Study 1 – ES not reported but the significant effects of price endings on the choice of products (replicated); Study 2a - perceived price difference between the two products was affected by the price endings manipulation, β= -.32 (replicated); Study 2b - perceived price differences were affected by the price endings manipulation, β = -.32 (replicated). Sokolova et al.: Study 1 – Left-digit effect size in private label price evaluation is stronger in stimulus-based, ηp2 =0.054, than in memory-based evaluations, ηp2 =0.001; Study 2 – Left-digit effect size in discount evaluation is stronger in stimulus-based, ηp2 =0.209, than in memory-based evaluations, ηp2 =0.057; Study 3 – Left-digit effect size in discount evaluation is stronger in stimulus-based, ηp2 = 0.455, than in memory-based evaluations,_ ηp2 _ =0.296; Study 4 – Left-digit affect response times more in stimulus-based, _ηp2 _ =0.001, than in memory-based evaluations,_ηp2_ =0.00004; left-digit effect size in precise memory-based evaluations similar to stimulus-based, _ηp2_ =0.004; Study 5 – Left-digit effect size in price evaluations is stronger in stimulus-based, _ηp2 _ =0.033, than in memory-based evaluations, _ηp2 _ =0.001; Study 6 – Left-digit bias is stronger among light category users, _b_ =-0.15 to _b_ =-0.55, than in heavy category users, _b_=0.00 to _b_ =-0.40 (replicated).


Neuroscience (humans)

  • One mind per hemisphere (split-brain syndrome). Surgical severing of the corpus callosum leads to the split-brain phenomenon, which is characterised by 1) a response × visual field interaction, 2) strong hemispheric specialisation 3) confabulations after left-hand actions 4) split attention, and 5) the inability to compare stimuli across the midline. Together, these reported effects have been interpreted as evidence for split consciousness. Surgical procedure does not result in the development of two independent minds or consciousnesses within one brain. Instead, the findings suggest that the two hemispheres continue to work together, even in the absence of the corpus callosum.

    • Status: mixed.
    • Original paper: ‘Some functional effects of sectioning the cerebral commissures in man’, Gazzaniga et al. 1962; case study, n=1. [citations=617(GS, October 2022)].
    • Critiques: de Haan et al. 2020 [review paper, n=NA, citations=44(GS, April 2023)]. Pinto et al. 2017 [review paper, n=NA, citations=43(GS, April 2023)]. Pinto et al., 2017 [n=2, citations=39(GS, October 2022)].
    • Original effect size: NA (verbal descriptions, no quantitative data)​.
    • Replication effect size: de Haan et al.: NA, body of evidence is insufficient to answer this question, different theories of consciousness have different predictions on the unity of mind in split-brain patients, and await the results of further investigation into this intriguing phenomenon. Pinto et al.: argue that the data could instead be indicative of a single undivided consciousness experiencing two parallel and unintegrated perceptual streams. Pinto et al.: replicated (no ES; replicate the standard finding that stimuli cannot be compared across visual half-fields, indicating that each hemisphere processes information independently of the other).

  • Hydrocephaly. The effect of massive volume loss improving cognition. Hydrocephalus, also known as “water on the brain,” is a condition in which there is an abnormal accumulation of cerebrospinal fluid (CSF) in the ventricular system of the brain. This can cause the ventricles to become enlarged, putting pressure on the brain and causing a wide range of symptoms, depending on the severity of the condition and the age of the individual. There are two types of Hydrocephalus: congenital and acquired. Congenital Hydrocephalus is present at birth and is caused by a genetic or developmental abnormality. Acquired Hydrocephalus develops later in life and can be caused by a variety of factors, such as a brain tumour, infection, or injury.

    • Status: NA
    • Original paper: No paper; instead a documentary and a profile of the claimant, John Lorber. Also ‘Wittgenstein’s Certainty is Uncertain: Brain Scans of Cured Hydrocephalics Challenge Cherished Assumptions’, Forsdyke 2015; review paper, n=NA. [citations = 20 (Springerlink, January 2023)].
    • Critiques: de Oliveira et al. 2012 [review, fraudulent/retracted, n = NA, citations = 42 (GS, April, 2023)]. Feuillet et al. 2007 [n=1, citations = 192 (GS, April 2023)]. Hawks 2007 [blog, n=NA, citations = 0 (GS, April 2023)]. Gwern 2019 [blog, n=NA, citations = 0 (GS, April 2023)]. Neuroskeptic 2015 [journal article, n=NA, citations = 0 (GS, April 2023)].
    • Original effect size: NA.
    • Replication effect size: NA.​ Crucially, for meta-analyses on improvements after Hydrocephaly treatment see Zhang (2020) or Tabatabaee et al. (2019). Hawks: The reported cases do not apparently involve significant gray matter tissue loss. A “thin” cortex does not necessarily imply functionally small cortical volume, even with substantial white tissue loss. Neuroskeptic: While the enormous “holes” in these brains seem dramatic, the bulk of the grey matter of the cerebral cortex, around the outside of the brain, appears to be intact and in the correct place; no detailed post-mortem studies of their brain tissue have been published. Gwern: the cases turn out to be suspiciously un⁣ver⁣i⁣fi⁣able (Lor⁣ber), likely fraud⁣u⁣lent (Oliveira), or actually low in⁣tel⁣li⁣gence (Feuil⁣let). It is un⁣clear if high-functioning cases of hydrocephalus even have less brain mass, as opposed to lower proxy measures like brain vol⁣ume. Feuillet et al.: man who got to 44 years old before anyone realised his severe hydrocephaly, through marriage and employment. IQ 75 (i.e. d=-1.7).

  • Readiness potentials. Readiness potentials are neural signals that are observed in the brain prior to voluntary movements. They are typically measured using electroencephalography (EEG) and are thought to reflect the neural activity associated with preparing for a movement, occurring several hundred milliseconds before the movement occurred, suggesting that the brain prepares for the movement before the person is consciously aware of the decision to move. RP have been observed in various regions of the brain, including the primary motor cortex, supplementary motor area, and premotor cortex. Schurger et al. (2021) for a glossary. The discovery of readiness potentials (RP) has been used to argue against the concept of free will, as it suggests that the neural activity associated with a voluntary movement starts before the person is consciously aware of the decision to move.

    • Status: replicated
    • Original paper: ‘Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential und reafferente Potentiale’, Kornhuber and Deecke 1956; 12 healthy subjects in 94 experiments. [citations = 1410 (SPRINGERLINK, January 2023)].
    • Critiques: Alexander et al. 2016 [n=17, citations = 83 (GS, April 2023)]. Fried et al. 2011 [n=12, citations = 589 (GS, April 2023)]. Libet et al. (1964/1983) [6 different experimental sessions with each of 5 subjects, citations = 3814 (PUBMED, January 2023)]. McGilchrist 2012 [n=NA, citations = 55 (GS, April 2023)]. Travers et al. 2020 [n=19, citations = 25 (GS, April 2023)].
    • Original effect size: NA.
    • Replication effect size: Fried et al.: replicated (no ES). Travers et al.: replicated (no ES). McGilchrist/ Alexander et al.: The neural activity observed may not necessarily be associated with the preparation for a voluntary movement, but rather with a cognitive process such as attention or decision making. Some studies have suggested that RP may reflect the neural activity associated with attentional processes rather than motor preparation, and that the relationship between RP and voluntary movement is not as clear-cut as initially thought.

  • Left-brain vs. Right-brain Hypothesis. Individuals may be left-brain dominant or right-brain dominant based on personality and cognitive style. Specifically, the hypothesis proposes that the two hemispheres of the brain have different functions and abilities, with the left hemisphere being associated with logical, analytical, and verbal skills, and the right hemisphere being associated with creative, intuitive, and spatial skills. This idea has been popularised in popular culture, but it is not supported by scientific evidence.

    • Status: not replicated
    • Original paper: No original paper based on brain data, seems to have evolved from early studies by Broca and Wernicke and on language localization of the brain, became a mainstream popular idea but not backed by evidence (Source)._ _Cognitive styles original papers use questionnaires and non-brain based measures to determine “hemispheric dominance”. No-brain-data early paper: ‘Hemispheric dominance in recall and recognition’, Zenhausen and Gebhardt 1979; within-subjects design, n = 20. [citations = 34 (GS, June 2022)].
    • Critiques: Nielsen et al. 2013 [n=1,011, citations= 446(GS, April 2023)].
    • Original effect size: N/A [Zenhausen & Gebhardt, not provided].
    • Replication effect size: Nielsen et al.: N/A, data are not consistent with a whole-brain phenotype of greater “left-brained” or greater “right-brained” network strength across individuals [no specific result in Nielsen et al. 2013].

  • Oxytocin on trust. Intranasal administration of oxytocin increases trust in strangers in a laboratory setting.

    • Status: not replicated
    • Original paper: ‘Oxytocin increases trust in humans’, Kosfeld et al. 2005; experiment, n = 128_. _[citations = 4800 (GS, April 2022)].
    • Critiques: Declerck et al. 2020 [n = 677, citations =57 (GS, April 2022)]. Lane et al. 2015 [n1 = 95, n2= 61, citations =63 (GS, April 2022)].
    • Original effect size: Not reported but could be calculated: “In fact, our data show that oxytocin increases investors' trust considerably. Out of the 29 subjects, 13 (45%) in the oxytocin group showed the maximal trust level, whereas only 6 of the 29 subjects (21%) in the placebo group showed maximal trust (Fig. 2a). In contrast, only 21% of the subjects in the oxytocin group had a trust level below 8 monetary units (MU), but 45% of the subjects in the control group showed such low levels of trust.”
    • Replication effect size: Declerck et al.: No support for the hypothesis that OT increases trust in the minimal social contact condition β= −0.136 [−0.952, –0.682]. Lane et al.: Study 1 - no significant effect, F(1,93) = .229, p = .663; Study 2 - no significant effect, F(1,59) = .295, p = .589.

  • Structural brain-behaviour correlations - the association between behavioural activation and white matter integrity. Individual differences in the sensitivity to signals of reward as indexed by BAS-Total and in the tendency to seek out potentially rewarding experiences as measured by BAS-Fun are positively correlated with diffusion measures of several white matter pathways.

    • Status: not replicated
    • Original paper: ‘White matter integrity and behavioral activation in healthy subjects’, Xu et al. 2012; correlational design, n = 51. [citations = 29 (GS, May 2022)]​.
    • Critiques: (https://doi.org/10.1016/j.cortex.2017.03.007)corrigendum of Boekel et al. 2015 [n=36, citations = 196 (GS, May 2022)]. Keuken et al. 2017 [n = 34-35, citations = 1 (GS, May 2022)].
    • Original effect size: BAS-Total correlation with parallel diffusivity in the left corona radiata (CR)/superior longitudinal fasciculus (SLF): r = .51; BAS-Fun correlation with: fractional anisotropy in the left CR/SLF: r = .52, parallel diffusivity in the left CR/SLF: r = .58, mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .51.
    • Replication effect size: Keuken et al.: BAS-Total correlation with parallel diffusivity in the left CR/SLF: r = -.15; BAS-Fun correlation with: fractional anisotropy in the left CR/SLF: r = -.15, parallel diffusivity in the left CR/SLF: r = -.04, mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .05.

  • Structural brain behaviour correlations - the association between social network size and grey matter volume. Individual differences in the number of Facebook friends (FBN) are positively correlated with grey matter volume in several brain areas: left middle temporal gyrus (MTG), right superior temporal sulcus (STS), rich entorhinal cortex (EC), left and right amygdala.

    • Status: mixed
    • Original paper: ‘Online social network size is reflected in human brain structure’, Kanai et al. 2012; correlational design, n = 125. [citations= 411 (GS, May 2022)].
    • Critiques: Boekel et al. 2015 [n = 34-35, citations = 196 (GS, May 2022)]. Kanai et al. 2012 [n = 40, citations= 411 (GS, May 2022)].
    • Original effect size: left MTG: r =.35; right STS: r = .35; right EC: r = .35, left amygdala: r = .30; right amygdala: r = .32.
    • Replication effect size: Kanai et al.: left MTG: r =.38; right STS: r = .44; right EC: r = .48; left amygdala: r = .33; right amygdala: r = .48. Boekel et al.: left MTG: r = .18; right STS: r = .11; right EC: r = .06; left amygdala: r = -.14; right amygdala: r = .02.

  • Structural brain-behaviour correlations - the association between distractibility and grey matter volume. Variability in self-reported distractibility is positively correlated with grey matter volume in the left superior parietal lobule (SPL) and negatively correlated with grey matter volume in medial pre-frontal cortex (mPFC).

  • Structural brain-behaviour correlations - the association between attention and cortical thickness. Individual differences in executive control are negatively correlated with cortical thickness in left anterior cingulate cortex (ACC), left superior temporal gyrus (STG), and right middle temporal gyrus (MTG), whereas variation in alerting scores is negatively correlated with cortical thickness in the left superior parietal lobule (SPL).

  • Structural brain-behaviour correlations - the association between control over speed/accuracy of perceptual decisions and white matter tracts strength. Individual differences in control over speed and accuracy of perceptual decisions are positively correlated with the strength of white matter tracts between the right presupplementary motor area (pre-SMA) and the right striatum.

  • Structural brain-behaviour associations - the association between executive function and grey matter volume. Grey matter volume in the rostral dorsal premotor cortex is associated with individual differences in executive function as measured by the trail making test.

  • Fear conditioning - Amygdala. Animal research suggests that fear conditioning activates the amygdala ( LeDoux, 1993), which has been replicated in some (but not all) human fMRI fear conditioning studies.

    • Status: mixed
    • Original paper: ‘Human Amygdala Activation during Conditioned Fear Acquisition and Extinction: a Mixed-Trial fMRI Study’, LaBar et al. 1998; differential fear conditioning paradigm, N=18. [citations=1826(GS, March 2023)]​. Note that amygdala activation habituated over time, as it would be expected from research in animals; this methodological consideration has been neglected in many replication attempts.
    • Critiques: Fullana et al. 2016 [meta-analysis, total n=677 from 27 studies, citations=503(GS, March 2023)]. Mechias et al. 2010 [meta-analysis, total n=360, citations=430 (GS, March 2023)] Öhman et al. 2009 [n=NA, citation=20 (GS, March 2023)]. Phelps et al. 2004 [replication, n=18, citations=2144(GS, March 2023)]. . Sehlmeyer et al. 2009 [systematic review, n=NA, citations=612(GS, March 2023)]. the following studies demonstrated that amygdala activation can be detected in fear conditioning experiments when amygdala habituation over time is explicitly modelled/considered: Armony and Dolan 2001 [n=8, citations=60(GS, March 2023)]. Büchel et al. 1998 [n=9, citations=1264(GS, March 2023); Büchel et al. 1999 [n=11, citations=561(GS, March2023)].Sperl et al. 2019 [n=21, citations=33(GS, March-2023)].​ Yin et al. 2018 [n=18, citations=21(GS, March 2023)].
    • Original effect size: NA.
    • Replication effect size: Armony and Dolan: the presence of the aversive visual context was associated with enhanced activity in parietal cortex, which may reflect an increase in attention to the presence of environmental threat stimuli. Büchel et al.: Differential evoked responses, related to conditioning, were found in the anterior cingulate and the anterior insula, regions with known involvement in emotional processing. Büchel et al.: Differential responses (CS+ vs CS−), related to conditioning, were observed in anterior cingulate and anterior insula, regions previously implicated in delay fear conditioning; differential responses were also observed in the amygdala and hippocampus that were best characterized with a time × stimulus interaction, indicating rapid adaptation of CS+-specific responses in medial temporal lobe. Fullana et al.: no robust and consistent involvement of the amygdala in fear acquisition across studies; see effect size maps (Fig. 1, 2, 3, and 4; maps are difficult to convert into numbers). Mechias et al.: consistent activation in rostral dmPFC but not in the other candidate areas; discussing methodological constraints. Öhman et al.: excellent overview about early replications and methodological considerations to capture amygdala activity in humans. Phelps et al.: amygdala activation was correlated across subjects with the conditioned response in both acquisition and early extinction.. Sehlmeyer et al.: A network consisting of fear-related brain areas, such as amygdala, insula, and anterior cingulate cortex, is activated independently of design parameters. However, some neuroimaging studies do not report these findings in the presence of methodological heterogeneities. Furthermore, other brain areas are differentially activated, depending on specific design parameters. These include stronger hippocampal activation in trace conditioning and tactile stimulation. Furthermore, tactile unconditioned stimuli enhance activation of pain related, motor, and somatosensory areas. Sperl et al.: Fear and extinction recall as indicated by theta explained 60% of the variance for the analogous effect in the right amygdala.

  • Fear conditioning - vmPFC. Animal research suggests that fear extinction activates the vmPFC ( Morgan et al., 1993); based on these findings from animal research, some (but not all) human fMRI fear conditioning/extinction studies found that the vmPFC becomes activated during fear extinction recall.

    • Status: mixed
    • Original paper: ‘Extinction Learning in Humans: Role of the Amygdala and vmPFC’, Phelps et al. 2004; differential fear conditioning and extinction paradigm, N=18. [citations=2144(GS, March 2023)]​.
    • Critiques: Diekhof et al. 2011 [meta-analysis, total n=154, citations=323(Elsevier, March 2023)]. Fullana et al. 2018 [meta-analysis, total n>1.300 participants, citations=210(GS, March 2023)]. (see also methodological comments by Morriss et al., 2018 and Fullana et al., 2019).
    • Original effect size: NA.
    • Replication effect size: Diekhof et al.: evidence that fear extinction activates vmPFC subregions in humans. Fullana et al.: there is support that fear extinction recall is associated with vmPFC activation, but vmPFC extinction effects seem to be more nuanced than previously assumed and vmPFC effects seem to depend on paradigm characteristics; see effect size maps (Fig. 1, 2, 3, and 4; maps are difficult to convert into numbers).

  • Fear conditioning - Theta oscillations. Animal research suggests that fear conditioning evokes prefrontal theta activity, which can be measured with EEG in humans.

    • Status: mixed
    • Status: replicated
    • Original paper: ‘Prefrontal Oscillations during Recall of Conditioned and Extinguished Fear in Humans’, Mueller et al. 2014; two-day differential fear conditioning study, n=42. [citations=87(GS, January 2023)]​.
    • Critiques: Bierwirth et al. 2021 [n=60, citations=11 (GS, January 2023)]. Chen et al. 2021 [n=13, citations=16(GS, January 2023)]. Mueller and Pizzagalli 2016 [n=16, citations=26(GS, January 2023)]. Sperl et al. 2021 [n=21, citations=31(GS, January 2023)]. Starita et al. 2023 [n=20, citations=0(Wiley; January 2023)].
    • Original effect size: NA.
    • Replication effect size: ierwirth et al.: results replicated and extended by the influence of sex hormones, ηp2 (estradiol status for E2 level by group)_ =.076; _d_ (one-sided; MC women vs. OC women) = .698; _d_ (one-sided; MC women vs. men) = .756; _d_ (one-sided; OC women vs. men) = .077. _ηp2_(estradiol status for P4 level by group) =.099; _d_ (MC women vs. OC women) = .718; _d_ (MC women vs. men) = .867; _d_ (OC women vs. men) = .219; _ηp2_(estradiol status for testosterone level by group) =.681; _d_ ( men vs. MC women) = 3.303; _d_ (men vs. OC women) = 3.303; _d_ (MC women vs. OC women) = .598; _ηp2_(skin conductance responses, day 1: contingency) = .426; _ηp2 _(skin conductance responses, day 1: estradiol status) = .139; _ηp2_(skin conductance responses, day 1: estradiol status X contingency) = .115; _d_ (diffSCR, men vs. OC women) = .797; _d_ (diffSCR, men vs. MC women) = .665; _ηp2_(extinction learning, day 1: contingency) = .277; _ηp2_(extinction learning, day 1: contingency X estradiol status) = .250; _d_ (diffSCR during learning, men vs. OC women) = .937; _d_ (diffSCR during learning, men vs. MC women) = 1.175; _d_ (one-sided; diffSCR during extinction learning vs. fear acquisition) = .353; _ηp2_(skin conductance responses, day 2: contingency) = .491; _ηp2_(skin conductance responses, day 2: contingency X extinction status) < .001 (ns); _d_ (diffSCR, extinction learning vs. extinction recall) = .269; _d_ (diffSCR, fear acquisition vs. extinction recall) = .069; _ηp2_(diffSCR, day 2: estradiol status) = .137; _d_ (one-sided; diffSCR, MC women vs. OC women) = .615; _d_ (one-sided; diffSCR, MC women vs. men) = .879; _ηp2_(estradiol status, FRI vs. ERI) = .103; _d_ (one-sided; FRI, MC women vs. OC women) = .639; _d_ (one-sided; ERI, MC women vs. OC women) = .547; _d_ (one-sided; FRI, MC women vs. men) = .928; _d_ (one-sided; ERI, MC women vs. men) = .796; _ηp2_(theta oscillations, electrode) = .119; _ηp2_(theta oscillations, electrode X contingency) = .730; _ηp2_(dACC source, contingency effect) = .090; _ηp2_(dACC source, contingency X extinction status) = .023; _ηp2_(dACC source, contingency X estradiol status) = .100; _d_ (one-sided; differential theta power in dACC, MC women vs. OC women) = .609; _d_ (one-sided; differential theta power in dACC, MC women vs. men) = .741; _ηp2_(frontal theta power during extinction learning, contingency factor) = .061.Chen et al.: NA; results replicated by intracranial EEG. Mueller and Pizzagalli: ES=NA; replicated during a fear recall test one year after conditioning. Sperl et al.: ES=NA; results replicated and extended by simultaneous EEG-fMRI. B Starita et al.: replicated and extended by reversal learning, main effect of CS type in midcingulate cortex _ηp2_=.37 [0.08, 0.56].

  • Fear conditioning - Late Positive Potential. Fear conditioning leads to elevated amplitudes during the time period of the Late Positive Potential (LPP), i.e., a positive-going event-related brain potential (ERP) component​ that can be recorded using electroencephalography (EEG).

  • Fear conditioning - Bradicardia / heart rate modulation. Fear conditioning leads to heart rate slowing (fear-conditioned bradycardia).

    • Status: replicated
    • Original paper: ‘Conditioned heart rate response in human beings during experimental anxiety’, Notterman et al. 1952; fear conditioning paradigm with heart rate recording, n= 20. [citations=122(GS, April 2023)]; see also Notterman et al., 1952b.
    • Critiques: Castegnetti et al. 2016 [n=99, citations=31(Wiley, January 2023)]. Deane & Zeaman 1958 [n=10, citations=51(GS, January 2023)]. Gruss et al. 2016 [n=63, citations=86(Elsevier, January 2023)]. Mueller et al. 2019 [n=86, citations=28(GS, January 2023)]. Panitz et al. 2015 [n=22, citations=27(GS, January 2023)]​. Panitz et al. 2018 [n=87, citations=24(gs, January 2023)]. Schipper et al. 2019 [n=104, citations=10(PNAS, January 2023)]. Sperl et al. 2021 [n=24, citations=13(GS, January 2023)]. Thigpen et al. 2017 [n=17, citations=29(GS, January 2023)].Yin et al. 2018 [n=18, citations=19(GS, January 2023)].
    • Original effect size: ES=NA.
    • Replication effect size: Castegnetti et al.: NA (replicated). Deane and Zeaman: NA (replicated). Gruss: NA (replicated, heart rate modulation depending on COMT genotype). Panitz et al.: ηp2=.281 (replicated, heart rate modulation depending on COMT genotype). Mueller et al.: study 1: ηp2=.08, study 2: ηp2=.07 (replicated, use of an aversive imagery unconditioned stimulus; paper includes two datasets with independent samples, effect can be replicated). Schipper et al.: NA (replicated, heart rate modulation depending on 5-HTTLPR genotype). Sperl et al.: d = 1.17 (replicated).Thigpen et al.: all d’s>.8 (replicated). Yin et al.: NA (replicated).

  • Bouba/kiki effect (sound symbolism). When presented with sounds (e.g., the words “bouba” and “kiki”) and visual objects (e.g., a curvy shape and a spiky shape), humans make non-arbitrary mappings between sounds and objects (e.g., the curvy shape is consistently called “bouba”).

    • Status: replicated
    • Original paper: ‘Synaesthesia—A window into perception, thought and language’ Ramchandran and Hubbard 2001; two-alternative forced choice task (claim made about ‘95% of participants,’ but actual study not described), n=NA. [citations=2218 (GS, February 2023)​].
    • Critiques: Ćwiek et al. 2022 [n=976 (replication), citations=30 (gs, February 2023)]. Fort et al. 2018 [n=425 (meta-analysis), citations=48 (GS, February 2023)].
    • Original effect size: NA.
    • Replication effect size: Ćwiek et al.: Hedge’s g=0.106 [0.029, 0.154] (calculated). Fort et al.: Hedge’s g=0.163 [0.088, 0.238].

  • Human freezing behaviour (postural sway). A physiological response that occurs in response to a perceived threat. Overall, freezing-like behaviour has been replicated in multiple studies across different species and contexts, and is considered a robust and reliable measure of fear and anxiety.

  • Glucose amplification of cortisol stress reactivity. After fasting, the administration of glucose prior to psychosocial stress or a nicotine challenge led to an increased cortisol stress response (in comparison to water administration). ​Blood glucose levels were positively associated with the cortisol stress response triggered by the Trier Social Stress Test (TSST).

    • Status: mixed
    • Original paper: ‘Effects of Fasting and Glucose Load on Free Cortisol Responses to Stress and Nicotine’, Kirschbaum et al. 1997; laboratory experiment, administration of glucose (100 g) prior to the Trier Social Stress Test (TSST) or smoking two cigarettes, TSST: N = 25, smoking: N = 12. [citations=178(GS, April 2023)]​.
    • Critiques: Bentele et al. 2021 [n=122, citations=10(GS, April 2023)]. Gonzalez-Bono et al. 2001 [n=37, citations = 158 (GS, April 2023)]. Meier et al. 2022 [n=152, citations=5(GS, April 2023)]. Rüttgens and Wolf 2022 [n=72, citations=1(GS, April 2023)]. Von Dawans et al. 2021 [n=151, citations=14(GS, April 2023)]. Zänkert et al. 2020 [n=103, citations=27(GS, April 2023)].
    • Original effect size: NA.
    • Replication effect size: Bentele et al.: NA, replicated the enhancing effect of glucose administration on the cortisol stress response to the TSST in females. Gonzalez-Bono et al.: NA, but replicated both the increased cortisol stress response after glucose consumption and the significant association between blood glucose levels and the cortisol stress response. Meier et al.: NA, replicated the enhancing effect of glucose administration on the cortisol stress response to the TSST in females; no correlation between blood glucose levels and the cortisol stress response. Rüttgens and Wolf: NA, could neither replicate the enhancing effect of glucose on the cortisol response to a socially-evaluated cold pressor test, nor did they find a correlation between blood glucose levels and the cortisol stress response (SE-CPT). Von Dawans et al.: ηp2=0.042, replicated the increased cortisol stress response after glucose administration in response to the TSST and the cold pressor test (CPT), yet they note that the effect seems to descriptively stronger for the psychosocial stressor (TSST). Zänkert et al.: η2=0.077-0.082, replicated the increased cortisol stress response after grape juice and glucose administration.

  • Resting-state functional connectivity patterns can accurately classify individuals diagnosed with depression. Multivariate pattern analyses can identify patterns of resting-state functional connectivity that successfully differentiate individuals with depression from healthy controls. This finding demonstrates the potential utility of resting-state functional connectivity as a biomarker of depression.

    • Status: mixed
    • Original paper: ‘Disease state prediction from resting state functional connectivity’, Craddock et al. 2009; quasi-experimental design, ncontrols = 20, nclinical = 20. [citations=464(GS, April 2023)]​.
    • Critiques: Bhaumik et al. 2017 [ncontrols = 29, nclinical = 38, Citations=58 (GS, April 2023)]. Cao et al. 2014 [ncontrols = 37, nclinical = 39, Citations= 51 (GS, April 2023)].Guo et al. 2014 [ncontrols = 27, nclinical = 36, Citations= 73 (GS, April 2023)]. Lord et al. 2012 [ncontrols = 22, nclinical = 21, Citations=177 (GS, April 2023)].Ma et al. 2013[ncontrols = 29, nclinical = 24, Citations=89 (GS, April 2023)]. Qin et al. 2015 [ncontrols = 29, nclinical = 24, Citations=38 (GS, April 2023)]. Ramasubbu et al. 2016 [ncontrols = 19, nclinical = 45, Citations= 51 (GS, April 2023)]. Sundermann et al. 2017 [ncontrols = 180/60 (whole sample/severe symptoms only), nclinical = 180/60, Citations= 17 (GS, April 2023)]. Yu et al. 2013 [ncontrols = 38, nclinical = 19, Citations= 69 (GS, April 2023)]. Zeng et al. 2012 [ncontrols = 29, nclinical = 24, Citations=753 (GS, April 2023)]. Zeng et al. 2014 [ncontrols = 29, nclinical = 24, Citations=169 (GS, April 2023)]
    • Original effect size: 62.5% - 95% Classification Accuracy (cross-validation; CV); 16.7–83.3% (Hold-out validation).
    • Replication effect size: Bhaumik et al.: 76.1% (CV); 77.8%. (Hold-out validation). Lord et al.: 99.3% (CV). Zeng et al. / Zeng et al./ Ma et al./ Qin et al.: 69.8–96.2% (CV). Yu et al.: 80.9% (CV). Guo et al.: 90.5% (CV). Cao et al.: 84.2% (CV). Ramasubbu et al.: 49–66% (CV) – mixed, only significant in group with most severe symptoms. Sundermann et al.: no significant results in main analysis on whole sample (ES not reported); only significant in group with most severe symptoms 40.8 to 65.0% (CV), 54.2-61.7 (hold-out validation).


Psychiatry

  • Low self-esteem on poor mental health/psychological outcomes. Poor self-esteem results in a decrease in self-appreciation, producing self-defeating attitudes, poor mental health, social problems or risk behaviours.

    • Status: not replicated
    • Original paper: ‘Pygmalion in the classroom’ Rosenthal, and Jacobsen 1968; Teachers were told that certain children would perform better based on a test that was actually non-existent, N = NA. [Citations=13722 (GS, March 2023)].
    • Critiques: Baumeister et al.2003 [review paper, n=total number of studies included in their review seems unclear - started with 15,000 sources but narrowed this down and the final number included doesn’t seem clear, citations=6484 (GS, February 2023)]. Keane,and Loades 2017 [N = 10 studies were identified for this systematic review, citations=102, GS, February 2023)].
    • Original effect size: NA.
    • Replication effect size: Baumeister et al.: theoretical review, not reported; Showed some mixed evidence but mostly refuted claims. They found self-esteem was not related to smoking, alcohol, drug use; seemed to be only minimally associated with interpersonal success; Relationship with school performance seems to be that better school performance leads to higher self-esteem rather than the other way around. Self-esteem was moderately correlated with depression. Keane and Loades: d = 0.37 - 2.26 for the co-occurence of self-esteem and mental health diagnoses (i.e., anxiety and depressive disorders).

  • Rorschach Test (Rorschach inkblot test). A diagnostic tool for psychiatric conditions in which subjects' perceptions of inkblots are recorded and analysed using psychological interpretation, complex algorithms, or both.

    • Status: NA
    • Original paper: ’Psychodiagnostik’ (Psychodiagnostics), Rorschach 1921; book, n=NA [Citations=536 (GS, April 2023)].
    • Critiques: Garb 1998 [book, n=NA, citations=1059 (GS, April 2023)]. Lilienfeld et al. 2006 [n=NA, citations=36 (GS, April 2023)]. Mihura et al. 2013 [systematically reviewed the validity over 53 meta-analyses examining variables against externally assessed criteria (e.g., observer ratings, psychiatric diagnosis), k = 770, and 42 meta-analyses examining variables against introspectively assessed criteria (e.g., self-report), k = 386, citations=499 (GS, April 2023)]. Wood et al. 2000 [review paper, n=NA, citations=180 (GS, April 2023)].
    • Original effect size: NA.
    • Replication effect size: Garb/Lilienfeld et al.: These indicate that clinicians with access to questionnaire data or life histories of patients use data from the Rorschach test, their predictive accuracy actually decreases, possibly because they place more weight on the Rorschach results which are lower quality than data from other sources. Mihura:the mean validity r = .27 (for externally assessed criteria) as compared to r = .08 (for introspectively assessed criteria, e.g., self-report). Wood et al.: Test has some merit in detecting thinking disorders (although this is thought to be non-projective rather than projective which is meant to be the intention of the test; Dawes 1994) but is not related to other conditions such as depression, anxiety, antisocial personality disorder.

  • Lunar effect (Transylvania effect). Full moons lead to strange occurrences in human behaviour. ​

    • Status: mixed
    • Original paper: ‘The Lunar effect’, Lieber 1978; correlational study, N ≈ 26,000. [Citations= 118 (GS, January 2023)].
    • Critiques: Gutiérrez-Garcia and Tusell 1997 [n=897 deaths by suicide citations= 64 (GS, January 2023)]. Kung, and Mrazek 2005 [n=1,826 nights (186 nights fit the definition of the full-moon effect), citations= 15 (GS, January 2023)]. Kamat et al. 2014 [n=559, citations= 17 (GS, January 2023)]. Rotton and Kelly 1985 Meta-analysis [N≈781,000 across 37 studies, citations= 262 (GS, January 2023)].
    • Original effect size: Effects on criminal offences – (Cohen’s)_ h_=.03 (reported in Rotton and Kelly 1985); Suicide and Self-harm: ES not reported; Psychiatric admissions: ES not reported (but study found disproportionate number of episodes during full moon).
    • Replication effect size: All reported in Rooton and Kelly (1985): Homicides - Frey et al.: (Cohen’s) h=.06. Lester: rpb=.10. Lieber and Sherin: h=.00 to h=.02. Pokorny: h=-.01. Pokorny and Jachimczyk: h=.00. Tasso and Miller: h=.18. Combined probabilities for lunar indexes (full moon) Unweighted Z =0.93 (n.s.); Criminal offences - Forbes andLebo: h=-.01. Frey et al.: h=.00. Purpura: h=.03. Tasso andMiller: h=.04. Combined probabilities for lunar indexes (full moon) Unweighted Z =2.78 (significant); Suicide and Self-harm - DeVoge and Mikawa:h=-.02. Frey et al.: h=.00. Garth and Lester: h=.01. Jones and Jones: h=-.01. Lester: rpb=-.03. Lester et al.: h=.07. Ossenkamp and Ossenkamp: h=.01. Pokorny: h=.03. Taylor and Diespecker: h=.04. Combined probabilities for lunar indexes (full moon) Unweighted Z =0.80 (n.s.); Psychiatric disturbances – Angus: h=-.04 to h=.08. Chapman: h=.09. Frey et al.: h=.00. Gilbert: r=-.13 to r=-.02. Templer and Veleber: h=.08. Combined probabilities for lunar indexes (full moon) Unweighted Z =0.37 (n.s.); Psychiatric admissions - Bauer and Hornick: h=.00. Blackman and Catalina: rpb=.54. Climent and Plutchik: h=.00 to h=-.04. Edelstein et al.: h=.03. Geller and Shannon: h=.04. Osborn: h=.06. Pokorny: h=-.02. Walters et al.:_ rpb_=-.67. Weiskott and Tipton: _h_=.03. Combined probabilities for lunar indexes (full moon) Unweighted _Z = -_0.09 (n.s.)Crisis calls – Angus: _h_=.02. DeVoge and Mikawa: _h_=-.03. Michelson et al.: _r_=.01. Templer and Veleber: _h_=-.11. Weiskott: _h_=.08. Combined probabilities for lunar indexes (full moon) Unweighted _Z_ =0.27 (n.s.). Gutiérrez-Garcia and Tusell: ES not reported, but no relationship between lunar phases and suicide. Kamat et al. 2014: ES not reported but no statistical difference when comparing the number of psychiatric-related visits between the actual full moon day and the controls at 7 and 10 days (_P_ = 0.7608 and _P_ = 0.8323, respectively). Kung and Mrazek: ES not reported, but t test (not reported) showed no significant differences between the number of patients seen on full-moon (M=2.30) and non-full-moon nights (M=2.32).

  • Lack of a Theory of Mind is universal in autism. All autistic people fail to understand that other people have a mind or that they themselves have a mind.


Parapsychology

  • Precognition. Undergraduates improve memory test performance by studying after the test.

    • Status: not replicated
    • Original paper: ‘Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect’, Bem 2012, 9 experiments with: Study 1: n=100; Study 2: n= 150; Study 3: n = 100; Study 4: n = 100; Study 5: n = 100; Study 6: n = 150; Study 7: n = 200; Study 8: n = 100; Study 9: n = 50. [citation = 1216 (GS, March 2022)]. Critiques: Gelman and Loken 2013 [newspaper article, n=NA, citations=0(GS, April 2023)]. Ritchie et al. 2012 [3 replications: Replication 1: n = 50, Replication 2: n = 50; Replication 3: n = 50, total n=150, citations=235 (GS, December 2021)]. Schimmack and Chen 2018 [blog, n=NA, citations=0(GS, April 2023)].
    • Original effect size: Study 1: d=0.25; Study 2: d = 0.20; Study 3: d = 0.26; Study 4: d = 0.23; Study 5: d = 0.22; Study 6: negative trials: d = 0.15, erotic trials = d = 0. 14; Study 7: d = 0.09; Study 8: d = 0.19; Study 9: d = 0.42; mean effect size d= 0.22.
    • Replication effect size: All effect sizes are reported in Ritchie et al.: Replication 1: d = 0.30, Replication 2: d = -0.39, combined: d = 0.04 (converted using this).

Evolutionary psychology

  • Fertility facial-preferences effect. Women prefer more masculine rather than feminised faces of potential partners during the fertile phase of their menstrual cycle. The preference for secondary sexual traits in male face shapes varies with the probability of conception.

    • Status: mixed
    • Original paper: ‘Menstrual cycle alters face preference’, Penton-Voak et al. 1999; 2 studies including two sessions (during low vs. high conception risk in ovulatory cycle): (1) on Japanese subjects n=39 and (2) on British subjects n=65. [citations=992 (GS, June 2022)]​.
    • Critiques: Harris 2011 [n=853 (effect tested on 258 women), citations=14 (GS, June 2022)].
    • Original effect size: not reported; _ηp2 _= 0.20 (Japanese sample) and ηp2 = 0.04 (British sample) [calculated for main effect of conception risk using Lakens 2018].
    • Replication effect size: Harris: not reported​; d = 0.29 (for Caucasian faces) and d = 0.00 (for Japanese faces) [calculated for main effect of conception risk using Lakens 2018].

  • Dunbar’s number. The number of neocortical neurons limits the organism’s information-processing capacity and this then limits the number of relationships that an individual can monitor simultaneously. Humans are cognitively or emotionally limited to 150 relationships with other people.

    • Status: not replicated
    • Original paper: ‘Neocortex size as a constraint on group size in primates’, Dunbar 1992; correlational design, n=38 genuses of primate with mean brain volumes, few brains per genus. [citations=3168 (GS, December 2022)]​.
    • Critiques: Lindenfors et al. 2021 [different datasets with n= 71 to n =142, citations=23 (GS, December 2022)].
    • Original effect size: neocortex ratio and mean group size r2 =0.764 [translates to ≈150 members of the expected human group size, reported in Lindenfors et al.].
    • Replication effect size: Lindenfors et al.: human group size average of 69.2 individuals [3.8, 292.0].

  • Romantic priming. Looking at attractive women increases men’s conspicuous consumption, time discount and risk-taking.

    • Status: not replicated
    • Original paper: ‘Do pretty women inspire men to discount the future?’, Wilson and Daly 2003; within-subjects design, n=209. [citation=613 (GS, February 2022)].
    • Critiques: Shanks et al. 2015 [n= show that the 43 previous studies have an unbelievably bad funnel plot, they also run 8 failed replications, citations=109 (GS, April 2023)].
    • Original effect size: d=0.55 [-0.04, 1.13] for the difference between men and women; meta-analytic d= 0.57 [0.49, 0.65].
    • Replication effect size: Shanks et al.: d=0.00 [-0.12, 0.11].

  • Implicit religious priming. Implicitly priming god concepts by unscrambling sentences with words relating to religion increases prosocial behaviour in an anonymous economic game.

  • Implicit analytic priming Implicitly priming analytic thinking by seeing a photo of Auguste Rodin’s The Thinker decreases belief in God.

    • Status: not replicated
    • Original paper: ‘Analytic thinking promotes religious disbelief’, Gervais and Norenzayan 2012; experimental design, n=57. [citation=601 (GS, December 2021)].
    • Critiques: Camerer et al. 2018 [n=224 and n=531, citations=871 (GS, December 2021)]. Sanchez et al. 2017 [n=941, citations=59 (GS, December 2021)].
    • Original effect size: d=-0.25 to d=0.12.
    • Replication effect size: Camerer et al.: Study 1 r=-0.055, Study 2 r=-0.035. Sanchez et al.: d=-0.25 to d=0.12.

  • Menstrual cycle version of the dual-mating-strategy. Hypothesis that “heterosexual women show stronger preferences for uncommitted sexual relationships [with more masculine men] during the high-fertility ovulatory phase of the menstrual cycle, while preferring long-term relationships at other points”.

    • Status: replicated
    • Original paper: ‘Menstrual cycle variation in women’s preferences for the scent of symmetrical men’, Gangestad and Thornhill 1998; within-subjects design, n=100. [citations=673 (GS, April 2023)].
    • Critiques: Gildersleeve et al. 2014 [meta-analysis, total sample of 134 effects from 38 published and 12 unpublished studies, n=5471, citations=491(GS, April 2023)]. Jones et al. 2018 [review paper, n=NA, citations=111 (GS, April 2023)]. Wood et al. 2014 [meta-analysis, k=58, citations=277(GS, April 2023)].
    • Original effect size: the greater the fertility risk of a woman, the greater her preference for scent associated with male symmetry, r =0.54.
    • Replication effect size: Gildersleeve et al.: women’s preference for men with characteristics that reflected genetic quality ancestrally was approximately 0.15 of a standard deviation stronger at high fertility than at low fertility; weighted mean g in a short-term context g = 0.21, SE = 0.06, in an unspecified relationship context g = 0.16, SE = 0.05, and in a long-term context was near zero g = 0.06, SE = 0.06, p =.32; comparing the three contexts revealed that the weighted mean g was larger in a short-term context than in a long-term context, and this difference was statistically significant (p = .002). Jones et al.: NA. Wood et al.: Preferences of fertile versus nonfertile women - Testosterone g= 0.11 [−0.20, 0.42], Masculinity g=0.08 [−0.01, 0.16], Dominance g=0.05 [−0.06, 0.16], Symmetry g=0.22 [0.05, 0.39], Kindness g=0.07 [−0.04, 0.18], Health g=−0.19 [−0.29, –0.09]; Between-phase effect sizes across short term, long term, and no relationship contexts – Masculinity: Short term g=0.09 [−0.07, 0.24], Long term g= 0.03 [−0.08, 0.13], No context g=0.09 [−0.03, 0.20]; Dominance: Short term g=0.02 [−0.14, 0.18], Long term g=−0.01[−0.16, 0.14], No context g=0.10 [−0.14, 0.34]; Symmetry: Short term g=0.11 [−0.18, 0.40], Long term g=0.06 [−0.13, 0.25], No context_ g_= 0.32 [0.09, 0.55]; Kindness: Short term _g_=0.11 [−0.07, 0.28], Long term _g_=0.06 [−0.08, 0.20], No context _g_=−0.004 [−0.27, 0.26]; Health: Short term _g_=−0.33 [−0.67, 0.02], Long term _g_=0.00 [−0.20, 0.20], No context _g_=−0.24 [−0.34, −0.13].

  • Menstrual cycle and lunar influence. Women with a 29.5+/-1 day menstrual cycle tend to menstruate during a full moon.

    • Status: not replicated
    • Original paper: ‘Lunar Influences on the Reproductive Cycle in Women’, Cutler et al. (1987); cross-sectional design, N = 229. [citation=54(GS, March 2022)]​.
    • Critiques: Komada et al. 2021 [n=3163, citations=3(GS, March 2022)].
    • Original effect size: not reported, two plots showed significant difference from uniform distribution, suggesting that the menstrual phase coincided more often with the full moon.
    • Replication effect size: Komada et al.: z = −0.58.​

  • Large parents have more sons. Bigger and taller parents have more sons.

    • Status: not replicated
    • Original paper: ‘Big and tall parents have more sons:Further generalizations of the Trivers–Willard hypothesis’, Kanazawa 2005; correlational design, N=22,680. [citations = 105 (GS, December 2022)].
    • Critiques: Denny 2007 [N= 8,249, citations=22 (GS, December 2022)].
    • Original effect size: effects of weight on number of boys _b = _0.0183 and number of female foetuses b = -0 .0154; effects of height on number of girls b = -0.0152 and number of male foetuses b = 0.0182.
    • Replication effect size: Denny: effects of father’s height on having male children not significant in three regression models Z = 0.080 to Z = 0.310; effects of mother’s height on having male children not significant in three regression models Z = 0.010 to Z = 0.230; effects of father’s BMI on having male children not significant in three regression models Z = 0.320 to Z = 0.400; effects of mother’s BMI on having male children not significant in three regression models Z = 0.230 to Z = 0.440.

  • Men’s strength in particular predicts opposition to egalitarianism. Muscular men are less likely to support social and economic equality.

    • Status: not replicated
    • Original paper: ‘The Ancestral Logic of Politics: Upper Body Strength Regulates Men’s Assertion of Self-Interest Over Economic Redistribution’, Petersen et al. 2013; cross-sectional design, n = 213 in Argentina, 486 in United States, 793 in Denmark, N = 1392. [citation=213(GS, May 2022)].
    • Critiques: Gelman and Loken 2019 [n = not reported, citations = 726(GS, May 2022)]. Petersen and Laustsen 2019 [n = 12 different samples from multiple countries, citations = 45(GS, April 2023)].
    • Original effect size: not reported.
    • Replication effect size: Gelman and Loken: not reported in but effect disappears once participant age is included. Petersen and Laustsen: They are very focussed on statistical significance instead of effect size. Overall male effect was b = 0.17 and female effect was b = 0.11, with a nonsignificant difference between the two (p = 0.09). (They prefer to emphasise the lab studies over the online studies, which showed a stronger difference.) Interesting that strength or “formidability” has an effect in both genders, whether or not their main claim about gender difference holds up.

  • Sex differences in mate preferences. Men and women differ in preferences of a potential mate which reflects different evolutionary selection pressures. Across 33 countries (original study; 45 - replication) researchers found universal sex differences such as: men, more than women, prefer attractive, young mates, and women, more than men, prefer older mates with financial prospects.

    • Status: replicated
    • Original paper: ‘Sex differences in human mate preferences: Evolutionary hypotheses tested in 37 cultures’, Buss 1989; questionnaire asking about preferences concerning potential mates such as: good financial prospect, ambition and industriousness, age difference between self and spouse, physical attractiveness, chastity, etc., n=10,047. [citations=6,136(GS, June 2022)]​.
    • Critiques: Walter et al. 2021 [n=14,399, citations=96(GS, June 2022)].
    • Original effect size: N/A. The author provided only country-level t-tests separately for each characteristic.
    • Replication effect size: Walter et al.: b = -0.30 [Mate preferences were standardised across countries prior to analysis, so this and all b values can be interpreted as equivalent to Cohen’s d’s].

  • Men’s preference for competition. Men are more likely to select tournaments than women, because women tend to avoid competition and men look for competition.

    • Status: not replicated
    • Original paper: ‘Do women shy away from competition? Do men compete too much?’, Niederle and Vesterlund 2007; within-subject experiment, n=80. [citations=4001(GS, February 2023)]​.
    • Critiques: Price 2020 direct replication of Niederle & Vesterlund study [n=60, citations=23 (GS, February2023)].
    • Original effect size: Man more likely to select tournaments then women – probit regression coefficients in various regression models -.38 to -.16.
    • Replication effect size: Price: A female in the replication has a 22.4 percentage point higher probability (58.0% versus 35.6%) of entering into the tournament than a female in NV (original study) after controlling for task performance; probit regression coefficient for the NV*Female interaction term from -0.65 (non-significant) to -0.99 (significant) [not replicated].

  • Orgasm gap (Orgasm equality). There is a gendered orgasm gap, with men experiencing orgasm more frequently than women in heterosexual sexual encounters.​

    • Status: replicated.
    • Original paper: ‘The Incidental Orgasm: The Presence of Clitoral Knowledge and the Absence of Orgasm for Women’, Wade et al. 2005; correlational study, n=833 [citations=135(GS, May 2023)]​.
    • Critiques: Garcia et al. 2014 [n=2,850 single individuals, citations=134(GS, May 2023)]. ​Mahar et al. 2020 [systematic review, n=NA, citations=74(GS, May 2023)].
    • Original effect size: The orgasm gap was 52 percent: 39 percent of women, compared to 91 percent of men, usually or always experienced orgasm in partnered sex; d= -1.26 [-1.44, -1.07] (estimated from the data in Table 6 and using this conversion).
    • Replication effect size: Garcia et al.: Compared with women, men reported a significantly higher mean occurrence rate of orgasm frequency, η2 = 0.12. Mahar et al.: ES not reported; six covered studies all showed that males report more frequent orgasm than females.


Psychophysiology

  • Sympathetic nervous system activity predicts political ideology. There are psychophysiological correlates of political ideology – conservatives react with higher levels of electrodermal activity (EDA)/ skin conductance to threatening stimuli than liberals.

    • Status: mixed
    • Original paper: ‘Political Attitudes Vary with Physiological Traits’, Oxley et al. 2008; correlational study, n=46. [citations=858(GS, December 2022)].
    • Critiques: Osmundsen et al. 2022 [n=318, meta-analyses: n = 484 across seven studies, citations=40(GS, December 2022)].
    • Original effect size: Effects of physiological reactions on support for socially protective policies β = 0.377.
    • Replication effect size: Osmundsen et al.: significant OLS effects of the EDA responses to threat on the Wilson-Patterson Battery, b = .33, and the Social Conservatism Scale, b = .33, but in the US sample only; not significant in the Dutch sample (b = -.2 and b = -.22, respectively) and overall (b = .13 and b = .11, respectively) [partly replicated]. All reported in Osmundsen et al.: OLS effects of EDA on social conservatism – Aaroe: 0.28 [-0.11, 0.66]. Dodd: 0.68 [0.34, 1.02]. Knoll: -0.17 [-0.43, 0.08]. Oxley: 0.54 [0.25, 0.84]. Petersen: -0.23 [-0.53, 0.07]. Smith: 0.18 [-0.14, 0.49]. OLS effects of EDA on left-right self-placement: Aaroe: 0.11 [-0.25, 0.47]. Coe: -0.00 [-0.18, 0.17]. Dodd: 0.53 [0.26, 0.81]. Smith: 0.29 [-0.01, 0.58]. Petersen: -0.09 [-0.38, 0.20]. OLS effects of EDA on economic conservatism: Knoll: 0.29 [0.05, 0.53]. Petersen: -0.19 [-0.48, 0.10]. Smith: -0.12 [-0.33, 0.08].

Behavioural Genetics

  • 5-HTT Gene-by-Environment Interaction (5-HTT G x E). Polymorphisms in the serotonin transporter gene-linked promoter region (5-HTTLPR) moderate the experience of depression after stressful life events. People homozygous for the “short” allele (s/s) are significantly more likely to experience depression than people homozygous for the “long” allele (l/l) after multiple stressful life events; heterozygotes (s/l) demonstrate an intermediate response.

    • Status: not replicated
    • Original paper: ‘Influence of Life Stress on Depression: Moderation by a Polymorphism in the 5-HTT Gene, Caspi et al. 2003; quasi-experimental analysis of how genotype (3 levels: s/s, s/l, l/l) and number of stressful life events (5 levels: 0, 1, 2, 3, 4+) across various categories (employment, financial, housing, health, relationship) that occurred after 21st birthday and before 26th birthday interacted to produce depression (measured by self-reported depression symptoms, probability of major depression episode, probability of suicide ideation/attempt, and informant reports of depression), 847 26-year-old participants from the Dunedin Study (a longitudinal study of 1037 non-Maori babies born between 1972-04-01 and 1972-03-31 in New Zealand). [citations = 6223 (Dimensions, June 2022)]​.
    • Critiques: Chabris et al. 2012 [n=samples of 5,571, 1,759, and 2,441 individuals, citations=347(GS, April 2022)] for intelligence. Culverhouse et al. 2017 [n = 28,252 (21 studies), meta-analysis, citations = 173 (Crossref, June 2022)]. No good evidence that 5-HTTLPR is strongly linked to depression, insomnia, PTSD, anxiety, and more. S Farrell et al. 2015 [n=25 historical candidate genes for schizophrenia, citations=345(GS, April 2022)] for schizophrenia. Serretti et al. 2007 [review paper, n=NA, citations=178(GS, April 2022)] for the overall phenotypic profile of HTR2A variant carriers.
    • Original effect size: N/A (unreported by Caspi; however, an OR for Caspi’s data was calculated by Risch et al. 2009 and displayed in Figure 2C approximately 1.3).
    • Replication effect size: Chabris et al.: We sought to replicate published associations between g and 12 specific genetic variants (in the genes DTNBP1, CTSD, DRD2, ANKK1, CHRM2, SSADH, COMT, BDNF, CHRNA4, DISC1, APOE, and SNAP25) using data sets from three independent, well-characterized longitudinal studies. Of 32 independent tests across all three data sets, only 1 was nominally significant. Culverhouse et al.: OR = 1.05 [0.94 - 1.16]. Farrell et al.: historical candidate gene literature did not yield clear insights into the genetic basis of schizophrenia.

Applied Linguistics

  • Critical period hypothesis. How grammar-learning ability changes with age, finding that it is intact to the crux of adulthood (17.4 years) and then declines steadily.

  • Motivational role of L2 vision. Mental imagery of oneself as a successful language user in the future can enhance one’s motivation and performance.

    • Status: not replicated
    • Original paper: Motivation, vision and gender, You et al. 2016; correlational design, n=10,569 [citation=172(GS, October 2022)]​.
    • Critiques: Hiver and Al-Hoorie 2020 [n=1297, citations=36(GS, Oct 2022)].
    • Original effect size: η2 =0.08.
    • Replication effect size: Hiver and Al-Hoorie: Model 1 - Ideal L2 self on grades (β=0.38) and intended effort (β=0.75), Outght-to-L2 self on grades (β=-0.11) and intended effort (β=0.21); Model 2 - Ideal L2 self on grades (β=0.44), Outght-to-L2 self on grades (β=-0.10).


Educational Psychology

  • Flipped learning. Students learn better if they do homework about a lesson before coming to class to study that lesson.

    • Status: replicated
    • Original paper: ‘Flip Your Classroom: Reach Every Student in Every Class Every Day’, Sams and Bergmann 2012; book, n = NA. [citation=6585(GS, December 2021)]​.
    • Critiques: Cheng et al. 2019 [n=7912, citation=195(GS, January 2022)]. Låg and Sæle 2019 [n=not reported, number of reports=272, citation=106(GS, January 2022)]. Lo and Hew 2017 [n=NA, citations=423(GS, December 2021)]. Lo and Hew 2019 [n=5329, citation=43(GS, January 2022)]. Shi et al. 2020 [n=6947, citation=60(GS, January 2022)]. Strelan et al. 2020 [n=33678, citation=107(GS, January 2022)]. van Altren et al. 2019 [n=24771, citation=239(GS, January 2022)]. Vitta & Al-Hoorie 2020 [n=4220, citation=17(GS, January 2022)]. Xu et al. 2019 [n=4295, citation=33(GS, January 2022)].
    • Original effect size: NA, theoretical paper/book (although report a descriptive data that the flipped class model helped students with lower maths skills perform at a similar level as a group with higher maths skills in a mathematically heavy science class).
    • Replication effect size: Strelan et al.: g = 0.50 [0.42, 0.52] cross-disciplinary. Cheng et al.: g = 0.19 [0.11, 0.27]​ cross-disciplinary. Låg and Sæle: g = 0.35 [0.31, 0.40] cross-disciplinary. Lo and Hew: g = .29 [0.17, 0.41) engineering education. Shi et al.: g = 0.53 [0.36, 0.70] cross-disciplinary. van Altren et al.: g = 0.36 [0.28, 0.44] cross-disciplinary. Xu et al.: d = 1.79 [1.32, 2.27] nursing education in China. Vitta and Al-Hoorie: g = 0.99 [0.81, 1.16] second language learning. In Vitta and Al-Hoorie’s study, Trim and Fill suggested possible publication bias inflating the results, but the adjusted effect size remained sizable: g = 0.58 [0.37, 0.78].

  • Mindsets. People’s beliefs about whether their talents and abilities are subject to growth and improvement. In recent years, mindset proponents have argued that interventions work but only for low SES populations or low-performing students.

    • Status: mixed
    • Original paper: ‘Implicit Theories of Intelligence Predict Achievement Across an Adolescent Transition’, Blackwell et al.2007; Study 1: longitudinal study, n=373; Study 2: n=99. [citation=4643(GS, October 2021)]​.
    • Critiques: Foliano et al. 2019 [n=4,454, citation=45(GS, October 2022)].Li and Bates 2017 [Study 1: n=190, Study 2: n=222, Study 3: n=212, citation=30 (GS, January 2023)]. Macnamara and Burgoyne 2022 [meta-analysis of 63 studies, n = 97,672; citations=2(GS, January 2023)]. Sisk et al. 2018 [two meta-analyses, k = 273, N = 365,915 and k = 43, N = 57,155, citation=837(GS, April 2023)]​.
    • Original effect size: (between growth mindset and achievement) β=.10-.70, r=.12-.20.
    • Replication effect size: Foliano et al.: found no effect of mindset intervention on school grades (NA = 0). Li and Bates: no effect of growth mindset on grades in all three studies. Macnamara and Burgoyne: significant effect: d=0.033-1.51; significant but opposite direction: d=-0.98- -0.68; non-significant: d=-0.56-1.39; overall effect non-significant after correcting for publication bias. Sisk et al.: The relationship between mindsets and academic achievement is weak: Of the 129 studies that they analysed, only 37% found a positive relationship between mindset and academic outcomes. Furthermore, 58% of the studies found no relationship and 6% found a negative relationship between mindset and academic outcomes. Evidence on the efficacy of mindset interventions is not promising: of the 29 studies reviewed, only 12% had a positive effect, 86% of the studies found no effect of the intervention and 2% found a negative effect of the intervention.

  • First instinct fallacy, the false belief that one’s initial thoughts/ideas/answers are right and closer to the truth than revised thoughts/ideas/answers. Surveys have shown that students generally believe that changing answers on a multiple-choice test lowers test scores, but research seems to show that most people who change their answers usually improve their test scores.

    • Status: replicated
    • Original paper: ‘Counterfactual thinking and the first instinct fallacy’, Kruger et al. 2005; Study 1: compared the anticipated and actual outcome of sticking versus switching, n=1561; Studies 2-4: tested the counterfactual thinking interpretation of the first instinct fallacy, total N=118. [citation=183(GS, November 2022)]​.
    • Critiques: Couchman et al. 2016 [n=62, citations=11 (GS, March 2023)].
    • Original effect size: Study 1 (that answer changes from wrong to right outnumber changes from right to wrong, changing benefits; not provided in paper, calculated here): Comparing changes from right to wrong and wrong to right: d ≈ 0.25, Comparing changes from right to wrong and wrong to wrong: d ≈ 0.64; Study 3 (analysing discrepancy between actual outcomes and participants' memory of those outcomes): Consequences of switching vs. sticking: η ² = .22, Memory consequences: η ² = .31.
    • Replication effect size: Couchman et. al.: d=0.58. Students who were confident in their first answers were usually right, and those who were not confident were usually wrong. They also found that changing answers from low to high confidence improved scores, and changing from high to low confidence lowered scores. They said that metacognition, or thinking about one’s own thinking, helped students decide when to change answers, and that the first instinct fallacy depended on how confident students were in their first answers. Additionally shows that although participants were generally better at answering questions when they were confident in their first instinct, revising answers can still lead to improved performance, especially when participants are less confident in their initial response.

  • Mirror reading and writing in dyslexia (Strephosymbolia). Strephosymbolia is a term coined to describe a learning disorder in which symbols and especially phrases, words, or letters appear to be reversed or transposed in reading. The incidence of strephosymbolia seems to be widely documented.​

    • Status: replicated
    • Original paper: ‘Word-blindness’ in school children’, Orton 1928; experimental clinic observational, n=15. [citation=535(GS, March 2023)]​.
    • Critiques: Lavers 1981[anecdotal account ~1-2/year, citations=3(GS, March 2023)]. Rosen 1955 [n=1, citations=61(GS, March 2023)]. Original effect size: NA.
    • Replication effect size: NA.

  • Pen mightier than the screen. Learning for conceptual-application questions is more effective when taking longhand notes than with a laptop.

    • Status: mixed
    • Original paper: ‘The Pen Is Mightier Than the Keyboard: Advantages of Longhand Over Laptop Note Taking’, Mueller and Oppenheimer 2014; Experiment 1, n = 65 participants after exclusions, Experiment 2, n = 149; Experiment 3, n = 109. [citation=1585 (GS, October 2022)]​.
    • Critiques: Urry et al. 2021 [N =145, n=68 in laptop condition, n = 74 in longhand condition, citations = 12 (GS, October 2022)].
    • Original effect size: Participants perform better on conceptual application questions when taking longhand notes compared to using a laptop, d = .38, p = .046 [calculated]. Experiment 2, Conceptual application questions, d = .41 [calculated]. Experiment 3 assessed the interaction between note-taking medium (longhand vs. laptop) and studying: among students who had the opportunity to study, longhand note takers did significantly better than laptop note takers, p = .024, d = 0.64 on conceptual questions. Non-significant effects are reported for factual recall and therefore not reported here.
    • Replication effect size: Urry et al.: The effect in the replication study was negligible in the opposite direction (Hedges’s g = −0.13 [−0.45, 0.20]), significantly different from the original effect, t(139.03) = −2.78, p = .003; A mini-meta analysis of eight studies found that taking notes longhand as opposed to with a laptop boosted total quiz performance across factual and conceptual item types to a negligible degree (Hedges’s g = 0.04 [−0.13, 0.20]); this effect was not statistically significant (z = 0.46, p = .645).

  • Sleep-assisted learning (hypnopedia). Memorising new information while sleeping.

    • Status: mixed
    • Original paper: Various sources but the one of the earliest published studies ‘The breaking of a habit by suggestion during sleep’, Leshan 1942; between-subject experimental design, n=40 boys. [Citations=36(GS, March 2023)]​.
    • Critiques: Aarons 1976 review paper [n = 11 studies, citations=120(GS, March 2023)]. Arzi et al. 2012 [n = 69 - 14 exclusions = 55 total, citations=261(GS, March 2023)]. Simon and Emmons 1955 review paper [n=10 studies, citations=80(GS, March 2023)]. Wood et al. 1992 [n = 31, citations=261(GS, March 2023)].
    • Original effect size: ES not reported, but 40% of boys broke their fingernail-biting habit after listening to “My fingernails taste terribly bitter” message during 54 nights, as compared to 0% in two control groups.
    • Replication effect size: Aarons: ES not reported; unclear if sleep-assisted learning works, preliminary results indicate that age, sex, health, wake learning capacity, suggestibility, and motivation to learn are important factors. ​Arzi et al.: sleeping subjects learned novel associations between tones and odours: Odour pleasantness is processed during sleep in a pattern resembling that during wake – Sniff volume during sleep was greater following pleasant odorants than unpleasant odorants (d= 1.39, estimated from the reported t(27) = 3.7, p< 0.001 using this conversion); Participants learned a novel association and acted on this learning, all during sleep – the sniff volume during sleep was larger after a tone that was previously paired during sleep with a pleasant odour than after a tone that was previously paired during sleep with an unpleasant odour (d= 1.29 , estimated from the t(19) = 2.9, p< 0.01 using this conversion). Simon and Emmons: ES not reported, but the critical analysis of ten sleep-learning studies concludes that it is highly speculative whether or not the studies reviewed in this paper have presented any acceptable evidence that learning during _sleep _is possible; even when the results is found they cannot be construed as _sleep-learning. _Wood et al.: Homophone test - presentation of items affected the implicit memory scores of waking controls but not sleeping subjects, indicating the absence of implicit learning during sleep (ηp ²= 0.85, estimated from the reported F(l, 20) = 118.41. p < .001); Category-instance task - waking controls but not sleeping subjects reporting the paired instance more frequently for presented than non-presented items (ηp ²= 0.77, estimated from the reported F(l, 20)= 67.8, p <.001 using this conversion); Recognition tests – Waking subjects were significantly more likely to recognize the presented homophones (ηp ²= 0.84, estimated from the reported F(1, 22) = 106.46, p < .001 using this conversion), and the presented categories of the category-instance pairs (ηp ²= 0.77, estimated from the reported F(l, 23) = 70.1, p < .001 using this conversion).

  • Dr. Fox Effect. Students rate educators higher based on qualities beyond the educational content itself (e.g., charisma, enthusiasm, entertainment).

    • Status: replicated.
    • Original paper: ‘The Doctor Fox lecture: A paradigm of educational seduction’, Naftulin et al. 1973; within-subjects, Experiment 1: n = 11, Experiment 2: n = 11, Experiment 3: n = 33. [citations = 743 (GS, February 2023)]​.
    • Critiques: . Peer and Babad 2014 [n = 46, citations = 28 (GS, February 2023)].Ware and Williams 1975 [n_ _= 207, citations = 306 (GS, February 2023)]
    • Original effect size: ES = Not able to estimate. However, people placed greater weight on the presentation style than the content.
    • Replication effect size: ​​Peer and Babad: Cohen’s w = 0.35 (estimated from test-statistic: χ2(1, N = 46) = 5.57, p = .018) (replicated).Ware and Williams: ωp2 = 0.27 [0.17, 0.36] (estimated from test-statistic: F(1,201) = 74.2, p < .001) (replicated); However, there was an interaction between content-coverage and characteristics: ωp2 = 0.03 [0.00, 0.09] (estimated from test-statistic: F(2,201) = 4.3, p = .015), specifically, medium sized content and positive characteristics led to the best overall rating.


Health Psychology

  • Stress as the main/sole cause of peptic ulcers. Stress was the main cause of peptic ulcers (with secondary contributing factors thought to be excess stomach acid, spicy food).

    • Status: mixed.
    • Original paper: Multiple sources, but one of the earliest published ‘Chronic stress and peptic ulcer’ Gray et al. 1951; experimental and observational study, n= 7. [citations=279(GS, March 2023)].
    • Critiques: Skillman et al. 1969 [n=150, citations=406(GS, March 2023)]. Marshall and Warren 1983 [n=100, citations=7515(GS, March 2023)]. Gough et al. 1984 [n=484, citations=145(GS, March 2023)]. Levenstein et al. 1997 [n=484, citations=113(GS, March 2023)]. Levenstein et al. 1999 [n=4,595, citations=84(GS, March 2023)]. Levenstein et al. 2015 [n=3,379, citations=188(GS, March 2023)].
    • Original effect size: ES not reported but the studies suggested that chronic emotional and physical stress is transmitted to the stomach by a hormonal mechanism mediated through the adrenal gland and may induce gastrointestinal ulceration through the hypothalamic-pituitary-gastric pathway.
    • Replication effect size: Skillman et al.: Stress ulceration described; increased secretion of acid may be an important cause of this disease (replicated). Marshall and Warren: ES not reported, but the bacterium Helicobacter pylori identified as correlated with ulcers and were present in almost all patients with active chronic gastritis, duodenal ulcer, or gastric ulcer and thus may be an important factor in the aetiology of these diseases (not replicated). Gough et al.: treating ulcers with antibiotics reduced recurrence by approximately 90-95% (not replicated). Leventstein et al.: five baseline psychological measures (depression, hostility, ego resiliency, social alienation or anomy, and personal uncertainty) had significant age-adjusted associations with incident ulcer [OR= 1.8 to 2.6]. Levenstein et al.: ES not reported, but conclusion that in most ulcer cases where stress is involved, H. pylori is likely to be present as well. The impact of the two factors may be additive (replicated). Levenstein et al.: ulcer incidence was significantly higher among subjects in the highest tertile of stress scores (3.5%) than the lowest tertile (1.6%) (adjusted odds ratio, 2.2; 95% confidence interval [CI], 1.2–3.9; P < .01); Stress had similar effects on ulcers associated with H. pylori infection and those unrelated to either H. pylori or use of nonsteroidal anti-inflammatory drugs (replicated).

  • Graphic images and cigarette use (Graphic warning labels). Introducing graphic warning labels (GWL) on cigarette packages reduces smoking prevalence.​

    • Status: mixed
    • Original paper: ‘Cigarette graphic warning labels and smoking prevalence in Canada: a critical examination and reformulation of the FDA regulatory impact analysis’, Huang et al. 2014; quasi-experimental study, n= adult smoking prevalence data from the USA and Canada for 1991–2009. [citations=122(GS, February 2023)]​.
    • Critiques: Harris et al. 2015 [n= 31,230 from nationwide registry of all pregnancies in Uruguay during 2007–2013, citations=61(GS, February 2023)]. Shang et al. 2017 [n= 21,683 across 18 countries, citations=18(GS, February 2023)]. Beleche et al. 2018 [n=data on smoking rates in Canada and USA for the population 15 years and older in the years 1994 to 2009, citations=2(GS, February 2023)].​
    • Original effect size: estimates of graphic warning labels (GWL) effects are statistically significant in all regression models and range from β=−0.13 to β=−0.22 corresponding to reduced smoking prevalence between 12.1% and 19.6%.
    • Replication effect size: Harris et al.: estimated effects of graphic image warning in different regression models on probability of quitting smoking during pregnancy significant and range from β=0.028 (non-significant) to β=0.038 [replicated]. Shang et al.: graphic warnings were associated with a 10.0% (OR = 0.89 [0.81, 0.97], p ≤ 0.01) lower cigarette smoking prevalence among adults with less than a secondary education or no formal education [replicated]. Beleche et al.: estimates of graphic warning labels effects in different regression models range from β=−0.03 (non-significant) to β=−0.22 (significant) [partly replicated].


Political Psychology

  • Stereotype threat on gender differences in political knowledge. Making gender stereotypes about political knowledge salient decreases womens’ performance on political knowledge tests.

    • Status: Not replicated
    • Original paper: ‘Gender Differences in Political Knowledge: Bringing Situation Back In’, Ihme and Tausendpfund 2018; 2 experiments with Study 1: N= 603; Study 2: N=377. [citation=18 (GS, February2022)]​.
    • Critiques: Azevedo et al. [Preprint, not available] [n=1502, citations=NA].
    • Original effect size: Study 1: partial η2 =0.12; Study 2: partial η2 =0.03 ​
    • Replication effect size: Azevedo et al.: partial η2 =0.00​.

  • Avoidance of dissonance-arousing situations (Ideological asymmetry in dissonance avoidance)**. There are differences in how liberals and conservatives respond to dissonance-arousing situations – conservatives are more strongly motivated to avoid dissonance-arousing tasks than liberals.​​

    • Status: not replicated
    • Original paper: ‘“Not for All the Tea in China!” Political Ideology and the Avoidance of Dissonance-Arousing Situations’, Nam et al. 2013; two mixed-design experiments, n1=180, n2= 159. [citations=147(GS, December 2022)]​.
    • Critiques: Brandt and Crawford 2013 pre-print [n=451, citations=46(gs, December 2022)].​
    • Original effect size: Study 1 – Bush vs. Obama preference effects on compliance with the dissonance-arousing, politically relevant task in high choice situation b = -2.36​; effect of political orientation on likelihood of compliance with the instruction to write a counter-attitudinal political essay under the circumstance of high perceived choice b = -.31; Study 2 - Regan vs. Clinton preference effects on compliance with the dissonance-arousing, politically relevant task in high choice situation b = -1.36​; effect of political orientation on likelihood of compliance with the instruction to write a counter-attitudinal political essay under the circumstance of high perceived choice b = -.01 (non-significant).
    • Replication effect size: Brandt and Crawford: non-significant effects on compliance behaviour of ideology (Democrat vs. Republican), b = -0.02, interaction between political ideology and essay topic (computers vs politics), b = 0.06, interaction between ideology and essay type (attitude consistent vs attitude inconsistent), b =0.05, and interaction between ideology, essay type and essay topic, b = -0.002 (not replicated).

  • Depressed-entitlement effect among women. In the absence of clear-cut standards of comparison, women reward/pay themselves significantly less money than do men for the same amount of work.

    • Status: replicated.
    • Original paper: ‘Sex, age, and equity behavior’, Leventhal & Lane 1970; between-subjects experiment, N = 61. [citation=269(GS, October 2022)]​.
    • Critiques: Callahan-Levy and Messe 1979 [Study 1 n=126, Study 2 n = 80, citation=222(GS, October 2022)].​.​ Jost 1997 [N=132, citation=192(GS, October 2022)]​. Hogue and Yoder 2003 [N=180, citation=78(GS, October 2022)] Lane and Messe 1971 [N=128, citation=174(GS, October 2022)].. Major et al. 1984 [Experiment 1 n=76, Experiment 2 n=80, citation=408(GS, October 2022)]​
    • Original effect size: main effects of sex on rewards taking_ ηp2 _= 0.22 / _d_ = 0.53 [_ηp2_ calculated from F statistic values and Table 1 data and converted using this conversion].
    • Replication effect size: Callahan-Levy and Messe: Study 1 – Target X Sex of Target interaction effects on actual pay ηp2 = 0.16 / d = 0.43 [ηp2 calculated from F statistic values and converted using this conversion], Target X Sex of Target interaction effects on fair pay estimates_ ηp2 _ = 0.16 / _d_ = 0.41 [_ηp2_ calculated from F statistic values and converted using this conversion] (replicated); Study 2 – main effect of sex on pay allocation _ηp2 _= 0.18 / _d_ = 0.47 [_ηp2_ calculated from F statistic values and converted using this conversion], main effect of sex on fair pay estimates _ηp2 _= 0.11 / _d_ = 0.35 [_ηp2_ calculated from F statistic values and converted using this conversion] (replicated). Jost: _ηp2 _= 0.044 / _d_ = 0.21 [_ηp2_ calculated from F statistic values in Table 1 and converted using this conversion] (replicated). Hogue and Yoder: _ηp2_ = 0.165 [reported] / _d_ = 0.87 [calculated from M, SD and (sub)sample size data for two independent samples in control condition] (replicated). Lane and Messe: main effects of sex on self-interest responses _ηp2 _= 0.099 / _d_ = 0.33 [_ηp2_ calculated from F statistic values and converted using this conversion] (replicated). Major et al.: ​ Study 1 –_ηp2 _= 0.13 / _d_ = 0.38 [Sex X Social Comparison Condition interaction effects, _ηp2_ calculated from F statistic values and converted using this conversion]; Study 2 –_ηp2 _= 0.18 / _d_ = 0.47 [main effects of Sex amount of work for the same/fixed amount of money, _ηp2_ calculated from F statistic values and converted using this conversion] (replicated).

  • Gender effects of political candidates. Voters evaluate political candidates based on their gender or sex.

  • Race/ethnicity effects of political candidates. Voters evaluate political candidates based on their race or ethnicity.

    • Status: not replicated
    • Original paper: ‘When White Voters Evaluate Black Candidates: The Processing Implications of Candidate Skin Color, Prejudice, and Self-Monitoring’, Terkildsen 1993; experiment, n = 348. [citations = 567 (GS, February 2023)].
    • Critiques: van Oosten et al. 2023 [meta-analysis, n = 253627, citations = 0 (GS, February 2023)].
    • Original effect size: Light-skinned black candidates were rated 0.7 points lower on an 11 point scale than white candidates (p<0.05). Dark-skinned black candidates were rated 0.6 points lower than white candidates (p<0.05).
    • Replication effect size: van Oosten et al.: Coefficient for Ethnic minority candidate: 0.002 (not significant at the 5% level); Coefficient for Black candidate: 0.007 (not significant at the 5% level); Coefficient for Latinx candidate: -0.002 (not significant at the 5% level); Coefficient for Asian candidate: 0.008 (p<0.05). The reference in all analyses are white candidates. In sum, the meta analysis found no support for the hypothesis that voters discriminate against ethnic minority political candidates, and in the case of Asian candidates, even favour them over white candidates. One caveat when comparing this replication to the original is that the replication considers voters of all races, while the original specifically singles out white voters. The meta analysis on the other hand singles out minority voters in a separate analysis and finds that they prefer political candidates of the same race/ethnicity (b = 0.079, p<0.05).

  • The democratic peace. Citizens of democratic states are significantly more likely to approve of war against non-democratic states than war against other democracies.​

    • Status: replicated
    • Original paper: ‘Why Don´t Democracies Fight Each Other? An Experimental Assessment of the ”Political Incentive" Explanation.‘, Mintz and Geva 1993; experiment, n = 44/34/39. [citations = 268 (GS, February 2023)]
    • Critiques: Tomz and Weeks 2013 [n = 762/1273/1944, citations = 426 (GS, February 2023)].
    • Original effect size: First experiment: -1.73 approval (on a scale from 0 to 10) to use force against democracies (p<0.05), compared to autocracies; Second experiment: -3.06 approval (p<0.01); Third experiment: -2.68 approval (p<0.01).
    • Replication effect size: Tomz and Weeks: First experiment: -13.3% approval to use force against democracies [-19.6, -6.9], compared to autocracies; Second experiment: -11.4% approval [-17, -5.9]; Third experiment: -11.5% approval [14.7, 5.3].

  • Moral foundations across the political spectrum. Moral Foundations Theory is a framework that claims that humans have five innate and universal moral values: Care, Fairness, Loyalty, Authority, and Purity. While extremely influential, it has failed to replicate, with studies critiquing its factor structure (Harper and Rhodes 2020) and universality (Davis et. al. 2016) .​

    • Status: Not replicated
    • Original paper: ‘Liberals and conservatives rely on different sets of moral foundations.’, Graham et. al. 2009; Web-based survey (n=2,212), web-based survey (n=3,303), web-based survey (n=2,030), text analysis (n=88) [citation=4858(GS, March 2023)]​.
    • Critiques: Curry 2019 [n=NA, citations=8(GS, May 2023)]. Davis et al. 2019 [n=1183, citations=80(GS, March 2023)]. Harper and Rhodes 2021 [n=750, citations=31(GS, March 2023)]. Original effect size: Study 1 - 𝛽 = 0.16-0.34​. Average 𝛽 = 0.25; Study 2 - 𝛽 = 0.16-0.52; Study 3 - 𝛽 = 0.14 (for the difference in average willingness to violate Harm foundation between liberals and conservatives), 𝛽 = 0.58-0.91 (for the difference in average willingness to violate the three binding foundations between liberals and conservatives), 𝛽 = 0.36 (for the difference in aggregated moral sacredness ratings between individualising and binding foundations), 𝛽 = 0.18 (for the moderation of the above effect by politics), 𝛽 = 0.43 (for the difference in average willingness to violate the foundations between libertarians and conservatives), 𝛽 = 0.09 (for the difference in average willingness to violate the foundations between libertarians and liberals); Study 4 - 𝛽 = 0.56 (for Harm foundation), 𝛽 = 0.65 (for Fairness foundation), 𝛽 = 1.27 (for Ingroup foundation), 𝛽 = 0.81 (for Authority foundation), 𝛽 = 0.99 (for Purity foundation).
    • Replication effect size: Curry: (1) an ad hoc approach that cherry-picks moral values, (2) failure to include four types of cooperation relevant to morality, and (3) a lack of clear definitions and operationalizations, as well as a flawed questionnaire that measures preferences rather than values and confounds moral relevance with agreement. Americans, and the researchers suggested incorporating more dimensions of oppression and resistance. Davis et al.: White participants: 𝛽 = 0.47, Black participants: 𝛽 = 0.19; Researchers examined the applicability of MFT to Black Americans and found that it did not replicate well in Black samples, suggesting a bias toward White American morality. The six foundations of MFT did not capture the moral concerns of Black. Harper and Rhodes: 𝛽 = 0.14-0.50. They found that only three meaningful clusters emerged in their analysis: traditionalism, compassion, and liberty. They suggested that the MFQ may not be a valid measure of MFT and that the theory may need to be revised, fail to replicate (low effect size).

  • Backfire. Being confronted with corrections of previously held political misconceptions can lead to an even increased alignment with those misperceptions.

    • Status: not replicated
    • Original paper: ‘When Corrections Fail: The Persistence of Political Misconceptions’, Nyhan and Reifler 2010; 2 studies with different news piece stimuli and moderator variables, study 2 was divided into 3 parts (1: moderator = mortality salience, weapons of mass destruction in Iraq; 2: moderator = media source, A: weapons of mass destruction in Iraq, B: tax cuts increase government revenue, C: President Bush bans stem cell research), study 1: n=130, study 2: n=197. [citation=2,683 (GS, June 2022)]​.
    • Critiques: Wood and Porter 2019 [n=10,100, citations=564 (GS, June 2022)].
    • Original effect size: b = .198 to b = 0.359​.
    • Replication effect size: Wood and Porter: b=3.5 to b=9.7.

  • Conspiracy mentality and political extremism. Refers to a positive relationship between** **the general tendency to endorse conspiracy theories (i.e., conspiracy mentality) and the political ideologies at either side of the political spectrum (i.e., extreme political ideologies).

    • Status: mixed
    • Original paper: ‘Political extremism predicts belief in conspiracy theories’, Van Prooijen et al. 2015; cross-sectional design, Study 1 n=207, Study 2a n=1,010, Study 2b n=1,297, Study 3 n=268. [citations=525 (GS, January 2023)].​
    • Critiques: Bartlett and Miller 2010 [theoretical paper, n=NA, citations = 248 (GS, January 2023)]. Imhoff 2015 [book, n=NA citations = 61 (GS, January 2023)]. Van der Linden et al.2021 [total N = 5049, citations = 183(GS, April 2023)]. Van et al. 2015[ n=7,553. citations = 147 (GS, January 2023)].
    • Original effect size: Study 1 (n=207 US citizens, β = .58, p = .04); Study 2a (n=1010 Dutch citizens, β = .35, p = .005); Study 2b (n=1297 Dutch sample, β = .53, p < .001); and Study 3 (n= 268 Dutch sample, β = 0.70, p = 0.01), all evidenced quadratic relationship between political ideology and conspiracy beliefs.
    • Replication effect size: Imhoff et al.: Study 1 (β = 0.062, s.e. 0.017, p = 0.001 [0.029–0.095]), Study 2 (β = 0.220, s.e. 0.031, p < 0.001, [0.160–0.281]). Hence, in a large-scale, cross-sectional study, performed across 26 countries (N=104,253), the authors showed that the predicted quadratic relationship between political orientation and conspiracy mentality was indeed significant. Van der Linden et al.: highlight the asymmetric relationship, reporting that US conservatives (rather than US liberals) are more likely to endorse specific conspiracy theories, and they were also more likely to espouse conspiratorial worldviews in general (r = .27 [0.24, 0.30]). Importantly, extreme conservatives were significantly more likely to engage in conspiratorial thinking than extreme liberals (Hedges' g = .77, SE = .07, p < .001).

  • Voters elect rather than affect policies. Elected politicians do not change their policies (measured through roll call votes) in response to changes in the median voters policy preferences. This means that voters merely elect politicians with policies that they support, but that they do not affect these politicians policies afterward.

    • Status: replicated
    • Original paper: ‘Do Voters Affect or Elect Policies? Evidence from the U.S. House’, Lee et al. 2004; quasi-experiment, N = 915. [citations = 983 (GS, February 2023)].
    • Critiques: Button 2017 [N = 915, citations = 3 (GS, February 2023)].
    • Original effect size: “Elect” Component: 22.84 (SE: 2.2); “Affect” Component: -1.64 (SE: 2.0). I.e., The “Elect” component is statistically significant from zero, but the “Affect” component is not.
    • Replication effect size: Button: Replication with same method: Elect: 23.11 (SE: 2.02), Affect: -1.82 (SE:1.47); Replication with local linear regression and triangular kernel: Elect: 20.12 (SE: 1.93), Affect: -1.66 (SE: 1.57); Replication with local linear regression and rectangular kernel: Elect: 18.98 (SE: 2.28), Affect: -1.31 (SE: 1.92); Replication with conventional nonparametric regression discontinuity design: Elect: 19.72 (SE: 4.3), Affect: -1.08 (SE: 3.56); Replication with bias-corrected regression discontinuity design: Elect: 19.33 (SE: 4.29), Affect: -0.97 (SE:3.97). i.e., The original results are replicated in every single model specification.

  • Personality correlates of sociopolitical attitudes. Liberals tend to socially conform whereas conservatives are more willing to violate established societal conventions.

    • Status: reversed.
    • Original paper: ‘The nature of the relationship between personality traits and political attitudes’, Verhults et al. 2010; correlational study, N= 20,559 / 7234 twins. [citations=127(GS, February 2023)] (same data and sample used in later studies by the same authors Verhulst et al. 2012).
    • Critiques: Ludeke and Rasmussen 2016 [Review paper and independent study, n=1,085, citations=12(GS, February 2023)].​ Verhulst et al. 2016 Erratum [n=NA, citations=12(GS, February 2023)].
    • Original effect size: correlations between General ideology (higher scores –more liberal) and Psychoticism (r = -.495), Extraversion (r =-.061), Neuroticism (r =-.008) and Social desirability (r =.261) among men and among women (r =-.566, r =-.177, r =.001, r =.357, respectively) (all significant at .01 or better).
    • Replication effect size: Ludeke and Rasmussen: Literature review: ES not reported but general conclusion that those on the right/conservative are typically higher in Conscientiousness, behavioural constraint (as measured by by the Orderliness and Politeness aspects in the Big Five model of personality and the low pole of Eysenck’s Psychoticism construct) and on moralistic bias measures such as IM and EPQ-Lie (Reversed); Big Five measures and socio-political attitudes: Openness negatively correlated with Conservative self-placement (r = -.21), Authoritarianism (r = -.29), and Social Dominance Orientation (r = -.39), Conscientiousness positively correlated with Conservative self-placement (r = .21), and Authoritarianism (r = .19), Agreeableness negatively correlated with Social Dominance Orientation (r = -.39) (_p_s <.001); Eysenckian measures (EPQ): Psychoticism correlated negatively with Conservative ideological self-placement (r = -.22) and with Authoritarianism (r = -.20), but correlated positively with Social Dominance Orientation (r = .11) (ps <.001), The Lie scale correlated positively with Conservative ideological self-placement (r = .10) and with Authoritarianism (r = .17) (_p_s <.001) (Reversed). Verhulst et al.: coding error in the original manuscript, the descriptive analyses report that those higher in Eysenck’s psychoticism are more conservative, but they are actually more liberal; and where the original manuscript reports those higher in neuroticism and social desirability are more liberal, they are, in fact, more conservative (reversed).


Comparative Psychology

  • Gaze following in monkeys. Monkeys fail to follow the gaze of another agent, using the object choice task. ​

  • Pointing following in monkeys. Monkeys fail to follow the point of another agent, using the object choice task. ​

  • Gaze following in nonhuman primates. Nonhuman primates fail to follow the gaze of another agent, using the object choice task. ​

    • Status: mixed
    • Original paper: ‘Chimpanzee gaze following in an object-choice task’, Call et al. 1998; experimental design, n = 6 (same chimpanzees for both studies). [citation = 317(GS, June 2023)].
    • Critiques: Itakura (2004) [review, no n of studies reported, citations=87 (GS, June 2023)]. Kano et al. 2018 [n=29 in experiment 1, n=18 in experiment 2, n=38, citations= 29 (GS, June 2023)].
    • Original effect size: not reported.
    • Replication effect size: Itakura: not reported (review). Kano et al.: Experiment 1 – main effect of phase: ηp2 =.71. Experiment 2– main effect of phase: ηp2=.71. Main effect of condition: ηp2=.58. Main effect of phase with additional factor group: ηp2=.67. First-look-responses: Main effect of AOI and Condition: ηp2=.30 and .17 respectively. Viewing-time-responses: Main effect of AOI and Condition: ηp2=.15 and .45 respectively. Main effect of group: ηp2=.24. Interaction group and condition: ηp2=.18. Experiment 3– Main effect of phase: ηp2=.64. First-look-responses: main effect of condition: ηp2=.27. Viewing-time-responses: Main effect of condition: ηp2=.59. Main effect of group: ηp2=.31. Interaction group and condition: ηp2=.57.

  • Pointing following in nonhuman primates. Nonhuman primates fail to follow the point of another agent, using the object choice task. ​

    • Status: mixed
    • Original paper: ‘Production and comprehension of referential pointing by orangutans (Pongo pygmaeus)’, Call and Tomasello 1995; experimental design, n = 2 (same animals in both studies). [citation = 464(GS, June 2023)].
    • Critiques: Miklosi & Soproni 2005. [systematic review, 24 studies on different species, citations=468 (GS, June 2023)]. Clark et al. 2019 [meta-analysis, n= 470, citations= 21 (GS, June 2023)].
    • Original effect size: NA.
    • Replication effect size: Miklosi & Soproni: not reported (they only included some images in their review). Clark et al.: Temporal cue properties: r=.30; Ipsilateral pointing cues: r=.28.

  • Gaze following in domesticated dogs. Dogs follow the gaze of another agent, using the object choice task. ​

  • Pointing following in domesticated dogs. Dogs follow the point of another agent, using the object choice task. ​

    • Status: mixed
    • Original paper: ’Use of experimenter-given cues in dogs’, Miklosi et al. 1998; experimental design, Experiment 1: n=3, Experiment 2: n=6. [citations=554 (GS, June 2023)].
    • Critiques: McKinley & Sambrook 2000 [n=16 domesticated dogs, citations= 303 (GS, June 2023)]. Miklosi and Soproni 2005 [systematic review, 24 studies on different species, citations=468 (GS, June 2023)]. Schneider et al. 2011 [N=48 domesticated dogs, citations= 101(GS, June 2023)]. Clark et al. 2019 [meta-analysis, n= 470, citations= 21 (GS, June 2023)].
    • Original effect size: NA.
    • Replication effect size: Miklosi & Soproni: NA. McKinley & Sambrook: NA. Clark et al.: Within dogs, those categorised as “close” (N = 174, Mdn Z = 1.26) scored higher than those categorised as “seldom” (N = 14, Mdn Z = -0.63) (Mann-Whitney U = 13.97, p <.001). For contralateral pointing cues, in contrast, within nonhuman primates and dogs, those categorised as “occasional” (N = 95, Mdn Z = 1.89) outperformed those categorised as “close” (N = 6, Mdn z  = 0.00), Mann-Whitney U = 136.5, p = 0.029. Subjects categorised as “close” (N = 356, Mdn Z = 0.89) scored higher than those categorised as “seldom” (N = 22, Mdn Z = -0.63), Mann-Whitney U = 1235.5, p < 0.001.

  • Pointing following in coyotes. Coyotes do not follow the pointing of a human agent, using the object choice task.

  • Gaze following in wolves. Wolves do not follow the gaze of a human agent, using the object choice task.

  • Pointing following in wolves. Wolves do not follow the pointing of a human agent, using the object choice task.

    • Status: mixed
    • Original paper: ‘The Domestication of Social Cognition in Dogs’, Hare et al. 2002; experiment, n=7. [citation=1335(GS, February 2023)]​.
    • Critiques: Gácsi et al. 2009 [Study 1: n=9, Study 2: n=7, Study 3: n=8, citations=229(GS, February 2023)]. Lampe et al. 2017 [n=12, citations=60(GS, February 2023)]. Miklósi et al. 2003 [n=4, citations=852(GS, February 2023)]. Udell et al. 2008 [n=8, citations=405(GS, February 2023)]. Udell et al. 2012 [n=7, citations=54(GS, February 2023)]. Virányi et al. 2008 [Study 1: n=9, Study 2: n=7, Study 3: n=4, Study 4: n=10, citations=327(GS, February 2023)].
    • Original effect size: not reported.
    • Replication effect size: Gácsi et al.: not reported. Lampe et al. (2017): not reported. Miklósi et al.: not reported. Udell et al.: not reported. Udell et al.: not reported. Virányi et al. : not reported.

  • Pointing following in Asian elephants. Asian elephants do not follow the pointing of a human agent, using the object choice task.

  • Pointing following in African elephants. African elephants follow the pointing of a human agent, using the object choice task.

  • Gaze following in horses. Horses do not follow the gaze of a human agent, using the object choice task.

  • Pointing following in horses. Horses follow the pointing of a human agent, using the object choice task.

  • Gaze following in domesticated pigs. Domesticated pigs do not follow the gaze of a human agent, using the object choice task.

    • Status: mixed
    • Original paper: ‘The effect of domestication and ontogeny in swine cognition (Sus scrofa scrofa and S. s. domestica)’, Albiach-Serrano et al. 2012; experiment, n=27. [citation=65(GS, January 2023)].
    • Critiques: Nawroth et al. 2014 [n=13, citations=69(GS, January 2023)]. Nawroth et al. 2016 [n=4, citations=13(GS, January 2023)].
    • Original effect size: not reported.
    • Replication effect size: Nawroth et al.: not reported. Nawroth et al.: not reported; For gaze following in domesticated pigs, the original paper shows no better than chance group performance, Nawroth et al. (2014) showed better than chance group performance, and Nawroth et al. (2016) suggested an individual pig (1/4 pigs) used gaze but others were at chance level.

  • Pointing following in pigs. Domesticated pigs do not follow the pointing of a human agent, using the object choice task.

  • Pointing following in goats. Goats follow the pointing of a human agent, using the object choice task.

  • Gaze following in goats. Goats do not follow the gaze of a human agent, using the object choice task.

  • Eye narrowing in felines. Felines are more likely to narrow their eyes following a slow blink from humans.

    • Status: replicated
    • Original paper: ‘The role of cat eye narrowing movements in cat–human communication; Humphreys et al. 2020a; experiments, experiment 1: n = 21, experiment 2: n = 24. [citation=21(GS, January 2023)]​.
    • Critiques: Humphreys et al. 2020b [n = 24, citations = 7(GS, January 2023)].
    • Original effect size: Experiment 1: d = 0.56 Experiment 2: Half-blinks vs. neutral: d =0.75 [_d _calculated from mean differences and standard deviation using this conversion], eye narrowing vs neutral: d = 0.90 [_d _calculated from mean differences and standard deviation and converted using this conversion].
    • Replication effect size: Humphreys et al.: slow blinking versus control trials: d = 0.71, slow blinking versus neutral trials: d = 0.52.

  • Pointing following in felines. Felines do not follow the pointing of another agent, using the object choice task. ​

  • Mirror-self recognition in magpies (Pica pica). Magpies have been suggested to be able to recognize themselves in the mirror implying a self-representation akin to chimpanzees​.

    • Status: not replicated
    • Original paper: ‘Mirror-induced behavior in the magpie (Pica pica): evidence of self-recognition’, Prior et al. 2008; Exposition of magpies to a mirror while marking the animals, test for self-directed behaviour, n = 5. [citations = 732, GS, January 2023].
    • Critiques: Gallup and Anderson 2020, [review paper, citations = 52, GS, January 2023]. Soler et al. 2020 [direct replication of Prior et al. 2008, n = 8 magpies, citations = 24, GS, April 2023].
    • Original effect size: 2 out of 5 animals showed self-recognition during mark test.
    • Replication effect size: Gallup and Anderson: no reproducible evidence that magpies can recognize themselves in the mirror. Soler et al.: no animal showed self-directed behaviour in front of the mirror during mark test (effects are descriptive only).

  • Right-bias in hand use in chimpanzees (Population handedness asymmetry). Chimpanzees have been proposed to be right-handed on the population level similar to humans.

    • Status: replicated
    • Original paper: ‘Chimpanzee Hand Preference in Throwing and Infant Cradling: Implications for the Origin of Human Handedness’, Hopkins et al. 1993; observational study in captive primates with respect to throwing and cradling behaviour, n = 36. [citations=139 (GS, January 2023)]​.
    • Critiques: Corballis 2003 [review article, n=NA, citations=792 (GS, January 2023)]. Hopkins et al. 1994 [experiment, n 140, citations = 138 (GS, January 2023)]. Palmer 2002 [reexamination of within-population variation from Hopkins 1994, n= 140 individual captive chimpanzees, citations=140 (GS, April 2023)].
    • Original effect size: d = 0.69 to 0.97​.
    • Replication effect size: Hopkins et al.: d = 0.52, n = 434, replication in three different colonies of chimpanzees. Corballis: population level asymmetries in handedness are a uniquely human feature. Palmer: right-sided asymmetries might have been statistical artefacts from animals with few observations, only from a single population.

  • Cache protection in Eurasian jays (Garrulus glandarius). Eurasian jays may opt to cache in out-of-view locations to reduce the likelihood of conspecifics pilfering their caches.

  • Desire-state attribution may govern food sharing in Eurasian jays (Garrulus glandarius). Male Eurasian jays may share food with their female partners in-line with the females current desire.

  • Reasoning about hidden causal agents in New Caledonian Crows (Corvus moneduloides). New Caledonian Crows “showed greater vigilance towards an area from which they had previously witnessed a threatening “stick attack” if a hidden causal agent (a human) could still be present in that area compared to when a human person had visibly left.”

  • Owners predict spatial impulsivity in dogs (spatial discounting, spatial discount, distance discounting). This phenomenon explores whether owner ratings of impulsivity in their dogs correlate with behavioural measures of the distance their dogs travel in a spatial impulsivity task. The original study (Brady et al., 2018) found that owner ratings of adult dog impulsivity (using the Dog Impulsivity Assessment Scale; Wright et al., 2011) matched levels of impulsivity in a spatial impulsivity task (but not for young dogs). Two subsequent studies using similar methods did not replicate this correlation, and an overall meta-analysis did not find evidence for an effect.​

    • Status: not replicated
    • Original paper: ‘A spatial discounting test to assess impulsivity in dogs’, Brady et al. 2018; experiment (correlation), n = 24 (Laboratory study 1), n = 13 (Simplified field study), n = 23 (Laboratory study 2). [citations = 15 (GS, January 2023)]​.
    • Critiques: The first replication Mongillo et al. 2019 [n = 48, citations = 3 (GS, January 2023)] was not an intentional replication but independently used a similar design (though slightly different task) to measure the same to concepts. The second replication Stevens et al. 2022 [n = 65 (Study 1), n = 43 (Study 2), citations = 1 (GS, January 2023)] was an intentional replication and included a meta-analysis of six effect sizes from all three studies.
    • Original effect size: r = −0.46 (Laboratory study 1), r = −0.61 (Simplified field study), r = −0.05 (Laboratory study 2).
    • Replication effect size: Mongillo et al.: r = 0.01. Stevens et al.: r = −0.10 [−0.34, 0.15] (Study 1), r = 0.04 [−0.26, 0.34] (Study 2), r = −0.11 [−0.27, 0.04] (meta-analysis).

  • Numerical ratio effects on quantity discrimination in elephants (Number discrimination, quantity/number judgments). The numerical ratio between quantities has been shown to predict the ability to discriminate quantities across a wide range of species. However, Irie-Sugimoto et al., (2009) found that elephants did not follow this pattern because numerical ratio did not predict performance. Perdue et al. (2012) replicated the study in elephants with a similar design and found that ratio predicted performance, reversing the findings of the original study.​

    • Status: reversed
    • Original paper: ‘Relative quantity judgment by Asian elephants (Elephas maximus)’, Irie-Sugimoto et al. 2009; experiment, n = 3. [citations = 107 (GS, January 2023)]​.
    • Critiques: Perdue et al. 2012 [n = 2, citations = 114 (GS, January 2023)].​
    • Original effect size: individual correlations for each subject: r = 0.23, r = 0.01, r = 0.02.
    • Replication effect size: Perdue et al.: individual correlations for each subject: r = −0.64, −0.96.

  • Contagious yawning in dogs. Dogs have been shown to yawn more after observing a person yawn than after a control condition with other mouth movements. The original study (Joly-Mascheroni et al., 2008) found 72% of dogs yawned after watching a human yawn but none yawned after watching other mouth movements. Two subsequent studies did not replicate a difference in yawning between conditions (Harr et al., 2009; O’Hara and Reeve, 2010), but a third did find a difference (Madsen and Persson, 2012).

    • Status: mixed
    • Original paper: ‘Dogs catch human yawns’, Joly-Mascheroni et al. 2008; experimental, n = 29. [citations = 223 (GS, January 2023)]​.
    • Critiques: Harr et al. 2009 [n = 15, citations =104 (GS, April2023)]. O’Hara and Reeve 2010 [n = 19, citations = 84 (GS, April 2023)]. Madsen and Persson 2012 [n = 32, citations =89 (GS, April 2023)].
    • Original effect size: d = 1.02 [0.567, 1.47].
    • Replication effect size: Harr et al.: d = 0.67. O’Hara and Reeve: d = 0.36 [−0.11, 0.82] (calculated). Madsen and Persson: d = 0.61 [0.27, 0.96] (calculated).

  • Temporal preferences in chimpanzees and bonobos (temporal/time discounting, intertemporal choice, delay choice). Chimpanzees (Pan troglodytes) wait longer than bonobos (Pan paniscus) in intertemporal choice tasks providing choices between smaller, sooner and larger, later food rewards. Rosati et al. (2007) found chimpanzees waited longer than bonobos at the Leipzig Zoo. Rosati and Hare (2013) confirmed this finding in a group of chimpanzees and bonobos at Tchimpounga Chimpanzee Sanctuary in Republic of Congo.

  • Risk preferences in chimpanzees and bonobos (risky choice). Chimpanzees (Pan troglodytes) choose risky options more than bonobos (Pan paniscus) in risky choice tasks providing choices between guaranteed and risky food rewards. Heilbronner et al. (2008) found chimpanzees preferred risky options more than bonobos at the Leipzig Zoo. Rosati and Hare (2013) confirmed this finding in a group of chimpanzees and bonobos at Tchimpounga Chimpanzee Sanctuary in Republic of Congo.


Evolutionary Linguistics

  • Typological Prevalence Hypothesis. The typological prevalence hypothesis is a proposal that suggests that certain structural features are more common across languages than others due to factors such as ease of learning, processing, and use. These more prevalent structural features are thought to be more likely to be retained in a language over time. Claims that cross-linguistically more prevalent distinctions are easier to learn, or the more common a certain distinction or way of categorising across languages, the more cognitively natural (and easily learnable) for humans it should be. This effect is explored for the grammatical structure of evidentiality, or grammatical marking of information source in an utterance.


Speech Language Therapy

  • Stuttering and bilingualism. Bilingual children had an increased risk of stuttering and a lower chance of recovery from stuttering than language exclusive and monolingual speakers.

  • Stuttering and self-esteem. Children who stutter have higher self-esteem than children who do not stutter. However, the self-esteem of children who stutter declines once they reach adolescence.

    • Status: NA
    • Original paper: ‘Selbstwert von stotternden Kindern und Jugendlichen’, (in German) Zückner 2011; Case-control study - comparison against norm scores with n = 171. [citations = 3, (GS, February 2022)].
    • Critiques: Cook and Howell 2014 [n=59, citations=16, (GS, February 2022)].
    • Original effect size: M(SD)stuttering boys = 56.5 (25.9), M(SD)boys normgroup: 36.5(25.9); M(SD)stuttering girls = 43.1(35.8), M(SD)girls normgroup=27.7(25.7).
    • Replication effect size: Cook and Howell: M(SD) = 2.9(0.49) (children: adolescent): r(bullying, self-esteem) = .387.

  • Stuttering and dyslexia co-occurrence. People who stutter show high co-occurrence with dyslexia than neurotypical adults.

  • Stuttering and phonological working memory impairment. Adults who stutter show lower scores on phonological working memory, using a nonword repetition task.

    • Status: mixed
    • Original paper: ‘Nonword repetition abilities of children who stutter: an exploratory study’, Hakim and Bernstein Ratner 2004; experiment, CWS: n = 8, CWNS: n = 8. [citation=233(GS, November 2022)]​.
    • Critiques: Anderson et al. 2006 [CWS: n = 12, CWNS: n = 12, citations=170(GS, November 2022)]. Bakhtiar et al. 2008 [CWS: n = 12, CWNS: n = 12, citations=78(GS, November 2022)]. Byrd et al. 2012 [AWS: n = 14, AWNS: n = 14, citations=97(GS, November 2022)]. Byrd et al. 2015 [AWS: n = 20, AWNS: n = 20, citations=58(GS, November 2022)]. Coalson and Byrd 2017 [AWS: n = 26, AWNS: n = 26, citations=16(GS, November 2022)]. Choopanian et al. 2019 [AWS: n = 20, AWNS: n = 30, citations=1(GS, November 2022)]. Elsherif et al. 2021 [AWS: n = 30, NT: n = 84, AWD: n = 50; citations=9(GS, November 2022)]. Gerwin and Weber 2022 [CWS: n = 88, CWNS: n = 53; citations=2(GS, November 2022)]. Sasisekaran et al. 2019 [CWS: n = 13, CWNS: n = 13; citations=6(GS, November 2022)]; Sasisekaran et al. 2019 [CWS: n = 13, CWNS: n = 13; citations=6(GS, November 2022)]; Smith et al. 2012 [CWS: n = 31, CWNS: n = 22; citations=144(GS, November 2022)]; Pelczarski and Yaruss 2016 [CWS: n = 16, CWNS: n = 13; citations=54(GS, November 2022)]; Sakhai et al. 2021 [CWS: n = 30, CWNS: n = 30; citations=4(GS, November 2022)]; Sasisekaran 2013 [AWS: n = 9, AWNS: n = 9; citations=57(GS, November 2022)]; Spencer and Weber-Fox 2014 [CWS: n = 40, CWNS: n = 25; citations=109(GS, November 2022)]; Sugathan and Maruthy 2020 [CWS: n = 17, CWNS: n = 17; citations=7(GS, November 2022)].
    • Original effect size: d = 1.417/r = .578.
    • Replication effect size: Anderson et al. (2006): bisyllable ηp2 = 0.20, trisyllable ηp2 = 0.18, quadsyllable ηp2 = 0.13, pentasyllable ηp2 = 0.05.; Bakhtiar et al. (2008): bisyllabic: d = 0.38, trisyllabic: d = 0.13.; Byrd et al. (2012): partial η2 = .150; Byrd et al. (2015): vocal: partial η2 = .382 and non-vocal: partial_ η2_ < .0001; Coalson and Byrd (2017): _d_ = .32; Choopanian et al. (2019): words: _r_ = 0.01, _r_ = 0.86 [calculated using the conversion from Mann Whitney U test to r]; Elsherif et al. (2021): AWS vs NT: _Δ_ = 1.26, AWS vs AWD: _Δ_ = 0.26; Gerwin et al. (2022): _η2_ = 0.018 [_η2 _calculated from reported F statistic and converted using this conversion]; Sasisekaran et al. (2019): partial _ η2_ = 0.22; Smith et al. (2012): monosyllable: _ηp2_ = 0.06 [_η2 _calculated from reported F statistic and converted using this conversion], bisyllable: _ηp2 _= 0.10 [_η2 _calculated from reported F statistic and converted using this conversion], trisyllable: _ηp2 _= 0.10 [_η2 _calculated from reported F statistic and converted using this conversion], quadsyllable: _ηp2_ = 0.03 [_η2 _ calculated from reported F statistic and converted using this conversion]; Pelczarski and Yaruss (2016): r = 0.52 [_r _ calculated from reported Wilcoxon Signed Ranked Test, Z statistic, and converted using this conversion] Sakhai et al. (2021): Afshar Nonword Repetition Task: analysing correct nonword: Afshar Nonword Repetition Task: trisyllable _ηp2_ = 0.11, quadsyllable _ηp2_ = 0.06, Adapted Version of the Yazdani Nonword Repetition Task: bisyllable _ηp2_ = 0.36, trisyllable _ηp2_ = 0.47, quadsyllable _ηp2_ = 0.42, Masumi-Kashani Nonword Repetition Task: bisyllable _ηp2_ = 0.13, trisyllable _ηp2_ = 0.31, quadsyllable _ηp2_ = 0.26, pentasyllable _ηp2_ = 0.18, analysing correct phonemes: Afshar Nonword Repetition Task: trisyllable _ηp2_ = 0.12, quadsyllable _ηp2_ = 0.14, Adapted Version of the Yazdani Nonword Repetition Task: bisyllable _ηp2_ = 0.40, trisyllable _ηp2_ = 0.46, quadsyllable _ηp2_ = 0.46, Masumi-Kashani Nonword Repetition Task: bisyllable _ηp2_ = 0.17, trisyllable _ηp2_ = 0.29, quadsyllable _ηp2_ = 0.28, pentasyllable _ηp2_ = 0.35, Sasisekaran (2013): _ηp2_ = 0.008 [_η2 _calculated from reported F statistic and converted using this conversion]; Spencer and Weber-Fox (2014): F < 1; Sugathan and Maruthy (2020): _ηp2_ = .169

  • Stuttering and phonological monitoring impairment. Adults who stutter show lower scores on phonological monitoring than neurotypical adults.

  • Stuttering and phonological awareness impairment (vs. neurotypicals). Adults who stutter show lower scores on phonological awareness than neurotypical adults.

    • Status: mixed
    • Original paper: Phonological encoding of young children who stutter, Pelczarski and Yaruss (2014); experiment, [CWS: n = 10, CWNS: n = 10, citation=37(GS, November 2022)]​.
    • Critiques: Elsherif et al. (2021) [AWS: n = 30, NT: n = 84, AWD: n = 50; citations=9(GS, November 2022)].
    • Original effect size: Pelczarski and Yaruss (2014): d = −1.0.
    • Replication effect size: Elsherif et al. (2021): NT vs AWS: Δ = 0.43.

  • Stuttering and phonological awareness impairment (vs. dyslexia). Adults who stutter show similar scores on phonological awareness to dyslexic adults.

    • Status: replicated
    • Original paper: Do dyslexia and stuttering share a processing deficit?, Elsherif et al. (2021); experiment, [AWS: n = 30, NT: n = 84, AWD: n = 50; citations=9(GS, November 2022)].
    • Critiques: Choo et al. (2022) [Adults struggling readers:: n = 98, Adults struggling readers who stutter: n = 22; citations=0(GS, November 2022).
    • Original effect size: Elsherif et al. (2021): AWD vs. AWS: Δ = 0.07.
    • Replication effect size: Choo et al. (2022): d = 0.130.

  • Stuttering and reading fluency impairment (vs. dyslexia). Adults who stutter show similar scores on reading fluency to dyslexic adults.

  • Struggling readers and stuttering co-occurrence. Dyslexic adults show higher co-occurrence with stuttering than neurotypical adults. ​


Experimental Philosophy

  • Fake barn cases. Older participants are less likely than younger participants to attribute knowledge in fake-barn cases.

    • Status: not replicated
    • Original paper: Epistemic Intuitions in Fake-Barn Thought Experiments, Colaço et al. (2014) between-subjects n=234 (n=85 in the relevant analysis) [citation=181 (GS, October 2022)].
    • Critiques: Bergenholtz et al. (2021) [n=348, citations=0 (GS, October 2022)].
    • Original effect size: r = −0.32.
    • Replication effect size: No effect size given (because of non-significant effect).

  • Stakes effect (alternative terms = a subset of interest-relative invariantism, interest-relativity of knowledge, bank cases). Knowledge is sensitive to stakes. According to the Stanford Encyclopedia of Philosophy, “a number of early findings from the experimental epistemology literature suggested that people’s ordinary knowledge attributions actually don’t depend on stakes.”

    • Status: mixed
    • Original paper: Practical Interests, Relevant Alternatives, and Knowledge Attributions: An Empirical Study, May et al. (2010), between-subjects and within-subjects design, sample size=241 (approximately 60 in each condition). [citations=146(GS, January 2023)].
    • Critiques: Sripada and Stanley (2012) [n=300 (50 in each condition), citations=111(GS, January 2023)]
    • Original effect size: Responses are on a 7-point Likert scale. Mean response in low stakes (LS-NA) group = 5.07 to mean response in high stakes (HS-NA) group = 5.33.
    • Replication effect size: Sripada and Stanley (2012): stakes in the Implicit/Explicit vignette: d = 0.34 [calculated, using this conversion], the Ignorant vignette: d = 0.36 [calculated, using this conversion], the Basic vignette: d = 0.03 [calculated, using this conversion].


Personality Psychology

  • Fear conditioning - effect of trait anxiety/neuroticism on conditioning. High trait anxiety/neuroticism leads to better fear conditioning. Evidence is mixed; some papers even find the reversed effect, depending on experimental paradigm (in particular, single-cue conditioning versus differential conditioning).​

    • Status: mixed
    • Original paper: Theoretical paper by Eysenck, 1962 [citations=20(Wiley, January 2023)]​.
    • Critiques: Gazendam et al. 2015 (n=236 , citations=33, GS, January 2023); Haaker et al. 2015; Sperl et al. 2016 (n=32, citations= 56, GS, February 2023); Panitz et al. 2018 (n=87, citations=24, GS, February 2023); Pineless et al. 2017; Sjouwerman et al. (2020 (n=469; citations=21(nature.com; January 2023); Torrents-Rodas et al. (2012) n=126; citations=78(Elsevier, January 2023).
    • Original effect size: NA
    • Replication effect size:
    • Torrents-Rodas et al. (2012): ηp2 reported: Acquisition: FPS: ηp2 =0.15 SCR: ηp2 =0.25. Risk rating: ηp2 =0.69. Generalisation: FPS: ηp2 =0.13, SCR: ηp2 =0.09, Risk ratings: ηp2 =0.67. Siouwerman (2020): study 1: see figure 1 for r reported. study 2: d=0.95, see figure 4 for r reported.
    • Gazendam et al.: Acquisition: Stress reaction (SR) β = -0.05, interaction term Stress reaction X Harm avoidance (HA) β = 0.07; Stress reaction (SR) x Time β=-0.02. Extinction: interaction term SR x HA x Time β=0.02
    • Sperl et al.: Habituation ηp2 =0.163. Acquisition: Main effect: ηp2 =0.532. Valence ratings: ηp2 =0.242. Skin conductance response: Main effect of time: ηp2 =0.379. Heart period: ηp2 =0.219. Extinction: Arousal: ηp2 =0.206. Valence: ηp2 =0.98. Skin conductance response: Main effect of time: ηp2 =0.393. Heart period: ηp2 =0.360. Recall test phase. Arousal: ηp2 =0.89. Valence ratings: ηp2 =0.28. Skin conductance responses: ηp2 =0.112. Heart period: ηp2 =0.147.
    • Panitz et al. (2018): see table 1. Acquisition: Contingency: ηp2 =0.276. Fear bradycardia: Main effect of contingency: ηp2 =0.281. Extinction: Contingency: ηp2 =0.91. Fear bradycardia: Contingency: ηp2 =0.111. SCR: Contingency: ηp2 =0.64. LPP: ηp2 =0.37.
    • Sjouwerman et al. (2020): see figure 1, figure 2, figure 3, figure 4 and figure 5.

  • Fear conditioning - effect of trait extraversion on conditioning. Low trait extraversion leads to better fear conditioning.

    • Status: mixed
    • Original paper: Theoretical paper (therefore no n reported) by Eysenck, 1962 [citations=59(GS, February 2023)]​.
    • Critiques: Martinez et al. (2012). [n=46, citations=24 (GS, March 2023)]. Otto et al. (2007) [n=72, citations=93 (GS, March 2023)]. Pineles et al. (2009) [n=217, citations= 64 (GS, March 2023)].
    • Original effect size: NA
    • Replication effect size: Martinez et al. (2012): SCL and extraversion: r2= .15. Otto et al. (2007): r = -.13 for general conditions, and r= -.16 for differential conditioning. Pineles et al. (2009): partial r = .14 for warmth, partial r = .17 for activity.




Further literature


Contributors

Project coordinators

  • Flavio Azevedo
  • Helena Hartmann

Active project managers

  • Zoran Pavlovic
  • Alaa Aldoh
  • Aleksandrina Skvortsova

Past project managers

  • Mahmoud M. Elsherif
  • Meng Liu
  • Charlotte R. Pennington
  • Shilaan Alzahawi

Collaborators

  • Gavin Leech
  • Siu Kit Yeung
  • Samuel Guay
  • Leticia Micheli
  • Kamil Izydorczak
  • Balazs Aczel
  • Amélie Gourdon-Kanhukamwe
  • Biljana Gjoneska
  • Aoife O’Mahony
  • Chun-Yu Lin
  • Yvonne Oberholzer
  • Sau-Chin Chen
  • Robert M. Ross
  • Ekaterina Pronizius
  • Steven Verheyen
  • Merle-Marie Pittelkow
  • Tamara Kalandadze
  • Annalise LaPlume
  • Bradley J. Baker
  • Mirela Zaneva
  • Cameron Brick
  • Ali H. Al-Hoorie
  • Oscar Lecuona
  • Arnon Weinberg
  • Maria Montefinese
  • Jan P. Röer
  • Anya Butler
  • Max Charles David Gattie
  • David Moreau
  • Patrícia Arriaga
  • Kathleen Schmidt
  • Nihan Albayrak-Aydemir
  • Veronica Diveica
  • Gerald H. Vineyard
  • Zlatomira G. Ilchovska
  • Gilad Feldman
  • Maximilian Maier
  • Hirotaka Imada
  • Julian Packheiser
  • Emir Efendic
  • Lina Koppel
  • Leigh Ann Vaughn
  • Nadia Adelina
  • Matthew C. Makel
  • Anabel Belaus
  • Elena Richert
  • Kai Li Chung
  • Anna Exner
  • Lukas Wallrich
  • Alina Herderich
  • Willem Plomp
  • Clove Haviva
  • Paul E. Plonski
  • David Zimmermann
  • David J. Bauer
  • Kimberly Lewis Meidenbauer
  • Aleksandra Tołopiło
  • Samuel Alarie
  • Veli-Matti Karhulahti
  • Malak El Halabi
  • Adrien Fillon
  • Subramanya Prasad Chandrashekar
  • Burak Tunca
  • Lukas Röseler
  • Niyatee Narkar
  • Jamie P. Cockcroft
  • Nuño Sempere
  • Sarah Jaubert
  • Wolf Vanpaemel
  • Marina Tiulpakova
  • Andis Draguns
  • Hilmar Brohmer
  • Farid Anvari
  • Felipe Fontana Vieira
  • Valeria Agostini
  • Alina Koppold
  • Christopher J. Graham
  • Alexandros Kastrinogiannis
  • Julia Wolska
  • Jason Hausenloy
  • Matthias F. J. Sperl
  • Maria Meier
  • Kevin Kamermans
  • Andrew Pua
  • Alma Jeftic
  • Jeffrey R. Stevens
  • Thomas Rhys Evans
  • Ben G. Farrar
  • Joris Frese
  • Kellen Mrkva
  • Jake Floyd
  • Alvin W. M. Tan
  • Sarah Jaubert
  • Halil Emre Kocalar
  • Shubham Pandey
  • Ligayaa Breemer
  • Benjamin Brummernhenrich
  • Julia Beitner
  • Dermot Lynott
  • Johanna Tomczak
  • Vaitsa Giannouli
  • Maren Klingelhöfer-Jens
  • Monika Nemcova