Framework for
Open and

Logo of FORRT is a fort.

Replications & Reversals


Replications of previous scientific work are at the core of the Open Scholarship movement. However, as replication efforts become more widespread, it can be challenging to scholars and educators to keep themselves up to date with which effects in their field replicate and which do not. FORRT’s replications and reversals aims to collate replications and specifically so-called reversal effects in social science. Reversals are—in the context of a replication—effects that have their original direction flipped. The extent of such reversals and non-replicated effects is already apparent in the social science literature, with even replicated effects being only half of the originally reported effect (Ioannidis, 2005; Open Science Collaboration, 2015). Although such failures to replicate are far less costly to society than for example medical ones (Prasad & Cifu, 2011), they broadly hinder science’s goal of accumulating knowledge and contribute to waste of scarce resources. This resource aims to be a “living”, freely available, crowd-sourced, and community-driven collection of effects that have either not been replicated or even reversed through empirical research across social sciences. Scholars from varied backgrounds and areas of social science are invited to contribute with prevalent effects in their respective fields.


The purpose of collating these reversal effects in social science is to encourage educators to incorporate replications of these effects into their students' project (e.g., third-year, thesis, course work) to provide them the opportunity to experience the research process directly, assess their ability to perform and report scientific research, and to help evaluate the robustness of the original study, thereby also helping them become good consumers of research. The below crowdsourced and community-curated resource aims to satisfy three of FORRT’s Goals:

  • Support scholars in their efforts to learn and stay up-to-date on best practices regarding open and reproducible research;
  • Facilitating conversations about the ethics and social impact of teaching substantive topics with due regard to scientific openness, epistemic uncertainty and the credibility revolution;
  • Foster social justice through the democratization of scientific educational resources and its pedagogies.

and four of FORRT’s Mission:

  • Dismantling hierarchies surrounding research, teaching, and service;
  • Building community among educators and various non-academic communities working to improve scientific communication and literacy across academia and the general public;
  • Building capacity for advocacy; and
  • Advocacy for the creation and maintenance of educational resources.

Current Status

This is a dynamic project that is organized in four stages for its 1st Phase. Currently, we are in stage 2 of phase 1:

  1. Proof of Concept Phase (adaptation of original project into FORRT, inclusion of effects from social and cognitive psychology, using Gavin Leech’s collection as a basis) → ~150 entries finished in 2021.

  2. Team Science Expansion Phase Across Disciplines (crowd-sourcing entries and refine existing entries), started at the end of 2021 and planned until the end of June 2022. Draft first ‘output’ piece.

  3. Review Phase (open review to identify inconsistencies, missing data, and errors), planned for the end of 2022. Finish first ‘output’ piece. End of Phase 1.

  4. Regular Update Phases (dynamically adding new effects), planned for 2023 and beyond.

How to contribute?

Anyone can add reversals or replications by joining our initiative on Slack and then following the instructions in our reversals g-doc.

All Effects (sorted by discipline)

To search whether an effect already exists in our collection, use Ctrl-F and a keyword in relation to the effect (e.g. “Macbeth” or “Priming”). Please note that not all effects contain all available information, as this is a work in progress.

Table of Contents

Social Psychology

No good evidence for many forms of priming, automatic behavior change from ‘related’ (often only metaphorically related) stimuli. Semantic priming is still solid, but the effect lasts only seconds.

  • Elderly priming, that hearing about old age makes people walk slower.

    • Status: reversed
    • Original paper: ‘Automaticity of social behavior’, Bargh (1996); 2 experiments with Study 2a: n = 30, Study 2b: n = 30[citations = 5938(GS, October 2021)]​
    • Critiques: Doyen (2012) [experiment: n=120, citations=757(GS, October 2021)]; Lakens (2017) [meta analysis: citations = 21(GS, October 2021)]; Pashler et al. (2011) [experiment: n=66, citations=21(GS, October 2021)].
    • Original effect size: not reported. ​
    • Replication effect size: Doyen: walking speed: η2= .01; Lakens (2017): r= .29/d= .61; Pashler: not reported.​

  • Hostility priming (unscrambled sentences). Exposing participants to more hostility-related stimuli caused them subsequently to interpret ambiguous behaviours as more hostile.

    • Status: not replicated
    • Original paper: The role of category accessibility in the interpretation of information about persons: Some determinants and implications. Srull and Wyer, Jr. (1979); 2 experiments with Study 1: n = 96; Study 2: n = 96 [ 2409 citations (GS, November 2021)].
    • Critique: McCarthy et al. 2018 [n = 7,373 for Study 1, citations = 40(GS, November 2021)]. McCarthy et al. 2021 (see Figure) [n = 1,402 for close replication; n = 1,641 for conceptual replication, citations = 2(GS, November 2021)]
    • Original effect size: 2.99 (XX = 1.58%)
    • Replication effect size: All effect sizes are located in McCarthy et al. 2018: Acar: XX = 0.16. Aczel: _XX _= 0.12. Birt: XX = -0.11. Evans: XX = -.22. Ferreira-Santos.: XX = 0.01. Gonzalez-Iraizoz: XX = -.21. Holzmeister: XX = .11. Klein Selfe and Rozmann: XX = -0.51. Koppel: XX = -.14. Laine: XX = -.27. Loschelder: XX =-.07. McCarthy: XX = -.10. Meijer: XX = .03. Ozdorgru: XX = .22. Pennington: XX = -.52. Roets: XX = -.01. Suchotzki: XX = .10. Sutan: XX = .49. Vanpaemel: XX = .17. Verschuere: XX = -.14. Wick: XX = .07. Wiggins: XX = .01. Average replication effect size: XX = -0.08:; McCarthy et al. 2021: XX = 0.06.

  • Intelligence priming (contemplation), alt term = professor priming. Participants primed with a category associated with intelligence (e.g. “professor”) performed 13% better on a trivia test than participants primed with a category associated with a lack of intelligence (“soccer hooligans”).

    • Status: not replicated
    • Original paper: The relation between perception and behavior, or how to win a game of trivial pursuit, Dijksterhuis and van Knippenberg, 1998, 4 experiments with Study 1: n = 60; Study 2: n = 58; Study 3: n = 95; Study 4: n = 43. [citations = 1124 (GS November 2021)].
    • Critiques: O’Donnell et al., 2018, [n = 4,493 who met the inclusion criteria; n = 6,454 in supplementary materials, citations = 71(GS November 2021)]).
    • Original effect size: PD = 13.20%.
    • Replication effect size: All effect sizes are located in O’Donnell et al. 2018: Aczel: PD = -1.35%; Aveyard: PD = -3.99%; Baskin.: PD =4.08%; Bialobrzeska: PD = -.12%; Boot: PD =-4.99%; Braithwaite: PD = 4.01%; Chartier: PD = 3.23%; DiDonato: PD = 3.14%; Finnigan: PD: 2.89%; Karpinski: PD = 1.38%; Keller: PD = .17%; Klein: PD =.88%; Koppel: PD = -.20%; McLatchie: PD = -2.16%; Newell: PD = 1.66%; O’Donnell: PD = 1.58%; Phillipp: PD = 43%; Ropovik: PD = -.48%; Saunders: PD = -1.87%; Schulte-Mecklenbeck: PD = 4.24%; Shanks: PD = .11%; Steele: PD = -.58%; Steffens: PD = -.84%; Susa: PD = -.63%; Tamayo: PD =1.41%; Meta-analytic estimate: PD = 0.02%.

  • Moral priming (contemplation). Participants exposed to a moral-reminder prime would demonstrate reduced cheating.

    • Status: not replicated
    • Original paper: The Dishonesty of Honest People: A Theory of Self-Concept Maintenance, Mazar et al. 2008; 6 experiments with Study 1: n = 229; Study 2: n = 207; Study 3: n = 450; Study 4: n = 44; Study 5: n = 108; Study 6: n = 326. [citations= 3072 (GS November 2021)].
    • Critiques: Verschuere et al. 2018 [n = 5786 replication of Experiment 1, citations = 65(GS November 2021)].
    • Original effect size: d = -1.45.
    • Replication effect size: d = 0.18.
    • All effect sizes are located in Verschuere et al. 2018: Aczel: d = -0.26; Birt: d = 0. 41; Evans: d = 0.85; Ferreira-Santos: d = -0.19; Gonzalez-Iraizoz: d = 0.26; Holzmeister: d = 1.11; klein Selle and Rozmann: d = -0.27; Koppel: d = 0.39; Laine: d = -0.37; Loschelder: d = -0.11; McCarthy: d = 0.57; Meijer: d = -0.15; Ozdogru: d = 1.19; Suchotzki: d = 0.00; Sutan: d = 0.02; Vanpaemel: d = 0.17; Verschuere: d = 0.18; Wick: d = -0.09; Wiggins: d = 0.19; Meta-analytic estimate: d = 0.11.

  • Moral priming (cleanliness). Participants exposed to physical cleanliness were shown to reduce the severity of their moral judgments. Direct, well-powered replications did not find evidence for the phenomenon.

    • Status: not replicated
    • Original paper: With a Clean Conscience: Cleanliness Reduces the Severity of Moral Judgments, Schnall, Benton, and Harvey, 2008; 2 experiments with Study 1: n = 40, Study 2: n = 44. [citations=645 (GS November 2021)].
    • Critiques: Johnson et al. 2014, [Study 1: n = 208, Study 2: n = 126. citations=128(GS November 2021)].
    • Original effect size: Study 1: d = -0.60, 95% CI [-1.23, 0.04]; Study 2: d = -0.85, 95% CI [-1.47, -0.22]
    • Replication effect size: Study 1: d = -0.01, 95% CI [-0.28, 0.26]; Study 2: d = 0.01, 95% CI [-0.34, 0.36]

  • Distance priming. Participants primed with distance compared to closeness produced greater enjoyment of media depicting embarrassment (Study 1), less emotional distress from violent media (Study 2), lower estimates of the number of calories in unhealthy food (Study 3), and weaker reports of emotional attachments to family members and hometowns (Study 4).

  • Flag priming. Participants primed by a flag are more likely to be more in conservative positions than those in the control condition.

    • Status: mixed
    • Original paper: A Single Exposure to the American Flag Shifts Support Toward Republicanism up to 8 Months Later Carter et al. 2011; 2 studies with n = 191 completed three sessions and 71 completed the fourth session; Experiment 2: 70. [citations = 186 (GS, October 2021)]
    • Critique: Klein et al. 2014 [n=6,082, citations = 957 (GS, October 2021)]).
    • Original effect size: d = 0.50
    • Replication effect size: All effect sizes are located in ManyLabs: Adams and Nelson: d = .02. Bernstein: d = 0.07. Bocian and Frankowska: d = .19 (Study 1). Bocian and Franowska: d = -.22 (Study 2). Brandt et al.: d = .21. Brumbaugh and Storbeck: d = -.22 (Study 1). Brumbaugh and Storbeck: d = .02 (Study 2). Cemalcilar: d = .14. Cheong: d = -.11. Davis and Hicks: d = -.27 (Study 1). Davis and Hicks: d =-.03 (Study 2). Devos: d = -.11. Furrow and Thompson: d = .09. Hovermale and Joy-Gaba: d = -.07. Hunt and Krueger: d = .27. Huntsinger and Mallett: d = .06. John and Skorinko: d = .08. Kappes: d = .04. Klein et al.: d = -.11. Kurtz: d =.04. Levitan: d = -.01. Morris: d = .09 Nier: d = -.45. Packard: d = .04. Pilati: d = 0.00. Rutchick: d = -.07. Schmidt and Nosek (PI): d =.03. Schmidt and Nosek (MTURK): d = .09. Schmidt and Nosek (UVA): d = -.15. Smith: d = .27. Swol: d =-.03. Vaughn: d = -.17. Vianello and Galliani: d =.49. Vranka: d = -.03. Wichman: d = .11. Woodzicska: d =-.09. Average replication effect size: d = 0.03

  • Fluency priming. Objects that are fluent (e.g., conceptually fluent, visually fluent) are perceived more concretely than objects that are disfluent (disfluent objects are perceived more abstractly).

  • Money priming. “Images or phrases related to money cause increased faith in capitalism, and the belief that victims deserve their fate”.

    • Status: not replicated
    • Original paper: ‘Mere exposure to money increases endorsement of free-market systems and social inequality’, Caruso 2013; n between 30 and 168. (~161 citations [GS, November 2021)].
    • Critiques: Rohrer 2015 [n=136, citations = 82 (GS, November 2021)]. Meta-analysis: Lodder 2019, ([citations = 64 (GS, November 2021]).
      Original effect size: system justification d=0.8, just world d=0.44, dominance d=0.51
    • Replication effect size: Rohrer et al. (Experiment 1): d = 0.07 for system justification, d = 0.06 for belief in a just world, d = -0.06 for social dominance, fair market ideology, d = 0.14.
    • For 47 preregistered experiments in Lodder:
    • g = 0.01 for system justification. g = 0.11 [-0.08, 0.3] for belief in a just world. g = 0.07 [-0.02, 0.15] for fair market ideology.

  • Commitment priming (recall). Participants exposed to a high-commitment prime would exhibit greater forgiveness.

  • Death priming, alt term = Mortality Salience/Terror Management Theory. Participants not exposed to mortality primes would show higher fear of death.

    • Status: not replicated
    • Original paper: ‘Role of Consciousness and Accessibility of Death-Related Thoughts in Mortality Salience Effects’, Greenberg et al. 1994; 4 experiments with Study 1: n = 58; Study 2: n = 87; Study 3: n = 59; Study 4: n = 37.(citations=1237(GS November 2021)].
    • Critiques: Klein et al. 2018; [n = 2281 for Experiment 1, citations = 70(GS November 2021)].
    • Original effect size: d = XX.
    • Replication effect size: Exclusion Set 1: Hedges’ g = 0.03, 95% CI = [-0.06, 0.12]; Exclusion Set 2: Hedges’ g = 0.06, 95% CI = [-0.06, 0.17] Exclusion Set 3: Hedges’ g = 0.04, 95% CI = [-0.07, 0.16].

  • Spatial priming for emotional closeness. Spatial distances cues were used as a prime for participants’ feelings regarding their emotional closeness to their families (Williams & Bargh, 2008). Participants were asked to plot points on a grid on a paper, either closer or further apart. Then they were asked to rate how emotionally close they feel towards their family members.

    • Status: not replicated
    • Original paper: Keeping One’s Distance: The effect of spatial distance cues on affect and emotion, Lawrence and Bargh (2008), 4 experiments with Study 1: n = 73; Study 2: n = 42; Study 3: n = 59; Study 4: n = 84. [citation= 583, (GS, January 2022)].
    • Critiques: Pashler et al. 2012[n = 92, citations = 188 (GS, January 2022)]. Open Science Collaboration 2015 [total n=125, citations: 6148; GS, January 2022]
    • Original effect size: Study 1: η2 = .09; Study 2: η2 = .18; Study 3: η2 = .10; Study 4: η2 = .11
    • Replication effect size: Pashler et al.: η2 = 0.01_. _Joy-Gaba et al.’s effect sizes are located in Open Science Collaboration 2015 for Study 4: _η_2 = .00.

  • Verbal framing (temporal tense). Participants who read what a person was doing (relative to those who read what person did) showed enhanced accessibility of intention-related concepts and attributed more intentionality to the person.

    • Status: mixed
    • Original paper: ‘Learning about what others were doing: Verb aspect and attributions of mundane and criminal intent for past actions’, Hart and Albarracin (2011): 3 experiments with Study 1: n = 5458; Study 2: n = 37; Study 3: n = 48. [citations = 37, (GS, January 2022)].
    • Critiques: Eerland et al. (2016) [meta analysis (total n= 685 for perfective-aspect condition; n = 681 imperfective-aspect condition) of Study 3 citations = 70, (GS, January, 2022)]
    • Original effect size: Study 1: d = 1.00 for intentionality in imperfective-aspect condition; Study 2: d = 1.23 for imagery in imperfective-aspect condition; Study 3: d= 1.20 for intentionality, d = 0.92 for imagery and 0.55 for intention attribution in imperfective-aspect condition.
    • Replication effect size: All effect sizes are located in Eerland et al. 2016: intentionality: Arnal (lab): d = -0.35; Berger (lab): d = -0.98; Birt and Aucoin (lab): d = -0.38; Eerland et al. (lab): d =0.16; Eerland et al.(online): d = -0.33; Ferretti (lab): d = -0.01; Knepp (lab): d = -0.95; Kurby and Kibbe (lab): d = -0.14; Melcher (lab): d = 0.65; Michael (lab): d = -0.41; Poirier et al. (lab): d = 0.32; Prenoveau and Carlucci (lab): d = -0.38. Meta-analytic estimate for laboratory replications only: d = -0.24. Imagery: Arnal (lab): d = −0.01; Berger (lab): d = −0.45; Birt and Aucoin (lab): d = −0.40; Eerland et al. (lab): d =−0.01; Eerland et al.(online): d = -−0.13; Ferretti (lab): d = 0.33; Knepp (lab): d = 0.00; Kurby and Kibbe (lab): d = 0.02; Melcher (lab): d = −0.16; Michael (lab): d = -0.08; Poirier et al. (lab): d = -0.19; Prenoveau and Carlucci (lab): d = -0.02. Meta-analytic estimate for laboratory replications only: d = -0.08. Intention attribution: Arnal (lab): d = -0.15; Berger (lab): d = -0.15; Birt and Aucoin (lab): d = 0.08; Eerland et al. (lab): d =-0.01; Eerland et al.(online): d = 0.02; Ferretti (lab): d = -0.19; Knepp (lab): d = -0.29; Kurby and Kibbe (lab): d = 0.00; Melcher (lab): d = 0.12; Michael (lab): d = 0.13; Poirier et al. (lab): d = 0.06; Prenoveau and Carlucci (lab): d = 0.03. Meta-analytic estimate for laboratory replications: d = 0.00.

  • Prosocial spending. Spending money on other people leads to greater happiness than spending money on oneself.

    • Status: replicated (on the basis of three studies, NB: effect sizes smaller than original)
    • Original paper: Spending Money on Others Promotes Happiness (Dunn, Akinn, Norton, 2008) [citations = 2008 (GS, March 2022)] ‘
    • Critiques: Akinn et al., 2020; 3 Experiments [citations = 51 (GS, March 2022)]
    • Original effect size: _b _= 0.11, p < 0.01
    • Replication effect size: Experiment 1: n = 712, Cohen’s d = .36, .32; Experiment 2: n = 1950, Cohen’s d = .03, .02; Experiment 3: n = 5,199, Cohen’s d = .06, .06, .17.

  • Gustatory disgust on moral judgement. Gustatory disgust triggers a heightened sense of moral wrongness.

    • Status: not replicated
    • Original paper: A Bad Taste in the Mouth: Gustatory Disgust Influences Moral Judgment, Eskine et al. (2011); experiment, n = 57.[citation = 564 (GS, January 2022)].
    • Critiques: Ghelfi et al., 2020 [meta-analysis, total n = 1137, citations = 18 (GS, January 2022)]; Johnson et al., 2016 [Study 1: n = 478, Study 2: n = 934. citations = 52 (GS January 2022)].
    • Original effect size:_ _Cohen’s _d_= 1.12 (comparison to control group) Cohen’s _d_= 1.28 (comparison to sweet taste).
    • Replication effect size: Johnson et al.: Cohen’s d = 0.04 (Study 1 - comparison to control group), Cohen’s d = 0.05 (Study 2 - comparison to control group). All effect sizes are located in Ghelfi et al. 2016: comparison to sweet group: Christopherson: Hedges g = 0.53; Christopherson: Hedges’ g = 0.04; Fischer: Hedges’ g = 0.25; Guberman: Hedges’ g = -0.30; de Haan: Hedges’ g = -0.13; Legate: Hedges’ g = 0.99; Legate: Hedges’ g= -0.02; Lenne: Hedges’ g = -0.19; Urry: Hedges’ g = -0.13; Wagemans: Hedges’ g = 0.03; Weber: Hedges’ g = -0.27. Meta-analytic estimate: Hedges’ g = -0.05. Comparison to control group: Christopherson: Hedges g = 0.68; Christopherson: Hedges’ g = -0.19; Fischer: Hedges’ g = -0.01; Guberman: Hedges’ g = -0.12; de Haan: Hedges’ g = -0.24; Legate: Hedges’ g = 0.79; Legate: Hedges’ g= 0.37; Lenne: Hedges’ g = -0.13; Urry: Hedges’ g = 0.08; Wagemans: Hedges’ g = -0.11; Weber: Hedges’ g = -0.04. Meta-analytic estimate: Hedges’ g = 0.10.

  • Macbeth effect. Moral aspersions induce literal physical hygiene.

    • Status: mixed
    • Original paper: ‘Washing away your sins: threatened morality and physical cleansing’, Zhong and Liljenquist (2006): 4 experiments with Study 1: n=60; Study 2: n=27; Study 3: n=32; Study 4: n=45. [citation = 1407, (GS, January 2022)].
    • Critiques: Siev et al. 2018 [meta-analysis: n=1,746, citations = 17(GS, January 2022)].
    • Original effect size: Study 1: g = 0.53; Study 2: g = 1.00; Study 3: g = 0.86; Study 4: g = XX. [0.05, 1.68] for Study 3.
      Replication effect size: Siev et al. (2018): g = 0.17, 95% CI [0.04 – 0.31].
    • All effect sizes are located in Siev et al. 2018:
    • Earp et al. (2014): Study 1: g = 0.02 95% CI [-0.30 0.34], Study 2: g= 0.05 95% CI[-0.27, 0.37], Study 3: g = 0.13 95% CI[-0.11, 0.37]; Fayard et al. (2009): Study 1: g = 0.11[-0.20 0.43]; Gamez et al. (2011): Study 1: g = 0.02 95% CI [-0.54 0.56], Study 2: g = -0.01 95% CI[-0.64, 0.63], Study 3: g = 0.55 95% CI[-0.26, 1.37]; Lee and Schwarz (2010): Study 2: g = 0.22; 95% CI[-0.20 0.64]; Schaefer (2019): Study 2: g = 0.71 95% CI[0.18, 1.23]; Siev et al. (unpublished): Study 1: g = -0.06 95% CI [-0.27 0.15], Study 2: g = -0.18 95% CI[-0.56, 0.20]; Zhong (unpublished): Study 2: g = 0.28.

  • Signing at the beginning rather than end makes ethics salient. Signing a statement of honest intent before providing information rather than after can reduce dishonesty.

  • Stanford Prison Experiment employed a simulation of a prison environment to examine the psychological effects of coercive situations. Utilizing role-playing, labeling and social expectations it showed that one third of participants in the role of prison guards displayed aggressive and dehumanizing behaviour.

    • Status: NA
    • Original paper: ‘Interpersonal dynamics in a simulated prison’, Haney, Banks, Zimbardo (1973) [n=24, citations: 2115 (including highly referenced publications), (GS, January, 2022)].
    • Critiques: First, the study has been criticized for the lack of adherence to the experimental methodology. Although the study has been widely described as an ‘experiment’ it lacks many defining features: 1) it does not define the precise set of manipulated variables, 2) it manipulates multiple variables at time without the proper control over the effects of each one, 3) it does not define the dependent variable and how it will be measured, 4) it does not state any clear hypotheses. It is noteworthy that in the original paper, authors present their work as a “demonstration” not an experiment. Second group of serious issues is the degree of researchers’ ad-hoc interventions that were influencing the behaviour of the participants. One of the leading researchers, Philip F. Zimbardo took part in the experimental procedure as the prisons’ “Superintendent”. Another close collaborator of the research team David Jaffe, who initially conceived the idea of the mock-prison study, was playing the role of the “Warden”. Considering that these people knew the goal of the study and were, as later admitted, interested in the particular outcome (a call for reform of the prison system), the ad-hoc intervention, such as encouraging some of the guards to be more strict and ‘tough’, cast a reasonable doubt on the role of experimentator' expectations on the final results of the study. The third group of issues is sampling. Namely, the study has been conducted on a small (n=24, n per condition = 12) and largely unrepresentative sample (all males, all college students of similar age, all residents of the United States). Also, despite the screening procedures of the voluntarily applying candidates, it is still possible that a strong ‘demand characteristic’ and ‘self-selection bias’ may have affected the composition of the sample. All the participants have responded to the newspaper ad about wanting help in “psychological study of prison life”. The last issue with the Stanford Prison Experiment is the interpretation of the results. Even if the discovered effect is trustworthy (and above mentioned issues put this into questions), there is no clear theoretical interpretation of what this finding actually proves. Some critics argue that violent behaviour of the guards may be rooted in their following of a strong leadership, rather than from their immersion into attributed social role. Some specific works addressing criticism to the original study are listed as follows:
      Le Texier (2019) [commentary; citations: 38, (GS, January, 2022)] Banuazizi, Mahavedi (1975) [methodological analysis; citations: 118, (GS, January 2022)] Festinger 1980 [book; citations: 132, (GS, January 2022)] Haslam, Reicher, Van Bavel 2019 [methodological analysis; citations: 37, (GS, January 2022)] Griggs, Whitehead 2014 [textbook analysis; citations: 37, (GS, January 2022)] Griggs 2014 [textbook analysis; citations: 48, (GS, January 2022)] Blum 2018 [media coverage; citations: 31, (GS, January 2022)] LeTexier 2020 [preprint; citations: 0, (GS, January 2022)] Izydorczak, Wicher 2020 [preprint; citations: 0, (GS, January, 2022)] Reicher and Haslam 2011 [experimental case study but not exact replication of SFE; n = 15, citations: ~435, (GS, January 2022)] Lovibond, Adams, Adams 1979 [original research but not exact replication of SFE; n = 60, citations: 55, (GS, January, 2022))
    • Original effect size: Key claims were insinuation plus a battery of difference in means tests at up to 20% significance(!). n = 24, data analysis on 21.
    • Replication effect size: N/A

  • Milgram experiment was a study examining the influence of authority on the immoral behaviour. Participants were assigned the role of ‘teachers’ and they were instructed by the experimentator to administer electric shocks of 15-450 V voltage, whenever the ‘learner’ made a mistake. There were various variants of the study. In the most basic one, 100% of participants agree to administer a 300 V shock and 65% agreed to apply to maximum shock od 450 V.

    • Status: mixed
    • Original paper: Behavioral Study of obedience, Milgram 1963. n=40
      (~6600 citations). (The full range of conditions was n=740.)
    • Critiques: Experiment was riddled with** **researcher degrees of freedom, going off-script, implausible agreement between very different treatments, and “only half of the people who undertook the experiment fully believed it was real and of those, 66% disobeyed the experimenter.” Sources: Burger 2011, Perry 2012, Brannigan 2013; Griggs 2016
      (total citations: ~240), but see also Caspar 2020.
    • Original effect size: 65% of subjects said to administer maximum, dangerous voltage.
    • Replication effect size: Doliński 2017 is relatively careful, n=80, and found comparable effects to Milgram. Burger (n=70) also finds similar levels of compliance to Milgram, but the level didn’t scale with the strength of the experimenter prods (see Table 5: the only real order among the prompts led to universal disobedience), so whatever was going on, it’s not obedience. One selection of follow-up studies found average compliance of 63%, but suffer from the usual publication bias and tiny samples. (Selection was by a student of Milgram.) The most you can say is that there’s weak evidence for compliance, rather than obedience. (“Milgram’s interpretation of his findings has been largely rejected.")

  • Robbers Cave Study. Utilized arbitrary groupings to demonstrate that tribalism between groups arises spontaneously, and depending on the context, it can result in group competition (e.g., in case of scarce resources) or group cooperation (e.g., in case of superordinate goals and common obstacles)**. **

    • Status: NA
    • Original paper: ‘Superordinate Goals in the Reduction of Intergroup Conflict’, Sherif (1958), [n=22, citations: 1,010,(GS, February, 2022)]. In addition to the original paper, some related books from the author(s) are also highly cited including: ‘Groups in harmony and tension’ by Sherif & Sherif (1958) [citations: 2,280 (GS, February, 2022)] and Intergroup Conflict and Co-operation' by Sherif et al, (1961) [citations: 253, (GS, February, 2022)]. Overall, the effect accounts to more than 4000 total citations including the SciAm piece.
    • Critiques: No good evidence that tribalism arises spontaneously following arbitrary groupings and scarcity, within weeks, and leads to inter-group violence. The “spontaneous” conflict among children at Robbers Cave was orchestrated by experimenters; tiny sample (maybe 70?); an exploratory study taken as inferential; no control group; there were really three experimental groups - that is, the experimenters had full power to set expectations and endorse deviance; results from their two other studies, with negative results, were not reported. Set aside the ethics: the total absence of consent - the boys and parents had no idea they were in an experiment - or the plan to set the forest on fire and leave the boys to it. Some specific works addressing criticism to the original study are listed as follows:
      • Billig (1976) in passing [book; citations: 808, (GS, February, 2022), see media mention by Haslam (2018)];
      • Perry (2018)in passing [book; citations: 25, (GS, February, 2022), see also media summary by Shariatmadari (2018) and Haslam (2018)].
      • Tavris also claims that the underlying “realistic conflict theory” is otherwise confirmed. No definitive conclusion can be reached.
    • Original effect size: N/A. Not reported in conventional format. (Rationale: “results obtained through observational methods were cross-checked with results obtained through sociometric technique, stereotype ratings of in-groups and outgroups, and through data obtained by techniques adapted from the laboratory. Unfortunately, these procedures cannot be elaborated here.")
    • Replication effect size: N/A

  • Digital technology use and adolescent wellbeing. Adolescents who spent more time on new media (including social media and electronic devices such as smartphones) are more likely to report mental health issues.

  • Anthropomorphism. Individuals who are lonely are more likely than people who are not lonely to attribute humanlike traits (e.g., free will) to nonhuman agents (e.g., an alarm clock),to fulfill unmet needs for belongingness.

  • Female-named hurricanes are more deadly than male-named ones. Original effect size was a 176% increase in deaths, driven entirely by four outliers; reanalysis using a greatly expanded historical dataset found a nonsignificant decrease in deaths from female named storms.

    • Status: reversed
    • Original paper: ‘Female hurricanes are deadlier than male hurricanes’, Jung 2014;observational study with n=92 hurricanes discarding two important outliers [citations = 113(GS, Mar 2022)].
    • Critiques: Christensen 2014 [same data, citations = 114(GS, March 2022)]. Smith 2016 [same data, citations = 8(GS, March 2022)].Original effect size: d=0.65: 176% increase in deaths from flipping names from relatively masculine to relatively feminine
    • Replication effect size: Smith: 264% decrease in deaths (Atlantic); 103% decrease (Pacific)

  • Implicit bias testing for racism. Implicit bias scores poorly predict actual bias, r = 0.15. The operationalisations used to measure that predictive power are often unrelated to actual discrimination (e.g. ambiguous brain activations). Test-retest reliability of 0.44 for race, which is usually classed as “unacceptable”. This isn’t news; the original study also found very low test-criterion correlations.

  • The Pygmalion effect, the effect of a teacher’s expectations on a student’s performance, is at most small, temporary, and inconsistent, r<0.1 with a reset after weeks. Rosenthal’s original claims about massive IQ gains, persisting for years, are straightforwardly false (“The largest gain… 24.8 IQ points in excess of the gain shown by the controls.”), and used an invalid test battery. Jussim: “90%–95% of the time, students are unaffected by teacher expectations”.

  • Stereotype threat on Asian women’s mathematical performance, i.e. the interaction between race, gender and stereotyping. This study found that Asian-American women performed better on a math test when their ethnic identity was activated, but worse when their gender identity was activated, compared with a control group who had neither identity activated.

    • Status: Mixed
    • Original paper: ‘Domain-specific Effects of Stereotypes on Performance’, Shih et al.1999
    • Critiques: Gibson et al. 2014; Moon and Roeder 2014
    • Original effect size: Asian-identity-salient > control > female-identity-salient, r=.27; Asian-identity-salient > female-identity-salient, r=.35.
    • Replication effect size: Gibson et al. 2014: No group differences, η2=.01; Asian-primed vs. female-primed, p=.18, d=.27; Including only those who were aware of the stereotypes, group accuracy p=.02, η2=.04, and the means followed the predicted pattern, Asian (M=.63), Control (M=.55), and Female (M=.51); Likewise, female-primed participants performed worse than Asian-primed participants, p=.02, d=.53. Moon & Roeder (2014): Group accuracy, p=.44, g2=.004; female-primed and Asian-primed conditions, p=.43, d=.17; Analysing just those who were aware of the stereotype, p=.28,g2=.012; female-primed participants vs. Asian-primed participants, p=.28, d=.27.

  • Stereotype threat on gender differences in political knowledge, the idea that making gender stereotypes about political knowledge salient decrease womens’ performance on political knowledge tests. The replication effort showed no significant effect of gender stereotype activation on womens’ performance on a political knowledge test. ​

    • Status: Not replicated.
    • Original paper: Gender Differences in Political Knowledge: Bringing Situation Back In, Ihme & Tausendpfund (2018). Study 1: N= 603, shows that women are rated as less politically knowledgeable than men. Study 2: N=377; Female and male participants are randomly assigned to one of three conditions (stereotype not activated - control, stereotype activated by asking participants to report their gender, stereotype activated by a statement that there are gender differences in performance on the test participants are about to take) and answer a questionnaire assessing political knowledge. [citation=17 (GS, January 2022)]​.
    • Critiques: Azevedo, Micheli & Bolesta [Preprint] [n=1502, citations=NA]. Results showed a non-significant interaction between stereotype activation and gender on political knowledge scores.
    • Original effect size: partial η2 =0.33​.
    • Replication effect size: partial η2 =0.00​. \

  • Increase in narcissism (leadership, vanity, entitlement) in young people over the last thirty years. It’s an ancient hypothesis. The basic counterargument is that they’re misidentifying an age effect as a cohort effect (The narcissism construct apparently decreases by about a standard deviation between adolescence and retirement.) “every generation is Generation Me”

    • Status: not replicated
    • Original paper: ‘The Evidence for Generation Me and Against Generation We’, Twenge 2013, review of various studies, including national surveys [citations=251(GS, March 2022).
    • Critiques: Donnellan and Trzesniewski [k = 5, n=477,380, citations = 432(GS, March 2022)] . Arnett 2013 [unsystematic review, citations=171(GS, March 2022)], Roberts 2017 [reanalysis of original data and analysis of new sample n = 476, citations=195(GS, March 2022)], Wetzel 2017[1990s: n = 1,166; 2000s: n = 33,647; 2010s: n = 25,412, citations=101(GS, March 2022)].(~660 total citations), Meta-analysis: Hamamura et al. 2020 [total n =24990, citations = 5(GS, March 2022)].
    • Original effect size: d=0.37 increase in NPI scores (1980-2010), n=49,000.
    • Replication effect size: Roberts doesn’t give a d but it’s near 0. something like d=0.03 ((15.65 - 15.44) / 6.59). Wetzel: d = -0.27 (1990 - 2010). Hamamura: d(leadership) = -0.26, d(vanity)=-0.39, d(entitlement) = -0.23.

  • Minimal group effect (MGE), alt-term = Minimal group paradigm. Ingroup bias (i.e., the tendency to prefer ingroup members) that appears when participants are assigned to previously unfamiliar, experimentally created and largely meaningless social identities.

    • Status: replicated
    • Original paper: Experiments in intergroup discrimination by Tajfel (1970), postulated that… The finding was confirmed in several meta-analytic studies (Mullen, Brown & Smith, 1992).
    • Critiques: Related to the cultural ubiquity of MGE
    • Critique: Studies by Kerr, Ao, Hogg, & Zhang, 2018 (comparing US and Australia), and Falk, Heine, & Takemura, 2014, emphasized the cultural variation of MGE.​

  • Solomon Asch’s conformity study. The study investigated the degree to which a person’s own opinions are influenced by those of a group. The original study is regarded as credible and the main effect has been confirmed multiple times in many cultural contexts (see, Bond and Smith, 1996). Nevertheless, the main effect of an original study had been widely misinterpreted and incorrectly referred to in both academic and popular literature.​

    • Status: reversed
    • Original paper: ‘Studies of independence ity of one against a unanimous majority.’ Solomon, 1956; n = 123 [citations = 6558, GS, October 2021]​.
    • Critiques: Friend et al., 1990; [citations = 156, GS, November 2021]; Griggs, 2015, citations = 12, GS, November 2021.
    • Original effect size: 36.8% of the responses were incorrect (influenced by the majority). The effect has been interpreted by the author as evidence for the prevalence of independence (“The preponderance of judgments was independent, evidence that under the present conditions the force of the perceived data far exceeded that of the majority.”, Asch, 1956, p.24). Nevertheless, the majority of academic textbooks present the study as evidence for overwhelming conformity, failing to report the evidence of independent tendencies among participants (see: Friend et al., 1990, Griggs, 2015). A common practice seen in many academic textbooks and popular writings is to report the value of “75%” or “76%” as the general indicator of conformity. In reality, this is the fraction of respondents who yielded to the majority in at least one of the twelve trials. The reversal of this value (rarely mentioned in the literature) would be 24% - a fraction of completely independent respondents or 95% - a fraction of respondents who remain independent in at least one of twelve trials.
    • Replication effect size: Bond and Smith, 1996: d = .92, 95%CI[.89-.96], average rate of incorrect answers: 25%.

  • Dynamic norms. Information about increasing minority norms increases interest/engagement in minority behaviour.​

  • Social comparison. No robust evidence for an interaction effect between body dissatisfaction and social comparison on fat talk.

  • Bystander effect. Claims that the feeling of responsibility diffuses with an increasing number of other observers. Research about the bystander effect was sparked by the 1964 murder of Catherine “Kitty” Genovese. See this New York Times article for details. Here’s a more detailed resource.

  • Color red on attractiveness. Viewing the color red enhances men’s attraction to women. In a lingua franca this effect may reflect the amorous meaning in the human mating game. ​

    • Status: Mixed
    • Original paper: ‘‘Romantic red: Red enhances men’s attraction to women’, Elliot and Niesta (2008); experiment, N = 42 [citation=66 (GS, February 2022)]​.
    • Critiques: Peperkoorn et al. (2016) [n=830, citations=48 (GS, February 2022)]. ​
    • Original effect size: Cohen’s d = .66 to ES = X​.
    • Replication effect size: Peperkoorn et al. (2016; study 1): partial _η_2 = .03 (in support of white more attractive than red). Peperkoorn et al. (2016; study 2): F = .07.​ Peperkoorn et al. (2016; study 3): d = −.12.

  • Big brother effect. An original study reported that being watched makes someone more likely to cooperate. People who viewed by a pair of eyes (even when a picture of eyes and not a real person) were three times more likely to contribute to an honesty box used to collect money for drinks (compared to participants who instead saw a picture of flowers), but later meta-analyses did not find this result using very large sample sizes.

    • Status: not replicated
    • Original paper: ‘Cues of being watched enhance co-operation in a real-world setting’, Bateson et al, 2006; experimental design, n=48. [citations = 1604, Google Scholar, Dec 2021)]​.
    • Critiques: Carbon & Hesslinger, 2006 [n=138, citations=52 (Google scholar, December 2021)], Northover et al., 2017 [1st meta-analysis total n=2700, 2nd meta-analysis total n=20,000, citations=135 (Google scholar, December 2021)].
    • Original effect size: d=1.948.
    • Replication effect size: Northover et al., 1st meta-analysis: g=03. Northover et al., 2nd meta-analysis: g=0.13..​

  • Decoy effect

  • Imagined Contact: the claim that imagining contact (instead of having actual contact) with someone from an outgroup (based on e.g., ethnicity, sexuality, religion, age) can increase contact intention.

    • Status: mixed
    • Original paper: Turner, Crisp and Lambert, 2007; three experiments experiments, n = [28, 24, 27]. [citations = 263, Web of Knowledge, 02/2022]
    • Critiques: Husnu & Crisp, 2010 [n = [study 1, 33; study 2, 60; study 3, different effect, not examined here], citations = 129, Web of Knowledge, 02/2022]; Klein et al., 2014 [n = 6344[], citations = 540, Web of Knowledge, 02/2022]
    • Original effect size: [ηp² = 0.15, ηp² = 0.20, d= 0.86] (as calculated for this entry, using Lakens’ tool).
    • Replication effect size: Husnur & Crisp, study 1, d= 0.86 (CI = [0.14; 1.57] as per Klein et al.), study 2, d= 1.13; Klein et al., d= 0.13, CI = [0.00;0.19].​

  • Complementarity (opposites attract)

  • Stereotype susceptibility effects, Awareness of stereotypes about a person’s in-group can affect a person’s behavior and performance when they complete a stereotype-relevant task.​

Positive Psychology

  • Power pose. Taking on a power pose lowers cortisol and risk tolerance, while it raises testosterone and feelings of power.

    • Status: not replicated
    • Original paper: ‘Power Posing : Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance’, Carney et al. (2010), n=42 mixed sexes; 2010, [citations = citation = 1450 (GS, April, 2022)]
    • Critiques: Garrison et al. (2016), [n=305, citations = 70 (GS, April 2022)]; Metzler and Grezes (2019), [n = 82 men, citations = 3 (GS, April 2022)] Ranehill (2015),[total n=200, citations = 291 (GS, April 2022)]; Ronay 2017 [n=108, citations = 38 (GS, April 2022)];
    • Original effect sizes: Φ = 0.30 in risk-taking from Carney et al. (2010), Sources unknown: d = -0.30 for cortisol, d=0.35 for testosterone d=0.79 for feelings of power
    • Replication effect size: Garrison et al. (2016): feeling of power: np2 = .016; Metzler and Grezes (2019): cortisol: np2 = 0.02, testosterone: np2 = 0.01; Ranehill (2015): cortisol: d = -0.157, feelings of power: d = 0.34; risk taking: d = -0.176, testosterone: d = -0.200; Ronay (2017): cortisol: d = 0.034, feeling of power: d = 0.226, testosterone: d = 0.121.

  • Facial Feedback. Smiling causes a good mood, while pouting produces a bad mood.

    StudyPublication statusNd
    Andréasson & Dimberg (2008) published112-0.22
    Andréasson (2010) Study 3unpublished48-0.05
    Andréasson (2010) Study 3unpublished48-0.35
    Andréasson (2010) Study 4unpublished440.49
    Andréasson (2010) Study 4unpublished440.31
    Baumeister et al. (2016)published101.26
    Baumeister et al. (2016)published100.63
    Bodenhausen et al. (1994)published510.55
    Bush et al. (1989)published690.16
    Butler et al. (2003) Study 1published24-0.1
    Butler et al. (2003) Study 2published42-0.83
    Butler et al. (2006)published69-0.03
    Cai et al. (2016)published68-0.08
    Ceschi & Scherer (2003) published640.74
    Clapp (2012)unpublished990.69
    Clapp (2012)unpublished930.08
    Clapp (2012)unpublished930.17
    Clapp (2012)unpublished990.27
    Laird & Crosby (1974) Study 1 published26-0.13
    Laird & Crosby (1974) Study 2 published260.35
    Davey et al. (2013) Study 1published280.41
    Davey et al. (2013) Study 1published140.62
    Davey et al. (2013) Study 1published280.52
    Davey et al. (2013) Study 1published140.13
    Davey et al. (2013) Study 1published280.69
    Davey et al. (2013) Study 1published140.42
    Davey et al. (2013) Study 1published280.35
    Davey et al. (2013) Study 1published140.14
    Davey et al. (2013) Study 2published290.73
    Davey et al. (2013) Study 2published150.63
    Davey et al. (2013) Study 2published290.4
    Davey et al. (2013) Study 2published150
    Davey et al. (2013) Study 2published290.08
    Davey et al. (2013) Study 2published15-0.25
    Davey et al. (2013) Study 2published290.03
    Davey et al. (2013) Study 2published15-0.06
    Davis (2008) Study 1unpublished280.99
    Davis (2008) Study 1unpublished280.87
    Davis (2008) Study 2unpublished310.26
    Davis (2008) Study 2unpublished30-0.19
    Davis et al. (2009)published690.07
    Davis et al. (2009)published690.51
    Davis et al. (2010)published680.1
    Davis et al. (2010)published680.05
    Davis et al. (2010)published68-0.15
    Davis et al. (2015)published18-0.16
    Demaree et al. (2004)published530.62
    Demaree et al. (2004)published500.16
    Demaree et al. (2006)published32-0.64
    Demaree et al. (2006)published350.06
    Demaree et al. (2006)published37-0.38
    Dillon et al. (2007)published360.11
    Dimberg & Söderkvist (2011) Study 1 published480.51
    Dimberg & Söderkvist (2011) Study 2 published960.1
    Dimberg & Söderkvist (2011) Study 2 published960.32
    Dimberg & Söderkvist (2011) Study 3 published610.06
    Dimberg & Söderkvist (2011) Study 3 published610.31
    Dimberg & Söderkvist (2011) Study 3 published610.34
    Duncan & Laird (1977) published310.44
    Duncan & Laird (1977) published310.38
    Duncan & Laird (1977) published310.51
    Duncan & Laird (1980) published600.59
    Duncan & Laird (1980) published600.44
    Dzokoto et al. (2014)published701.02
    Dzokoto et al. (2014)published590.07
    Dzokoto et al. (2014)published351.07
    Dzokoto et al. (2014)published510.2
    Flack, Laird & Cavallaro (1999b) Study 1 published601.2
    Flack, Laird & Cavallaro (1999b) Study 1 published600.7
    Flack, Laird & Cavallaro (1999b) Study 1 published600.31
    Flack, Laird & Cavallaro (1999b) Study 1 published600.86
    Flack, Laird & Cavallaro (1999b) Study 1 published601.31
    Flack, Laird & Cavallaro (1999b) Study 2 published290.39
    Flack, Laird & Cavallaro (1999b) Study 2 published290.23
    Flack, Laird & Cavallaro (1999b) Study 2 published29-0.16
    Flack, Laird & Cavallaro (1999b) Study 2 published29-0.49
    Flack, Laird & Cavallaro (1999b) Study 2 published290.25
    Flack, Laird & Cavallaro (1999a) published541.41
    Flack, Laird & Cavallaro (1999a) published540.29
    Flack, Laird & Cavallaro (1999a) published541.18
    Flack, Laird & Cavallaro (1999a) published541.21
    Flack (2006)published510.72
    Flack (2006)published510.35
    Flack (2006)published510.59
    Flack (2006)published510.68
    Gan et al. (2015)published34-0.11
    Goldin et al. (2008)published170.8
    Gross & Levenson (1993) published850.04
    Gross & Levenson (1997) published1800.37
    Gross & Levenson (1997) published1800.16
    Gross (1993)unpublished1800.37
    Gross (1993)unpublished1800.09
    Gross (1993)unpublished1800.2
    Gross (1993)unpublished1800.16
    Gross (1993)unpublished180-0.23
    Gross (1998)published800.18
    Harris (2001)published360.07
    Hawk et al. (2012)published410.85
    Helt & Fein (2016) published430.42
    Hendricks & Buchanan (2016) published79-0.08
    Hendricks (2013)unpublished790.02
    Henry et al. (2007)published30-0.49
    Henry et al. (2007)published300.25
    Henry et al. (2009)apublished26-0.05
    Henry et al. (2009)apublished260.53
    Henry et al. (2009)bpublished20-0.05
    Henry et al. (2009)bpublished200.48
    Hess et al. (1992)published28-0.28
    Hess et al. (1992)published280.14
    Hess et al. (1992)published28-0.26
    Hess et al. (1992)published28-0.16
    Hofmann et al. (2009)published134-0.03
    Ito et al. (2006)published40-0.39
    Ito et al. (2006)published33-0.25
    Kalokerinos et al. (2015) Study 1published133.67b-0.06
    Kalokerinos et al. (2015) Study 1published133.67b-0.02
    Kalokerinos et al. (2015) Study 2published2951.32
    Kalokerinos et al. (2015) Study 2published2950.2
    Kao et al. (2017)published410.09
    Kao et al. (2017)published41-0.39
    Kao et al. (2017)published410.8
    Kao et al. (2017)published41-0.34
    Kao et al. (2017)published410.98
    Kao et al. (2017)published41-0.67
    Kircher et al. (2012)published271.89
    Kircher et al. (2012)published271.14
    Korb et al. (2012)published220.21
    Labott & Teleha (1996) published190.04
    Labott & Teleha (1996) published160.91
    Laird (1974) Study 1published380.46
    Laird (1974) Study 1published380.44
    Laird (1974) Study 1published380.39
    Laird (1974) Study 2published260.55
    Laird (1974) Study 2published260.13
    Lalot et al. (2014)published45-0.17
    Larsen et al. (1992)published270.43
    Lee (2011)unpublished520.48
    Lee (2011)unpublished440.17
    Lee (2011)unpublished52-0.27
    Lee (2011)unpublished44-0.26
    Lewis & Bowler (2009) published251.35
    Lewis (2012)published240.71
    Lewis (2012)published240.56
    Ma (2011)unpublished42.67b-0.21
    Ma (2011)unpublished42.67b-0.21
    Ma (2011)unpublished42.67b-0.21
    Ma (2011)unpublished42.67b-0.21
    Maldonado et al. (2015)unpublished157.33b0.12
    Marmolejo-Ramos & Dunn (2013) Study 1 published100-0.07
    Marmolejo-Ramos & Dunn (2013) Study 2 published106-0.07
    Marmolejo-Ramos & Dunn (2013) Study 3 published104-0.07
    Marmolejo-Ramos & Dunn (2013) Study 4 published100-0.07
    Marmolejo-Ramos & Dunn (2013) Study 5 published660.27
    Marmolejo-Ramos & Dunn (2013) Study 6 published670.38
    Martijn et al. (2002)published33-0.24
    McCanne & Anderson (1987) published30-2.16
    McCanne & Anderson (1987) published30-2.07
    McCanne & Anderson (1987) published304.73
    McCanne & Anderson (1987) published301.67
    McCanne & Anderson (1987) published302.48
    McCanne & Anderson (1987) published30-0.25
    McCaul et al. (1982)published270.25
    McIntosh et al. (1997)published260.54
    Meeten et al. (2015)published710.49
    Miyamoto (2006) Study 1unpublished400.17
    Miyamoto (2006) Study 1unpublished400.53
    Miyamoto (2006) Study 2unpublished770.49
    Moore & Zoellner(2012) published23.33b-0.87
    Kappas (1989)unpublished320.08
    Kappas (1989)unpublished320.26
    Kappas (1989)unpublished320.27
    Kappas (1989)unpublished320.1
    Kappas (1989)unpublished320.17
    Kappas (1989)unpublished320.52
    Kappas (1989)unpublished320.62
    Kappas (1989)unpublished320.74
    Kappas (1989)unpublished320.18
    Kappas (1989)unpublished320.42
    Ohira & Kurono (1993) Study 1 published201.23
    Ohira & Kurono (1993) Study 1 published200.31
    Ohira & Kurono (1993) Study 2 published201.61
    Ohira & Kurono (1993) Study 2 published20-1.38
    Paredes et al. (2013)published310.85
    Paul et al. (2013)published200.91
    Pedder et al. (2016)published680.7
    Pedder et al. (2016)published680.22
    Phillips et al. (2008)published320.18
    Phillips et al. (2008)published320.08
    Reisenzein & Studtmann (2007) Study 1 published530.18
    Reisenzein & Studtmann (2007) Study 1 published550.34
    Reisenzein & Studtmann (2007) Study 1 published55-0.08
    Reisenzein & Studtmann (2007) Study 1 published550.3
    Reisenzein & Studtmann (2007) Study 1 published53-0.12
    Reisenzein & Studtmann (2007) Study 1 published530.22
    Reisenzein & Studtmann (2007) Study 1 published52-0.04
    Reisenzein & Studtmann (2007) Study 1 published52-0.09
    Reisenzein & Studtmann (2007) Study 3 published40-0.74
    Richards, Butler & Gross (2003) published590.19
    Richards, Butler & Gross (2003) published59-0.12
    Richards & Gross (1999) Study 1 published58-0.1
    Richards & Gross (1999) Study 1 published580.25
    Richards & Gross (1999) Study 1 published580.36
    Richards & Gross (1999) Study 2 published850.13
    Richards & Gross (1999) Study 2 published850.24
    Richards & Gross (1999) Study 2 published850.06
    Richards & Gross (2000) Study 1 published53-0.12
    Richards & Gross (2000) Study 2 published610.39
    Richards & Gross (2006) published1310.34
    Roberts et al. (2008)published1600.07
    Robinson & Demaree (2009) published102-0.04
    Robinson & Demaree (2009) published1020.03
    Robinson & Demaree (2009) published1020
    Robinson & Demaree (2009) published1020
    Roemer (2014)unpublished440.58
    Roemer (2014)unpublished440.29
    Rohrmann et al. (2009)published360.16
    Rohrmann et al. (2009)published360.13
    Rummer et al. (2014)published740.57
    Rummer et al. (2014)published740.46
    Schmeichel , Vohs, & Baumeister (2003) published37-0.23
    Schmeichel et al. (2008)published500.1
    Söderkvist & Dimberg (unpublished) unpublished320.36
    Söderkvist et al. (2018) Study 1aunpublished320.34
    Söderkvist et al. (2018) Study 2aunpublished640.17
    Soussignan (2002)published33-0.17
    Soussignan (2002)published330.48
    Soussignan (2002)published330.47
    Soussignan (2002)published330.44
    Soussignan (2002)published320.53
    Soussignan (2002)published321.1
    Soussignan (2002)published321.11
    Soussignan (2002)published320.94
    Stel et al. (2008) Study 2published18.67b1.11
    Stel et al. (2008) Study 3published241
    Strack et al. (1988) Study 1published76.67b0.43
    Strack et al. (1988) Study 2published83-0.15
    Strack et al. (1988) Study 2published41.50.55
    Strack et al. (1988) Study 2published41.5-0.51
    Tamir et al. (2004)published72-0.16
    Tourangeau & Ellsworth (1979) published20.5b0.3
    Tourangeau & Ellsworth (1979) published20.5b0.3
    Tourangeau & Ellsworth (1979) published20.5b0.3
    Tourangeau & Ellsworth (1979) published20.5b0.3
    Trent (2010)unpublished107.33b-0.22
    Trent (2010)unpublished107.33b-0.22
    Trent (2010)unpublished107.33b-0.06
    Trent (2010)unpublished107.33b-0.06
    Vieillard et al. (2015)published310.25
    Vieillard et al. (2015)published310.66
    Vieillard et al. (2015)published300.21
    Vieillard et al. (2015)published300.14
    Vieillard et al. (2015)published31-0.05
    Vieillard et al. (2015)published31-0.5
    Vieillard et al. (2015)published300.07
    Vieillard et al. (2015)published30-0.12
    Wagenmakers et al. (2016) Albohn sitepublished1390.09
    Wagenmakers et al. (2016) Allard sitepublished1250.09
    Wagenmakers et al. (2016) Benning sitepublished115-0.01
    Wagenmakers et al. (2016) Bulnes sitepublished1010.09
    Wagenmakers et al. (2016) Capaldi sitepublished117-0.07
    Wagenmakers et al. (2016) Chasten sitepublished94-0.04
    Wagenmakers et al. (2016) Holmes sitepublished990.15
    Wagenmakers et al. (2016) Koch sitepublished100-0.14
    Wagenmakers et al. (2016) Korb sitepublished1010.01
    Wagenmakers et al. (2016) Lynott sitepublished1260.23
    Wagenmakers et al. (2016) Oosterwijk sitepublished110-0.17
    Wagenmakers et al. (2016) Ozdogru sitepublished87-0.3
    Wagenmakers et al. (2016) Pacheco-Unguetti sitepublished120-0.08
    Wagenmakers et al. (2016) Talarico sitepublished1120.02
    Wagenmakers et al. (2016) Wagenmakers sitepublished1300.13
    Wagenmakers et al. (2016) Wayand sitepublished110-0.14
    Wagenmakers et al. (2016) Zeelenberg sitepublished1080.25
    Wittmer (1985)unpublished30-0.36
    Wittmer (1985)unpublished30-0.21
    Yartz (2004)unpublished28-0.05
    Yartz (2004)unpublished30-0.18
    Yartz (2004)unpublished28-0.08
    Yartz (2004)unpublished30-0.09
    Yartz (2004)unpublished280.04
    Yartz (2004)unpublished300.5
    Zajonc et al. (1989) Study 3published371.27
    Zajonc et al. (1989) Study 4published260.47
    Zajonc et al. (1989) Study 4published260.31
    Zariffa et al. (2014)published24-0.57
    Zariffa et al. (2014)published24-0.14
    Zhu et al. (2015)published551.74

  • No good evidence for Blue Monday, that the third week in January is the peak of depression or low affect ‘as measured by a simple mathematical formula developed on behalf of Sky Travel’. You’d need a huge sample size, in the thousands, to detect the effect reliably and this has never been done.

  • Reason to be cautious about mindfulness for mental health. Most studies are low quality and use inconsistent designs, there’s higher heterogeneity than other mental health treatments, and there’s strong reason to suspect reporting bias (see also this relevant review; Van Dam, 2017). None of the 36 meta-analyses before 2016 mentioned publication bias.

  • Mindfulness for wellbeing - effectiveness of MBIs on wellbeing

  • Mindfulness and mental health - Correlates of mindfulness and mental health

  • Mindfulness and wellbeing - Correlates of mindfulness and well-being

Cognitive Psychology

  • Ego depletion, that willpower is limited in a muscle-like fashion.

    • Original paper: ‘Ego Depletion: Is the Active Self a Limited Resource?’, Baumeister 1998, n=67 (~5700 citation)
    • Critiques: Hagger 2016, 23 independent conceptual replications
      (total citations: ~6
    • Critique: Vohs et al. 2021, multisite project, n = 3,531 over 36 sites. Altmetrics: * Original effect size: something like d = -1.96 between control and worst condition. (I hope I’m calculating that wrong Replication effect size: d = 0.04 [−0.07, 0.14]. (NB: not testing the construct the same wa* Replication effect size (Vohs et al. 2021) : d = 0.06.

  • Dunning-Kruger effect. No evidence for the “Mount Stupid” misinterpretation.

    • Original paper: ‘Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments.’, Dunning & Kruger 1999, n=334 undergrads. This contains claims (1), (2), and (5) but no hint of (3) or (4). (~5660 citations)
    • Critiques: Gignac 2020, n=929; Nuhfer 2016 and Nuhfer 2017, n=1154; Luu 2015; Greenberg 2018, n=534; Yarkoni 2010.
      (total citations: ~20)
    • Original effect size: No sds reported so I don’t know. 2 of the 4 experiments showed a positive relationship between score and perceived ability; 2 showed no strong relationship. And the best performers tended to underestimate their performance. This replicates: the correlation between your IQ and your assessment of it is around r ≃ 0.3. (3) and (4) are not at all warranted.
      (5) is much shakier than (1). The original paper concedes that there’s a purely statistical explanation for (1): just that it is much easier to overestimate a low number which has a lower bound! And the converse: if I am a perfect performer, I am unable to overestimate myself. D&K just think there’s something notable left when you subtract this.
      It’s also confounded by (2)
    • Replication effect size (for claim 1): 3 of the 4 original studies can be explained by noisy tests, bounded scales, and artefacts in the plotting procedure. (“the primary drivers of errors in judging relative standing are general inaccuracy and overall biases tied to task difficulty”.) Only about 5% of low-performance people were very overconfident (more than 30% off) in the Nuhfer data
    • Gignac & Zajenkowski use IQ rather than task performance, and run two less-confounded tests, finding r = −0.05 between P and errors, and r = 0.02 for a quadratic relationship between self-described performance and actual performance
    • Jansen (2021) find independent support for claim 1 (n=3500) (the “performance-dependent estimation model”) and also argue for (5), since they find less evidence for an alternative explanation, Bayesian reasoning towards a prior of “I am mediocre”. (Fig 5b follows the original DK plot style, and is very unclear as a result.
    • Muller (2020) replicate claim (1) and add some EEG stuff* Some suggestions that claim (2) is WEIRD only.

  • Depressive realism effect, of increased predictive accuracy or decreased cognitive bias among the clinically depressed.

    • Original paper: ‘Judgment of contingency in depressed and nondepressed students: sadder but wiser?’, 1979 (2450 citations).
    • Critiques: Moore & Fresco 2012
      (211 total citations)
    • Original effect size: d= -0.32 for bias about ‘contingency’, how much the outcome actually depends on what you do,
      n=96 students, needlessly binarised into depressed and nondepressed based on Beck score > 9. (Why?)
    • Replication effect size: d = -0.07 with massive sd=0.46, n=7305, includes a trim-and-fill correction for publication bias. “Overall, however, both dysphoric/depressed individuals (d= .14) and nondysphoric/nondepressed individuals evidenced a substantial positive bias (d= .29)”

  • Hungry judge effect, of massively reduced acquittals (d=2) just before lunch. Case order isn’t independent of acquittal probability (“unrepresented prisoners usually go last and are less likely to be granted parole”); favourable cases may take predictably longer and so are pushed until after recess; effect size is implausible on priors; explanation involved ego depletion.

  • No good evidence for multiple intelligences (in the sense of statistically independent components of cognition). Gardner, the inventor: “Nor, indeed, have I carried out experiments designed to test the theory… I readily admit that the theory is no longer current. Several fields of knowledge have advanced significantly since the early 1980s.

  • At most weak evidence for brain training (that is, “far transfer” from daily training games to fluid intelligence) in general, in particular from the Dual n-Back game.

    • Original paper: ‘Improving fluid intelligence with training on working memory’, Jaeggi 2008, n=70. (2200 citations).
    • Critiques: Melby-Lervåg 2013, meta-analysis of 23 studies.
      Gwern 2012, meta-analysis of 45 studies.
    • Original effect size: d=0.4 over control, 1-2 days after training
    • Replication effect size: Melby: d=0.19 [0.03, 0.37] nonverbal; d=0.13 [-0.09, 0.34] verbal. Gwern: d=0.1397 [-0.0292, 0.3085], among studies using active controls.
    • Maybe some effect on non-Gf skills of the elderly.
      A 2020 RCT on 572 first-graders finds an effect (d=0.2 to 0.4), but many of the apparent far-transfer effects come only 6-12 months later, i.e. well past the end of most prior studies.
    • In general, be highly suspicious of anything that claims a positive permanent effect on adult IQ. Even in children the absolute maximum is 4-15 points for a powerful single intervention (iodine supplementation during pregnancy in deficient populations)

  • Generalized cognitive improvements following brief interventions. Cognitive improvements elicited by many interventions are not reliable, and their ecological validity remains limited ( Moreau 2021). In general, be highly suspicious of anything that claims a positive permanent effect on adult or children IQ. Even in children the absolute maximum is 4- 15 points for a powerful single intervention (iodine supplementation during pregnancy in deficient populations).

  • At most weak evidence for brain training, as in “far transfer” from daily computer training games to fluid intelligence in general, in particular from the Dual n-Back game.

    • Status: mixed
    • Original paper: ‘Improving fluid intelligence with training on working memory’, Jaeggi 2008, n=70. (2200 citations).
    • Critiques: Melby-Lervåg 2013 (meta-analysis of 23 studies), Gwern 2012 (meta-analysis of 45 studies).
    • Original effect size: d=0.4 over control, 1-2 days after training
    • Replication effect size: Melby: d=0.19 [0.03, 0.37] nonverbal; d=0.13 [-0.09, 0.34] verbal. Gwern: d=0.1397 [-0.0292, 0.3085], among studies using active controls. Reddick et al 2013 found “no positive transfer to any of the cognitive ability tests”.
    • Maybe some effect on non-Gf skills of the elderly.
      A 2020 RCT on 572 first-graders finds an effect (d=0.2 to 0.4), but many of the apparent far-transfer effects come only 6-12 months later, i.e. well past the end of most prior studies. Also, see Simons et al 2016 ( for a comprehensive, authoritative review. This sentence sums it up: “Based on this examination, we find extensive evidence that brain-training interventions improve performance on the trained tasks, less evidence that such interventions improve performance on closely related tasks, and little evidence that training enhances performance on distantly related tasks or that training improves everyday cognitive performance.”

  • Music lessons improve intelligence. An original experimental study found an increase in IQ for children who received a year of music lessons, compared to children who were randomly assigned to drama lessons or no lessons. Later studies did not replicate this when comparing music lessons to other forms of training such as dance lessons. However, observational studies do suggest that adult musicians have enhanced cognitive abilities compared to non-musicians or even bilinguals ( D’Souza et al., 2018), but this work is correlational, and one possible reason of such positive effects is that the personality traits of people who self-select into long-term music training (e.g., conscientiousness) also drives them to success in other areas. ​ This finding is a good case study of the problem of miscommunication of scientific research in the press ( Mehr, 2015), because the original study has not been replicated in over five attempts, yet it continues to be cited and used by the media and educators.

    • Status: not replicated
    • Original paper: ‘Music lessons enhance IQ’, [Schellenberg, 2004]( Article information); randomized control trial, n=144. [citations = 1424, Google Scholar, Dec 2021)]​.
    • Critiques: Mehr et al., 2013 [Study 1 n=29, Study 2 n=55, citations=52 (Google scholar, December 2021)], D’Souza & Wiseheart, 2018 [n=75, citations=20 (Google scholar, December 2021)].
    • Original effect size: d=1.948.
    • Replication effect size: Mehr et al., 2013: Wilks' λ = .851. D’Souza & Wiseheart: d=0.11 to d=0.55.

  • Brain training, that video gaming enhances cognitive ability.

    • Status:
    • Original papers: Studies have reported enhanced performance in a range of abilities, including visual processing (Green & Bavelier, 2003, 2007), attention (Belchior et al., 2013), spatial ability (Goldstein et al., 1997; Okagaki & Frensch, 1994), and executive function (Basak, Boot, Voss, & Kramer, 2008; Green, Sugarman, Medford, Klobusicky, & Bavelier, 2012).
    • Critiques: Sala et al., 2016: “The first meta-analysis (k = 310) examined the correlation between video game skill and cognitive ability. The second meta-analysis (k = 315) dealt with the differences between video game players and nonplayers in cognitive ability. The third meta-analysis (k = 359) investigated the effects of video game training on participants’ cognitive ability. Small or null overall effect sizes were found in all three models.” “Importantly, we found no evidence of a causal relationship between playing video games and enhanced cognitive ability.”

  • Bilingual advantages in executive control. The popular hypothesis was that speaking two languages also improves general cognitive control processes (executive control). However, this was challenged by a growing body of systematic studies, which showed no bilingual advantage across different executive control tasks, or even a small bilingual disadvantage. The lack of an effect was even found in exact replications of the original tasks, especially as sample size increases ( Paap et al., 2015), and after accounting for the main moderators proposed by the bilingualism literature (Gunnerud et al., 2020).+

  • Mozart effect. Listening to Mozart’s sonata for two pianos in D major (KV 448) enhances performance on spatial tasks in standardized tests.

    • Status: not replicated
    • Original paper: ‘Music and spatial task performance’, Rauscher, Shaw, and Ky (1993) with n=36. [citations= 2110 (GS, November 2021)].
    • Critiques: Steele et al. (1999a) [n=86, citations=555 (GS, November 2021)], Steele et al. (1999) [n=206, citations=126 (GS, November 2021)], Meta-analysis: Pietschnig et al. (2010) [meta analysis: 39 studies, citations= 235 (GS, November 2021)]
    • The effect sizes are calculated in Pietschnig et al. (2010):
    • Original effect size: d= 1.5 [0.65, 2.35]
    • Replication effect size: Adlmann (2006): d = 0.57 [0.25 0.89]; Carstens (1998) Study 1: d = -0.22 [-0.89 0.45]; Carstens (1998) Study 2: d = 0.47 [-0.23 1.17]; Cooper (2004): d = 0.42 [-0.23 1.08]; Flohr (1995) Study 1: d = 0.14[-0.35 0.63]; Flohr (1995) Study 2: d = 0.16[-0.26 0.58]; Gileta (2003) Study 1: d =0.13 [-0.26 0.51] ; Gileta (2003) Study 2: d = -0.05[-0.43 0.34]; Ivanov (2003): d = 0.77 [0.20 1.34]; Jones (2006): d = 0.92 [0.27 1.56]; Jones (2007): d = 0.54[0.11 0.97]; Kenealy (1994): d = -0.22 [-1.08 0.64]; Knell (2006): d = 0.45 [0.13 0.77]; Lints (2003): d = -0.37 [0.75 0.02] McClure (2004): d = 0.46 [-0.02 0.95]; Nantals (1999) Study 1: d_ _= 0.77 [-0.07 1.61]; Nantals (1999) Study 2: _d_ = 0.06 [-0.72 0.84]; Rauscher and Hayes (1999): _d_ = 0.52 [0.18 0.86]; Rauscher and Ribar (1999) Study 1: _d_ = 1.81[1.24 2.37]; Rauscher and Ribar (1999) Study 2: _d_ = 0.93[0.46 1.39]; Rideout (1996): _d_ = 1.54 [-0.67 3.75]; Rideout (1997): _d_ = 1.01 [0.19 1.82]; Rideout (1998a): _d_ =1.01 [-0.21 2.23]; Rideout (1998b): _d_ = 0.28 [-1.04 1.60]; Siegel (1999): _d_ = 0.26 [-0.39 0.91]; Spitzer (2003): _d_ = 0.01 [-0.32 0.33]; Steele et al.: _d _= 0.85 [0.41 1.30]; Steele, Dalla Bella, et al. (1999a) Study 1: _d_ = 0.49 [-0.01 1.00]; Steele, Dalla Bella, et al. (1999a) Study 2: _d_ = -0.41 [1.15 0.33]; Steele, Dalla Bella, et al. (1999b): _d_ = 0.85 [0.41 1.30]; Steele, Brown and Stoecker (1999): d=0.20 [-.08 0.48; Sweeny (2006) Study 1: d = -0.43 [-0.93 0.07]; Sweeny (2006) Study 2: d = -0.06 [-0.56 0.42]; Sweeny (2006) Study 3: d = 0.14 [-0.37 0.65]; Twomey (2002): d = 0.63 [-0.01 1.27]; Wells (1995): d = -0.18 [-0.83 0.47]; Wilson (1997): d =0.85 [-0.44 2.13]; Pietschnig et al.: meta-analytic estimate: d = 0.37 [0.23, 0.52]

  • Enhancing effects on IQ

  • Automatic imitation

  • Congruence Sequence effect

  • Action-sentence Compatibility Effect (ACE), participants’ movements are faster when the sentence meaning is consistent with their moving direction.

  • The attentional spatial-numerical association of response codes (Att-SNARC) effect is the finding that participants had quicker detects to left-side targets preceded by small numbers and to the right-side targets preceded by large numbers. This finding triggered many assumptions about the number representations grounded in body experience.

  • Scarcity effect. Having too little resources leads individuals to misallocate attention, leading to consequences such as overborrowing. Study 1 examined whether scarcity causes greater cognitive fatigue, measured by poorer performance on a cognitive ability task.​

    • Status: not replicated (reversal for the Shah et al. 2019 replication)
    • Original paper: ‘Some consequences of having too little’, Shah et al. 2012; 5 experiments with Study 1 n=60; Study 2 n=68; Study 3 n=143; Study 4 n=118; Study 5 n=137. Replication attempts on Study 1.
    • Critiques: Camerer et al. 2018 [n=619, citations=855(GS, November 2021)]; O’Donnell et al. 2021 [n=668, citations=0(GS, November 2021)]; Shah et al. 2019 [n=997, citations=19(GS, November 2021)]
    • Original effect size: r= .267
    • Replication effect size: Camerer et al. r= -.015; O’Donnell et al. r= -.039; Shah et al. η2 = .004

  • Scarcity effect - Meaning in life. Threats to people’s sense that they can afford things that they need in the present and foreseeable future, undermines perceptions of meaning in life.​

  • Scarcity effect - Discounting. A negative income shock was associated with increased discounting rates for gains and loses.​

  • Scarcity effect - Physical pain. The higher the economic insecurity is associated with the higher the physical pain.

  • Scarcity effect - **_Self expansion. _**Lower self-concept clarity (conceptualized as a finite resource) is associated with lower self-expansion.​

  • Scarcity effect **-_ Wellbeing. _**Imagining having less time available in one’s current city is positively associated with well-being.​

  • Scarcity effect - _Decision making. _ Lacking time or money can lead to making worse decisions.​

  • Scarcity effect - Opportunity costs. Poor people are more likely to consider opportunity costs spontaneously.

  • **Scarcity effect -_ Conscious thoughts._** Thoughts triggered by financial concerns intrude more often into consciousness of poorer individuals than for wealthier individuals.​

  • **Scarcity effect -_ Absoluteness of losses_.** Poorer individuals view losses in more absolute, rather than relative, terms than do wealthier individuals.​

    • Status: not replicated
    • Original paper: ‘Scarcity frames value’, Shah et al. (2015) with n=73. [citation=315(GS, November 2021)]​.
    • Critiques: O’Donnell et al. 2021 [n=209, citations=0(GS, November 2021)]
    • Original effect size: _r _= .264
    • Replication effect size: _r _= .090

  • Bottomless soup bowl. Visual cues related to portion size increase intake volume of soup.

  • Simon effect. Faster responses are observed when the stimulus and response are on the same side than when the stimulus and response are on opposite sides.

    • Status: mixed
    • Original paper: ‘Choice reaction time as a function of angular stimulus-response correspondence and age’, Simon and Wolf 1963; with, n1 = 20, n2 = 20. [citation=289(GS, June 2022)]​.
    • Critiques: Ehrenstein 1994 [n1=12, n2=14; citations=27(GS, June 2022)] ​ Marble & Proctor 2000 [n1=48; n2=20; n3=32, n4=80; citations=89(GS, June 2022)]; Proctor et al. 2000 [n1=64, n2=64; citations=74(GS, June 2022)]; Theeuwes et al. 2014 [n1=30, n2=30, n3=30, n4=30; citations=30(GS, June 2022)].
    • Original effect size: not reported but could be calculated.
    • Replication effect size: Ehrenstein: not reported but could be calculated; Marble and Proctor: not reported but could be calculated; Proctor et al.: not reported but could be calculated; Theeuwes et al.: ​ηp²(the compatible S-R instructions condition vs. the incompatible S-R instructions condition)=.12; ηp²(the compatible S-R instructions condition vs. the incompatible practiced S-R instructions condition)=.07; ηp²(the incompatible S-R instructions condition vs. the compatible S-R instructions condition)=.21; ηp²(e incompatible practiced S-R instructions condition vs. the compatible S-R instructions condition)=.11.

  • ERP and lie detectors

  • Evaluative conditioning

  • Bilingual deficit in lexical retrieval. Compared to monolinguals, bilinguals have often been found to be slower or less accurate in accessing the meaning of a certain word or the word for a certain representation under certain conditions.

    • Status: mixed
    • Original paper: ‘Memory in a monolingual mode: When are bilinguals at a disadvantage?’, Ransdell and Fischler, 1987; between-group multi-experiment study, with monolingual and bilingual young adults, n1 = 28, n2 = 28. [citations=216(GS, May 2022)]​.
    • Critiques: Bialystok et al. 2007 [study 1: n1=24, n2 = 24; study 2: n1 = 50, n2 = 16, citations=338(GS, May 2022)]; Gollan et al. 2002 [n1=30, n2=30; citations=584(GS, May 2022)]; Gollan et al. 2005 [study 1: n1=31, n2=31; study 2: n1=36, n2=36; citations=665(GS, May 2022)]; Rosselli et al. 2000 [n1=45, n2=18, n3=19; citations=341(GS, May 2022)]. Rosselli et al. 2002 [n= 45, n2=18, n3=19; citations=151(GS, May 2022)].
    • Original effect size: not reported but could be calculated.
    • Replication effect size: Bialystok et al.: not reported but could be calculated Rosselli et al. 2000: not reported but could be calculated; Rosselli et al. 2002: not reported but could be calculated; Gollan et al. 2002: not reported but could be calculated; ​Gollan et al. 2005: not reported but could be calculated. ​

  • Visiting a place will cue memories

  • Spacing effect

  • False memories

  • Motor priming

  • Associative priming

  • Repetition priming

  • Shape simulation

  • Flanker task

  • Mere Exposure Effect, the mere exposure effect refers to the finding that participants who are repeatedly exposed to the same stimuli rate them more positively than stimuli that have not been presented before.

    • Status: replicated
    • Original paper: Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(2), 1–27 [citation=9458(GS, February 2022)]​.
    • Critiques: Bornstein (1989). Meta-Analysis, total N = 33047 [citation=2944(GS, February 2022)]
    • Original effect size: Experiment 1, Nonsense words [F(5,355) = 5.64, p < .001], Experiment 2, Chinese characters [F(5, 335) = 4.72, p < .001], Experiment 3, Photographs [F(5, 355) = 9.96, p < .001]
    • Replication effect size: Combined effect size r = .260 (Bornstein, 1989)

  • Cocktail Party Effect, the cocktail party effect refers to the finding that approximately one third of participants hear their own name being presented in the irrelevant message during a dichotic listening task. Sometimes the impression is given that all participants demonstrate the effect. This is mentioned for example by Conway, Cowan & Bunting (2001): “Contrary to popular belief, not all subjects demonstrate this cocktail party effect.”. However, both in the original study and in the replications, less than half of the participants reported hearing their own name (29-43 percent).

    • Status: replicated
    • Original paper: Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. The Quarterly Journal of Experimental Psychology, 11, 56-60. [citation=1972 (GS, February 2022)]​.
    • Critiques: Wood & Cowan (1995) Replication [citation=467 (GS, February 2022)]; Conway et al. (2001) Replication [citation=1195 (GS, February 2022)]; Röer & Cowan (2021) Preregistered Replication [citation=3 (GS, February 2022)]
    • Original effect size: No effect size is given, only the detection rate: 33 percent
    • Replication effect size: Wood & Cowan (1995) 35 percent, Conway et al. (2001) 43 percent, Röer & Cowan (2021) 29 percent.

Developmental Psychology

  • Some evidence for a tiny effect of growth mindset (thinking that skill is improvable) on attainment. Really we should distinguish the correlation of the mindset with attainment vs. the effect of a 1-hour class about the importance of growth-mindset on attainment. I cover the latter but check out Sisk for evidence against both.

  • Expertise attained after 10,000 hours practice” (Gladwell). Disowned by the supposed proponents.

    • No good evidence that tailoring **teaching **to students’ preferred learning styles has any effect on objective measures of attainment. There are dozens of these inventories, and really you’d have to look at each. (I won’t.)
    • Original paper: Multiple origins. e.g. the ‘Learning style inventory: technical manual’ (Kolb), ~4200 citations. The VARK questionnaire (Fleming). But it is ubiquitous in Western educational practice.
    • Critiques: Willingham 2015; Pashler 2009; Knoll 2017 (n=54); Husmann 2019
      (total citations: ~2400 )
    • Original effect size:
    • Replication effect size:

  • Tailoring teaching to students’ preferred learning styles has any effect on objective measures of attainment.

  • Neonate imitation. Babies are born with the ability to imitate.

    • Status: NA
    • Original paper: ‘Imitation of facial and manual gestures by human neonates’, Meltzoff and Moore, 1977; 2 studies with: Study 1: n=6, Study 2: n=12.. [citation=5311 (GS, December 2021])​.
    • Critiques: Oostenbroek et al., 2016 [n=106, citations=259 (GS, December 2021)].
    • Original effect size: Not reported​.
    • Replication effect size: Not reported.

  • Watching violence on aggression

Differential psychology

Judgment and Decision Making

In general, the replication success rate of Judgment and Decision Making (JDM) is decent, 68% according to Collaborative Open-science Research (2022) led by Dr. Gilad Feldman, a mass replication project. That said, similar to other subfields in social sciences, many effects are overestimated and there are publication biases.

  • The effect of “nudges” (choice architecture that promotes beneficial decisions) may be exaggerated in general. One comprehensive review by DellaVigna and Linos (2020) found average effects by Nudge Units were six times smaller than studies published in academic journals. (Not saying there are no big effects.), see another systematic review by Szaszi et al. (2018) as well as Hummel and Maedche (2019).

  • Identifiable victim effect is much weaker and less robust than previously thought, with lots of mixed findings, failed replications, null findings and numerous boundary conditions. See Lee and Feeley (2016) meta-analysis, with r = .05.

  • Psychophysical numbing. People prefer to save lives if they are a higher proportion of the total (e.g. people prefer to save 4,500 lives out of 11,000 or 4,500 lives out of 250,000?).

    • Status: mixed (Study 2 was successfully replicated but Study 1 was a replication failure)
    • Original paper: Insensitivity to the value of human life: A study of psychophysical numbing. Fetherstonhaugh et al. 1997; 3 studies, 2 of which are split into Part A and Part B with n’s = 1: 54; 196 ; 2: 162; Experiment 3: 165 [citations = 468 (GS, December 2021)].
    • Critique: Ziano et al. 2021 [n=4799, citations = 0 (GS, December 2021)]
    • Original Study 1 effect size: η2p= 0.14
    • Replication effect size: Study 1a: η2p= 0.06, Study 1b: η2p= 0.21; Study 1c: η2p= 0.13, all were reversals.

  • Here are a few cautionary pieces on whether, aside from the pure question of reproducibility, behavioural science is ready to steer policy.

  • Moving the signature box to the top of forms does not decrease dishonest reporting in the rest of the form.

  • Loss aversion. The subjective value of losses exceeds the subjective values of gains.

    • Original paper:
    • Critiques:
      Meta-analyses: Nieuwenstein et al., 2020 [total n = 399]; Walasek et al. 2018 [19 studies, citations=11, Dec 2021], Brown et al. 2021 [607 estimates from 150 articles, citations=10, Dec 2021]
    • Original effect size:
    • Replication effect size:
      Walasek et al. 2018: λ = 1.31, 95% CI [1.10, 1.53]
      Brown et al. 2021: λ = 1.955, 95% credible interval [1.824, 2.104]
    • Loss aversion is still mostly replicable but with weaker effects for some people and in some situations (see Mrkva et al., 2020).

  • Unconscious Thought Advantage, or “deliberation-without-attention”, the idea that for complex choices (with more features to take into account), not deliberating leads to better decisions (as defined by the research team, i.e., normatively).​

    • Status: not replicated
    • Original paper: On making the right choice, Dijkterhuis, 2006; two experiments and two quasi-experiments, n = [80, 59, 93, 115]. [citations = 605, Web of Knowledge, 10/2021]
    • Critiques: Nieuwenstein & van Rijn (2012) [n = [48, 24, 32, 24], citations = 12, Web of Knowledge, 10/2021]; Nieuwenstein et al. (2015) [meta-analysis, 61 studies, n = [40-399]; replication study, n = 423; citations = 49, Web of Knowledge, 10/2021]; see also González-Vallejo et al. (2008) for a theoretical critique [citations = 51, Web of Knowledge, 10/2021]
    • Original effect size: _g _= [.86, .70] [as per Nieuwenstein et al., 2015].
    • Replication effect size: Nieuwenstein & van Rijn, g = [0.10, -0.55, 0.87, -0.74]; Nieuwenstein et al., g= -0.01, after trim-and-fill, meta-analysis pooled Hedges’ g = 0.018 with CI = [−0.10; 0.14]. ​

  • Self-interest is Overestimated: how much do personal benefits affect policy preferences and behaviors?

  • Marshmallow experiment

  • Differential reinforcement

  • Extinction bursts

  • Functional communication training

  • Derived relational responding

  • Schedules of R+

  • Above- and Below Average Effect. Above-and-below-average effects arise when comparing oneself to others, whereby people rate themselves as above average for easy abilities and below average for difficult abilities. All standardized beta-values from multiple regression predicting judgmental weight of own and others’ abilities from mean comparative ability estimates in different ability domains were consistent with the original. Additionally, different statistical tests resulted in slightly smaller effects in the same direction of the original.

    • Status: replicated
    • Original paper: Kruger (1999); 3 experiments with n=37. [citations = 1190 (GS, February 2022)]
    • Critiques: Korbmacher, Kwan & Feldman (2022) [citations = 0 (GS, February 2022)]. Review: Sundström (2008) [citations = 138 (GS, February 2022). Meta-analysis: Zell et al. (2020) [focuses only on above average effect, citations = 84 (GS, February 2022)]
    • Original effect sizes:
      • Correlations: Own ability & comparative ability _r = .95, Domain difficulty and comparative ability r= -.96 _
      • One sample t-tests: Easy domains d = 0.90, Difficult domains d = -1.44
    • Replication effect sizes:
      Zell et al. (2020): dz = 0.78, 95% CI [0.71, 0.84]
      Korbmacher et al. (2022):
      • Correlations: Own ability & comparative ability r = .99, Domain difficulty and comparative ability r= -.85
      • One sample t-tests: Easy domains: from d = 0.54 to d = 1.18, Difficult domains: from d =0.11 (non-sig) to d = -0.65).

  • Accuracy of information (truth discernment), asking people to think about the accuracy of a single headline improves “truth discernment” of intentions to share news headlines about COVID-19.​

    • Status: mixed
    • Original paper: ‘Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention’, Pennycook et al. (2021); 2 studies with n’s = 853;856 [citations=887(GS, March 2022)]​.
    • Critiques: Roozenbeek et al. (2021) [n=1583, citations=22(GS, March 2022)].
    • Original effect size: Study 1: d = 0.657, 95% confidence interval (CI) = [0.477, 0.836] on accuracy judgment; d = 0.121, 95% CI = [0.030, 0.212] on sharing intention; Study 2: control condition: d = 0.050, 95% CI = [−0.033, 0.133]; treatment condition: d = 0.142, 95% CI = [0.049, 0.235].
    • Replication effect size: Roozenbeek et al. : Study 1: F = 1.53; Study 2: treatment: d = −0.14, 95% CI = [−0.17, −0.12], control: d = −0.10, 95% CI = [−0.13, −0.078].

Marketing records all retractions and replications in marketing.

  • Brian Wansink accidentally admitted gross malpractice; fatal errors were found in 50 of his lab’s papers. These include flashy results about increased portion size massively reducing satiety.

  • Choice overload, the idea that giving people too many choices can lead to certain undesirable consequences such as reduced purchasing intentions, is in doubt, but most people don’t consider it to be discredited.

    • Status: Mixed (dueling meta-analyses, mix of successful and failed replications). It would probably require a systematic, multi-lab replication approach to sort this out at this point.
    • Original study: When choice is demotivating, Iyengar & Lepper, 2000. In the original field experiment with exotic jams where # flavors were manipulated (24 vs. 6), more stopped to browse the larger selection (60% vs 40%), but more purchased from the smaller selection (30% (31) vs. 3% (4)) (Iyengar and Lepper 2000). 3 experiments with n=249,193,. [citations=4897(GS,October 2021)].
    • Failed replications: Scheibehenne (2008) failed to directly replicate Iyengar and Lepper (2000) jam study. Greifeneder (2008) did a lab experiment with chocolates and also failed to conceptually replicate. These replication failures are not definitive because there have been many studies (too many to list) in which the effect was (conceptually) replicated in some fashion.
    • Meta-analyses:
      • Scheibehenne et al. 2010: “We found a mean effect size of virtually zero” (d=.02) [citations=1049(GS,October 2021)].
      • Chernev et al. 2010: That’s because many of the studies were designed to show instances when there is no effect. You need to split the data into “choice is good” vs. “choice is bad.”
      • Simonsohn et al. 2014: We agree with Chernev et al. (2010). When we split it up, we found that the choice is bad studies (choice overload) lack collective evidential value (uniform p-curve).
      • Chernev et al. 2015: <ignoring Simonsohn et al. 2014> Choice overload is a reliable effect under certain conditions (moderators).
    • Original effect size: d=0.77 (study1) and d=0.29 (study2), and d=0.88 (study3) (as calculated from the X^2 values in the text with this online calculator)

  • Mate guarding, the idea that women use conspicuous luxurious goods to deter female rivals by signaling to other women they have a devoted partner.

    • Status: reversed
    • Original paper: Conspicuous Consumption, Relationships, and Rivals: Women’s Luxury Products as Signals to Other Women, Wang & Griskevicius 2014; 5 studies (Study 1: N=69; shows that a women was perceived by other women as having a more devoted partner when she had a designer brand outfit accessory vs a non-designer bran accessory. Study 2: N=137; women in the mate guarding condition are asked to imagine they are at a party with their date and another woman is flirting with their date. The activation of a mate guarding motive increases women’s desire for conspicuous consumption. Study 3: N=115; replicates study 2 and shows that a mate guarding motive only increases desire for conspicuous goods. Study 4: N=75; the activation of a mate guarding motive increases women’s spending on luxurious accessories, but only when these accessories are visible to other women. Study 5: N=175; shows that displaying luxurious goods dissuades other women from pursuing a relationship with a taken man. [citation=450 (Google Scholar, January 2022)]​.
    • Critiques: Tunka & Yanar (2020) [conceptual (Study 1, N= 250) and direct replications (Study 2, N=255) of study 1 of Wang & Griskevicius, citations=2 (GS, January 2022)]. Study 1 did not replicate the original findings that women with luxurious goods are perceived by other women as having devoted partners. In study 2, a reversal is observed, such that women with non-designer possessions were perceived to have a more devoted partner than women with designer possessions.
    • Original effect size: d=0.24
    • Replication effect size: Study 1 (d =0.13); Study 2 (d=-0.27).

  • Super size me. Larger food options are associated with higher status.

    • Status: Not replicated.
    • Original paper: Super Size Me: Product Size as a Signal of Status, Dubois et al. (2012); 6 studies with n’s = 183; 142; 89; 269; 134; 104 [citations = 325 (GS, January 2022)].
    • Critiques: Tunca et al. (2021) [Preprint] [direct replication of study 1 of Dubois et al. (2012); N= 415, citations=1 (GS, January 2022)].
    • Original effect size: Study 1: Large vs. small product size: d=1.10; large vs. medium product size: d=0.65; medium vs. small: d=0.46.
    • Replication effect size: Large vs. small product size: d=-0.1; 95%CI [-0.15, 0.33]; large vs. medium: d=-0.11 95%CI [-0.13, 0.34]; medium vs. small: d=-0.01 95%CI[-0.25, 0.23]).

  • Scarcity effect - Overborrowing. Perceived financial scarcity causes consumers to overborrow.

  • Scarcity effect - Resource allocation. Poor economic conditions favour resource allocations to daughters over sons.

  • Scarcity effect - Planning. Consumers who feel resource constrained shift to engage in relatively more priority planning, rather than efficiency planning.

  • Scarcity effect - Competition/threat. Exposure to limited-quantity promotion advertising prompts consumers to perceive other shoppers as competitive threats.

  • Scarcity effect - Brand attitudes. Observing luxury brand consumers whose consumption arises from unearned financial resources reduces observers’ brand attitudes when observers place a high value on fairness.​

  • Scarcity effect - Product use creativity. Scarcity salience is associated with greater creativity.​

  • Scarcity effect - Wage rates. The difference in implied wage rates based on a time elicitation versus a money elicitation procedure is reduced as the time horizon increases.​

  • Scarcity effect - Selfishness. Reminders of scarcity causes selfish behaviour to a greater extent in people with low social value orientation.​

  • Scarcity effect - Preference for material goods. Scarcity leads to a preference for material goods over experiential goods.​

  • Scarcity effect - Preference polarization. Perceived scarcity leads to greater preference polarization and stronger preference for a preferred option over a less preferred option.​


  • One mind per hemisphere. The corpus callosotomy studies which purported to show “two consciousnesses” inhabiting the same brain were badly overinterpreted.

  • Existence of high-functioning (IQ ~ 100) hydrocephalic people. The hypothesis begins from extreme prior improbability; the effect of massive volume loss is claimed to be on average positive for cognition; the case studies are often questionable and involve little detailed study of the brains (e.g. 1970 scanners were not capable of the precision claimed).

    • Status: NA
    • Original paper: No paper; instead a documentary and a profile of the claimant, John Lorber. Also Forsdyke 2015 and the fraudulent de Oliveira 2012 ( citations).
    • Critiques: Hawks 2007; Neuroskeptic 2015; Gwern 2019
      (total citations: )
    • Alex Maier writes in with a cool 2007 case study of a man who got to 44 years old before anyone realized his severe hydrocephaly, through marriage and employment. IQ 75 (i.e. d=-1.7), which is higher than I expected, but still far short of the original claim, d=0.

  • Readiness potentials seem to be actually causal, not diagnostic. So Libet’s studies also do not show what they purport to. We still don’t have free will (since random circuit noise can tip us when the evidence is weak), but in a different way.

  • No good evidence for left/right hemisphere dominance correlating with personality differences. No clear hemisphere dominance at all in this study.

  • Oxytocin on trust. Intranasal administration of oxytocin increases trust in strangers in a laboratory setting.

    • Status: not replicated
    • Original paper: Oxytocin increases trust in humans, Kosfeld et al. (2005); experiment, n = 128_. _[citations = 4800, April 2022]
    • Critiques: Declerck et al. (2020)[n = 677, citations =57, (GS, April 2022) ]. Lane et al. (2015) [n = 95, citations =63, (GS, April 2022) ];.
    • Original effect size: Not reported but could be calculated: “In fact, our data show that oxytocin increases investors' trust considerably. Out of the 29 subjects, 13 (45%) in the oxytocin group showed the maximal trust level, whereas only 6 of the 29 subjects (21%) in the placebo group showed maximal trust (Fig. 2a). In contrast, only 21% of the subjects in the oxytocin group had a trust level below 8 monetary units (MU), but 45% of the subjects in the control group showed such low levels of trust.” Kosfeld et al. (2005)
    • Replication effect size: not reported

  • Structural brain-behaviour correlations - the association between behavioural activation and white matter integrity. Individual differences in the sensitivity to signals of reward as indexed by BAS-Total and in the tendency to seek out potentially rewarding experiences as measured by BAS-Fun are positively correlated with diffusion measures of several white matter pathways.

    • Status: not replicated
    • Original paper: Xu et al. (2012) [n = 51, citations = 29 (GS, May 2022)]​.
    • Critiques: ( of Boekel et al. (2015) [citations = 196 (GS, May, 2022)]; Keuken et al. (2017) [n = 34-35, citations = 1 (GS, May, 2022).
    • Original effect size:
      • BAS-Total correlation with parallel diffusivity in the left corona radiata (CR)/superior longitudinal fasciculus (SLF): r = .51.
      • BAS-Fun correlation with:
        • fractional anisotropy in the left CR/SLF: r = .52
        • parallel diffusivity in the left CR/SLF: r = .58
        • mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .51
    • Replication effect size: Keuken et al. (2017):
      • BAS-Total correlation with parallel diffusivity in the left CR/SLF: r = -.15
      • BAS-Fun correlation with:
        • fractional anisotropy in the left CR/SLF: r = -.15
        • parallel diffusivity in the left CR/SLF: r = -.04
        • mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .05

  • Structural brain behaviour correlations - The association between social network size and grey matter volume. Individual differences in the number of Facebook friends (FBN) are positively correlated with grey matter volume in several brain areas: left middle temporal gyrus (MTG), right superior temporal sulcus (STS), rich entorhinal cortex (EC), left and right amygdala.

    • Status: mixed
    • Original paper: Kanai et al., (2012) [n = 125, citations: 411 (GS, May, 2022)].
    • Critiques: Boekel et al. (2015) [n = 34-35, citations = 196 (GS, May, 2022] ;Kanai et al., (2012) [n = 40, citations: 411 (GS, May, 2022)].
    • Original effect size: left MTG: r =.35; right STS: r = .35; right EC: r = .35, left amygdala: r = .30; right amygdala: r = .32.
    • Replication effect size:
    • Kanai et al., (2012): left MTG: r =.38; right STS: r = .44; right EC: r = .48; left amygdala: r = .33; right amygdala: r = .48.
    • Boekel et al. (2015): left MTG: r = .18; right STS: r = .11; right EC: r = .06; left amygdala: r = -.14; right amygdala: r = .02.

  • Structural brain-behaviour correlations - the association between distractibility and grey matter volume. Variability in self-reported distractibility is positively correlated with grey matter volume in the left superior parietal lobule (SPL) and negatively correlated with grey matter volume in medial pre-frontal cortex (mPFC).

    • Status: not replicated
    • Original paper: Kanai et al., (2011) [n = 155, citations: 110 (GS, May, 2022)].
    • Critiques: Boekel et al. (2015) [n = 36, citations = 196 (GS, May, 2022].
    • Original effect size: left SPL: r =.38; mPFC: r = -.28.
    • Replication effect size: Boekel et al. (2015): left SPL: r =.22; mPFC: r = -.19.

  • Structural brain-behaviour correlations - the association between attention and cortical thickness. Individual differences in executive control are negatively correlated with cortical thickness in left anterior cingulate cortex (ACC), left superior temporal gyrus (STG), and right middle temporal gyrus (MTG), whereas variation in alerting scores is negatively correlated with cortical thickness in the left superior parietal lobule (SPL).

    • Status: not replicated.
    • Original paper: Westlye et al., (2011) [n = 132; citations = 190 (GS, May 2022)]​.
    • Critiques: Boekel et al. (2015) [n = 35, citations = 196 (GS, May, 2022].
    • Original effect size:
      • Executive control scores and cortical thickness in left ACC: r = -.21; left STG: r = -.15; right MTG: r = -.13.
      • Alerting scores and cortical thickness in left SPL: r = -.26
    • Replication effect size: Boekel et al. (2015):
      • Executive control scores and cortical thickness in left ACC: r = -.18; left STG: r = -.14; right MTG r = -.19.
      • Alerting scores and cortical thickness in left SPL: r = .16.

  • Structural brain-behaviour correlations - the association between control over speed/accuracy of perceptual decisions and white matter tracts strength. Individual differences in control over speed and accuracy of perceptual decisions are positively correlated with the strength of white matter tracts between the right presupplementary motor area (pre-SMA) and the right striatum.

    • Status: mixed
    • Original paper: [Forstmann et al. (2010) n = 9, citations = 387 (GS, May 2022)]​.
    • Critiques: corrigendum of Boekel et al. (2015) [citations = 196 (GS, May, 2022)]; ​​[Forstmann et al. (2010) n = 12, citations = 387 (GS, May 2022)]; Keuken et al. (2017) [n = 32, citations = 1 (GS, May, 2022),
    • Original effect size: r = .93.
    • Replication effect size: ​​Forstmann et al. (2010): r = .76; Keuken et al. (2017): r = -.08.


  • At most extremely weak evidence that psychiatric hospitals (of the 1970s) could not detect sane patients in the absence of deception.

  • Low self-esteem on poor mental health/psychological outcomes. Small amount of slightly mixed evidence for some outcomes but not supported for most outcomes e.g. alcohol/smoking/drug use etc.

    • Status:
    • Original paper:
    • Critiques: Baumeister, Campbell, Krueger & Vohs 2003. Does High Self-Esteem Cause Better Performance, Interpersonal Success, Happiness, or Healthier Lifestyles?. Total number of studies included in their review seems unclear - started with 15,000 sources but narrowed this down and final number included doesn’t seem clear? Showed some mixed evidence but mostly refuted claims. They found self-esteem was not related to smoking, alcohol, drug use; seemed to be only minimally associated with interpersonal success; Relationship with school performance seems to be that better school performance leads to higher self-esteem rather than the other way around. Self-esteem was moderately correlated with depression.

  • Rorschach Test, as a diagnostic tool for psychiatric conditions.

    • Status: NA
    • Original paper:
    • Critiques: Wood, Lilienfeld, Garb, & Nezworski, 2000. The Rorschach test in clinical diagnosis: a critical review, with a backward look at Garfield (1947). . Test has some merit in detecting thinking disorders (although this is thought to be non-projective rather than projective which is meant to be the intention of the test; Dawes 1994) but is not related to other conditions such as depression, anxiety, antisocial personality disorder. Garb 1998, Lilienfeld et al. 2006. These indicate that clinicians with access to questionnaire data or life histories of patients use data from the Rorschach test, their predictive accuracy actually decreases, possibly because they place more weight on the Rorschach results which are lower quality than data from other sources. Lilienfeld , Wood, & Garb 2006. Why questionable psychological tests remain popular. Scientific Review of Alternative Medicine. Garb1998. Studying the clinician: judgement research & psychological assessment. APA.

  • Lunar effect, alt term = Transylvania effect, this suggests there is a correlation between the full moon and strange occurances, particularly human behaviour. This is thought to have existed as a folk belief for centuries, and is widely believed today (e.g. According to Owen & McGowan, 81% of mental health professionals believe in this effect, with 69% of mental health nurses believing that full moons are associated with an increase in patient admissions - Francescani & Bacon, 2008).

    • Status: not replicated
    • Original paper: The lunar effect was popularised by Arnold Lieber: Lieber, A. L. (1978). The Lunar Effect. Anchor Press. Lieber, A. L. & Agel, J. (1996). How the moon affects you. Hastings House. Can’t access the cited books to add study details/sample sizes etc. Variety of other papers are available by other authors on this effect but Lieber reference used as it’s mostly attributed to him.
    • Critiques: Rotton & Kelly 1985. Much ado about the full moon: A meta-analysis of lunar-lunacy research Meta-analysis of 37 studies. Gutiérrez-Garcia & Tusell 1997. Suicides and the Lunar Cycle. n=897 (deaths by suicide). Kung & Mrazek 2005. Psychiatric Emergency Department Visits on Full-Moon Nights.
    • Original effect size:
    • Replication effect size:

  • Lack of a Theory of Mind is universal in autism. All autistic people fail to understand that other people have a mind or that they themselves have a mind.


  • Precognition, undergraduates improving memory test performance by studying after the test.

    • Status: not replicated
    • Original paper: ‘Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect’, Bem 2012, 9 experiments with: Study 1: n=100; Study 2: n= 150; Study 3: n = 100; Study 4: n = 100; Study 5: n = 100; Study 6: n = 150; Study 7: n = 200; Study 8: n = 100; Study 9: n = 50; [citation = 1216 (GS, March 2022]).
    • Critiques: Ritchie et al. 2012, 3 replications: Replication 1: n = 50, Replication 2: n = 50; Replication 3: n = 50 [experiment; n=150, citations=235 (GS, December 2021)]; Gelman, 2013 [newspaper article] Schimmack, 2018 [blog]
    • Original effect size: Study 1: _d = _0.25; Study 2: d = 0.20; Study 3: d = 0.26; Study 4: d = 0.23; Study 5: d = 0.22; Study 6: negative trials: d = 0.15, erotic trials = d = 0. 14; Study 7: d = 0.09; Study 8: d = 0.19; Study 9: d = 0.42; mean effect size = 0.22.
    • Replication effect size: All effect sizes are reported in Ritchie et al. 2012: Replication 1: d = 0.30, Replication 2: d = -0.39, combined: d = 0.04 (converted using this).

Evolutionary psychology

  • Romantic priming, that looking at attractive women increases men’s conspicuous consumption, time discount, risk-taking. Weak, despite there being 43 independent confirmatory studies!: one of the strongest publication biases / p-hacking ever found.

    • Original paper: ‘Do pretty women inspire men to discount the future?’, Wilson and Daly 2003. n=209 (but only n=52 for each cell in the 2x2) (~560 citations).
    • Critiques: Shanks et al (2015): show that the 43 previous studies have an unbelievably bad funnel plot. They also run 8 failed replications. (total citations: ~80)
    • Original effect size: d=0.55 [-0.04, 1.13] for the difference between men and women. Meta-analytic d= 0.57 [0.49, 0.65]
    • Replication effect size: 0.00 [-0.12, 0.11]

  • Implicit religious priming. Implicitly priming god concepts by unscrambling sentences with words relating to religion increases prosocial behaviour in an anonymous economic game.

  • Implicit analytic priming, that implicitly priming analytic thinking by seeing a photo of Auguste Rodin’s The Thinker decreases belief in God.

    • Status: not replicated
    • Original paper: ‘Analytic thinking promotes religious disbelief’, Gervais and Norenzayan 2012; n=57 [citation=601 (Google Scholar, December 2021)].
    • Critiques: Sanchez et al [n=941, citations=59 (Google Scholar, December 2021)]. Camerer et al 2018; 2 experiments, n=224 and n=531 [citations=871 (Google Scholar, December 2021)].
    • Original effect size: d=-0.25 to d=0.12.
    • Replication effect size: Sanchez et al 2017, d=-0.25 to d=0.12. Camerer et al 2018, study 1 r=-0.055, study 2 r=-0.035.

  • Menstrual cycle version of the dual-mating-strategy hypothesis (that “heterosexual women show stronger preferences for uncommitted sexual relationships [with more masculine men] during the high-fertility ovulatory phase of the menstrual cycle, while preferring long-term relationships at other points”). Studies are usually tiny (median n=34, mostly over one cycle). Funnel plot looks ok though.

    • Original paper: ‘Menstrual cycle variation in women’s preferences for the scent of symmetrical men’, Gangestad and Thornhill (1998). (602 citations).
    • Critiques: Jones et al (2018) (total citations: 32)
    • Original effect size: g = 0.15, SE = 0.04, n=5471 in the meta-analysis. Massive battery of preferences included (…)
    • Replication effect size: Not a meta-analysis, just a list of recent well-conducted “null” studies and a plausible alternative explanation.
    • Note from a professor friend: the idea of a dual-mating hypothesis itself is not in trouble: the specific menstrual cycle research doesn’t seem to replicate well. However, to my knowledge the basic pattern of short vs long term relationship goals predicting [women’s] masculinity preferences is still robust.

  • Menstrual cycle and lunar influence

  • Large parents have more sons (Kanazawa); original analysis makes several errors and reanalysis shows near-zero effect. (Original effect size: 8% more likely.)

  • Men’s strength in particular predicts opposition to egalitarianism.

    • Original paper: Petersen et al (194 citations).
    • Critiques: Measurement was of arm circumference in students, and effect disappeared when participant age is included. (total citations: 605)
    • Original effect size: N/A, battery of F-tests.
    • Replication effect size: Gelman: none as in zero. The same lab later returned with 12 conceptual replications on a couple of measures of (anti-)egalitarianism. They are very focussed on statistical significance instead of effect size. Overall male effect was b = 0.17 and female effect was b = 0.11, with a nonsignificant difference between the two (p = 0.09). (They prefer to emphasise the lab studies over the online studies, which showed a stronger difference.) Interesting that strength or “formidability” has an effect in both genders, whether or not their main claim about gender difference holds up.


  • Sympathetic nervous system activity predicts political ideology. In particular, subjects’ skin conductance reaction to threatening or disgusting visual prompts.

    • Original paper: Oxley et al, n=46 ( citations). p=0.05 on a falsely binarised measure of ideology.
    • Critiques: Six replications so far (Knoll et al; 3 from Bakker et al), five negative as in nonsignificant, one forking (“holds in US but not Denmark”) (total citations: )
    • Original effect size:
    • Replication effect size:

Behavioral Genetics

  • No good evidence that 5-HTTLPR is strongly linked to depression, insomnia, PTSD, anxiety, and more. See also COMT and APOE for intelligence, BDNF for schizophrenia, 5-HT2a for everything.

  • Be very suspicious of any such “candidate gene” finding (post-hoc data mining showing large >1% contributions from a single allele). 0/18 replications in candidate genes for depression. 73% of candidates failed to replicate in psychiatry in general. One big journal won’t publish them anymore without several accompanying replications. A huge GWAS, n=1 million: “We find no evidence of enrichment for genes previously hypothesized to relate to risk tolerance.”

Applied Linguistics

  • Critical period hypothesis. How grammar-learning ability changes with age, finding that it is intact to the crux of adulthood (17.4 years) and then declines steadily.

  • Motivational role of L2 vision. Mental imagery of oneself as a successful language user in the future can enhance one’s motivation and performance.

Educational Psychology

  • Flipped learning, students learn better if they do homework about a lesson before coming to class to study that lesson.

    • Status: replicated
    • Original paper: Flip Your Classroom: Reach Every Student in Every Class Every Day. [citation=6585(Google Scholar, Dec 2021)]​.
    • Critiques: Lo & Hew [citations=423(Google Scholar, Dec 2021)], Strelan et al. [n=33678, citation=107(Google Scholar, Jan 2022)], Cheng et al. [n=7912, citation=195(Google Scholar, Jan 2022)], Låg & Sæle [n=not reported, number of reports=272, citation=106(Google Scholar, Jan 2022)], Lo & Hew [n=5329, citation=43(Google Scholar, Jan 2022)], Shi et al. [n=6947, citation=60(Google Scholar, Jan 2022)], van Altren et al. [n=24771, citation=239(Google Scholar, Jan 2022)], Xu et al. [n=4295, citation=33(Google Scholar, Jan 2022)], Vitta & Al-Hoorie [n=4220, citation=17(Google Scholar, Jan 2022)].
    • Meta-analysis effect size: Strelan et al.: g = 0.50 (0.42-0.52) cross-disciplinary. Cheng et al.: g = 0.19 (0.11, 0.27)​ cross-disciplinary. Låg and Sæle: g = 0.35 (0.31, 0.40) cross-disciplinary. Lo & Hew: g = .29 (0.17, 0.41) engineering education. Shi et al.: g = 0.53 (0.36, 0.70) cross-disciplinary. van Altren et al.: g = 0.36 (0.28, 0.44) cross-disciplinary. Xu et al.: d = 1.79 (1.32, 2.27) nursing education in China. Vitta & Al-Hoorie: g = 0.99 (0.81, 1.16) second language learning. In Vitta & Al-Hoorie’s study, Trim and Fill suggested possible publication bias inflating the results, but the adjusted effect size remained sizable: g = 0.58 (0.37, 0.78).

  • Mindsets, people’s beliefs about whether their talents and abilities are subject to growth and improvement. According to the meta-analysis by Sisk and colleagues ( 2018), the relationship between mindsets and academic achievement is weak: Of the 129 studies that they analyzed, only 37% found a positive relationship between mindset and academic outcomes. Furthermore, 58% of the studies found no relationship and 6% found a negative relationship between mindset and academic outcomes. Evidence on the efficacy of mindset interventions is not promising: of the 29 studies reviewed, only 12% had a positive effect, 86% of the studies found no effect of the intervention and 2% found a negative effect of the intervention. It should be noted that interventions seemed to work for low SES populations.

  • First instinct fallacy

  • Strephosymbolia/ mirror reading and writing in dyslexia

  • Pen mightier than the screen

  • Sleep assisted learning

Health Psychology

  • Stress as the main/sole cause of peptic ulcers. Until the 1980s, stress was believed to be the main cause of peptic ulcers (with secondary contributing factors thought to eb excess stomach acid, spicy food). There may be some effect/role of stress involved in development and/or healing of ulcers but the evidence shows it is not the primary cause as was previously believed.

Political Psychology

  • Stereotype threat on gender differences in political knowledge. Making gender stereotypes about political knowledge salient decreases womens’ performance on political knowledge tests.

    • Status: Not replicated.
    • Original paper: Gender Differences in Political Knowledge: Bringing Situation Back In. Ihme and Tausendpfund 2018 2 experiments with Study 1: N= 603; Study 2: N=377 [citation=18 (GS, February2022)]​.
    • Critiques: Azevedo et al. [Preprint, not available] [n=1502, citations=NA].
    • [n=120, citations=757(GS, October 2021)],
    • Original effect size: Study 1: partial η2 = 0.12; Study 2: partial η2 = 0.03 ​
    • Replication effect size: partial η2 = 0.00​.

  • Avoidance of dissonance-arousing situations

  • Depressed-entitlement effect among women

  • Moral foundations across political spectrum

Comparative Psychology

  • Joint attention in non-human primates, non-human primates fail to follow the gaze or pointing of another agent, using the object choice task.

Evolutionary Linguistics

  • Learnability (reproduction) of universal hierarchies of the grammatical systems of the world languages, one sentence definition.

Speech Language Therapy

  • Bilingualism and stuttering, Bilingual children had an increased risk of stuttering and a lower chance of recovery from stuttering than language exclusive and monolingual speakers.

  • Self esteem and stuttering. Children who stutter have higher self-esteem than children who do not stutter. However, the self-esteem of children who stutter declines once they reach adolescence.

    • Status: NA
    • Original paper: Selbstwert von stotternden Kindern und Jugendlichen (in German). Zückner (2011). Case-control study - comparison against norm scores with n = 171[citations = 3, (GS, February 2022)].
    • Critiques: Cook and Howell 2014 [observational, n=59, citations=16, (GS, February 2022)]
    • Original effect size: M(SD)stuttering boys = 56.5 (25.9), M(SD)boys normgroup: 36.5(25.9); M(SD)stuttering girls = 43.1(35.8), M(SD)girls normgroup=27.7(25.7)
    • Replication effect size: Cook and Howell M(SD) = 2.9(0.49) (children: adolescent: r(bullying, self-esteem = .387)

Further literature