Framework for
Open and
Reproducible
Research
Training

Logo of FORRT is a fort.

Replications & Reversals


Summary


Replications of previous scientific work are at the core of the Open Scholarship movement. However, as replication efforts become more widespread, it can be challenging to scholars and educators to keep themselves up to date with which effects in their field replicate and which do not. FORRT’s replications and reversals aims to collate replications and specifically so-called reversal effects in social science. Reversals are—in the context of a replication—effects that have their original direction flipped. The extent of such reversals and non-replicated effects is already apparent in the social science literature, with even replicated effects being only half of the originally reported effect (Ioannidis, 2005; Open Science Collaboration, 2015). Although such failures to replicate are far less costly to society than for example medical ones (Prasad & Cifu, 2011), they broadly hinder science’s goal of accumulating knowledge and contribute to waste of scarce resources. This resource aims to be a “living”, freely available, crowd-sourced, and community-driven collection of effects that have either not been replicated or even reversed through empirical research across social sciences. Scholars from varied backgrounds and areas of social science are invited to contribute with prevalent effects in their respective fields.


Motivation

The purpose of collating these reversal effects in social science is to encourage educators to incorporate replications of these effects into their students' project (e.g., third-year, thesis, course work) to provide them the opportunity to experience the research process directly, assess their ability to perform and report scientific research, and to help evaluate the robustness of the original study, thereby also helping them become good consumers of research. The below crowdsourced and community-curated resource aims to satisfy three of FORRT’s Goals:

  • Support scholars in their efforts to learn and stay up-to-date on best practices regarding open and reproducible research;
  • Facilitating conversations about the ethics and social impact of teaching substantive topics with due regard to scientific openness, epistemic uncertainty and the credibility revolution;
  • Foster social justice through the democratization of scientific educational resources and its pedagogies.

and four of FORRT’s Mission:

  • Dismantling hierarchies surrounding research, teaching, and service;
  • Building community among educators and various non-academic communities working to improve scientific communication and literacy across academia and the general public;
  • Building capacity for advocacy; and
  • Advocacy for the creation and maintenance of educational resources.

Current Status

This is a dynamic project that is organized in four stages. Currently, we are in stage 2:

  1. Proof of Concept Phase (adaptation of original project into FORRT, inclusion of effects from social and cognitive psychology, using Gavin Leech’s collection as a basis) → ~150 entries finished in 2021.

  2. Team Science Expansion Phase Across Disciplines (crowd-sourcing entries and refine existing entries), started at the end of 2021 and planned until mid of 2023. Draft of first ‘output’ piece. Currently a total of 320+ effects documented, 220+ of those finished.

  3. Review Phase (open review to identify inconsistencies, missing data, and errors), planned for the end of 2023. Finish first ‘output’ piece.

  4. Regular Update Phases (dynamically adding new effects), planned for 2024 and beyond.

For the latest updates, sent in October 2022 to every contributor, please check this summarizing email


How to contribute?

Anyone can add new effects or edit existing effects by joining our initiative on Slack and then following the instructions in our reversals g-doc. Currently we are focusing on finishing already existing effects.




All Effects (sorted by discipline)


You can find a list of all effects we are working on here. To search whether an effect already exists in our collection, use Ctrl-F and a keyword in relation to the effect (e.g. “Macbeth” or “Priming”). Please note that not all effects are listed here or contain all available information, as this is a work in progress (last site update: 28th of November 2022)


Table of Contents


Social Psychology

  • Elderly priming, that hearing about old age makes people walk slower.

  • Status: reversed
  • Original paper: ‘Automaticity of social behavior’, Bargh (1996); 2 experiments with Study 2a: n = 30, Study 2b: n = 30[citations = 5938(GS, October 2021)]​
  • Critiques: Doyen (2012) [experiment: n=120, citations=757(GS, October 2021)]; Lakens (2017) [meta analysis: citations = 21(GS, October 2021)]; Pashler et al. (2011) [experiment: n=66, citations=21(GS, October 2021)].
  • Original effect size: not reported. ​
  • Replication effect size: Doyen: walking speed: η2= .01; Lakens (2017): r= .29/d= .61; Pashler: not reported.​
</div>
  • Hostility priming (unscrambled sentences). Exposing participants to more hostility-related stimuli caused them subsequently to interpret ambiguous behaviours as more hostile.

  • Status: not replicated
  • Original paper: The role of category accessibility in the interpretation of information about persons: Some determinants and implications. Srull and Wyer, Jr. (1979); 2 experiments with Study 1: n = 96; Study 2: n = 96 [ 2409 citations (GS, November 2021)].
  • Critique: McCarthy et al. 2018 [experiment: n = 7,373 for Study 1, citations = 40(GS, November 2021)]. McCarthy et al. 2021 (see Figure) [experiment: n = 1,402 for close replication; n = 1,641 for conceptual replication, citations = 2(GS, November 2021)].
  • Original effect size: 2.99 (XX = 1.58%)
  • Replication effect size: All effect sizes are located in McCarthy et al. 2018: Acar: _d _= 0.16. Aczel: _d _= 0.12. Birt: d = -0.11. Evans: d = -.22. Ferreira-Santos: d = 0.01. Gonzalez-Iraizoz: d = -.21. Holzmeister: d = .11. Klein Selfe and Rozmann: d = -0.51. Koppel: d = -.14. Laine: d = -.27. Loschelder: XX =-.07. McCarthy: d = -.10. Meijer: d = .03. Ozdorgru: d = .22. Pennington: d = -.52. Roets: d = -.01. Suchotzki: d = .10. Sutan: d = .49. Vanpaemel: d = .17. Verschuere: d = -.14. Wick: d = .07. Wiggins: d = .01. Average replication effect size: d = -0.08:; McCarthy et al. 2021: d = 0.06.
</div>
  • Intelligence priming (contemplation), alt term = professor priming. Participants primed with a category associated with intelligence (e.g. “professor”) performed 13% better on a trivia test than participants primed with a category associated with a lack of intelligence (“soccer hooligans”).

  • Status: not replicated
  • Original paper: The relation between perception and behavior, or how to win a game of trivial pursuit, Dijksterhuis and van Knippenberg, 1998, 4 experiments with Study 1: n = 60; Study 2: n = 58; Study 3: n = 95; Study 4: n = 43. [citations = 1124 (GS November 2021)].
  • Critiques: O’Donnell et al., 2018, [n = 4,493 who met the inclusion criteria; n = 6,454 in supplementary materials, citations = 71(GS November 2021)]).
  • Original effect size: PD = 13.20%.
  • Replication effect size: All effect sizes are located in O’Donnell et al. 2018: Aczel: PD = -1.35%; Aveyard: PD = -3.99%; Baskin.: PD =4.08%; Bialobrzeska: PD = -.12%; Boot: PD =-4.99%; Braithwaite: PD = 4.01%; Chartier: PD = 3.23%; DiDonato: PD = 3.14%; Finnigan: PD: 2.89%; Karpinski: PD = 1.38%; Keller: PD = .17%; Klein: PD =.88%; Koppel: PD = -.20%; McLatchie: PD = -2.16%; Newell: PD = 1.66%; O’Donnell: PD = 1.58%; Phillipp: PD = 43%; Ropovik: PD = -.48%; Saunders: PD = -1.87%; Schulte-Mecklenbeck: PD = 4.24%; Shanks: PD = .11%; Steele: PD = -.58%; Steffens: PD = -.84%; Susa: PD = -.63%; Tamayo: PD =1.41%; Meta-analytic estimate: PD = 0.02%.
</div>
  • Moral priming (contemplation). Participants exposed to a moral-reminder prime would demonstrate reduced cheating.

  • Status: not replicated
  • Original paper: The Dishonesty of Honest People: A Theory of Self-Concept Maintenance, Mazar et al. 2008; 6 experiments with Study 1: n = 229; Study 2: n = 207; Study 3: n = 450; Study 4: n = 44; Study 5: n = 108; Study 6: n = 326. [citations= 3072 (GS November 2021)].
  • Critiques: Verschuere et al. 2018 [n = 5786 replication of Experiment 1, citations = 65(GS November 2021)].
  • Original effect size: d = -1.45.
  • Replication effect size: d = 0.18.
  • All effect sizes are located in Verschuere et al. 2018: Aczel: d = -0.26; Birt: d = 0. 41; Evans: d = 0.85; Ferreira-Santos: d = -0.19; Gonzalez-Iraizoz: d = 0.26; Holzmeister: d = 1.11; klein Selle and Rozmann: d = -0.27; Koppel: d = 0.39; Laine: d = -0.37; Loschelder: d = -0.11; McCarthy: d = 0.57; Meijer: d = -0.15; Ozdogru: d = 1.19; Suchotzki: d = 0.00; Sutan: d = 0.02; Vanpaemel: d = 0.17; Verschuere: d = 0.18; Wick: d = -0.09; Wiggins: d = 0.19; Meta-analytic estimate: d = 0.11.
</div>
  • Moral priming (cleanliness). Participants exposed to physical cleanliness were shown to reduce the severity of their moral judgments. Direct, well-powered replications did not find evidence for the phenomenon.

  • Status: not replicated
  • Original paper: With a Clean Conscience: Cleanliness Reduces the Severity of Moral Judgments, Schnall, Benton, and Harvey, 2008; 2 experiments with Study 1: n = 40, Study 2: n = 44. [citations=645 (GS November 2021)].
  • Critiques: Johnson et al. 2014, [Study 1: n = 208, Study 2: n = 126. citations=128(GS November 2021)].
  • Original effect size: Study 1: d = -0.60, 95% CI [-1.23, 0.04]; Study 2: d = -0.85, 95% CI [-1.47, -0.22]
  • Replication effect size: Study 1: d = -0.01, 95% CI [-0.28, 0.26]; Study 2: d = 0.01, 95% CI [-0.34, 0.36]
</div>
  • Distance priming. Participants primed with distance compared to closeness produced greater enjoyment of media depicting embarrassment (Study 1), less emotional distress from violent media (Study 2), lower estimates of the number of calories in unhealthy food (Study 3), and weaker reports of emotional attachments to family members and hometowns (Study 4).

</div>
  • Flag priming. Participants primed by a flag are more likely to be more in conservative positions than those in the control condition.

  • Status: mixed
  • Original paper: A Single Exposure to the American Flag Shifts Support Toward Republicanism up to 8 Months Later Carter et al. 2011; 2 studies with n = 191 completed three sessions and 71 completed the fourth session; Experiment 2: 70. [citations = 186 (GS, October 2021)]
  • Critique: Klein et al. 2014 [n=6,082, citations = 957 (GS, October 2021)]).
  • Original effect size: d = 0.50
  • Replication effect size: All effect sizes are located in ManyLabs: Adams and Nelson: d = .02. Bernstein: d = 0.07. Bocian and Frankowska: d = .19 (Study 1). Bocian and Franowska: d = -.22 (Study 2). Brandt et al.: d = .21. Brumbaugh and Storbeck: d = -.22 (Study 1). Brumbaugh and Storbeck: d = .02 (Study 2). Cemalcilar: d = .14. Cheong: d = -.11. Davis and Hicks: d = -.27 (Study 1). Davis and Hicks: d =-.03 (Study 2). Devos: d = -.11. Furrow and Thompson: d = .09. Hovermale and Joy-Gaba: d = -.07. Hunt and Krueger: d = .27. Huntsinger and Mallett: d = .06. John and Skorinko: d = .08. Kappes: d = .04. Klein et al.: d = -.11. Kurtz: d =.04. Levitan: d = -.01. Morris: d = .09 Nier: d = -.45. Packard: d = .04. Pilati: d = 0.00. Rutchick: d = -.07. Schmidt and Nosek (PI): d =.03. Schmidt and Nosek (MTURK): d = .09. Schmidt and Nosek (UVA): d = -.15. Smith: d = .27. Swol: d =-.03. Vaughn: d = -.17. Vianello and Galliani: d =.49. Vranka: d = -.03. Wichman: d = .11. Woodzicska: d =-.09. Average replication effect size: d = 0.03
</div>
  • Fluency priming. Objects that are fluent (e.g., conceptually fluent, visually fluent) are perceived more concretely than objects that are disfluent (disfluent objects are perceived more abstractly).

</div>
  • Money priming. “Images or phrases related to money cause increased faith in capitalism, and the belief that victims deserve their fate”.

  • Status: not replicated
  • Original paper: ‘Mere exposure to money increases endorsement of free-market systems and social inequality’, Caruso 2013; n between 30 and 168. (~161 citations [GS, November 2021)].
  • Critiques: Rohrer 2015 [n=136, citations = 82 (GS, November 2021)]. Meta-analysis: Lodder 2019, ([citations = 64 (GS, November 2021]).
    Original effect size: system justification d=0.8, just world d=0.44, dominance d=0.51
  • Replication effect size: Rohrer et al. (Experiment 1): d = 0.07 for system justification, d = 0.06 for belief in a just world, d = -0.06 for social dominance, fair market ideology, d = 0.14.
  • For 47 preregistered experiments in Lodder:
  • g = 0.01 for system justification. g = 0.11 [-0.08, 0.3] for belief in a just world. g = 0.07 [-0.02, 0.15] for fair market ideology.
</div>
  • Commitment priming (recall). Participants exposed to a high-commitment prime would exhibit greater forgiveness.

</div>
  • Mortality Salience, alt-terms = Death Priming/Terror Management Theory. Reminders of death lead to subconscious changes in attitudes and behaviour, for example in the form of increased in-group bias and behaviour that serves to defend an individual’s cultural worldview.

</div>
  • Spatial priming for emotional closeness. Spatial distances cues were used as a prime for participants’ feelings regarding their emotional closeness to their families (Williams & Bargh, 2008). Participants were asked to plot points on a grid on a paper, either closer or further apart. Then they were asked to rate how emotionally close they feel towards their family members.

  • Status: not replicated
  • Original paper: Keeping One’s Distance: The effect of spatial distance cues on affect and emotion, Lawrence and Bargh (2008), 4 experiments with Study 1: n = 73; Study 2: n = 42; Study 3: n = 59; Study 4: n = 84. [citation= 583, (GS, January 2022)].
  • Critiques: Pashler et al. 2012[n = 92, citations = 188 (GS, January 2022)]. Open Science Collaboration 2015 [total n=125, citations: 6148; GS, January 2022]
  • Original effect size: Study 1: η2 = .09; Study 2: η2 = .18; Study 3: η2 = .10; Study 4: η2 = .11
  • Replication effect size: Pashler et al.: η2 = 0.01_. _Joy-Gaba et al.’s effect sizes are located in Open Science Collaboration 2015 for Study 4: _η_2 = .00.
</div>
  • Implicit God prime increases self-reported risky behaviour. Implicitly priming God using the scrambled-sentence paradigm increases self-reported risk taking.

</div>
  • Implicit God prime increases actual risky behaviour. Implicitly priming God using the scrambled-sentence paradigm increases willingness to engage in risky behaviour for financial reward.

</div>
  • Implicit God prime increases prosocial behaviour. Implicitly priming God using the scrambled-sentence paradigm increases prosocial behaviour in an anonymous economic game.

</div>
  • Heat priming. Exposure to words related to hot temperatures increases aggressive thoughts and hostile perceptions. This effect suggests that people mentally associate heat-related constructs with aggression-related constructs.

  • Status: not replicated
  • Original paper: ‘Hot under the collar in a lukewarm environment: Words associated with hot temperature increase aggressive thoughts and hostile perceptions’, DeWall & Bushman 2009; 2 experiments in which participants were first exposed to words related to either heat, cold, or neutral concepts and then completed a word stem completion task (Study 1; n=127) or had to rate person’s hostility basing on ambiguous description of this person (Study 2; n=72) [citation=76 (Google Scholar, June 2022)]​.
  • Critiques: McCarthy 2014 [n=182, citations=14 (Google Scholar, June 2022)]; including meta-analyses [n=499]​
  • Original effect size: Study 1: d = 0.47 (hot vs. cold words), d = 0.46 (hot vs. neutral words); Study 2: d = 0.67 (hot vs. cold words), d = 0.63 (hot vs. neutral words)
  • Replication effect size: McCarthy: Study 2A: d = -0.12 (hot vs. cold words), d = -0.02 (hot vs. neutral words); Study 2B: d = -0.06 (hot vs. cold words), d = 0.00 (hot vs. neutral words) (both experiments replicate procedure from Study 2); Meta-analysis: d = 0.18.
</div>
  • ​​Honesty priming, alt-terms: goal-priming, social priming. An increased level of honesty to embarrassing behaviours after exposure to honesty-related words.

  • Status: not replicated.
  • Original paper: ‘Using implicit goal priming to improve the quality of self-report data’, Rasinski et al. (2005), between-subjects, n = 64 [citations = 111 (Google Scholar, October 2022)].
  • Critiques:
    • Pashler et al. (2013) [citations = 66 (Google Scholar, October 2022)].
      • Experiment 1 (n = 149, direct replication)
      • Experiment 2 (n = 152, direct replication)
      • Experiment 3 (_n _= 151, conceptual replication)
      • Experiment 4 (n = 153, conceptual replication – supplementary experiment)
    • Dalal and Hakel (2016) [Experiment 1 (n = 590), conceptual replication, citations = (Google Scholar, October 2022)].
  • Original effect size: d = 1.21 (estimated from test-statistics in paper).
  • Replication effect size:
    • Pashler et al. (2013):
      • Experiment 1: d = 0.18 (non-significant; not replicated).
      • Experiment 2: d = -0.14 (non-significant; opposite direction).
      • Experiment 3:
        • Measure 1: _d _= -0.14 (non-significant; opposite direction; estimated from test statistics in paper).
        • Measure 2: d = -0.13 (non-significant; opposite direction; estimated from test statistics in paper).
      • Experiment 4:
        • Measure 1: d = 0.04 (non-significant; not replicated; estimated from descriptive statistics)
        • Measure 2: d = -0.14 (non-significant; opposite direction; estimated from descriptive statistics)
  • Dalal and Hakel (2016): _d _= -0.07 (non-significant; opposite direction; estimated from descriptive statistics in Table 2 (to get N of groups) and 3 (to get the means and standard deviations).
</div>
  • Achievement priming (alternative terms: goal priming, high-performance goal priming). Exposing individuals to words that are success oriented (e.g., win, strive) will increase their performance on a task compared to those exposed to neutral words (e.g., carpet, shampoo).​

  • Status: mixed.
  • Original paper: ‘The Automated Will: Nonconscious Activation and Pursuit of Behavioral Goals’, Bargh et al. (2001); between-subjects. [citation = 2,987 (Google Scholar, October 2022)]. Five experiments:
    • Experiment 1: n = 78.
    • Experiment 2: n = 60.
    • Experiment 3: n = 288.
    • Experiment 4: n = 76.
    • Experiment 5: n = 65.
  • Critiques:
    • Shantz and Latham (2009) [Pilot Study (n = 52), Field Experiment (n = 81), citations = 221 (Google Scholar, October 2022)].
    • Harris et al. (2013) [Experiment 1 (_n _= 98), Experiment 2 (n = 66), citations = 199 (Google Scholar, October 2022)].
    • Weingarten et al. (2016) [meta-analysis, (n = NA, k = 133 studies, citations = 333 (Google Scholar, October 2022)].
  • Original effect size (estimated from test-statistics reported):
    • Experiment 1: _d _= 0.72 (priming of high-performance words led to more words being found).
    • Experiment 2: d = 0.53 (priming of cooperation words led to more cooperation between players).
    • Experiment 3: d = 0.52 (adding delay between word exposure and task increased performance in the high-performance words group).
    • Experiment 4: d = 0.76 (when given the stop signal, those in the high-performance word group continued to work on the task; Note: The statistics for this experiment suggest that they had more than 76 participants. Specifically, they fit a 2 x 2 ANOVA and have residual degrees of freedom of 75. If they had 76 participants, their residual degrees of freedom would be 72. For the purposes of estimating their effect sizes, I have used the corrected residual degrees of freedom value.).
    • Experiment 5: ​_ d_ = 0.68 (when interrupted, the high-performance word group were more likely to return to their task than the neutral group).
    • Replication effect size:
      • Shantz and Latham (2009): Participants either shown a picture of a woman winning a race or not to prime achievement.
        • Pilot Study: d = 0.84 (replicated).
        • Field Experiment: d = 0.43 (replicated).
      • Harris et al. (2013):
        • Experiment 1 (direct replication of Experiment 1 in Bargh et al., 2001): d = -0.24, 95% CI [0.15, -0.64]. (not replicated).
        • Experiment 2 (direct replication of Experiment 3 in Bargh et al., 2001): d = -0.03, 95% CI [0.45, -0.52]. (not replicated).
      • Weingarten et al. (2016): The meta-analysis looked at all priming experiments that examined behaviour (i.e., not just achievement priming). It found that there is a small effect of behavioural priming (d = 0.35, 95% CI [0.29, 0.41]). Factors that affected the priming effects were:
        • Publication status:
          • Published (n = 255 studies): d = 0.39, 95% CI [0.33, 0.44],
          • Unpublished (n = 88 studies): d = 0.10, 95% CI [0.01, 0.20].
        • Liminality:
          • Supraliminal (n = 255 studies): d = 0.30, 95% CI [0.24, 0.36; this is the method used in Bargh et al., 2001],
          • Subliminal (n = 88 studies): d = 0.40, 95% CI [0.30, 0.51].
        • Use of neutral control:
          • No neutral control (n = 38 studies): d = 0.44, 95% CI [0.27, 0.60],
          • With neutral control (n = 307 studies): d = 0.31, 95% CI [0.25, 0.37].
</div>
  • Weapons priming effect (alternative term: weapons effect). Stimuli or cues associated with aggression, such as weapons, can elicit aggressive responses.

  • Status: mixed (the effect is smaller than originally believed)
  • Original paper: ‘Weapons as aggression-eliciting stimuli’, Berkowitz and LePage (1967); between-subjects design, n = 100 (male university students) [citations = 1161 (Google Scholar, October 2022)].
  • Critiques:
    • Turner and Simons (1974) [n = 60, citations = 11 (Google Scholar, October 2022)]
    • Frodi (1975) [_n = _100, citations = 50 (Google Scholar, October 2022)]
    • Carlson et al. (1990) [meta-analysis; n = 628 (fail-safe), k = 56 studies, citations = 339 (Google Scholar, October 2022)].
    • Benjamin et al., (2018) [meta-analysis; n = 7,668 participants, k = 78 studies, citations = 12 (Google Scholar, October 2022)]. .
    • Ariel et al. (2019) [RCT of taser presence and the police force; n = 678 officers, citations = 42 (Google Scholar, October 2022)].
  • Original effect size: d = 0.76 to 1.06 (taken from Carlson et al., 1990).
  • Replication effect size:
    • Turner and Simons (1974): d = -1.17 to 0.64 (taken from Carlson et al., 1990). The greater the evaluation apprehension, the less likely aggressive behaviour was observed. (mixed)
    • Frodi (1975): d = 0.91 (taken from Carlson et al., 1990). (replicated)
    • Carlson et al. (1990): d = 0.38 (replicated)
    • Benjamin et al., (2018): d = 0.29, 95% CI [0.21, 0.36] (replicated)
      • The effect is moderated by several variables (some listed below):
        • Smaller if looked at behaviour (d = 0.25, 95% CI [0.07, 0.43])
        • Reduced for “field” experiments (d = 0.22, 95% CI [-0.07, 0.51])
        • Larger when photos used (d = 0.35, 95% CI [0.26, 0.44]) rather than actual weapons (d = 0.12, 95% CI [-0.08, 0.31]).
    • Ariel et al. (2019): The presence of a taser on the officer led to:
      • Increased use of force: IRR = 1.48, 95% CI [1.27, 1.72] (replicated).
      • Increased injury to officers: IRR = 2.11, 95% CI [1.53, 2.91] (replicated).
</div>
  • Verbal framing (temporal tense). Participants who read what a person was doing (relative to those who read what person did) showed enhanced accessibility of intention-related concepts and attributed more intentionality to the person.

  • Status: mixed
  • Original paper: ‘Learning about what others were doing: Verb aspect and attributions of mundane and criminal intent for past actions’, Hart and Albarracin (2011): 3 experiments with Study 1: n = 5458; Study 2: n = 37; Study 3: n = 48. [citations = 37, (GS, January 2022)].
  • Critiques: Eerland et al. (2016) [meta analysis (total n= 685 for perfective-aspect condition; n = 681 imperfective-aspect condition) of Study 3 citations = 70, (GS, January, 2022)]
  • Original effect size: Study 1: d = 1.00 for intentionality in imperfective-aspect condition; Study 2: d = 1.23 for imagery in imperfective-aspect condition; Study 3: d= 1.20 for intentionality, d = 0.92 for imagery and 0.55 for intention attribution in imperfective-aspect condition.
  • Replication effect size: All effect sizes are located in Eerland et al. 2016: intentionality: Arnal (lab): d = -0.35; Berger (lab): d = -0.98; Birt and Aucoin (lab): d = -0.38; Eerland et al. (lab): d =0.16; Eerland et al.(online): d = -0.33; Ferretti (lab): d = -0.01; Knepp (lab): d = -0.95; Kurby and Kibbe (lab): d = -0.14; Melcher (lab): d = 0.65; Michael (lab): d = -0.41; Poirier et al. (lab): d = 0.32; Prenoveau and Carlucci (lab): d = -0.38. Meta-analytic estimate for laboratory replications only: d = -0.24. Imagery: Arnal (lab): d = −0.01; Berger (lab): d = −0.45; Birt and Aucoin (lab): d = −0.40; Eerland et al. (lab): d =−0.01; Eerland et al.(online): d = -−0.13; Ferretti (lab): d = 0.33; Knepp (lab): d = 0.00; Kurby and Kibbe (lab): d = 0.02; Melcher (lab): d = −0.16; Michael (lab): d = -0.08; Poirier et al. (lab): d = -0.19; Prenoveau and Carlucci (lab): d = -0.02. Meta-analytic estimate for laboratory replications only: d = -0.08. Intention attribution: Arnal (lab): d = -0.15; Berger (lab): d = -0.15; Birt and Aucoin (lab): d = 0.08; Eerland et al. (lab): d =-0.01; Eerland et al.(online): d = 0.02; Ferretti (lab): d = -0.19; Knepp (lab): d = -0.29; Kurby and Kibbe (lab): d = 0.00; Melcher (lab): d = 0.12; Michael (lab): d = 0.13; Poirier et al. (lab): d = 0.06; Prenoveau and Carlucci (lab): d = 0.03. Meta-analytic estimate for laboratory replications: d = 0.00.
</div>
  • Prosocial spending. Spending money on other people leads to greater happiness than spending money on oneself.

  • Status: replicated (on the basis of three studies, NB: effect sizes smaller than original)
  • Original paper: Spending Money on Others Promotes Happiness (Dunn, Akinn, Norton, 2008) [citations = 2008 (GS, March 2022)] ‘
  • Critiques: Akinn et al., 2020; 3 Experiments [citations = 51 (GS, March 2022)]
  • Original effect size: _b _= 0.11, p < 0.01
  • Replication effect size: Experiment 1: n = 712, Cohen’s d = .36, .32; Experiment 2: n = 1950, Cohen’s d = .03, .02; Experiment 3: n = 5,199, Cohen’s d = .06, .06, .17.
</div>
  • Gustatory disgust on moral judgement. Gustatory disgust triggers a heightened sense of moral wrongness.

  • Status: not replicated
  • Original paper: A Bad Taste in the Mouth: Gustatory Disgust Influences Moral Judgment, Eskine et al. (2011); experiment, n = 57.[citation = 564 (GS, January 2022)].
  • Critiques: Ghelfi et al., 2020 [meta-analysis, total n = 1137, citations = 18 (GS, January 2022)]; Johnson et al., 2016 [Study 1: n = 478, Study 2: n = 934. citations = 52 (GS January 2022)].
  • Original effect size:_ _Cohen’s _d_= 1.12 (comparison to control group) Cohen’s _d_= 1.28 (comparison to sweet taste).
  • Replication effect size: Johnson et al.: Cohen’s d = 0.04 (Study 1 - comparison to control group), Cohen’s d = 0.05 (Study 2 - comparison to control group). All effect sizes are located in Ghelfi et al. 2016: comparison to sweet group: Christopherson: Hedges g = 0.53; Christopherson: Hedges’ g = 0.04; Fischer: Hedges’ g = 0.25; Guberman: Hedges’ g = -0.30; de Haan: Hedges’ g = -0.13; Legate: Hedges’ g = 0.99; Legate: Hedges’ g= -0.02; Lenne: Hedges’ g = -0.19; Urry: Hedges’ g = -0.13; Wagemans: Hedges’ g = 0.03; Weber: Hedges’ g = -0.27. Meta-analytic estimate: Hedges’ g = -0.05. Comparison to control group: Christopherson: Hedges g = 0.68; Christopherson: Hedges’ g = -0.19; Fischer: Hedges’ g = -0.01; Guberman: Hedges’ g = -0.12; de Haan: Hedges’ g = -0.24; Legate: Hedges’ g = 0.79; Legate: Hedges’ g= 0.37; Lenne: Hedges’ g = -0.13; Urry: Hedges’ g = 0.08; Wagemans: Hedges’ g = -0.11; Weber: Hedges’ g = -0.04. Meta-analytic estimate: Hedges’ g = 0.10.
</div>
  • Macbeth effect. Moral aspersions induce literal physical hygiene.

  • Status: mixed
  • Original paper: ‘Washing away your sins: threatened morality and physical cleansing’, Zhong and Liljenquist (2006): 4 experiments with Study 1: n=60; Study 2: n=27; Study 3: n=32; Study 4: n=45. [citation = 1407, (GS, January 2022)].
  • Critiques: Siev et al. 2018 [meta-analysis: n=1,746, citations = 17(GS, January 2022)].
  • Original effect size: Study 1: g = 0.53; Study 2: g = 1.00; Study 3: g = 0.86; Study 4: g = XX. [0.05, 1.68] for Study 3.
    Replication effect size: Siev et al. (2018): g = 0.17, 95% CI [0.04 – 0.31].
  • All effect sizes are located in Siev et al. 2018:
  • Earp et al. (2014): Study 1: g = 0.02 95% CI [-0.30 0.34], Study 2: g= 0.05 95% CI[-0.27, 0.37], Study 3: g = 0.13 95% CI[-0.11, 0.37]; Fayard et al. (2009): Study 1: g = 0.11[-0.20 0.43]; Gamez et al. (2011): Study 1: g = 0.02 95% CI [-0.54 0.56], Study 2: g = -0.01 95% CI[-0.64, 0.63], Study 3: g = 0.55 95% CI[-0.26, 1.37]; Lee and Schwarz (2010): Study 2: g = 0.22; 95% CI[-0.20 0.64]; Schaefer (2019): Study 2: g = 0.71 95% CI[0.18, 1.23]; Siev et al. (unpublished): Study 1: g = -0.06 95% CI [-0.27 0.15], Study 2: g = -0.18 95% CI[-0.56, 0.20]; Zhong (unpublished): Study 2: g = 0.28.
</div>
  • Signing at the beginning rather than end makes ethics salient. Signing a statement of honest intent before providing information rather than after can reduce dishonesty.

</div>
  • Social class on prosocial behaviour. Individuals from a high social class are more likely to exhibit prosocial behavior than those from a low social class, but there is a U-shaped curve between social class and prosocial behavior that sometimes appears. The final study in the critique section below reported two pre-registered replications of Piff et al., 2010 with different results. There are more studies than those described here, but these should provide a good sense of the current state of the science.

  • Status: mixed
  • Original papers:
    • Volunteering in public health: An analysis of volunteers' characteristics and activities’ Ramirez-Valles, 2006; random-digit dialing in Illinois, US (household income on past-12-month volunteering in public health OR = 1.22; education NS OR = 1.02): n = 609. [citations = 9 (GS, June 2022)].
    • Charitable giving: Factors influencing giving in US states’ Gittell & Tebaldi, 2006; analysis by US state using public data for charitable giving from IRS 2000-2002, for volunteer rate from Points of Light Foundation, 2004/Current Population Survey 2002, for personal income from BEA 2001, and for MA or PhD education from U.S. Bureau of Census, 2000. Effect sizes unclear, “simple correlation” between income and volunteer rate (-.13), regression coefficients for personal income (769.1) and education (29.35) on average charitable contribution per tax filer. [citations = 161 (GS, June 2022)].
    • The Nature and Causes of the U-Shaped Charitable Giving Profile’ James III & Sharpe, 2007; n = 16,442 households [citations=171 (GS, June 2022)].
    • Having Less, Giving More: The Influence of Social Class on Prosocial Behavior’ Piff et al. 2010; 4 experiments with Study 1 (subjective SES on dictator game resource allocation beta = -.23): n = 115; Study 2 (self-reported family income beta = -.27 and manipulated social class beta = -.23 on attitudes toward charitable giving): n = 81; Study 3 (combined education and income on trust game with arbitrary points r = -.18): n = 155; Study 4 (combined past and current income on ambiguous task helping beta = -.43): n = 91. [citations=1572 (GS, June 2022)].
    • Social status modulates prosocial behavior and egalitarianism in preschool children and adults’ Guinote et al. 2015; 4 experiments with Study 1 (manipulated department rank on picking up pens for experimenter d = 1.16): n = 44; Study 2 (not prosocial behavior outcome); Study 3 (not prosocial behavior outcome); Study 4 (random winner on sticker donation T1 _calculated d = 0.657, _losing status η2p = 0.34, gaining status η2p = 0.38, NS differences at T2): n = 48 children mean age 4.7 years (SD = .56). [citations=185 (GS, June 2022)].
    • Family Income Affects Children’s Altruistic Behavior in the Dictator Game’ Chen et al. 2013 (family income on sticker allocation in dictator game Spearman’s ρ = -.10; parents education/migrant status NS); n = 469 kindergarten children. [citations=110 (GS, June 2022)].
  • Critiques:
    • A Large Scale Test of the Effect of Social Class on Prosocial Behavior’ Korndörfer et al. 2015; 8 studies 8 studies with Study 1* (objective social class for each household, standardized composite of three indicators: income, education, and occupational prestige on self-reported donation behavior for the previous year OR = 2.07, NS quadratic term, on relative amount of donation, both standardized score, _b_= .158 and its quadratic term, _b_= .073): n = 9260 German households; Study 2* (objective social class for each household, standardized composite of two indicators: income, and education of the reference person if available on self-reported donation behavior for the previous year, OR = 1.99, NS quadratic term, on relative amount of donation, standardized score, _b_= .078, NS quadratic term): n = 32,090 US households; Study 3 (Model 1, objective social class for each person, standardized composite of three indicators: income, education, and occupational prestige on self-reported donation behavior for the previous year OR = 2.54, NS quadratic term and frequency _b_= .392, quadratic term _b_= -.064; Model 2, four-category subjective social class for each person on self-reported donation behavior for the previous year OR = 1.61, quadratic term OR = 0.90 and frequency _b_= .230, quadratic term _b_= -.039): n = 3975 (objective) & n = 3,857 (subjective) US persons; Study 4 (objective social class for each person, standardized composite of three indicators: income, education, and occupational prestige on self-reported volunteering OR = 2.03, quadratic term OR = 0.91 and frequency _b_= .336, quadratic term _b_= -0.48): n = 33,072 German persons asked about volunteering one to four times (82,966 observations); Study 5 (same models as study 3 but with a volunteering outcome: Model 1, OR = 1.64, NS quadratic term and frequency _b_= .248, NS quadratic term; Model 2, OR = 1.29, NS quadratic term, and frequency _b_= .135, NS quadratic term): n = 3,983 (objective) & n = 3,964 (subjective) US persons; Study 6 (Model 1, objective social class for each person, standardized composite of three indicators: income, education, and occupational prestige on past 12 month volunteering OR = 1.18, quadratic term OR = 0.97 and frequency _b_= 0.94, quadratic term _b_= -.012; Model 2, six-category subjective social class on volunteering OR = 1.15, NS quadratic term and frequency _b_= 0.76, NS quadratic term): n = 32,257 persons in 28 countries; Study 7 (same models as study 3 and 5 but with a single everyday helping outcome: Model 1, _b_= .397, NS quadratic term; Model 2, NS _b_and NS quadratic term ): n = 3,902 (objective) & n = 3,886 (subjective) US persons; Study 8 (objective social class for each person, standardized composite of three indicators: income, education, and occupational prestige on behavior in a trust game, player 1, _b_= .468, player 2, _b _= .421): n = 1,421 German persons. * Additional results are available for donor and non-donor households separately, as well as from tobit regressions.
    • Having less, giving more? Two preregistered replications of the relationship between social class and prosocial behavior’ Stamos et al., 2020; 2 experiments with Study 1 (subjective SES ladder on dictator game performance, NS Beta= 0.007, multiverse analysis available): n = 300; Study 2 (manipulated subjective SES on attitudes toward charitable donations, d= .36, direction opposite of Piff et al., 2010, measured family income added to model, NS b= -0.12): n = 200
  • Original effect size: outcome of attitudes toward charitable donation, d = .53 (manipulated subjective SES), partial r = -.23 (family income controlling for ethnicity) Piff et al., 2010.
  • Replication effect size: d = .36 (manipulated subjective SES), opposite direction, r = -.02 (family income) Stamos et al., 2020
</div>
  • Stanford Prison Experiment employed a simulation of a prison environment to examine the psychological effects of coercive situations. Utilizing role-playing, labeling and social expectations it showed that one third of participants in the role of prison guards displayed aggressive and dehumanizing behaviour.

  • Status: NA
  • Original paper: ‘Interpersonal dynamics in a simulated prison’, Haney, Banks, Zimbardo (1973) [n=24, citations: 2115 (including highly referenced publications), (GS, January, 2022)].
  • Critiques: First, the study has been criticized for the lack of adherence to the experimental methodology. Although the study has been widely described as an ‘experiment’ it lacks many defining features: 1) it does not define the precise set of manipulated variables, 2) it manipulates multiple variables at time without the proper control over the effects of each one, 3) it does not define the dependent variable and how it will be measured, 4) it does not state any clear hypotheses. It is noteworthy that in the original paper, authors present their work as a “demonstration” not an experiment. Second group of serious issues is the degree of researchers’ ad-hoc interventions that were influencing the behaviour of the participants. One of the leading researchers, Philip F. Zimbardo took part in the experimental procedure as the prisons’ “Superintendent”. Another close collaborator of the research team David Jaffe, who initially conceived the idea of the mock-prison study, was playing the role of the “Warden”. Considering that these people knew the goal of the study and were, as later admitted, interested in the particular outcome (a call for reform of the prison system), the ad-hoc intervention, such as encouraging some of the guards to be more strict and ‘tough’, cast a reasonable doubt on the role of experimentator' expectations on the final results of the study. The third group of issues is sampling. Namely, the study has been conducted on a small (n=24, n per condition = 12) and largely unrepresentative sample (all males, all college students of similar age, all residents of the United States). Also, despite the screening procedures of the voluntarily applying candidates, it is still possible that a strong ‘demand characteristic’ and ‘self-selection bias’ may have affected the composition of the sample. All the participants have responded to the newspaper ad about wanting help in “psychological study of prison life”. The last issue with the Stanford Prison Experiment is the interpretation of the results. Even if the discovered effect is trustworthy (and above mentioned issues put this into questions), there is no clear theoretical interpretation of what this finding actually proves. Some critics argue that violent behaviour of the guards may be rooted in their following of a strong leadership, rather than from their immersion into attributed social role. Some specific works addressing criticism to the original study are listed as follows:
    Le Texier (2019) [commentary; citations: 38, (GS, January, 2022)] Banuazizi, Mahavedi (1975) [methodological analysis; citations: 118, (GS, January 2022)] Festinger 1980 [book; citations: 132, (GS, January 2022)] Haslam, Reicher, Van Bavel 2019 [methodological analysis; citations: 37, (GS, January 2022)] Griggs, Whitehead 2014 [textbook analysis; citations: 37, (GS, January 2022)] Griggs 2014 [textbook analysis; citations: 48, (GS, January 2022)] Blum 2018 [media coverage; citations: 31, (GS, January 2022)] LeTexier 2020 [preprint; citations: 0, (GS, January 2022)] Izydorczak, Wicher 2020 [preprint; citations: 0, (GS, January, 2022)] Reicher and Haslam 2011 [experimental case study but not exact replication of SFE; n = 15, citations: ~435, (GS, January 2022)] Lovibond, Adams, Adams 1979 [original research but not exact replication of SFE; n = 60, citations: 55, (GS, January, 2022))
  • Original effect size: Key claims were insinuation plus a battery of difference in means tests at up to 20% significance(!). n = 24, data analysis on 21.
  • Replication effect size: N/A
</div>
  • Milgram experiment was a study examining the influence of authority on the immoral behaviour. Participants were assigned the role of ‘teachers’ and they were instructed by the experimentator to administer electric shocks of 15-450 V voltage, whenever the ‘learner’ made a mistake. There were various variants of the study. In the most basic one, 100% of participants agree to administer a 300 V shock and 65% agreed to apply to maximum shock of 450 V.

  • Status: mixed
  • Original paper: Behavioral Study of obedience, Milgram 1963. n=40
    (~6600 citations). (The full range of conditions was n=740.)
  • Critiques: Experiment was riddled with** **researcher degrees of freedom, going off-script, implausible agreement between very different treatments, and “only half of the people who undertook the experiment fully believed it was real and of those, 66% disobeyed the experimenter.” Sources: Burger 2011, Perry 2012, Brannigan 2013; Griggs 2016
    (total citations: ~240), but see also Caspar 2020.
  • Original effect size: 65% of subjects said to administer maximum, dangerous voltage.
  • Replication effect size: Doliński 2017 is relatively careful, n=80, and found comparable effects to Milgram. Burger (n=70) also finds similar levels of compliance to Milgram, but the level didn’t scale with the strength of the experimenter prods (see Table 5: the only real order among the prompts led to universal disobedience), so whatever was going on, it’s not obedience. One selection of follow-up studies found average compliance of 63%, but suffer from the usual publication bias and tiny samples. (Selection was by a student of Milgram.) The most you can say is that there’s weak evidence for compliance, rather than obedience. (“Milgram’s interpretation of his findings has been largely rejected.").
</div>
  • Robbers Cave Study. Utilized arbitrary groupings to demonstrate that tribalism between groups arises spontaneously, and depending on the context, it can result in group competition (e.g., in case of scarce resources) or group cooperation (e.g., in case of superordinate goals and common obstacles)**. **

  • Status: NA
  • Original paper: ‘Superordinate Goals in the Reduction of Intergroup Conflict’, Sherif (1958), [n=22, citations: 1,010,(GS, February, 2022)]. In addition to the original paper, some related books from the author(s) are also highly cited including: ‘Groups in harmony and tension’ by Sherif & Sherif (1958) [citations: 2,280 (GS, February, 2022)] and Intergroup Conflict and Co-operation' by Sherif et al, (1961) [citations: 253, (GS, February, 2022)]. Overall, the effect accounts to more than 4000 total citations including the SciAm piece.
  • Critiques: No good evidence that tribalism arises spontaneously following arbitrary groupings and scarcity, within weeks, and leads to inter-group violence. The “spontaneous” conflict among children at Robbers Cave was orchestrated by experimenters; tiny sample (maybe 70?); an exploratory study taken as inferential; no control group; there were really three experimental groups - that is, the experimenters had full power to set expectations and endorse deviance; results from their two other studies, with negative results, were not reported. Set aside the ethics: the total absence of consent - the boys and parents had no idea they were in an experiment - or the plan to set the forest on fire and leave the boys to it. Some specific works addressing criticism to the original study are listed as follows:
    • Billig (1976) in passing [book; citations: 808, (GS, February, 2022), see media mention by Haslam (2018)];
    • Perry (2018)in passing [book; citations: 25, (GS, February, 2022), see also media summary by Shariatmadari (2018) and Haslam (2018)].
    • Tavris also claims that the underlying “realistic conflict theory” is otherwise confirmed. No definitive conclusion can be reached.
  • Original effect size: N/A. Not reported in conventional format. (Rationale: “results obtained through observational methods were cross-checked with results obtained through sociometric technique, stereotype ratings of in-groups and outgroups, and through data obtained by techniques adapted from the laboratory. Unfortunately, these procedures cannot be elaborated here.")
  • Replication effect size: N/A
</div>
  • Digital technology use and adolescent wellbeing. Adolescents who spent more time on new media (including social media and electronic devices such as smartphones) are more likely to report mental health issues.

</div>
  • Anthropomorphism for inanimate objects. Individuals who are lonely are more likely than people who are not lonely to attribute humanlike traits (e.g., free will) to nonhuman agents (e.g., an alarm clock),to fulfill unmet needs for belongingness.

</div>
  • Hurricane names. Female-named hurricanes are more deadly than male-named ones. Original effect size was a 176% increase in deaths, driven entirely by four outliers; reanalysis using a greatly expanded historical dataset found a nonsignificant decrease in deaths from female named storms.

  • Status: reversed
  • Original paper: ‘Female hurricanes are deadlier than male hurricanes’, Jung 2014;observational study with n=92 hurricanes discarding two important outliers [citations = 113(GS, Mar 2022)].
  • Critiques: Christensen 2014 [same data, citations = 114(GS, March 2022)]. Smith 2016 [same data, citations = 8(GS, March 2022)].Original effect size: d=0.65: 176% increase in deaths from flipping names from relatively masculine to relatively feminine
  • Replication effect size: Smith: 264% decrease in deaths (Atlantic); 103% decrease (Pacific)
</div>
  • Implicit racism bias testing. Implicit bias scores poorly predict actual bias, r = 0.15. The operationalisations used to measure that predictive power are often unrelated to actual discrimination (e.g. ambiguous brain activations). Test-retest reliability of 0.44 for race, which is usually classed as “unacceptable”. This isn’t news; the original study also found very low test-criterion correlations.

</div>
  • The Pygmalion effect, the effect of a teacher’s expectations on a student’s performance, is at most small, temporary, and inconsistent, r<0.1 with a reset after weeks. Rosenthal’s original claims about massive IQ gains, persisting for years, are straightforwardly false (“The largest gain… 24.8 IQ points in excess of the gain shown by the controls.”), and used an invalid test battery. Jussim: “90%–95% of the time, students are unaffected by teacher expectations”.

</div>
  • Stereotype threat on Asian women’s mathematical performance, i.e. the interaction between race, gender and stereotyping. This study found that Asian-American women performed better on a math test when their ethnic identity was activated, but worse when their gender identity was activated, compared with a control group who had neither identity activated.

  • Status: Mixed
  • Original paper: ‘Domain-specific Effects of Stereotypes on Performance’, Shih et al.1999
  • Critiques: Gibson et al. 2014; Moon and Roeder 2014
  • Original effect size: Asian-identity-salient > control > female-identity-salient, r=.27; Asian-identity-salient > female-identity-salient, r=.35.
  • Replication effect size: Gibson et al. 2014: No group differences, η2=.01; Asian-primed vs. female-primed, p=.18, d=.27; Including only those who were aware of the stereotypes, group accuracy p=.02, η2=.04, and the means followed the predicted pattern, Asian (M=.63), Control (M=.55), and Female (M=.51); Likewise, female-primed participants performed worse than Asian-primed participants, p=.02, d=.53. Moon & Roeder (2014): Group accuracy, p=.44, g2=.004; female-primed and Asian-primed conditions, p=.43, d=.17; Analysing just those who were aware of the stereotype, p=.28,g2=.012; female-primed participants vs. Asian-primed participants, p=.28, d=.27.
</div>
  • Stereotype threat on girls’ mathematical performance. A situational phenomenon whereby priming a negative gender stereotype (e.g., “women are bad at math”) has a detrimental impact on mathematical performance.

  • Status: mixed
  • Original paper: ‘Stereotype Threat and Women’s Math Performance’, Spencer et al. 1999, Experiment 2, n=30 women (~5076 GS citations as of June 2022).
  • Critiques: Stoet & Geary 2012, meta-analysis of 23 studies. Flore & Wicherts 2015, meta-analysis of 47 measurements. Flore et al. 2018 Registered Report n=2064 Dutch high school students; Agnoli et al. 2021, conceptual replication with n_ _= 164 ninth grade and n = 164 eleventh grade Italian high school students. Other reported null results in the literature but not explicit replications (e.g., Ganley 2013, n=931 across three studies).
  • Original effect size: Exact statistics not reported; For Experiment 2, Fig. 2 does not report specific values but appears to be control-group-women (M = 17, SD = 20) compared to experiment-group-women (M = 5, SD = 15), which translates to approximately d= −0.7 (calculated).
  • Replication effect size: Stoet & Geary 2012: d= −0.61 for adjusted and 0.17 [−0.27, −0.07] for unadjusted scores. Together, only the group of studies with adjusted scores confirmed a statistically significant effect of stereotype threat. Flore & Wicherts 2015: g= −0.22 [95 CI = −0.21; 0.06) and significantly different from zero, but g = −0.07 [−0.21; 0.06] and not statistically significant after accounting for publication bias. Flore et al. 2018: d= −0.05 [−0.18, 0.07]. Agnoli et al. 2021: Both estimated stereotype threat effects were nonsignificant (see also Table S22; https://osf.io/3u2jd), Z = 1.53, p = .25 for ninth grade female participants and Z =.70, p = .97 for eleventh grade female participants.
</div>
  • Narcissism increase. (Leadership, vanity, and entitlement increase in young people over the last thirty years. It’s an ancient hypothesis. The basic counterargument is that they’re misidentifying an age effect as a cohort effect (The narcissism construct apparently decreases by about a standard deviation between adolescence and retirement.) “every generation is Generation Me”

  • Status: not replicated
  • Original paper: ‘The Evidence for Generation Me and Against Generation We’, Twenge 2013, review of various studies, including national surveys [citations=251(GS, March 2022).
  • Critiques: Donnellan and Trzesniewski [k = 5, n=477,380, citations = 432(GS, March 2022)] . Arnett 2013 [unsystematic review, citations=171(GS, March 2022)], Roberts 2017 [reanalysis of original data and analysis of new sample n = 476, citations=195(GS, March 2022)], Wetzel 2017[1990s: n = 1,166; 2000s: n = 33,647; 2010s: n = 25,412, citations=101(GS, March 2022)].(~660 total citations), Meta-analysis: Hamamura et al. 2020 [total n =24990, citations = 5(GS, March 2022)].
  • Original effect size: d=0.37 increase in NPI scores (1980-2010), n=49,000.
  • Replication effect size: Roberts doesn’t give a d but it’s near 0. something like d=0.03 ((15.65 - 15.44) / 6.59). Wetzel: d = -0.27 (1990 - 2010). Hamamura: d(leadership) = -0.26, d(vanity)=-0.39, d(entitlement) = -0.23.
</div>
  • Minimal group effect (MGE), alt-term = Minimal group paradigm. An intergroup bias that manifests as ingroup favouritism (i.e., a tendency to prefer ingroup members) when participants are assigned to previously unfamiliar, experimentally created and largely meaningless social identities.

  • Status: replicated
  • Original papers: Experiments on ingroup favoritism (or a broader contruct of intergroup discrimination) by Rabbie & Horwitz (1969) [citations= 662 (GS, July 2022)], Tajfel (1970) [citations= 3920 (GS, July 2022)], Tajfel, Billig, Bundy, & Flament (1971) [citations=7766 (GS, July 2022) and Billig & Tajfel (1973) [citations=2134 (GS, July 2022) The finding was confirmed in several meta-analytic studies (Mullen, Brown & Smith, 1992).
  • Critiques: Related to the cultural ubiquity of MGE. Studies by Kerr, Ao, Hogg, & Zhang (2018) comparing US and Australian samples [citations=17 (GS, July 2022), and Falk, Heine, & Takemura, 2014, emphasised the cultural variation of MGE
  • Original effect size: N/A
  • Replication effect size:
</div>
  • Solomon Asch’s conformity study. The study investigated the degree to which a person’s own opinions are influenced by those of a group. The original study is regarded as credible and the main effect has been confirmed multiple times in many cultural contexts (see, Bond and Smith, 1996). Nevertheless, the main effect of an original study had been widely misinterpreted and incorrectly referred to in both academic and popular literature.​

  • Status: reversed
  • Original paper: ‘Studies of independence ity of one against a unanimous majority.’ Solomon, 1956; n = 123 [citations = 6558, GS, October 2021]​.
  • Critiques: Friend et al., 1990; [citations = 156, GS, November 2021]; Griggs, 2015, citations = 12, GS, November 2021.
  • Original effect size: 36.8% of the responses were incorrect (influenced by the majority). The effect has been interpreted by the author as evidence for the prevalence of independence (“The preponderance of judgments was independent, evidence that under the present conditions the force of the perceived data far exceeded that of the majority.”, Asch, 1956, p.24). Nevertheless, the majority of academic textbooks present the study as evidence for overwhelming conformity, failing to report the evidence of independent tendencies among participants (see: Friend et al., 1990, Griggs, 2015). A common practice seen in many academic textbooks and popular writings is to report the value of “75%” or “76%” as the general indicator of conformity. In reality, this is the fraction of respondents who yielded to the majority in at least one of the twelve trials. The reversal of this value (rarely mentioned in the literature) would be 24% - a fraction of completely independent respondents or 95% - a fraction of respondents who remain independent in at least one of twelve trials.
  • Replication effect size: Bond and Smith, 1996: d = .92, 95%CI[.89-.96], average rate of incorrect answers: 25%.
</div>
  • Dynamic norms. Information about increasing minority norms increases interest/engagement in minority behaviour.​

</div>
  • Social comparison. No robust evidence for an interaction effect between body dissatisfaction and social comparison on fat talk.

</div>
  • Bystander effect. Claims that the feeling of responsibility diffuses with an increasing number of other observers. Research about the bystander effect was sparked by the 1964 murder of Catherine “Kitty” Genovese. See this New York Times article for details. Here’s a more detailed resource.

</div>
  • Color red on attractiveness. Viewing the color red enhances men’s attraction to women. In a lingua franca this effect may reflect the amorous meaning in the human mating game.

  • Status: Mixed
  • Original paper: ‘‘Romantic red: Red enhances men’s attraction to women’, Elliot and Niesta (2008); experiment, N = 42 [citation=66 (GS, February 2022)]​.
  • Critiques: Peperkoorn et al. (2016) [n=830, citations=48 (GS, February 2022)]. ​
  • Original effect size: Cohen’s d = .66 to ES = X​.
  • Replication effect size: Peperkoorn et al. (2016; study 1): partial η2 = .03 (in support of white more attractive than red). Peperkoorn et al. (2016; study 2): F = .07.​ Peperkoorn et al. (2016; study 3): d = −.12.
</div>
  • Big brother effect. An original study reported that being watched makes someone more likely to cooperate. People who viewed by a pair of eyes (even when a picture of eyes and not a real person) were three times more likely to contribute to an honesty box used to collect money for drinks (compared to participants who instead saw a picture of flowers), but later meta-analyses did not find this result using very large sample sizes.

  • Status: not replicated
  • Original paper: ‘Cues of being watched enhance co-operation in a real-world setting’, Bateson et al, 2006; experimental design, n=48. [citations = 1604, Google Scholar, Dec 2021)]​.
  • Critiques: Carbon & Hesslinger, 2006 [n=138, citations=52 (Google scholar, December 2021)], Northover et al., 2017 [1st meta-analysis total n=2700, 2nd meta-analysis total n=20,000, citations=135 (Google scholar, December 2021)].
  • Original effect size: d=1.948.
  • Replication effect size: Northover et al., 1st meta-analysis: g=03. Northover et al., 2nd meta-analysis: g=0.13..​
</div>
  • Imagined Contact - Bias, the claim that imagining social contact (instead of having actual contact) with someone from an outgroup (based on e.g., ethnicity, sexuality, religion, age) can reduce intergroup bias.

  • Status: mixed
  • Original paper: ‘Imagining intergroup contact can improve intergroup attitudes’, Turner, Crisp, & Lambert (2007), three experiments, n = [28, 24, 27] [citations = 633, Google Scholar, 10/2022].
  • Critiques: Hoffarth & Hodson (2016, Study 1: N = 261, Study 2: N = 320, GS: 36 citations, 10/2022), Miles & Turner (2014), meta-analysis, k = 71, N = 5,770, GS: 450 citations, 10/2022), Firat & Ataca (2020), N = 335, GS: 9 citations, 102022.
  • Original effect size: Study 1, age: _d = _.42, Study 2, elderly, ηp² = 0.20, Study 3, _d _= 0.86 (as calculated for this entry, using Lakens’ tool).
  • Replication effect size:
    • Hoffarth & Hodson (2016, Study 1, concerning gay people): many outcomes, all n.s., largest beta = .10, Hoffarth & Hodson (2016, Study 2, concerning Muslims): many outcomes, all n.s., largest beta = .095
    • Miles & Turner (2014): overall _d _= .35 (95% CI [0.26, 0.44])
    • Firat & Ataca (2020): ηp2 = .01 (n.s.)
</div>
  • Imagined Contact - Intentions, the claim that imagining social contact (instead of having actual contact) with someone from an outgroup (based on e.g., ethnicity, sexuality, religion, age) can increase contact intentions.

  • Status: Mixed
  • Original paper: ‘Elaboration enhances the imagined contact effect’, Husnu & Crisp, 2010, [Experiment 1, n= 33; Experiment 2, n = 60, citations = 278, GS, 10/2022]
  • Critiques: Klein et al., 2014 Many Labs study [n = 6344, citations = 1082, GS, 06/2022]; Crisp et al. (2014) [citations = 16, GS, 10/2022] reply to Klein et al. stating that the effect size was significant and comparable to that obtained in the Miles and Crisp (2014) [citations = 450, GS, 10/2022] meta-analysis for the relevant outgroup, suggesting that the Many Labs project may provide stronger evidence than originally thought.
  • Original effect size: Husnu & Crisp, study 1, _d _= 0.86, study 2, _d _= 1.13;
  • Replication effect size: Klein et al., _d _= 0.13, CI = [0.00;0.19] (NB: original study focused on ‘British Muslims’ - this on Muslims across cultures). Meta-analysis: Miles and Crisp (2014), _d _= 0.35 and estimate for religious groups, _d _= 0.22. Crisp et al. (2014) “the observed effect size of 0.13 in the Many Labs study is substantially different from the original Husnu and Crisp study, and from our overall estimate of 0.35, but not from the most appropriate comparison: The meta-analytic estimate for religious outgroups (0.22).
</div>
  • Stereotype susceptibility effects, Awareness of stereotypes about a person’s in-group can affect a person’s behavior and performance when they complete a stereotype-relevant task.​

</div>
  • Positive mood-boost helping effect. People are more likely to do good when feeling good.

  • Status: mixed
  • Original paper: Isen & Levin (1972). [Experiment 1 n = 52 male undergraduates; Experiment 2 n = 41 adults.[citations=1,881 (GS, 10/2022)]​.
  • Critiques:
  • Original effect size, calculated: Study 1: OR = 2.25, Study 2: OR = 168 [no typo, both calculated]
  • Replication effect size: Batson et al. (1979): OR = 4.3 [calculated], Carlson et al. (1988): _d _= .54 [reported], Weyant & Clark (1977): Study 1: OR = 4.2 (calculated, between dime and no-dime, excl. 2 other conditions), Study 2: OR = 0.7 [calculated], Blevins & Murphy (1974): OR = 0.9 [calculated].
</div>
  • Superiority-of-unconscious decision-making effect (alt-term = deliberation without attention effect). While conscious reflection produces better choices on simple tasks, complex choices “should be left to unconscious thought”.​

  • Status: mixed.
  • Original paper: ‘On Making the Right Choice: The Deliberation-Without-Attention Effect’, Dijksterhuis et al., 2005; 2 experiments (Ns = 80 & 59, undergraduates) that show better choices (and two surveys that show greater satisfaction, not focus here) [citations = 1807 in GS; October 2022].
  • Critiques: Acker 2008 [n=78, citations=233(Google Scholar, November 2022)].​
    • Meta-analysis: Acker 2008 [n=888 across 17 studies, citations=233(Google Scholar, November 2022)]
    • Meta-analysis: Nieuwenstein et al. 2015 [n=4518 across 67 studies, citations=103(Google Scholar, November 2022)]
  • Original effect size: ηp2 = 0.06 (Study 1) / g = 0.434 [reported in Acker 2008] to 0.11 (Study 2) / g = 0.242 [reported in Acker 2008] for interaction between choice complexity and deliberation. Main effects and descriptives not reported.
  • Replication effect size: All reported in Acker 2008: Acker: g = 0.471. Ham et al.: g = 0.883 to g = 1.055. Lerouge: g = -0.064 to g = 1.116. Newell et al.: g = -0.504 to g = 0.722. Payne et al.: g = -0.483 to g = 0.722. Phillips et al.: g = -0.251. The mean effect size was g = .251.​ All reported in Nieuwenstein et al. 2015: Abadie et al. g = -0.62 to g = 0.22. Aczel et al._ g_ = -0.35. Ashby et al. _g_ = -0.21 to _g_ = 1.00. Bos et al. _g_ = -0.10 to _g_ = 1.48. Calvillo & Penaloza _g_ = -0.29 to _g_ = -0.09. Dijksterhuis _g_ = 0.24 to _g_ = 0.42. Dijksterhuis et al. _g_ = 0.70 to _g_ = 0.86. González et al. _g_ = 0.00. Hasford _g_ = 0.43. Hess et al. _g_ = -0.14. Huizenga et al. _g_ = -0.50 to _g_ = -0.33. Lassiter et al. _g_ = 0.27 to _g_ = 0.51. Lerouge _g_ = 0.38 to _g_ = 0.47. McMahon et al. _g_ = 0.62 to _g_ = 0.67. Messner et al. _g_ = 0.63. Newell et al. _g_ = -0.50 to _g_ = 0.17. Newell and Rakow _g_ = -0.37 to _g_ = 0.31. Nieuwenstein and Van Rijn _g_ = -0.74 to _g_ = 0.87. Nieuwenstein et al. _g_ = -0.01. Nordgren et al. _g_ = 0.27 to _g_ = 0.36. Payne et al. _g_ = -0.10. Queen & Hess _g_ = -0.21. Rey et al. _g_ = 0.27. Smith et al. _g_ = 0.25 to _g_ = 0.32. Strick et al. _g_ = 0.58 to _g_ = 1.21. Thorsteinson & Withrow _g_ = 0.18 to _g_ = 0.34. Usher et al. _g_ = 0.78 to _g_ = 1.04. Waroquier et al. _g_ = -0.09 to _g_ = 0.35. Pooled effect size of _g_ = 0.15 [0.03; 0.26].
</div>
  • Behavioural-consequences-of automatic-evaluation, alt-term = affective compatibility effect). Automatic classification of stimuli as either good or bad have direct behavioural consequences.​ Automatic evaluation results directly in behavioural predispositions toward the stimulus, such that positive evaluations produce immediate approach tendencies, and negative evaluations produce immediate avoidance tendencies.

  • Status: mixed.
  • Original paper: ‘Consequences of Automatic Evaluation: Immediate Behavioral Predispositions to Approach or Avoid the Stimulus’, Chen & Bargh 1979; two mixed design experiments, study 1 n= 42, study 2 n = 50. [citations = 1943 (Google Scholar, October 2022)]​.
  • Critiques:
    • Rotteveel et al. 2015 [study 1 n=100, study 2 n=50, citations = 35(Google Scholar, October 2022)]. List total n for meta-analyses. Repeat this format for all studies you find and add “meta-analysis” or “review” for specific study types.​
    • Meta-analysis: Phaf et al. 2014 [N=1538 across 29 studies, citations=271(Google Scholar, October 2022)].
  • Original effect size:
    • study 1 (conscious evaluation) – congruence factor main effect ηp2 = 0.168 / d = 0.44 [_ηp2 _calculated from reported F statistic and converted using this conversion]
    • study 2 (automatic evaluation) – congruence factor main effect ηp2 = 0.078 / d = 0.29 [_ηp2 _calculated from reported F statistic and converted using this conversion]
  • Replication effect size:
    • Rotteveel et al.:
      • study 1 – Evaluative judgement × Lever movement interaction effect ηp2= 0.030 [reported, non-significant] / d = 0.17 [converted using this conversion].
      • study 2 – Affective valence × Lever movement interaction effect ηp2 = 0.057 [reported, marginally significant] / d = 0.24 [converted using this conversion].
    • Phaf et al.:
      • Positive emotions – The average effect size differed significantly from zero for explicit instructions to evaluate (g= 0.287; p < 0.0001; 95% CI = 0.204, 0.369) and for explicit-converted instructions (g= 0.287; p = 0.0001; 95% CI = 0.146, 0.429), but not for implicit instructions (g= 0.028; p= 0._572) [all reported].
      • Negative emotions – Effect sizes differed significantly from zero for explicit-converted instructions (g= 0.389; p= 0_._001; 95% CI = 0.155, 0.624) and for explicit instructions (_g_= 0_._249; _p= 0_._0001; 95% CI = 0.159, 0.339), but not for implicit instructions (_g_= 0_._103; _p_= 0_._0959) [all reported].
      • Both emotions – The average effect size differed significantly from zero for explicit-converted instructions (g= 0_._433; _p_= 0_._0001; 95% CI = 0.295, 0.571) and explicit instructions (_g_= 0_._403; _p_ = _0_._0001; 95% CI = 0.286, 0.521), but not for implicit instructions (_g_= 0_._076; _p_= 0_._148) [all reported].
</div>
  • Self-control relies on glucose effect. Acts of self-control decrease blood glucose levels; low levels of blood glucose predict poor performance on self-control tasks; initial acts of self-control impair performance on subsequent self-control tasks, but consuming a glucose drink eliminates these impairments.

  • Status: mixed
  • Original paper: ‘Self-control relies on glucose as a limited energy source: Willpower is more than a metaphor’, Gailliot et al. (2007); 9 experiments with: Study 1 (self-control decreases blood glucose) n= 103; Study 2 (self-control decreases blood glucose) n= 37; Study 3 (low levels of blood glucose predict poor performance on self-control tasks) n= 15; Study 4 (low levels of blood glucose predict poor performance on self-control tasks) n= 10; Study 5 (low levels of blood glucose predict poor performance on self-control tasks) n= 19; Study 6 (low levels of blood glucose predict poor performance on self-control tasks) n= 15; Study 7 (glucose consumption) n= 61; Study 8 (glucose consumption) n= 72; Study 9 (glucose consumption) n= 17. [citations=1956(GS, June, 2022].
  • Critiques: Meta-analysis: Hagger et al., 2010 [citations= 2638 (GS, June, 2022)]. Lange & Egger, 2014 [n= 70, citations= 114 (GS, June 2022)]. Lange & Egger also points at statistical mistakes in the meta-analysis of Hagger et al.
  • Original effect size: Study 1 (self-control decreases blood glucose): ηp2 = 0.057 [calculated]. Study 3 (low levels of blood glucose predict poor performance on self-control tasks): r= -0.62. Studies 4-6 (low levels of blood glucose predict poor performance on self-control tasks): r= 0.56, r= 0.45, r= 0.43 respectively. Study 7, 8, 9 (glucose consumption): ηp2 = 0.081, ηp2 = 0.073, d= 1.518 respectively [all calculated].
  • Replication effect size: Meta-analysis Hagger et al., 2010 for glucose consumption: d = 0.75 (includes the original study); for decrease of blood glucose levels: d= -0.87 (includes the original study). Lange & Egger 2014 for glucose consumption: ηp2 = 0.02.
</div>
  • Physical warmth promotes interpersonal warmth. Exposure to physical warmth will lead to more positive judgments of strangers and an increase in prosocial behaviour (e.g., gift-giving).

  • Status: not replicated.
  • Original paper: ‘Experiencing physical warmth promotes interpersonal warmth’, Williams and Bargh (2008); between-subjects [citations = 1,894 (Google Scholar, October 2022)].
    • Experiment 1: n = 41,
    • Experiment 2: n = 53,
  • Critiques:
    • Lynott et al. (2014) [citations = 140 (Google Scholar, October 2022)]
      • Sample 1: n = 306 (Ohio, USA),
      • Sample 2: n = 250 (Michigan State University, USA),
      • Sample 3: n = 305 (University of Manchester, UK),
        • Note: All samples attempted to replicate Experiment 2 of Williams and Bargh (2008).
    • Chabris et al. (2018) [citations = 53 (Google Scholar, October 2022)]
      • Experiment 1: n = 128 (attempted to replicate Experiment 1 of Williams and Bargh, 2008).
      • Experiment 2: n = 177 (attempted to replicate Experiment 2 of Williams and Bargh, 2008)
  • Original effect size:
    • Experiment 1 (estimated from test-statistic): d = 0.65 (people tended to give more positive ratings after holding a warm drink).
    • Experiment 2 (converted from Lynott et al. 2014’s OR reported for this study): _d _= 0.65 (people were more likely to give a gift to a friend than themselves after holding a warming pad).
  • Replication effect size:
    • Lynott et al. (2014):
      • Sample 1: d = -0.27 (opposite direction, converted from OR reported in paper)
      • Sample 2: d = -0.05 (not replicated, converted from OR reported in paper)
      • Sample 3: d = -0.14 (not replicated, converted from OR reported in paper)
    • Chabris et al. (2018):
      • Experiment 1: d = -0.06 (not replicated, converted from r statistic reported)
      • Experiment 2: d = 0.04 (not replicated, converted from r statistic reported).
</div>
  • Power impairs perspective-taking effect. Individuals made to feel high in power were more likely to inaccurately assume that others view the social world from the same perspective as they do.

  • Status: not replicated
  • Original paper: ‘Power and Perspectives Not Taken’, Galinsky et al. 2006; 3 between-subjects experiments, each with two conditions; Experiment 1: n = 57, Experiment 2a: n = 42, Experiment 2b: n = 51, Experiment 3: n = 70; [citations = 1550 (GS, June 2022)]
  • Critiques: Experiment 2a: Ebersole et al. (2016) [n = 2,969, citations = 438 (GS, June 2022)]
  • Original effect size: d = .77 [0.12 1.41] obtained from Ebersole et al. (2016)
  • Replication effect size: Ebersole et al. (2016): d = .03 [− 0.04 0.10] obtained from Ebersole et al. (2016).
</div>
  • Status-legitimacy effect. Members of low-status, disadvantaged, and marginalized groups are more likely to perceive their social systems as legitimate than their high-status and advantaged counterparts under certain circumstances People who are most disadvantaged by the status quo, due to the greatest psychological need to reduce ideological dissonance, are most likely to support, defend, and justify existing social systems, authorities, and outcomes.​

  • Status: mixed.
  • Original paper: ‘Social inequality and the reduction of ideological dissonance on behalf of the system: evidence of enhanced system justification among the disadvantaged’, Jost et al. 2003; five cross-sectional / correlational studies, n1 = 1345, n2 = 2485, n3 = 1396, n4 = 2223, n5= 788. [citations =927(Google Scholar, October 2022)]​.
  • Critiques: Henry & Saul 2006 [n=356, citations=156(Google Scholar, October 2022)]. ​Brandt 2013 [n=151,794, citations=271(Google Scholar, October 2022)]. Caricati 2017 [n=38,967, citations=50(Google Scholar, October 2022)]
  • Original effect size:
    • Study 1 – effect of income, B = -0.22, race (European Americans vs. African Americans), B = -0.73, and education, B = -0.30, on willingness to limit the press; effect of income, B = -0.31, race (European Americans vs. African Americans),_ B_ = -1.01, and education,_ B_ = -0.38, on the attitudes of the rights of citizens.
    • Study 2 – effect of income, B = 0.06, and education,_ B_ = -0.08, on trust in government officials among Latinos.
    • Study 3 – effect of income on belief that large income differences are necessary to get people to work hard, B = 0.04, and as an incentive for individual effort, B = 0.02.
    • Study 4 – main effects of region (North vs. South), ηp2 = 0.128 / d = 0.38, and income, ηp2 = 0.09 / d = 0.31, on meritocratic beliefs among African Americans [_ηp2 _calculated from the reported F statistic and converted using this conversion].
    • Study 5 – effect of socio-economic status, B = -0.34, and race (White versus Black), B = -0.25, on legitimation of income inequality.
  • Replication effect size:
    • Henry & Saul: group status effects on the support for of the dissent, ηp2 = 0.019 / d = 0.14, government approval, ηp2 = 0.024 / d = 0.16, and alienation from government, ηp2 = 0.024 / d = 0.16 [_ηp2 _calculated from the reported F statistic and converted using this conversion] (replicated).​
    • Caricati: effects of the top-bottom self-placement, B = 0.117, social class, B = 0.075, and personal income, B = 0.022, on perceived fairness of income distribution [all significant, reversed].
    • Brandt:
      • effects of income on trust in government and confidence in societal institutions in various multilevel regression models _b _= -0.014 to _b _= 0.005 [all non-significant, not replicated];
      • effects of education on trust in government and confidence in societal institutions in various multilevel regression models _b _= -0.044 [significant, replicated] to _b _= 0.021 [significant, reversed];
      • effects of social class on trust in government and confidence in societal institutions in various multilevel regression models _b _= 0.055 [significant, reversed) to _b _= 0.110 [significant, reversed];
      • effects of race on trust in government and confidence in societal institutions in various multilevel regression models _b _= -0.019 [non-significant, not replicated] to _b _= 0.017 [significant, reversed];
      • Overall, only one effect out of the 14 was supportive, six effects were significant and positive (reversed) and the remaining seven effects were not significantly different from zero.
</div>
  • Red impairs cognitive performance. The color red impairs performance on achievement tasks, as red is associated with the danger of failure and evokes avoidance motivation.

</div>
  • Reduced prosociality of high SES effect. Higher socioeconomic status predict decreased prosocial behavior. Affluence may be linked with reduced empathy and poverty may be linked with increased empathy.

  • Status: mixed
  • Original paper: ‘Having less, giving more: the influence of social class on prosocial behavior’, Piff, Kraus, Côté, Cheng, & Keltner (2010); correlational and experimental design: self-report and behavioral measure of altruism, Total N = 394. [citations=1633(Google Scholar, October 2022)]​.
  • Critiques: Stamos, Lange, Huang, & Dewitte (2020), preregistered replications [Study 1 n=300, Study 2 n=200, citations=25(Google Scholar, October 2022)]. Andreoni, Nikiforakis, & Stoop (2021) field experiment [n=360, citations=27(Google Scholar, October 2022)].
  • Original effect size: mean _r _= −0.215
  • Replication effect size: Stamos et al. (2020): r=0.01 (non-significant). Andreoni et al. (2021): mean _r _=.37 (reversed).​
</div>
  • Moral licensing effect, alt-terms = self-licensing, moral self-licensing, licensing effect) is the effect that acting in a moral way makes people more likely to excuse and perform subsequent immoral, unethical, or otherwise problematic behaviors.

  • Status: not replicated
  • Original paper: ‘Sinning Saints and Saintly Sinners’, Sachdeva et al. (2009); three experiments using a priming-task where participants write a story about themselves using neutral/negative/positive traits, US student sample, Study 1 & 3: n = 46. [citations=919 (Google Scholar, June 2022)]
  • Critiques: Blanken et al, 2014 (direct replication of 2 of the original studies, 3 replication studies with 2 different populations), [Study 1: n = 105, Study 2: n = 150, Study 3: n = 940, citations = 81(Google Scholar, June 2022)]; Blanken et al, 2015 [meta-analysis estimating a mean effect of d = 0.31, 95% CI = [0.23,0.38], total n = 7,397, citations = 470(Google Scholar, June 2022)]; Simbrunner & Schlegelmilch 2015 [meta-analysis estimating a mean effect of d = 0.319, 95% CI = [0.229,0.408], k = 106 (n data points not reported), citations = 37(Google Scholar, June 2022)]; Kuper & Bott 2019 [re-analysis of the meta-analyses above, adjustment for publication bias, Adjusted effect sizes: d= -0.05 (PET-PEESE) and d= 0.18 (3-PSM), citations = 27(Google Scholar, June 2022)]; Urban et al, 2019 [failed conceptual replication of Mazar & Zhong, 2010, moral licensing in the domain of environmental behavior, 3 studies, total n  =  1274]; Rotella & Barclay, 2020 [failed pre-registered conceptual replication of the effect, n = 562]
  • Original effect size: Study 1: Cohen’s d = 0.62, 95% CI = [-0.11,1.35]. Study 3: Cohen’s d = 0.59, 95% CI = [-0.12,1.30] (effect sizes taken from replication paper by Blanken et al.)
  • Replication effect size: Blanken et al, replication Study 1 (Dutch student sample): Cohen’s d = -0.03, 95% CI = [-0.51,0.45], replication Study 2 (Dutch student sample): Cohen’s d = -0.31, 95% CI = [-0.70,0.08], replication Study 1 & 3 ­­(US MTurk sample)­­: Cohen’s d = 0.05, 95% CI = [-0.15,0.25].
</div>

Positive Psychology

  • Power pose. Taking on a power pose lowers cortisol and risk tolerance, while it raises testosterone and feelings of power.

  • Status: not replicated
  • Original paper: ‘Power Posing : Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance’, Carney et al. (2010), n=42 mixed sexes; 2010, [citations = citation = 1450 (GS, April, 2022)]
  • Critiques: Garrison et al. (2016), [n=305, citations = 70 (GS, April 2022)]; Metzler and Grezes (2019), [n = 82 men, citations = 3 (GS, April 2022)] Ranehill (2015),[total n=200, citations = 291 (GS, April 2022)]; Ronay 2017 [n=108, citations = 38 (GS, April 2022)];
  • Original effect sizes: Φ = 0.30 in risk-taking from Carney et al. (2010), Sources unknown: d = -0.30 for cortisol, d=0.35 for testosterone d=0.79 for feelings of power
  • Replication effect size: Garrison et al. (2016): feeling of power: np2 = .016; Metzler and Grezes (2019): cortisol: np2 = 0.02, testosterone: np2 = 0.01; Ranehill (2015): cortisol: d = -0.157, feelings of power: d = 0.34; risk taking: d = -0.176, testosterone: d = -0.200; Ronay (2017): cortisol: d = 0.034, feeling of power: d = 0.226, testosterone: d = 0.121.
</div>
  • Facial Feedback. Smiling causes a good mood, while pouting produces a bad mood.

  • Status: not replicated
  • Original paper: ‘Inhibiting and Facilitating Conditions of the Human Smile: A Nonobtrusive Test of the Facial Feedback Hypothesis’, Strack et al. (1988); 2 studies with n’s = 92,83 [citation= 2577(GS, February, 2022)].
  • Critiques: Coles et al. (2019)[(meta-analysis k = 98, citation= 115, (GS, February, 2022)]; Wagenmakers et al (2016)[meta-analysis n=1894, citation=349, (GS, February, 2022)]; Schimmack (2017)
  • Original effect size: Study 1: d = 0.82, d = 0.43 (0.82 out of 9)
  • Replication effect size: 0.03 out of 9, CI overlapping 0. A meta-analysis of 98 studies finds d= 0.2 [0.14, 0.26] with an absurdly low p value, and doesn’t find publication bias. But this latter point simply can’t be right. Given d = 0.2 and the convention of targeting 80% power to detect a real phenomenon, you would need very high sample sizes, n > 500. And almost all of the included studies are N < 100. Schimmack finds strong evidence of publication bias on a subset of these papers, using a proper power analysis. Direct replications of the original pen-in-mouth protocol fail; but new conceptual replications appear to work. Coles et al (2022), n=3878 participants, m=17 labs collaborating; effect is an absolute 5% increase in a happiness metric.
StudyPublication statusNd
Andréasson & Dimberg (2008) published112-0.22
Andréasson (2010) Study 3unpublished48-0.05
Andréasson (2010) Study 3unpublished48-0.35
Andréasson (2010) Study 4unpublished440.49
Andréasson (2010) Study 4unpublished440.31
Baumeister et al. (2016)published101.26
Baumeister et al. (2016)published100.63
Bodenhausen et al. (1994)published510.55
Bush et al. (1989)published690.16
Butler et al. (2003) Study 1published24-0.1
Butler et al. (2003) Study 2published42-0.83
Butler et al. (2006)published69-0.03
Cai et al. (2016)published68-0.08
Ceschi & Scherer (2003) published640.74
Clapp (2012)unpublished990.69
Clapp (2012)unpublished930.08
Clapp (2012)unpublished930.17
Clapp (2012)unpublished990.27
Laird & Crosby (1974) Study 1 published26-0.13
Laird & Crosby (1974) Study 2 published260.35
Davey et al. (2013) Study 1published280.41
Davey et al. (2013) Study 1published140.62
Davey et al. (2013) Study 1published280.52
Davey et al. (2013) Study 1published140.13
Davey et al. (2013) Study 1published280.69
Davey et al. (2013) Study 1published140.42
Davey et al. (2013) Study 1published280.35
Davey et al. (2013) Study 1published140.14
Davey et al. (2013) Study 2published290.73
Davey et al. (2013) Study 2published150.63
Davey et al. (2013) Study 2published290.4
Davey et al. (2013) Study 2published150
Davey et al. (2013) Study 2published290.08
Davey et al. (2013) Study 2published15-0.25
Davey et al. (2013) Study 2published290.03
Davey et al. (2013) Study 2published15-0.06
Davis (2008) Study 1unpublished280.99
Davis (2008) Study 1unpublished280.87
Davis (2008) Study 2unpublished310.26
Davis (2008) Study 2unpublished30-0.19
Davis et al. (2009)published690.07
Davis et al. (2009)published690.51
Davis et al. (2010)published680.1
Davis et al. (2010)published680.05
Davis et al. (2010)published68-0.15
Davis et al. (2015)published18-0.16
Demaree et al. (2004)published530.62
Demaree et al. (2004)published500.16
Demaree et al. (2006)published32-0.64
Demaree et al. (2006)published350.06
Demaree et al. (2006)published37-0.38
Dillon et al. (2007)published360.11
Dimberg & Söderkvist (2011) Study 1 published480.51
Dimberg & Söderkvist (2011) Study 2 published960.1
Dimberg & Söderkvist (2011) Study 2 published960.32
Dimberg & Söderkvist (2011) Study 3 published610.06
Dimberg & Söderkvist (2011) Study 3 published610.31
Dimberg & Söderkvist (2011) Study 3 published610.34
Duncan & Laird (1977) published310.44
Duncan & Laird (1977) published310.38
Duncan & Laird (1977) published310.51
Duncan & Laird (1980) published600.59
Duncan & Laird (1980) published600.44
Dzokoto et al. (2014)published701.02
Dzokoto et al. (2014)published590.07
Dzokoto et al. (2014)published351.07
Dzokoto et al. (2014)published510.2
Flack, Laird & Cavallaro (1999b) Study 1 published601.2
Flack, Laird & Cavallaro (1999b) Study 1 published600.7
Flack, Laird & Cavallaro (1999b) Study 1 published600.31
Flack, Laird & Cavallaro (1999b) Study 1 published600.86
Flack, Laird & Cavallaro (1999b) Study 1 published601.31
Flack, Laird & Cavallaro (1999b) Study 2 published290.39
Flack, Laird & Cavallaro (1999b) Study 2 published290.23
Flack, Laird & Cavallaro (1999b) Study 2 published29-0.16
Flack, Laird & Cavallaro (1999b) Study 2 published29-0.49
Flack, Laird & Cavallaro (1999b) Study 2 published290.25
Flack, Laird & Cavallaro (1999a) published541.41
Flack, Laird & Cavallaro (1999a) published540.29
Flack, Laird & Cavallaro (1999a) published541.18
Flack, Laird & Cavallaro (1999a) published541.21
Flack (2006)published510.72
Flack (2006)published510.35
Flack (2006)published510.59
Flack (2006)published510.68
Gan et al. (2015)published34-0.11
Goldin et al. (2008)published170.8
Gross & Levenson (1993) published850.04
Gross & Levenson (1997) published1800.37
Gross & Levenson (1997) published1800.16
Gross (1993)unpublished1800.37
Gross (1993)unpublished1800.09
Gross (1993)unpublished1800.2
Gross (1993)unpublished1800.16
Gross (1993)unpublished180-0.23
Gross (1998)published800.18
Harris (2001)published360.07
Hawk et al. (2012)published410.85
Helt & Fein (2016) published430.42
Hendricks & Buchanan (2016) published79-0.08
Hendricks (2013)unpublished790.02
Henry et al. (2007)published30-0.49
Henry et al. (2007)published300.25
Henry et al. (2009)apublished26-0.05
Henry et al. (2009)apublished260.53
Henry et al. (2009)bpublished20-0.05
Henry et al. (2009)bpublished200.48
Hess et al. (1992)published28-0.28
Hess et al. (1992)published280.14
Hess et al. (1992)published28-0.26
Hess et al. (1992)published28-0.16
Hofmann et al. (2009)published134-0.03
Ito et al. (2006)published40-0.39
Ito et al. (2006)published33-0.25
Kalokerinos et al. (2015) Study 1published133.67b-0.06
Kalokerinos et al. (2015) Study 1published133.67b-0.02
Kalokerinos et al. (2015) Study 2published2951.32
Kalokerinos et al. (2015) Study 2published2950.2
Kao et al. (2017)published410.09
Kao et al. (2017)published41-0.39
Kao et al. (2017)published410.8
Kao et al. (2017)published41-0.34
Kao et al. (2017)published410.98
Kao et al. (2017)published41-0.67
Kircher et al. (2012)published271.89
Kircher et al. (2012)published271.14
Korb et al. (2012)published220.21
Labott & Teleha (1996) published190.04
Labott & Teleha (1996) published160.91
Laird (1974) Study 1published380.46
Laird (1974) Study 1published380.44
Laird (1974) Study 1published380.39
Laird (1974) Study 2published260.55
Laird (1974) Study 2published260.13
Lalot et al. (2014)published45-0.17
Larsen et al. (1992)published270.43
Lee (2011)unpublished520.48
Lee (2011)unpublished440.17
Lee (2011)unpublished52-0.27
Lee (2011)unpublished44-0.26
Lewis & Bowler (2009) published251.35
Lewis (2012)published240.71
Lewis (2012)published240.56
Ma (2011)unpublished42.67b-0.21
Ma (2011)unpublished42.67b-0.21
Ma (2011)unpublished42.67b-0.21
Ma (2011)unpublished42.67b-0.21
Maldonado et al. (2015)unpublished157.33b0.12
Marmolejo-Ramos & Dunn (2013) Study 1 published100-0.07
Marmolejo-Ramos & Dunn (2013) Study 2 published106-0.07
Marmolejo-Ramos & Dunn (2013) Study 3 published104-0.07
Marmolejo-Ramos & Dunn (2013) Study 4 published100-0.07
Marmolejo-Ramos & Dunn (2013) Study 5 published660.27
Marmolejo-Ramos & Dunn (2013) Study 6 published670.38
Martijn et al. (2002)published33-0.24
McCanne & Anderson (1987) published30-2.16
McCanne & Anderson (1987) published30-2.07
McCanne & Anderson (1987) published304.73
McCanne & Anderson (1987) published301.67
McCanne & Anderson (1987) published302.48
McCanne & Anderson (1987) published30-0.25
McCaul et al. (1982)published270.25
McIntosh et al. (1997)published260.54
Meeten et al. (2015)published710.49
Miyamoto (2006) Study 1unpublished400.17
Miyamoto (2006) Study 1unpublished400.53
Miyamoto (2006) Study 2unpublished770.49
Moore & Zoellner(2012) published23.33b-0.87
Kappas (1989)unpublished320.08
Kappas (1989)unpublished320.26
Kappas (1989)unpublished320.27
Kappas (1989)unpublished320.1
Kappas (1989)unpublished320.17
Kappas (1989)unpublished320.52
Kappas (1989)unpublished320.62
Kappas (1989)unpublished320.74
Kappas (1989)unpublished320.18
Kappas (1989)unpublished320.42
Ohira & Kurono (1993) Study 1 published201.23
Ohira & Kurono (1993) Study 1 published200.31
Ohira & Kurono (1993) Study 2 published201.61
Ohira & Kurono (1993) Study 2 published20-1.38
Paredes et al. (2013)published310.85
Paul et al. (2013)published200.91
Pedder et al. (2016)published680.7
Pedder et al. (2016)published680.22
Phillips et al. (2008)published320.18
Phillips et al. (2008)published320.08
Reisenzein & Studtmann (2007) Study 1 published530.18
Reisenzein & Studtmann (2007) Study 1 published550.34
Reisenzein & Studtmann (2007) Study 1 published55-0.08
Reisenzein & Studtmann (2007) Study 1 published550.3
Reisenzein & Studtmann (2007) Study 1 published53-0.12
Reisenzein & Studtmann (2007) Study 1 published530.22
Reisenzein & Studtmann (2007) Study 1 published52-0.04
Reisenzein & Studtmann (2007) Study 1 published52-0.09
Reisenzein & Studtmann (2007) Study 3 published40-0.74
Richards, Butler & Gross (2003) published590.19
Richards, Butler & Gross (2003) published59-0.12
Richards & Gross (1999) Study 1 published58-0.1
Richards & Gross (1999) Study 1 published580.25
Richards & Gross (1999) Study 1 published580.36
Richards & Gross (1999) Study 2 published850.13
Richards & Gross (1999) Study 2 published850.24
Richards & Gross (1999) Study 2 published850.06
Richards & Gross (2000) Study 1 published53-0.12
Richards & Gross (2000) Study 2 published610.39
Richards & Gross (2006) published1310.34
Roberts et al. (2008)published1600.07
Robinson & Demaree (2009) published102-0.04
Robinson & Demaree (2009) published1020.03
Robinson & Demaree (2009) published1020
Robinson & Demaree (2009) published1020
Roemer (2014)unpublished440.58
Roemer (2014)unpublished440.29
Rohrmann et al. (2009)published360.16
Rohrmann et al. (2009)published360.13
Rummer et al. (2014)published740.57
Rummer et al. (2014)published740.46
Schmeichel , Vohs, & Baumeister (2003) published37-0.23
Schmeichel et al. (2008)published500.1
Söderkvist & Dimberg (unpublished) unpublished320.36
Söderkvist et al. (2018) Study 1aunpublished320.34
Söderkvist et al. (2018) Study 2aunpublished640.17
Soussignan (2002)published33-0.17
Soussignan (2002)published330.48
Soussignan (2002)published330.47
Soussignan (2002)published330.44
Soussignan (2002)published320.53
Soussignan (2002)published321.1
Soussignan (2002)published321.11
Soussignan (2002)published320.94
Stel et al. (2008) Study 2published18.67b1.11
Stel et al. (2008) Study 3published241
Strack et al. (1988) Study 1published76.67b0.43
Strack et al. (1988) Study 2published83-0.15
Strack et al. (1988) Study 2published41.50.55
Strack et al. (1988) Study 2published41.5-0.51
Tamir et al. (2004)published72-0.16
Tourangeau & Ellsworth (1979) published20.5b0.3
Tourangeau & Ellsworth (1979) published20.5b0.3
Tourangeau & Ellsworth (1979) published20.5b0.3
Tourangeau & Ellsworth (1979) published20.5b0.3
Trent (2010)unpublished107.33b-0.22
Trent (2010)unpublished107.33b-0.22
Trent (2010)unpublished107.33b-0.06
Trent (2010)unpublished107.33b-0.06
Vieillard et al. (2015)published310.25
Vieillard et al. (2015)published310.66
Vieillard et al. (2015)published300.21
Vieillard et al. (2015)published300.14
Vieillard et al. (2015)published31-0.05
Vieillard et al. (2015)published31-0.5
Vieillard et al. (2015)published300.07
Vieillard et al. (2015)published30-0.12
Wagenmakers et al. (2016) Albohn sitepublished1390.09
Wagenmakers et al. (2016) Allard sitepublished1250.09
Wagenmakers et al. (2016) Benning sitepublished115-0.01
Wagenmakers et al. (2016) Bulnes sitepublished1010.09
Wagenmakers et al. (2016) Capaldi sitepublished117-0.07
Wagenmakers et al. (2016) Chasten sitepublished94-0.04
Wagenmakers et al. (2016) Holmes sitepublished990.15
Wagenmakers et al. (2016) Koch sitepublished100-0.14
Wagenmakers et al. (2016) Korb sitepublished1010.01
Wagenmakers et al. (2016) Lynott sitepublished1260.23
Wagenmakers et al. (2016) Oosterwijk sitepublished110-0.17
Wagenmakers et al. (2016) Ozdogru sitepublished87-0.3
Wagenmakers et al. (2016) Pacheco-Unguetti sitepublished120-0.08
Wagenmakers et al. (2016) Talarico sitepublished1120.02
Wagenmakers et al. (2016) Wagenmakers sitepublished1300.13
Wagenmakers et al. (2016) Wayand sitepublished110-0.14
Wagenmakers et al. (2016) Zeelenberg sitepublished1080.25
Wittmer (1985)unpublished30-0.36
Wittmer (1985)unpublished30-0.21
Yartz (2004)unpublished28-0.05
Yartz (2004)unpublished30-0.18
Yartz (2004)unpublished28-0.08
Yartz (2004)unpublished30-0.09
Yartz (2004)unpublished280.04
Yartz (2004)unpublished300.5
Zajonc et al. (1989) Study 3published371.27
Zajonc et al. (1989) Study 4published260.47
Zajonc et al. (1989) Study 4published260.31
Zariffa et al. (2014)published24-0.57
Zariffa et al. (2014)published24-0.14
Zhu et al. (2015)published551.74
</div>
  • Positive affirmation on mood. Positive self-statements boost mood for people with high self-esteem and reduced mood for people with low self-esteem.

  • Status: not replicated
  • Original paper: ‘Positive Self-Statements: Power for Some, Peril for Others’, Wood et al.’s (2009): 3 experiments with Study 1: n = 249, Study 2: n = 68, Study 3:n=116. [citation=294(GS, February 2022)]​.
  • Critiques: Flynn and Bordieri (2020) [experiment: n = 462 , citations=4(GS, February 2022)].
  • Original effect size: Study 1: not reported [g = 0.53 calculated]; Study 2: not reported [g = 1.00 calculated]; Study 3: not reported [g = 0.86, d= -0.74, d= 2.13, d = -0.49 calculated]. A meta-analysis combining the studies suggested that participants with high self-esteem did receive some benefit, Z = 2.51, p < .013, d = 0.66 (for participants with low self-esteem, Z = −3.21, p < .002, d = 0.72).
  • Replication effect size: Flynn and Bordieri 2020: Study 1: not reported [g = 0.53 calculated]; Study 2: not reported [g = 1.00 calculated].
</div>

Cognitive Psychology

  • Ego depletion. Self-control is a limited resource that can be depleted by efforts to inhibit a thought, emotion or behaviour.

  • Status: not replicated
  • Original paper: ‘Ego Depletion: Is the Active Self a Limited Resource?’, Baumeister 1998, n=67 [citations = 7141 (GS, September 2022)].
  • Critique: Xu et al. 2014, 4 conceptual replications with high-power to detect medium-large effects [citations = 7136 (GS, September 2022)]. Hagger 2016, 23 independent conceptual replications [citations = 1027 (GS, September 2022)]. Vohs et al. 2021, multisite project, n = 3,531 [citations = 63 (GS, September 2022)].
  • Original effect size: not reported (calculated d = -1.96 between control and worst condition).
  • Replication effect size: Xu et al. 2014: hand grip persistence, community adults d = −0.30, young adults _d _=  −0.002, combined difference d = −0.20; Stroop interference, community adults d = −0.15, young adults d = .21, combined difference d = −0.06. Hagger 2016: d = 0.04 [−0.07, 0.14] (NB: not testing the construct the same way). Vohs et al. 2021:d = 0.06.
</div>
  • Dunning-Kruger effect. A cognitive bias whereby people with limited knowledge or competence in a given intellectual or social domain greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general.

  • Status: replicated
  • Original paper: ‘Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments’, Dunning & Kruger 1999. This contains claims (1), (2), and (5) but no hint of (3) or (4) [n=334 undergrads, citations = 8376 (GS, September, 2022)].
  • Critiques: Gignac 2020, [n=929,citations = 53 (GS, September, 2022)]; Nuhfer 2016 and Nuhfer 2017, [n=1154, citations = 34 (GS, September, 2022)]; Luu 2015; Greenberg 2018, n=534; Yarkoni 2010, Jansen 2021 [2 studies, n=2000 each study, citations= 26 (GS, October2022)], Muller 2020 [n= 56, citations= 20 (GS, October 2022)]
  • Original effect size: not reported. Study 1 on humor (n= 15): difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 2.58 [calculated], while for “competent” (top quartile) participants d= -0.55 [calculated]. Study 2 on logical reasoning ( n= 45): difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 5.44 (percieved logical reasoning ability) [calculated], d= 3.48 (test performance) [calculated], while for “competent” (top quartile) participants d= -1.12 [calculated], d= -0.79 (percieved test performance) [calculated]. Study 3 on grammar (n= 84): difference between the actual and estimated performance of “incompetent” (percieved bottom quartile) participants d= 3.42 (percieved ability) [calculated], d= 3.94 (percieved test performance) [calculated], while for “competent” (top quartile) participants d= -1.18 (percieved ability) [calculated], d= -1.27 (perceived test performance) [calculated].
  • Replication effect size: Gignac 2020 (for IQ): when using statistical analysis as in Dunning & Kruger 1999 η2 = 0.20, but running two less-confounded tests, r= −0.05/d= -0.1 [calculated] between P and errors , and r= 0.02/d= 0.04 [calculated] for a quadratic relationship between self-described performance and actual performance. Jansen 2021 (for grammar and logical reasoning): not reported (Bayesian models support the existence of the effect in the data and replicate claim 1). Muller 2020 (for recognition memory): the difference between the actual and estimated performance of “incompetent” (bottom quartile) participants d= 4.73 [calculated], while for “competent” (top quartile) participants d= -0.88 [calculated].
</div>
  • Depressive realism effect. Increased predictive accuracy or decreased cognitive bias among the clinically depressed.

  • Status: reversed
  • Original paper: ‘Judgment of contingency in depressed and nondepressed students: sadder but wiser?’, Alloy and Abramson (1979): 4 experiments with Study 1: n1 = 48, n2 = 48, Study 2: n1 = 32, n2 = 32; Study 3: n1 = 32, n2 = 32; Study 4: n1 = 32, n2 = 32 (citations = 2855 (GS, June 2022)].
  • Critiques: Moore and Fresco 2012 [meta analysis, n = 7305, citations = 311 (GS, June 2022)]
  • Original effect size: not reported. [d= -0.32 calculated for bias about ‘contingency’, how much the outcome actually depends on what you do]
  • Replication effect size: Moore and Fresco 2012: d = -0.07.
</div>
  • Hungry judge effect, of massively reduced acquittals just before lunch. Case order isn’t independent of acquittal probability (“unrepresented prisoners usually go last and are less likely to be granted parole”); favourable cases may take predictably longer and so are pushed until after recess; effect size is implausible on priors; explanation involved ego depletion.

  • Status: NA
  • Original paper: ‘Extraneous factors in judicial decisions’, 2011 [n= 8 judges, 1122 judicial rulings, citations = 1626 (GS, October, 2022)].
  • Critiques: Weinshall-Margel 2011 [n= 227 decisions, citations= 79 (GS, October, 2022)], Glöckner 2016, Lakens 2017.
  • Original effect size: d= 1.96, “the probability of a favorable ruling steadily declines from ≈0.65 to [0.05] and jumps back up to ≈0.65 after a break for a meal”, n=8 judges with n=1122 cases.
  • Replication effect size: N/A.
</div>
  • “Far transfer”, transfer of knowledge and skills from daily computer training games to fluid intelligence in general, in particular from the Dual n-Back game.

  • Status: mixed
  • Original paper: ‘Improving fluid intelligence with training on working memory’, Jaeggi 2008 [n=70, citations= 2840 (GS, October 2022)].
  • Critiques: Melby-Lervåg 2013 [meta-analysis of 23 studies, citations= 2156 (GD, October 2022)], Gwern 2012 [meta-analysis of 45 studies], Reddick 2013 [n= 73, citations= 824 (GS, October 2022)], Lampit 2014 [meta-analysis of 52 studies, n= 4885, citations= 809 (GS, October 2022)], Berger 2020 [n= 572, citations= 22 (GS, October 2022)], Simons 2016 [comprehensive review of literature, citations= 1015 (GS, October 2022)].
  • Original effect size: d= 0.4 over control, 1-2 days after training
  • Replication effect size: Melby-Lervåg 2013: d= 0.19 [0.03, 0.37] nonverbal; d= 0.13 [-0.09, 0.34] verbal. Gwern 2012: d= 0.1397 [-0.0292, 0.3085], among studies using active controls. Reddick 2013: found “no positive transfer to any of the cognitive ability tests”, all n2p < 0.054. Lampit 2014 (meta-analysis on studies in eldery adults): _g _=  0.24 (95% CI 0.09 to 0.38) nonverbal memory; _g _=  0.08 (95% CI 0.01 to 0.15) verbal memory; g =  0.22 (95% CI 0.09 to 0.35) working memory; g = 0.31 (95% CI 0.11 to 0.50) processing speed; g =  0.30 (95% CI 0.07 to 0.54) visuospatial skills. Berger 2020 (RCT in 6-7 year olds): d= 0.2 to 0.4, but many of the apparent far-transfer effects come only 6-12 months later, i.e. well past the end of most prior studies.
</div>
  • Music lessons improve intelligence. An original experimental study found an increase in IQ for children who received a year of music lessons, compared to children who were randomly assigned to drama lessons or no lessons.

  • Status: not replicated
  • Original paper: ‘Music lessons enhance IQ’, [Schellenberg, 2004](https://doi.org/10.1111/j.0956-7976.2004.00711.x Article information); randomised control trial, n=144. [citations = 1424, GS, Dec 2021)]​.
  • Critiques: Mehr et al., 2013 [Study 1 n=29, Study 2 n=55, citations=52 (GS, December 2021)]. D’Souza & Wiseheart, 2018 [n=75, citations=20 (GS, December 2021)].
  • Original effect size: d= 1.948.
  • Replication effect size: Mehr et al., 2013: Wilks' λ = .851/η2p= 0.077 [calculated]. D’Souza & Wiseheart: for task switching: _Bayes Factor (BF) inclusion _= 1.964 (weak evidence); for .processing speed BF inclusion= 0.757 (box completion task), 0.243 (symbol copy task), 0.213 (symbol coding task) (weak evidence); for working memory: BF inclusion= 0.216 (digit span forward task), 0.138 (the digit span backward task), 0.004 (self-ordered pointing task) (weak evidence); for inference control: BF inclusion= 0.137 (flanker task), 0.007 (Stroop task) (weak evidence); for nonverbal intelligence: BF inclusion= 0.778 (Peabody Picture Vocabulary Test) (weak evidence).
</div>
  • Bilingual advantages in executive control. The popular hypothesis was that speaking two languages also improves general cognitive control processes (executive control). However, this was challenged by a growing body of systematic studies, which showed no bilingual advantage across different executive control tasks, or even a small bilingual disadvantage. The lack of an effect was even found in exact replications of the original tasks, especially as sample size increases (Paap et al., 2015), and after accounting for the main moderators proposed by the bilingualism literature (Gunnerud et al., 2020).+

</div>
  • Mozart effect. Listening to Mozart’s sonata for two pianos in D major (KV 448) enhances performance on spatial tasks in standardized tests.

  • Status: not replicated
  • Original paper: ‘Music and spatial task performance’, Rauscher, Shaw, and Ky (1993) with n=36. [citations= 2110 (GS, November 2021)].
  • Critiques: Steele et al. (1999a) [n=86, citations=555 (GS, November 2021)], Steele et al. (1999) [n=206, citations=126 (GS, November 2021)], Meta-analysis: Pietschnig et al. (2010) [meta analysis: 39 studies, citations= 235 (GS, November 2021)]
  • The effect sizes are calculated in Pietschnig et al. (2010):
  • Original effect size: d= 1.5 [0.65, 2.35]
  • Replication effect size: Adlmann (2006): d = 0.57 [0.25 0.89]; Carstens (1998) Study 1: d = -0.22 [-0.89 0.45]; Carstens (1998) Study 2: d = 0.47 [-0.23 1.17]; Cooper (2004): d = 0.42 [-0.23 1.08]; Flohr (1995) Study 1: d = 0.14[-0.35 0.63]; Flohr (1995) Study 2: d = 0.16[-0.26 0.58]; Gileta (2003) Study 1: d =0.13 [-0.26 0.51] ; Gileta (2003) Study 2: d = -0.05[-0.43 0.34]; Ivanov (2003): d = 0.77 [0.20 1.34]; Jones (2006): d = 0.92 [0.27 1.56]; Jones (2007): d = 0.54[0.11 0.97]; Kenealy (1994): d = -0.22 [-1.08 0.64]; Knell (2006): d = 0.45 [0.13 0.77]; Lints (2003): d = -0.37 [0.75 0.02] McClure (2004): d = 0.46 [-0.02 0.95]; Nantals (1999) Study 1: d_ _= 0.77 [-0.07 1.61]; Nantals (1999) Study 2: _d_ = 0.06 [-0.72 0.84]; Rauscher and Hayes (1999): _d_ = 0.52 [0.18 0.86]; Rauscher and Ribar (1999) Study 1: _d_ = 1.81[1.24 2.37]; Rauscher and Ribar (1999) Study 2: _d_ = 0.93[0.46 1.39]; Rideout (1996): _d_ = 1.54 [-0.67 3.75]; Rideout (1997): _d_ = 1.01 [0.19 1.82]; Rideout (1998a): _d_ =1.01 [-0.21 2.23]; Rideout (1998b): _d_ = 0.28 [-1.04 1.60]; Siegel (1999): _d_ = 0.26 [-0.39 0.91]; Spitzer (2003): _d_ = 0.01 [-0.32 0.33]; Steele et al.: _d _= 0.85 [0.41 1.30]; Steele, Dalla Bella, et al. (1999a) Study 1: _d_ = 0.49 [-0.01 1.00]; Steele, Dalla Bella, et al. (1999a) Study 2: _d_ = -0.41 [1.15 0.33]; Steele, Dalla Bella, et al. (1999b): _d_ = 0.85 [0.41 1.30]; Steele, Brown and Stoecker (1999): d=0.20 [-.08 0.48; Sweeny (2006) Study 1: d = -0.43 [-0.93 0.07]; Sweeny (2006) Study 2: d = -0.06 [-0.56 0.42]; Sweeny (2006) Study 3: d = 0.14 [-0.37 0.65]; Twomey (2002): d = 0.63 [-0.01 1.27]; Wells (1995): d = -0.18 [-0.83 0.47]; Wilson (1997): d =0.85 [-0.44 2.13]; Pietschnig et al.: meta-analytic estimate: d = 0.37 [0.23, 0.52]
</div>
  • Congruence Sequence effect

</div>
  • Action-sentence Compatibility Effect (ACE). Participants’ movements are faster when the direction of the described action (e.g., Mark dealt the cards to you) matches the response direction (e.g., toward).

  • Status: not replicated
  • Original paper: “Grounding language in action”, Glenberg & Kaschak (2002); Experiment 1: n= 44, Experiment 2A: n= 70, Experiment 2B: n= 72 [citations= 2870 (GS, October, 2022)].
  • Critiques: Morey et al. (2022) [pre-registered multi-lab replication, 18 labs, n= 1278, citations= 30 (GS, October 2022)].
  • Original effect size: Experiment 1: η2p= 0.186 [calculated]. Experiment 2A: η2p = 0.051 [calculated].
  • Replication effect size: Morey et al. (2022): for native English speakers d= 0.0036; for non-native English speakers d = -0.019.
</div>
  • The attentional spatial-numerical association of response codes (Att-SNARC) effect is the finding that participants had quicker detects to left-side targets preceded by small numbers and to the right-side targets preceded by large numbers. This finding triggered many assumptions about the number representations grounded in body experience.

</div>
  • Scarcity effect - Attention. Having too little resources leads individuals to misallocate attention, leading to consequences such as overborrowing. Study 1 examined whether scarcity causes greater cognitive fatigue, measured by poorer performance on a cognitive ability task.​

  • Status: mixed
  • Original paper: ‘Some consequences of having too little’, Shah et al. (2012); 5 experiments with Study 1: n=60; Study 2: n=68; Study 3: n=143; Study 4: n=118; Study 5: n=137 [citations=1403 (GS, April 2022)].
  • Critiques: Camerer et al. (2018) [n=619, citations=855(GS, November 2021)]; O’Donnell et al. (2021) [n=668, citations=0(GS, November 2021)]; Shah et al. (2019) [n=997, citations=19(GS, November 2021)]
  • Original effect size: r = .267
  • Replication effect size: Camerer et al.: r = -.015; O’Donnell et al.: r= -.039; Shah et al.: η2 = .004.
</div>
  • Scarcity effect - Meaning in life. Threats to people’s sense that they can afford things that they need in the present and foreseeable future, undermines perceptions of meaning in life.​

</div>
  • Scarcity effect - Discounting. A negative income shock was associated with increased discounting rates for gains and loses.​

</div>
  • Scarcity effect - Physical pain. The higher the economic insecurity is associated with the higher the physical pain.

</div>
  • Scarcity effect - Self expansion. Lower self-concept clarity (conceptualized as a finite resource) is associated with lower self-expansion.​

</div>
  • Scarcity effect - Wellbeing. Imagining having less time available in one’s current city is positively associated with well-being.​

</div>
  • Scarcity effect - Decision making. Lacking time or money can lead to making worse decisions.​

</div>
  • Scarcity effect - Opportunity costs. Poor people are more likely to consider opportunity costs spontaneously.

</div>
  • Scarcity effect - Conscious thoughts. Thoughts triggered by financial concerns intrude more often into consciousness of poorer individuals than for wealthier individuals.​

</div>
  • Scarcity effect - Absoluteness of losses. Poorer individuals view losses in more absolute, rather than relative, terms than do wealthier individuals.​

  • Status: not replicated
  • Original paper: ‘Scarcity frames value’, Shah et al. (2015) with n=73. [citation=315(GS, November 2021)]​.
  • Critiques: O’Donnell et al. 2021 [n=209, citations=0(GS, November 2021)]
  • Original effect size: r = .264
  • Replication effect size: r = .090
</div>
  • Bottomless soup bowl. Visual cues related to portion size increase intake volume of soup.

</div>
  • Simon effect. Faster responses are observed when the stimulus and response are on the same side than when the stimulus and response are on opposite sides.

  • Status: mixed
  • Original paper: ‘Choice reaction time as a function of angular stimulus-response correspondence and age’, Simon and Wolf 1963; with, n1 = 20, n2 = 20. [citation=289(GS, June 2022)]​.
  • Critiques: Ehrenstein 1994 [n1=12, n2=14; citations=27(GS, June 2022)] ​ Marble & Proctor 2000 [n1=48; n2=20; n3=32, n4=80; citations=89(GS, June 2022)]; Proctor et al. 2000 [n1=64, n2=64; citations=74(GS, June 2022)]; Theeuwes et al. 2014 [n1=30, n2=30, n3=30, n4=30; citations=30(GS, June 2022)].
  • Original effect size: not reported but could be calculated.
  • Replication effect size: Ehrenstein: not reported but could be calculated; Marble and Proctor: not reported but could be calculated; Proctor et al.: not reported but could be calculated; Theeuwes et al.: ​ηp²(the compatible S-R instructions condition vs. the incompatible S-R instructions condition)=.12; ηp²(the compatible S-R instructions condition vs. the incompatible practiced S-R instructions condition)=.07; ηp²(the incompatible S-R instructions condition vs. the compatible S-R instructions condition)=.21; ηp²(e incompatible practiced S-R instructions condition vs. the compatible S-R instructions condition)=.11.
</div>
  • ERPs in lie detection. Particularly the P300 ERP component has been related in literature using Guilty Knowledge Tests to conscious recognition of crime-related targets as meaningful and salient stimuli, based on crime-related episodic memories.

  • Status: mixed
  • Original paper: Late Vertex Positivity in Event-Related Potentials as a Guilty Knowledge Indicator: A New Method of Lie Detection’, Rosenfeld et al. (1987), with _n_1=10, _n_2=6. [citation=126(GS, May 2022)]​.
  • Critiques: Abootalebi et al., 2006 [n=62, citations=159(GS, May 2022)]; Bergström et al. (2013) [_n_1=24, _n_2=24; citations=61(GS, May 2022)]. Mertens & Allen, 2008 [n=79, citations=187(GS, May 2022)]; Rosenfeld et al., 2004 [n-ex1=33; n-ex2.1=12, n-ex2.2=10; citations=419(GS, May 2022)]; Wang et al., (2016) [n=28, citations=61(GS, May 2022)]
  • Original effect size: N/A
  • Replication effect size: Abootalebi et al.: not reported but could be calculated; Bergström et al.: d=2.89 (effort in uncooperative recall suppression); d=2.28 (success in uncooperative recall suppression); partial η2 = 0.20 (experiment 1 - voluntary modulations of P300); partial η2 = 0.31 (experiment 2 - voluntary modulations of P300); d = 0.48 and d = 0.31 (experiment 1 - cooperative phase); d =0.03 ( experiment 1 - uncooperative phase); d = 0.14 (experiment 1 - innocent phase); d = 0.77 (experiment 1: targets vs. probes - innocent phase); d = 0.71 (experiment 1: targets vs. probes - uncooperative phase); d = 1.03 and d = 0.48 (experiment 2 - cooperative phase); d = 0.48 and d = 0.99 (experiment 2 - uncooperative phase); d = 1.81 (experiment 2 - innocent phase); d = 0.50 (experiment 1: cooperative vs. uncooperative); d = 0.52 (experiment 2: cooperative vs. uncooperative); d = 0.07 ( experiment 1: uncooperative vs. innocent); d = 0.57 (experiment 2: uncooperative vs. innocent); d < 0.17 (targets vs. irrelevants for experiment 1 and 2); Mertens and Allen: not reported but could be calculated; Rosenfeld et al.: not reported but could be calculated; Wang et al.: not reported but could be calculated.
</div>
  • Bilingual deficit in lexical retrieval. Compared to monolinguals, bilinguals have often been found to be slower or less accurate in accessing the meaning of a certain word or the word for a certain representation under certain conditions.

  • Status: mixed
  • Original paper: ‘Memory in a monolingual mode: When are bilinguals at a disadvantage?’, Ransdell and Fischler, 1987; between-group multi-experiment study, with monolingual and bilingual young adults, n1 = 28, n2 = 28. [citations=216(GS, May 2022)]​.
  • Critiques: Bialystok et al. 2007 [study 1: n1=24, n2 = 24; study 2: n1 = 50, n2 = 16, citations=338(GS, May 2022)]; Gollan et al. 2002 [n1=30, n2=30; citations=584(GS, May 2022)]; Gollan et al. 2005 [study 1: n1=31, n2=31; study 2: n1=36, n2=36; citations=665(GS, May 2022)]; Rosselli et al. 2000 [n1=45, n2=18, n3=19; citations=341(GS, May 2022)]. Rosselli et al. 2002 [n= 45, n2=18, n3=19; citations=151(GS, May 2022)].
  • Original effect size: not reported but could be calculated.
  • Replication effect size: Bialystok et al.: not reported but could be calculated Rosselli et al. 2000: not reported but could be calculated; Rosselli et al. 2002: not reported but could be calculated; Gollan et al. 2002: not reported but could be calculated; ​Gollan et al. 2005: not reported but could be calculated. ​
</div>
​ * **Mere Exposure Effect**, the mere exposure effect refers to the finding that participants who are repeatedly exposed to the same stimuli rate them more positively than stimuli that have not been presented before.

  • Status: replicated
  • Original paper: Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(2), 1–27 [citation=9458(GS, February 2022)]​.
  • Critiques: Bornstein (1989). Meta-Analysis, total N = 33047 [citation=2944(GS, February 2022)]
  • Original effect size: Experiment 1, Nonsense words [F(5,355) = 5.64, p < .001], Experiment 2, Chinese characters [F(5, 335) = 4.72, p < .001], Experiment 3, Photographs [F(5, 355) = 9.96, p < .001]
  • Replication effect size: Combined effect size r = .260 (Bornstein, 1989)
</div>
  • Cocktail Party Effect, the cocktail party effect refers to the finding that approximately one third of participants hear their own name being presented in the irrelevant message during a dichotic listening task. Sometimes the impression is given that all participants demonstrate the effect. This is mentioned for example by Conway, Cowan & Bunting (2001): “Contrary to popular belief, not all subjects demonstrate this cocktail party effect.”. However, both in the original study and in the replications, less than half of the participants reported hearing their own name (29-43 percent).

  • Status: replicated
  • Original paper: Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. The Quarterly Journal of Experimental Psychology, 11, 56-60. [citation=1972 (GS, February 2022)]​.
  • Critiques: Wood & Cowan (1995) Replication [citation=467 (GS, February 2022)]; Conway et al. (2001) Replication [citation=1195 (GS, February 2022)]; Röer & Cowan (2021) Preregistered Replication [citation=3 (GS, February 2022)]
  • Original effect size: No effect size is given, only the detection rate: 33 percent
  • Replication effect size: Wood & Cowan (1995) 35 percent, Conway et al. (2001) 43 percent, Röer & Cowan (2021) 29 percent.
</div>

Developmental Psychology

  • Growth mindset (thinking that skill is improvable) on attainment.

</div>
  • Neonate imitation. Babies are born with the ability to imitate.

  • Status: NA
  • Original paper: ‘Imitation of facial and manual gestures by human neonates’, Meltzoff and Moore, 1977; 2 studies with: Study 1: n=6, Study 2: n=12.. [citation=5311 (GS, December 2021])​.
  • Critiques: Oostenbroek et al., 2016 [n=106, citations=259 (GS, December 2021)].
  • Original effect size: Not reported​.
  • Replication effect size: NA.
</div>
  • Violent media content on aggression, Violence content in media can affect people to be more aggressive. Notably, the studies of this effect differ by media (TV, games, etc.) and whether long, medium, or short-term effects have been investigated. The variety of methods/tests further complicates the literature. Distinct media types are marked for each reference below.

</div>

Differential psychology

  • 2D:4D ratio of the fingers and its correlation with psychological traits. This ratio was used as a predictor for different interindividual (e.g., intelligence) and especially gender differences.

</div>
  • Personality traits and consequential life outcomes, Personality traits (i.e., characteristic patterns of thinking, feeling, or behaving that tends to be consistent over time and across relevant situations), particularly the Big Five factors, are linked with consequential individual, interpersonal, and social-institutional outcomes.

</div>

Judgment and Decision Making

  • Nudges. Choice architecture interventions that promote beneficial decisions.

  • Status: mixed
  • Original paper: Nudge: Improving Decisions about Health, Wealth, and
  • Happiness’, Thaler & Sunstein, 2008, Book [citations=23376 (Google Scholar, October 2022].
  • Critiques: Mertens et al. (2021)[citations=55 (Google Scholar, October 2022] conducted a meta-analysis on nudges and found medium effect size across all types of nudges. They conducted several publication bias tests, the most severe indicated a very small but significant effect size. Maier et al. (2022) [citations=15 (Google Scholar, October 2022] re-analysed the data reviewed by Mertens et al. (2021) and found no nudging effect after adjusting for publication bias.
  • Original effect size: No effect sizes were provided in the original book.
  • Replication effect size: Mertens et al. 2021: _d _= 0.37 to _d _= 0.46. Maier et al., 2022: _d _= 0.00 to _d _= 0.14.
</div>
  • Risky Choice Framing Effect (term used by Levin et al., 1998), alt-term = framing effect in risky-decision making. Under loss-frame, people are risk-seeking, whereas under gain-frame, people are risk-averse. In framing studies, logically equivalent choice situations are differently described and the resulting preferences are studied (Kühberger, 1998). In risky choice problems, the way a choice is presented influences the decision. (e.g. saving 10 people out of 100 vs losing 90 people out of 100).

  • Status: replicated
  • Original paper: ‘The framing of decisions and the psychology of choice’, Tversky & Kahnemann, 1981; experimental design, P1: 152; P2: 155; P3: 150; P4: 86; P5: 77; P6: 85; P7: 81; P8: 183; P9: 200; P10.1: 93; P10.2: 88; Total = 1350* (unclear if those samples are different samples) [citations = 24617 (GS, October 2022)]​.
  • Critiques: Meta-analysis: Kühberger, 1998. [Total studies reviewed=136, citations=1554 (GS, June 2022)] The author finds that certain characteristics of framing studies are crucial to getting a consistent framing effect, but that the closer a methodology is to the original methodology, the better chance to replicate the original effect. Large scale replication in Klein et al., 2014 [Total replication studies = 36, citations=1082 (GS, June 2022)]
  • Original effect size: Kahneman and Tversky (1982): d = 1.13, 95% CI [0.89, 1.37] (based on Klein et al., 2014 calculation) Meta-analytical effect size (many close and conceptual replications): Steiger & Kühberger (2018): d = 0.52 to 0.56.
  • Replication effect size: Kühberger, 1998: d = .308.; Revised in Steiger & Kühberger, 2018 to d = .522 with only 81 of the 136 studies; Klein et al., 2014 : d=.60 (95% CI 0.53-0.67); Steiger & Kühberger, 2018 : d=.56.
</div>
  • Risk and Goal Message Framing. a) For illness detection behaviors, loss framing (presenting information of negative consequences with undesirable behaviors / without desirable behaviors) would be more effective than gain framing (presenting information of benefits through engaging in desirable behaviors) in encouraging healthy attitudes, intentions, and behaviors (perhaps because illness detection behaviors are riskier, Rothman & Salovey, 1997), whereas b) for health-affirming behaviors, gain framing would be more effective than loss framing in motivating healthy attitudes, intentions, behaviors (perhaps because health-affirming behaviors are less risky, Rothman & Salovey, 1997).

  • Status: Mixed, depending on operationalizations, DVs, and method (meta-analysis vs empirical study). The conceptual replication failed to provide support for the interaction, but this may be due to limited power.
  • Original paper: Rothman et al. (1999), between-subject design, sample size: 120 (Study 2) [citations=548(GS, October 2022)]​.
  • Critiques: van Riet et al. (2016) criticized reasoning of applying Kahneman and Tversky (1981) Prospect Theory (which was more suitable and applicable for risky choice framing) to goal message framing. Van Riet et al. (2016) also reviewed direct empirical and meta-analytical evidence, and it appears the evidence of risk-framing hypothesis in message framing is not conclusive.Original effect size: Rothman et al. (1999): partial eta squared=0.03, [90% CI [0.00, 0.10], to partial eta squared=0.06, 90% CI [0.01, 0.14].
  • Replication effect size: Cox et al. (2006): author: partial eta squared =0.03, 90% CI [0.00, 0.12], non-significant, but may be due to limited power.
</div>
  • Psychophysical numbing. People prefer to save lives if they are a higher proportion of the total (e.g. people prefer to save 4,500 lives out of 11,000 or 4,500 lives out of 250,000?).

  • Status: mixed (Study 2 was successfully replicated but Study 1 was a replication failure)
  • Original paper: Insensitivity to the value of human life: A study of psychophysical numbing. Fetherstonhaugh et al. 1997; 3 studies, 2 of which are split into Part A and Part B with n’s = 1: 54; 196 ; 2: 162; Experiment 3: 165 [citations = 468 (GS, December 2021)].
  • Critique: Ziano et al. 2021 [n=4799, citations = 0 (GS, December 2021)]
  • Original Study 1 effect size: η2p= 0.14
  • Replication effect size: Study 1a: η2p= 0.06, Study 1b: η2p= 0.21; Study 1c: η2p= 0.13, all were reversals.
</div>
  • Loss aversion. The subjective value of losses exceeds the subjective values of gains.

  • Original paper:
  • Critiques:
    Meta-analyses: Nieuwenstein et al., 2020 [total n = 399]; Walasek et al. 2018 [19 studies, citations=11, Dec 2021], Brown et al. 2021 [607 estimates from 150 articles, citations=10, Dec 2021]
  • Original effect size:
  • Replication effect size:
    Walasek et al. 2018: λ = 1.31, 95% CI [1.10, 1.53]
    Brown et al. 2021: λ = 1.955, 95% credible interval [1.824, 2.104]
  • Loss aversion is still mostly replicable but with weaker effects for some people and in some situations (see Mrkva et al., 2020).
</div>
  • Unconscious Thought Advantage, or “deliberation-without-attention”, the idea that for complex choices (with more features to take into account), not deliberating leads to better decisions (as defined by the research team, i.e., normatively).​

  • Status: not replicated
  • Original paper: On making the right choice, Dijkterhuis, 2006; two experiments and two quasi-experiments, n = [80, 59, 93, 115]. [citations = 605, Web of Knowledge, 10/2021]
  • Critiques: Nieuwenstein & van Rijn (2012) [n = [48, 24, 32, 24], citations = 12, Web of Knowledge, 10/2021]; Nieuwenstein et al. (2015) [meta-analysis, 61 studies, n = [40-399]; replication study, n = 423; citations = 49, Web of Knowledge, 10/2021]; see also González-Vallejo et al. (2008) for a theoretical critique [citations = 51, Web of Knowledge, 10/2021]
  • Original effect size: _g _= [.86, .70] [as per Nieuwenstein et al., 2015].
  • Replication effect size: Nieuwenstein & van Rijn, g = [0.10, -0.55, 0.87, -0.74]; Nieuwenstein et al., g= -0.01, after trim-and-fill, meta-analysis pooled Hedges’ g = 0.018 with CI = [−0.10; 0.14]. ​
</div>
  • Self-interest is Overestimated: how much do personal benefits affect policy preferences and behaviors?

</div>
  • Above- and Below Average Effect. Above-and-below-average effects arise when comparing oneself to others, whereby people rate themselves as above average for easy abilities and below average for difficult abilities. All standardized beta-values from multiple regression predicting judgmental weight of own and others’ abilities from mean comparative ability estimates in different ability domains were consistent with the original. Additionally, different statistical tests resulted in slightly smaller effects in the same direction of the original.

  • Status: replicated
  • Original paper: Kruger (1999); 3 experiments with n=37. [citations = 1190 (GS, February 2022)]
  • Critiques: Korbmacher, Kwan & Feldman (2022) [citations = 0 (GS, February 2022)]. Review: Sundström (2008) [citations = 138 (GS, February 2022). Meta-analysis: Zell et al. (2020) [focuses only on above average effect, citations = 84 (GS, February 2022)]
  • Original effect sizes:
    • Correlations: Own ability & comparative ability _r = .95, Domain difficulty and comparative ability r= -.96 _
    • One sample t-tests: Easy domains d = 0.90, Difficult domains d = -1.44
  • Replication effect sizes:
    Zell et al. (2020): dz = 0.78, 95% CI [0.71, 0.84]
    Korbmacher et al. (2022):
    • Correlations: Own ability & comparative ability r = .99, Domain difficulty and comparative ability r= -.85
    • One sample t-tests: Easy domains: from d = 0.54 to d = 1.18, Difficult domains: from d =0.11 (non-sig) to d = -0.65).
</div>
  • Accuracy of information (truth discernment), asking people to think about the accuracy of a single headline improves “truth discernment” of intentions to share news headlines about COVID-19.​

  • Status: mixed
  • Original paper: ‘Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention’, Pennycook et al. (2021); 2 studies with n’s = 853;856 [citations=887(GS, March 2022)]​.
  • Critiques: Roozenbeek et al. (2021) [n=1583, citations=22(GS, March 2022)].
  • Original effect size: Study 1: d = 0.657, 95% confidence interval (CI) = [0.477, 0.836] on accuracy judgment; d = 0.121, 95% CI = [0.030, 0.212] on sharing intention; Study 2: control condition: d = 0.050, 95% CI = [−0.033, 0.133]; treatment condition: d = 0.142, 95% CI = [0.049, 0.235].
  • Replication effect size: Roozenbeek et al. : Study 1: F = 1.53; Study 2: treatment: d = −0.14, 95% CI = [−0.17, −0.12], control: d = −0.10, 95% CI = [−0.13, −0.078].
</div>

Marketing

  • Choice overload, the idea that giving people too many choices can lead to certain undesirable consequences such as reduced purchasing intentions, is in doubt, but most people don’t consider it to be discredited.

  • Status: Mixed (dueling meta-analyses, mix of successful and failed replications). It would probably require a systematic, multi-lab replication approach to sort this out at this point.
  • Original study: When choice is demotivating, Iyengar & Lepper, 2000. In the original field experiment with exotic jams where # flavors were manipulated (24 vs. 6), more stopped to browse the larger selection (60% vs 40%), but more purchased from the smaller selection (30% (31) vs. 3% (4)) (Iyengar and Lepper 2000). 3 experiments with n=249,193,. [citations=4897(GS,October 2021)].
  • Failed replications: Scheibehenne (2008) failed to directly replicate Iyengar and Lepper (2000) jam study. Greifeneder (2008) did a lab experiment with chocolates and also failed to conceptually replicate. These replication failures are not definitive because there have been many studies (too many to list) in which the effect was (conceptually) replicated in some fashion.
  • Meta-analyses:
    • Scheibehenne et al. 2010: “We found a mean effect size of virtually zero” (d=.02) [citations=1049(GS,October 2021)].
    • Chernev et al. 2010: That’s because many of the studies were designed to show instances when there is no effect. You need to split the data into “choice is good” vs. “choice is bad.”
    • Simonsohn et al. 2014: We agree with Chernev et al. (2010). When we split it up, we found that the choice is bad studies (choice overload) lack collective evidential value (uniform p-curve).
    • Chernev et al. 2015: <ignoring Simonsohn et al. 2014> Choice overload is a reliable effect under certain conditions (moderators).
  • Original effect size: d=0.77 (study1) and d=0.29 (study2), and d=0.88 (study3) (as calculated from the X^2 values in the text with this online calculator)
</div>
  • Mate guarding, the idea that women use conspicuous luxurious goods to deter female rivals by signaling to other women they have a devoted partner.

  • Status: reversed
  • Original paper: Conspicuous Consumption, Relationships, and Rivals: Women’s Luxury Products as Signals to Other Women, Wang & Griskevicius 2014; 5 studies (Study 1: N=69; shows that a women was perceived by other women as having a more devoted partner when she had a designer brand outfit accessory vs a non-designer bran accessory. Study 2: N=137; women in the mate guarding condition are asked to imagine they are at a party with their date and another woman is flirting with their date. The activation of a mate guarding motive increases women’s desire for conspicuous consumption. Study 3: N=115; replicates study 2 and shows that a mate guarding motive only increases desire for conspicuous goods. Study 4: N=75; the activation of a mate guarding motive increases women’s spending on luxurious accessories, but only when these accessories are visible to other women. Study 5: N=175; shows that displaying luxurious goods dissuades other women from pursuing a relationship with a taken man. [citation=450 (Google Scholar, January 2022)]​.
  • Critiques: Tunka & Yanar (2020) [conceptual (Study 1, N= 250) and direct replications (Study 2, N=255) of study 1 of Wang & Griskevicius, citations=2 (GS, January 2022)]. Study 1 did not replicate the original findings that women with luxurious goods are perceived by other women as having devoted partners. In study 2, a reversal is observed, such that women with non-designer possessions were perceived to have a more devoted partner than women with designer possessions.
  • Original effect size: d=0.24
  • Replication effect size: Study 1 (d =0.13); Study 2 (d=-0.27).
</div>
  • Super size me. Larger food options are associated with higher status.

  • Status: Not replicated.
  • Original paper: Super Size Me: Product Size as a Signal of Status, Dubois et al. (2012); 6 studies with n’s = 183; 142; 89; 269; 134; 104 [citations = 325 (GS, January 2022)].
  • Critiques: Tunca et al. (2021) [Preprint] [direct replication of study 1 of Dubois et al. (2012); N= 415, citations=1 (GS, January 2022)].
  • Original effect size: Study 1: Large vs. small product size: d=1.10; large vs. medium product size: d=0.65; medium vs. small: d=0.46.
  • Replication effect size: Large vs. small product size: d=-0.1; 95%CI [-0.15, 0.33]; large vs. medium: d=-0.11 95%CI [-0.13, 0.34]; medium vs. small: d=-0.01 95%CI[-0.25, 0.23]).
</div>
  • Scarcity effect - Overborrowing. Perceived financial scarcity causes consumers to overborrow.

</div>
  • Scarcity effect - Resource allocation. Poor economic conditions favour resource allocations to daughters over sons.

</div>
  • Scarcity effect - Planning. Consumers who feel resource constrained shift to engage in relatively more priority planning, rather than efficiency planning.

</div>
  • Scarcity effect - Competition/threat. Exposure to limited-quantity promotion advertising prompts consumers to perceive other shoppers as competitive threats.

</div>
  • Scarcity effect - Brand attitudes. Observing luxury brand consumers whose consumption arises from unearned financial resources reduces observers’ brand attitudes when observers place a high value on fairness.​

</div>
  • Scarcity effect - Product use creativity. Scarcity salience is associated with greater creativity.​

</div>
  • Scarcity effect - Wage rates. The difference in implied wage rates based on a time elicitation versus a money elicitation procedure is reduced as the time horizon increases.​

</div>
  • Scarcity effect - Selfishness. Reminders of scarcity causes selfish behaviour to a greater extent in people with low social value orientation.​

</div>
  • Scarcity effect - Preference for material goods. Scarcity leads to a preference for material goods over experiential goods.​

</div>
  • Scarcity effect - Preference polarization. Perceived scarcity leads to greater preference polarization and stronger preference for a preferred option over a less preferred option.​

</div>

Neuroscience (humans)

  • Existence of high-functioning (IQ ~ 100) hydrocephalic people. The hypothesis begins from extreme prior improbability; the effect of massive volume loss is claimed to be on average positive for cognition; the case studies are often questionable and involve little detailed study of the brains (e.g. 1970 scanners were not capable of the precision claimed).

  • Status: NA
  • Original paper: No paper; instead a documentary and a profile of the claimant, John Lorber. Also Forsdyke 2015 and the fraudulent de Oliveira 2012 ( citations).
  • Critiques: Hawks 2007; Neuroskeptic 2015; Gwern 2019
    (total citations: )
  • Alex Maier writes in with a cool 2007 case study of a man who got to 44 years old before anyone realized his severe hydrocephaly, through marriage and employment. IQ 75 (i.e. d=-1.7), which is higher than I expected, but still far short of the original claim, d=0.
</div>
  • Oxytocin on trust. Intranasal administration of oxytocin increases trust in strangers in a laboratory setting.

  • Status: not replicated
  • Original paper: Oxytocin increases trust in humans, Kosfeld et al. (2005); experiment, n = 128_. _[citations = 4800, April 2022]
  • Critiques: Declerck et al. (2020)[n = 677, citations =57, (GS, April 2022) ]. Lane et al. (2015) [n = 95, citations =63, (GS, April 2022) ];.
  • Original effect size: Not reported but could be calculated: “In fact, our data show that oxytocin increases investors' trust considerably. Out of the 29 subjects, 13 (45%) in the oxytocin group showed the maximal trust level, whereas only 6 of the 29 subjects (21%) in the placebo group showed maximal trust (Fig. 2a). In contrast, only 21% of the subjects in the oxytocin group had a trust level below 8 monetary units (MU), but 45% of the subjects in the control group showed such low levels of trust.” Kosfeld et al. (2005)
  • Replication effect size: not reported
</div>
  • Structural brain-behaviour correlations - the association between behavioural activation and white matter integrity. Individual differences in the sensitivity to signals of reward as indexed by BAS-Total and in the tendency to seek out potentially rewarding experiences as measured by BAS-Fun are positively correlated with diffusion measures of several white matter pathways.

  • Status: not replicated
  • Original paper: Xu et al. (2012) [n = 51, citations = 29 (GS, May 2022)]​.
  • Critiques: (https://doi.org/10.1016/j.cortex.2017.03.007)corrigendum of Boekel et al. (2015) [citations = 196 (GS, May, 2022)]; Keuken et al. (2017) [n = 34-35, citations = 1 (GS, May, 2022).
  • Original effect size:
    • BAS-Total correlation with parallel diffusivity in the left corona radiata (CR)/superior longitudinal fasciculus (SLF): r = .51.
    • BAS-Fun correlation with:
      • fractional anisotropy in the left CR/SLF: r = .52
      • parallel diffusivity in the left CR/SLF: r = .58
      • mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .51
  • Replication effect size: Keuken et al. (2017):
    • BAS-Total correlation with parallel diffusivity in the left CR/SLF: r = -.15
    • BAS-Fun correlation with:
      • fractional anisotropy in the left CR/SLF: r = -.15
      • parallel diffusivity in the left CR/SLF: r = -.04
      • mean diffusivity in the left SLF/inferior fronto-occipital fasciculus (IFOF): r = .05
</div>
  • Structural brain behaviour correlations - The association between social network size and grey matter volume. Individual differences in the number of Facebook friends (FBN) are positively correlated with grey matter volume in several brain areas: left middle temporal gyrus (MTG), right superior temporal sulcus (STS), rich entorhinal cortex (EC), left and right amygdala.

  • Status: mixed
  • Original paper: Kanai et al., (2012) [n = 125, citations: 411 (GS, May, 2022)].
  • Critiques: Boekel et al. (2015) [n = 34-35, citations = 196 (GS, May, 2022] ;Kanai et al., (2012) [n = 40, citations: 411 (GS, May, 2022)].
  • Original effect size: left MTG: r =.35; right STS: r = .35; right EC: r = .35, left amygdala: r = .30; right amygdala: r = .32.
  • Replication effect size:
  • Kanai et al., (2012): left MTG: r =.38; right STS: r = .44; right EC: r = .48; left amygdala: r = .33; right amygdala: r = .48.
  • Boekel et al. (2015): left MTG: r = .18; right STS: r = .11; right EC: r = .06; left amygdala: r = -.14; right amygdala: r = .02.
</div>
  • Structural brain-behaviour correlations - the association between distractibility and grey matter volume. Variability in self-reported distractibility is positively correlated with grey matter volume in the left superior parietal lobule (SPL) and negatively correlated with grey matter volume in medial pre-frontal cortex (mPFC).

  • Status: not replicated
  • Original paper: Kanai et al., (2011) [n = 155, citations: 110 (GS, May, 2022)].
  • Critiques: Boekel et al. (2015) [n = 36, citations = 196 (GS, May, 2022].
  • Original effect size: left SPL: r =.38; mPFC: r = -.28.
  • Replication effect size: Boekel et al. (2015): left SPL: r =.22; mPFC: r = -.19.
</div>
  • Structural brain-behaviour correlations - the association between attention and cortical thickness. Individual differences in executive control are negatively correlated with cortical thickness in left anterior cingulate cortex (ACC), left superior temporal gyrus (STG), and right middle temporal gyrus (MTG), whereas variation in alerting scores is negatively correlated with cortical thickness in the left superior parietal lobule (SPL).

  • Status: not replicated.
  • Original paper: Westlye et al., (2011) [n = 132; citations = 190 (GS, May 2022)]​.
  • Critiques: Boekel et al. (2015) [n = 35, citations = 196 (GS, May, 2022].
  • Original effect size:
    • Executive control scores and cortical thickness in left ACC: r = -.21; left STG: r = -.15; right MTG: r = -.13.
    • Alerting scores and cortical thickness in left SPL: r = -.26
  • Replication effect size: Boekel et al. (2015):
    • Executive control scores and cortical thickness in left ACC: r = -.18; left STG: r = -.14; right MTG r = -.19.
    • Alerting scores and cortical thickness in left SPL: r = .16.
</div>
  • Structural brain-behaviour correlations - the association between control over speed/accuracy of perceptual decisions and white matter tracts strength. Individual differences in control over speed and accuracy of perceptual decisions are positively correlated with the strength of white matter tracts between the right presupplementary motor area (pre-SMA) and the right striatum.

  • Status: mixed
  • Original paper: [Forstmann et al. (2010) n = 9, citations = 387 (GS, May 2022)]​.
  • Critiques: corrigendum of Boekel et al. (2015) [citations = 196 (GS, May, 2022)]; ​​[Forstmann et al. (2010) n = 12, citations = 387 (GS, May 2022)]; Keuken et al. (2017) [n = 32, citations = 1 (GS, May, 2022),
  • Original effect size: r = .93.
  • Replication effect size: ​​Forstmann et al. (2010): r = .76; Keuken et al. (2017): r = -.08.
</div>

Psychiatry

  • Low self-esteem on poor mental health/psychological outcomes. Small amount of slightly mixed evidence for some outcomes but not supported for most outcomes e.g. alcohol/smoking/drug use etc.

  • Status:
  • Original paper:
  • Critiques: Baumeister, Campbell, Krueger & Vohs 2003. Does High Self-Esteem Cause Better Performance, Interpersonal Success, Happiness, or Healthier Lifestyles?. https://journals.sagepub.com/doi/10.1111/1529-1006.01431 Total number of studies included in their review seems unclear - started with 15,000 sources but narrowed this down and final number included doesn’t seem clear? Showed some mixed evidence but mostly refuted claims. They found self-esteem was not related to smoking, alcohol, drug use; seemed to be only minimally associated with interpersonal success; Relationship with school performance seems to be that better school performance leads to higher self-esteem rather than the other way around. Self-esteem was moderately correlated with depression.
</div>
  • Rorschach Test, as a diagnostic tool for psychiatric conditions.

  • Status: NA
  • Original paper:
  • Critiques: Wood, Lilienfeld, Garb, & Nezworski, 2000. The Rorschach test in clinical diagnosis: a critical review, with a backward look at Garfield (1947). https://pubmed.ncbi.nlm.nih.gov/10726675/ . Test has some merit in detecting thinking disorders (although this is thought to be non-projective rather than projective which is meant to be the intention of the test; Dawes 1994) but is not related to other conditions such as depression, anxiety, antisocial personality disorder. Garb 1998, Lilienfeld et al. 2006. These indicate that clinicians with access to questionnaire data or life histories of patients use data from the Rorschach test, their predictive accuracy actually decreases, possibly because they place more weight on the Rorschach results which are lower quality than data from other sources. Lilienfeld , Wood, & Garb 2006. Why questionable psychological tests remain popular. Scientific Review of Alternative Medicine. https://www.semanticscholar.org/paper/Why-questionable-psychological-tests-remain-popular-Lilienfeld-Wood/fedcdcac7efcc42b25160004c5c07bf4174f51c6. Garb1998. Studying the clinician: judgement research & psychological assessment. APA.
</div>
  • Lunar effect, alt term = Transylvania effect, this suggests there is a correlation between the full moon and strange occurances, particularly human behaviour. This is thought to have existed as a folk belief for centuries, and is widely believed today (e.g. According to Owen & McGowan, 81% of mental health professionals believe in this effect, with 69% of mental health nurses believing that full moons are associated with an increase in patient admissions - Francescani & Bacon, 2008).

  • Status: not replicated
  • Original paper: The lunar effect was popularised by Arnold Lieber: Lieber, A. L. (1978). The Lunar Effect. Anchor Press. Lieber, A. L. & Agel, J. (1996). How the moon affects you. Hastings House. Can’t access the cited books to add study details/sample sizes etc. Variety of other papers are available by other authors on this effect but Lieber reference used as it’s mostly attributed to him.
  • Critiques: Rotton & Kelly 1985. Much ado about the full moon: A meta-analysis of lunar-lunacy research https://doi.apa.org/doiLanding?doi=10.1037%2F0033-2909.97.2.286 Meta-analysis of 37 studies. Gutiérrez-Garcia & Tusell 1997. Suicides and the Lunar Cycle. https://journals.sagepub.com/doi/abs/10.2466/pr0.1997.80.1.243 n=897 (deaths by suicide). Kung & Mrazek 2005. Psychiatric Emergency Department Visits on Full-Moon Nights. https://ps.psychiatryonline.org/doi/full/10.1176/appi.ps.56.2.221-a
  • Original effect size:
  • Replication effect size:
</div>
  • Lack of a Theory of Mind is universal in autism. All autistic people fail to understand that other people have a mind or that they themselves have a mind.

</div>

Parapsychology

  • Precognition, undergraduates improving memory test performance by studying after the test.

  • Status: not replicated
  • Original paper: ‘Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect’, Bem 2012, 9 experiments with: Study 1: n=100; Study 2: n= 150; Study 3: n = 100; Study 4: n = 100; Study 5: n = 100; Study 6: n = 150; Study 7: n = 200; Study 8: n = 100; Study 9: n = 50; [citation = 1216 (GS, March 2022]).
  • Critiques: Ritchie et al. 2012, 3 replications: Replication 1: n = 50, Replication 2: n = 50; Replication 3: n = 50 [experiment; n=150, citations=235 (GS, December 2021)]; Gelman, 2013 [newspaper article] Schimmack, 2018 [blog]
  • Original effect size: Study 1: _d = _0.25; Study 2: d = 0.20; Study 3: d = 0.26; Study 4: d = 0.23; Study 5: d = 0.22; Study 6: negative trials: d = 0.15, erotic trials = d = 0. 14; Study 7: d = 0.09; Study 8: d = 0.19; Study 9: d = 0.42; mean effect size = 0.22.
  • Replication effect size: All effect sizes are reported in Ritchie et al. 2012: Replication 1: d = 0.30, Replication 2: d = -0.39, combined: d = 0.04 (converted using this).
</div>

Evolutionary psychology

  • Romantic priming, that looking at attractive women increases men’s conspicuous consumption, time discount, risk-taking. Weak, despite there being 43 independent confirmatory studies!: one of the strongest publication biases / p-hacking ever found.

  • Original paper: ‘Do pretty women inspire men to discount the future?’, Wilson and Daly 2003. n=209 (but only n=52 for each cell in the 2x2) (~560 citations).
  • Critiques: Shanks et al (2015): show that the 43 previous studies have an unbelievably bad funnel plot. They also run 8 failed replications. (total citations: ~80)
  • Original effect size: d=0.55 [-0.04, 1.13] for the difference between men and women. Meta-analytic d= 0.57 [0.49, 0.65]
  • Replication effect size: 0.00 [-0.12, 0.11]
</div>
  • Implicit religious priming. Implicitly priming god concepts by unscrambling sentences with words relating to religion increases prosocial behaviour in an anonymous economic game.

</div>
  • Implicit analytic priming, that implicitly priming analytic thinking by seeing a photo of Auguste Rodin’s The Thinker decreases belief in God.

  • Status: not replicated
  • Original paper: ‘Analytic thinking promotes religious disbelief’, Gervais and Norenzayan 2012; n=57 [citation=601 (Google Scholar, December 2021)].
  • Critiques: Sanchez et al [n=941, citations=59 (Google Scholar, December 2021)]. Camerer et al 2018; 2 experiments, n=224 and n=531 [citations=871 (Google Scholar, December 2021)].
  • Original effect size: d=-0.25 to d=0.12.
  • Replication effect size: Sanchez et al 2017, d=-0.25 to d=0.12. Camerer et al 2018, study 1 r=-0.055, study 2 r=-0.035.
</div>
  • Menstrual cycle version of the dual-mating-strategy hypothesis (that “heterosexual women show stronger preferences for uncommitted sexual relationships [with more masculine men] during the high-fertility ovulatory phase of the menstrual cycle, while preferring long-term relationships at other points”). Studies are usually tiny (median n=34, mostly over one cycle). Funnel plot looks ok though.

  • Original paper: ‘Menstrual cycle variation in women’s preferences for the scent of symmetrical men’, Gangestad and Thornhill (1998). (602 citations).
  • Critiques: Jones et al (2018) (total citations: 32)
  • Original effect size: g = 0.15, SE = 0.04, n=5471 in the meta-analysis. Massive battery of preferences included (…)
  • Replication effect size: Not a meta-analysis, just a list of recent well-conducted “null” studies and a plausible alternative explanation.
  • Note from a professor friend: the idea of a dual-mating hypothesis itself is not in trouble: the specific menstrual cycle research doesn’t seem to replicate well. However, to my knowledge the basic pattern of short vs long term relationship goals predicting [women’s] masculinity preferences is still robust.
</div>
  • Men’s strength in particular predicts opposition to egalitarianism.

  • Original paper: Petersen et al (194 citations).
  • Critiques: Measurement was of arm circumference in students, and effect disappeared when participant age is included. (total citations: 605)
  • Original effect size: N/A, battery of F-tests.
  • Replication effect size: Gelman: none as in zero. The same lab later returned with 12 conceptual replications on a couple of measures of (anti-)egalitarianism. They are very focussed on statistical significance instead of effect size. Overall male effect was b = 0.17 and female effect was b = 0.11, with a nonsignificant difference between the two (p = 0.09). (They prefer to emphasise the lab studies over the online studies, which showed a stronger difference.) Interesting that strength or “formidability” has an effect in both genders, whether or not their main claim about gender difference holds up.
</div>

Psychophysiology

  • Sympathetic nervous system activity predicts political ideology. In particular, subjects’ skin conductance reaction to threatening or disgusting visual prompts.

  • Original paper: Oxley et al, n=46 ( citations). p=0.05 on a falsely binarised measure of ideology.
  • Critiques: Six replications so far (Knoll et al; 3 from Bakker et al), five negative as in nonsignificant, one forking (“holds in US but not Denmark”) (total citations: )
  • Original effect size:
  • Replication effect size:
</div>

Applied Linguistics

  • Critical period hypothesis. How grammar-learning ability changes with age, finding that it is intact to the crux of adulthood (17.4 years) and then declines steadily.

</div>
  • Motivational role of L2 vision. Mental imagery of oneself as a successful language user in the future can enhance one’s motivation and performance.

</div>

Educational Psychology

  • Flipped learning, students learn better if they do homework about a lesson before coming to class to study that lesson.

  • Status: replicated
  • Original paper: Flip Your Classroom: Reach Every Student in Every Class Every Day. [citation=6585(Google Scholar, Dec 2021)]​.
  • Critiques: Lo & Hew [citations=423(Google Scholar, Dec 2021)], Strelan et al. [n=33678, citation=107(Google Scholar, Jan 2022)], Cheng et al. [n=7912, citation=195(Google Scholar, Jan 2022)], Låg & Sæle [n=not reported, number of reports=272, citation=106(Google Scholar, Jan 2022)], Lo & Hew [n=5329, citation=43(Google Scholar, Jan 2022)], Shi et al. [n=6947, citation=60(Google Scholar, Jan 2022)], van Altren et al. [n=24771, citation=239(Google Scholar, Jan 2022)], Xu et al. [n=4295, citation=33(Google Scholar, Jan 2022)], Vitta & Al-Hoorie [n=4220, citation=17(Google Scholar, Jan 2022)].
  • Meta-analysis effect size: Strelan et al.: g = 0.50 (0.42-0.52) cross-disciplinary. Cheng et al.: g = 0.19 (0.11, 0.27)​ cross-disciplinary. Låg and Sæle: g = 0.35 (0.31, 0.40) cross-disciplinary. Lo & Hew: g = .29 (0.17, 0.41) engineering education. Shi et al.: g = 0.53 (0.36, 0.70) cross-disciplinary. van Altren et al.: g = 0.36 (0.28, 0.44) cross-disciplinary. Xu et al.: d = 1.79 (1.32, 2.27) nursing education in China. Vitta & Al-Hoorie: g = 0.99 (0.81, 1.16) second language learning. In Vitta & Al-Hoorie’s study, Trim and Fill suggested possible publication bias inflating the results, but the adjusted effect size remained sizable: g = 0.58 (0.37, 0.78).
</div>
  • Mindsets, people’s beliefs about whether their talents and abilities are subject to growth and improvement. According to the meta-analysis by Sisk and colleagues (2018), the relationship between mindsets and academic achievement is weak: Of the 129 studies that they analyzed, only 37% found a positive relationship between mindset and academic outcomes. Furthermore, 58% of the studies found no relationship and 6% found a negative relationship between mindset and academic outcomes. Evidence on the efficacy of mindset interventions is not promising: of the 29 studies reviewed, only 12% had a positive effect, 86% of the studies found no effect of the intervention and 2% found a negative effect of the intervention. It should be noted that interventions seemed to work for low SES populations.

</div>

Health Psychology

  • Stress as the main/sole cause of peptic ulcers. Until the 1980s, stress was believed to be the main cause of peptic ulcers (with secondary contributing factors thought to eb excess stomach acid, spicy food). There may be some effect/role of stress involved in development and/or healing of ulcers but the evidence shows it is not the primary cause as was previously believed.

</div>

Political Psychology

  • Stereotype threat on gender differences in political knowledge, the idea that making gender stereotypes about political knowledge salient decrease womens’ performance on political knowledge tests. The replication effort showed no significant effect of gender stereotype activation on womens’ performance on a political knowledge test. ​

  • Status: Not replicated.
  • Original paper: Gender Differences in Political Knowledge: Bringing Situation Back In, Ihme & Tausendpfund (2018). Study 1: N= 603, shows that women are rated as less politically knowledgeable than men. Study 2: N=377; Female and male participants are randomly assigned to one of three conditions (stereotype not activated - control, stereotype activated by asking participants to report their gender, stereotype activated by a statement that there are gender differences in performance on the test participants are about to take) and answer a questionnaire assessing political knowledge. [citation=17 (GS, January 2022)]​.
  • Critiques: Azevedo, Micheli & Bolesta [Preprint] [n=1502, citations=NA]. Results showed a non-significant interaction between stereotype activation and gender on political knowledge scores.
  • Original effect size: partial η2 =0.33​.
  • Replication effect size: partial η2 =0.00​.
</div>

Evolutionary Linguistics

  • Typological Prevalence Hypothesis. Claims that cross-linguistically more prevalent distinctions are easier to learn, or the more common a certain distinction or way of categorizing across languages, the more cognitively natural (and easily learnable) for humans it should be. This effect is explored for the grammatical structure of evidentiality, or grammatical marking of information source in an utterance.

</div>

Speech Language Therapy

  • Bilingualism and stuttering, Bilingual children had an increased risk of stuttering and a lower chance of recovery from stuttering than language exclusive and monolingual speakers.

</div>
  • Self esteem and stuttering. Children who stutter have higher self-esteem than children who do not stutter. However, the self-esteem of children who stutter declines once they reach adolescence.

  • Status: NA
  • Original paper: Selbstwert von stotternden Kindern und Jugendlichen (in German). Zückner (2011). Case-control study - comparison against norm scores with n = 171[citations = 3, (GS, February 2022)].
  • Critiques: Cook and Howell 2014 [observational, n=59, citations=16, (GS, February 2022)]
  • Original effect size: M(SD)stuttering boys = 56.5 (25.9), M(SD)boys normgroup: 36.5(25.9); M(SD)stuttering girls = 43.1(35.8), M(SD)girls normgroup=27.7(25.7)
  • Replication effect size: Cook and Howell M(SD) = 2.9(0.49) (children: adolescent: r(bullying, self-esteem = .387)
</div>

Experimental Philosophy

  • Fake barn cases. “Older participants are less likely than younger participants to attribute knowledge in fake-barn cases”

  • Status: not replicated
  • Original paper: Epistemic Intuitions in Fake-Barn Thought Experiments, Colaço et al. (2014) between subjects n=234 (n=85 in the relevant analysis) [citation=181 (GS, October 2022)] Critiques: Bergenholtz et al. (2021) [n=348, citations=0 (GS, October 2022)]
  • Original effect size: r = −0.32
  • Replication effect size: No effect size given (because of non-significant effect)
</div>



Further literature