A recipe for extremely reproducible enrichment analysis (and why we need it)

Edit this page

Abstract

Unreliable and irreproducible research is a significant problem that wastes resources and risks undermining the public perception of science. Previous work has highlighted that published enrichment analyses frequently suffer from statistical and reporting flaws. We sought to determine whether this translates into irreproducibility by examining whether the findings of 20 open-access articles published in 2019 describing enrichment analysis with the popular DAVID suite were reproducible. We find that only four articles exhibited a high degree of concordance, while seven exhibited major discrepancies, which we mainly ascribe to deficiencies in methodological reporting. As the tool version used is no longer available, all articles using this tool pre-2021 (~20,800 studies) including this sample of 20 articles are no longer able to be reproduced with the original tools. Based on this, we suggest that results from web-based tools without long-term preservation features should not be included in scientific publications due to the threat of link decay and short reproducibility horizon. Relying exclusively on webtools for analysis may also be in breach of institutional and funder data preservation mandates. We advocate for the adoption of extremely reproducible research workflows, and we provide a detailed protocol for how to achieve it for enrichment analysis using a combination of best practices including literate programming, version control, containerisation, documentation and persistent sharing of data and software.

Link to resource: https://osf.io/preprints/osf/r6kxg

Type of resources: Reading

Education level(s): Graduate / Professional, Career /Technical

Primary user(s): Teacher, Librarian

Subject area(s): Life Science

Language(s): English