Data overuse in aging research: Emerging issues and potential solutions

Edit this page


Aging and lifespan development researchers have been fortunate to have public access to many longitudinal datasets. These data are valuable and see high utilization, yet this has a considerable downside. Many of these are heavily overused. Overuse of publicly available datasets creates dependency among published research papers giving the false impression of independent contributions to knowledge by reporting the same associations over multiple papers. This is a potentially serious problem in the aging literature given the high use of a relatively small number of well-known studies. Any irregularities or sampling biases in this relatively small number of samples have outsize influence on perceived answers to key aging questions. We detail this problem, focusing on issues of dependency among studies, sampling bias and overfitting, and contradictory estimates of the same effect from the same data in independent publications. We provide solutions, including greater use of data sharing, pre-registrations, holdout samples, split-sample cross-validation, and coordinated analysis. We argue these valuable datasets are public resources that are being diminished by overuse, with parallels in environmental science. Taking a conservation perspective, we hold that these practices (pre-registration, holdout samples) can preserve data resources for future generations of researchers.

Link to resource:

Type of resources: Reading

Education level(s): College / Upper Division (Undergraduates), Graduate / Professional, Career /Technical, Adult Education

Primary user(s): Student, Teacher

Subject area(s): Life Science, Social Science

Language(s): English