| アブストラクト | BACKGROUND AND OBJECTIVES: The last 3 years have seen an explosion in published manuscripts analyzing open-access health datasets, in many cases presenting misleading or biologically implausible findings. There is a growing evidence base to suggest that this is due in part to artificial intelligence-assisted and formulaic workflows, and publishers are responding by discouraging submissions employing open-access health datasets. METHODS: Here we use a scientometric analysis to investigate which datasets have seen publication rates deviate from previous trends, especially where this coincides with changes to author geographical origins and increases in formulaic titles. RESULTS: Across 36 datasets, we identify nine showing hallmarks of paper mill exploitation (FDA Adverse Event Reporting System, National Health And Nutrition Examination Survey, UK Biobank, FinnGen, the Global Burden of Disease Study, Medical Information Mart for Intensive Care, China Health and Retirement Longitudinal Study, Centers for Disease Control and Prevention Wide-ranging Online Data for Epidemiologic Research, and TriNetX). These nine datasets had, in 2025, a combined publication count of 23,005 indexed in the OpenAlex database. This represents an excess of 11,577 publications above the AutoRegressive Integrated Moving Average forecast trend, and is a 3.0x-fold change on the 7655 publication count for these nine datasets in 2022. We also identified a notable difference in the fold change for China (4.2x) vs. the rest of the world (1.9x) and an increase in formulaic titles. CONCLUSION: These findings highlight potential risks to research integrity in areas such as public health and drug safety, and especially to the accessibility and interoperability principles central to Open Science and Findable, Accessible, Interoperable and Reusable data practices. We argue that permissive open-access data policies naturally facilitate exploitative workflows and that these findings add to the case for the safeguarding mechanisms to preserve the goals of Open Science. |
| ジャーナル名 | Journal of clinical epidemiology |
| Pubmed追加日 | 2026/2/26 |
| 投稿者 | Spick, Matt; Onoja, Anthony; Harrison, Charlie; Stender, Stefan; Byrne, Jennifer; Geifman, Nophar |
| 組織名 | Faculty of Health and Medical Sciences, School of Health Sciences, University of;Surrey, Guildford GU2 7XH, United Kingdom. Electronic address:;matt.spick@surrey.ac.uk.;Surrey, Guildford GU2 7XH, United Kingdom.;Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion;SY23 3DB, UK.;Department of Clinical Biochemistry, Rigshospitalet, Copenhagen University;Hospital, Copenhagen, Denmark.;Faculty of Medicine and Health, School of Medical Sciences, The University of;Sydney, Camperdown, New South Wales, Australia; NSW Health Statewide Biobank, NSW;Health Pathology, Camperdown, New South Wales, Australia. |
| Pubmed リンク | https://www.ncbi.nlm.nih.gov/pubmed/41740900/ |