Model Collapse in the Age of Synthetic Data: Risks and Consequences

Bozhidar Bahov

Model Collapse in the Age of Synthetic Data: Risks and Consequences
Model Collapse in the Age of Synthetic Data: Risks and Consequences

Author(s): Bozhidar Bahov
Subject(s): Social Sciences, Economy, Business Economy / Management, Sociology, Evaluation research, Social Informatics, ICT Information and Communications Technologies, Socio-Economic Research
Published by: Университет за национално и световно стопанство (УНСС)
Keywords: model collapse; Model Autophagy Disorder (MAD); synthetic data; generative models; data contamination
Summary/Abstract: This paper presents a literature review on model collapse—sometimes termed Model Autophagy Disorder (MAD)—observed when generative models are recursively trained on synthetic data produced by earlier versions of themselves. Drawing on evidence from recent studies in language modeling and computer vision, we highlight how reliance on model-generated content can gradually degrade performance metrics and reduce expressive diversity, impacting downstream tasks such as text generation, image classification, and captioning. We examine the theoretical impact of model collapse, including mechanisms that drive the distributional drift and loss of tail events, and survey the empirical findings demonstrating its effects on large language models and modern image generators. Finally, we discuss the broader risks and consequences—ranging from ethical concerns to long-term threats to AI system reliability—and propose strategies for mitigating collapse, including the inclusion of fresh real-world data, rigid data curation, and detection techniques like watermarking.

Details
Contents

Book: Selected Papers from the 14th International Conference on Application of Information and Communication Technology and Statistics in Economy and Education (ICAICTSEE-2024), December 2-3rd, 2024, UNWE, Sofia, Bulgaria

Page Range: 248-256
Page Count: 9
Publication Year: 2025
Language: English

Content File-PDF

Back to list

Model Collapse in the Age of Synthetic Data: Risks and Consequences Model Collapse in the Age of Synthetic Data: Risks and Consequences

Model Collapse in the Age of Synthetic Data: Risks and Consequences
Model Collapse in the Age of Synthetic Data: Risks and Consequences