244. Duplicate Data and I.I.D.
medium

A language model is trained on web text where the same article appears multiple times due to web crawl duplicates. How does this violate i.i.d. and what is the practical consequence?