244. Duplicate Data and I.I.D.
medium
A language model is trained on web text where the same article appears multiple times due to web crawl duplicates. How does this violate i.i.d. and what is the practical consequence?
A language model is trained on web text where the same article appears multiple times due to web crawl duplicates. How does this violate i.i.d. and what is the practical consequence?