Large AI training data set removed after study finds child abuse material

LAION-5B, a large artificial intelligence data set used to train several popular text-to-image generators, was found to contain child sexual abuse material.

A widely-used artificial intelligence data set used to train Stable Diffusion, Imagen and other AI image generator models has been removed by its creator after a study found it contained thousands of instances of suspected child sexual abuse material.

LAION — also known as Large-scale Artificial Intelligence Open Network, is a German nonprofit organization that makes open-sourced artificial intelligence models and data sets used to train several popular text-to-image models.

A Dec. 20 report from researchers at the Stanford Internet Observatory’s Cyber Policy Center said they identified 3,226 instances of suspected CSAM — or child sexual abuse material — in the LAION-5B data set, “much of which was confirmed as CSAM by third parties,” according to Stanford Cyber Policy Center’s Big Data Architect and Chief Technologist David Thiel.

Source: https://cointelegraph.com/news/laion-5b-ai-data-set-removed-child-sexual-abuse-material