Brazil: Children’s Personal Photos Misused to Power AI Tools

Two young girls are playing with their cameras in a garden, Osterode, Germany, January 8, 2016. © 2016 Frank May/picture-alliance/dpa/AP Photo

(São Paulo, Brazil) – The personal photos of Brazilian children are being used to create powerful artificial intelligence (AI) tools without the children’s knowledge or consent, Human Rights Watch said today. These photos are being scraped off the web into a large data set that companies then use to train their AI tools. In turn, others are using these tools to create malicious deepfakes that put even more children at risk of exploitation and harm.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” said Hye Jung Han, children’s rights and technology researcher and advocate at Human Rights Watch. “The government should urgently adopt policies to protect children’s data from AI-fueled misuse.”

Analysis by Human Rights Watch found that LAION-5B, a data set used to train popular AI tools and built by scraping most of the internet, contains links to identifiable photos of Brazilian children. Some children’s names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.

One such photo features a 2-year-old girl, her lips parted in wonder as she touches the tiny fingers of her newborn sister. The caption and information embedded in the photo reveals not only both children’s names but also the name and precise location of the hospital in Santa Catarina where the baby was born nine years ago on a winter afternoon.

Human Rights Watch found 170 photos of children from at least 10 states: Alagoas, Bahia, Ceará, Mato Grosso do Sul, Minas Gerais, Paraná, Rio de Janeiro, Rio Grande do Sul, Santa Catarina, and São Paulo. This is likely to be a significant undercount of the total amount of children’s personal data that exists in LAION-5B, as Human Rights Watch reviewed less than 0.0001 percent of the 5.85 billion images and captions contained in the data set.

The photos reviewed span the entirety of childhood. They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home, students giving a presentation at school, and teenagers posing for photos at their high school’s carnival.

Many of these photos were originally seen by few people and appear to have previously had a measure of privacy. They do not appear to be otherwise possible to find through an online search. Some of these photos were posted by children, their parents, or their family on personal blogs and photo- and video-sharing sites. Some were uploaded years or even a decade before LAION-5B was created.

Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people. Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken.

These privacy risks pave the way for further harm. Training on photos of real children has enabled AI models to create convincing clones of any child, based on a handful of photos or even a single image. Malicious actors have used LAION-trained AI tools to generate explicit imagery of children using innocuous photos, as well as explicit imagery of child survivors whose images of sexual abuse were scraped into LAION-5B.

Likewise, the presence of Brazilian children in LAION-5B contributes to the ability of AI models trained on this data set to produce realistic imagery of Brazilian children. This substantially amplifies the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said or did.

At least 85 girls from Alagoas, Minas Gerais, Pernambuco, Rio de Janeiro, Rio Grande do Sul, and São Paulo have reported harassment by their classmates, who used AI tools to create sexually explicit deepfakes of the girls based on photos taken from their social media profiles and then circulated the faked images online.

Fabricated media have always existed, but they required time, resources, and specialized expertise to create and were largely not very realistic. Today’s AI tools create lifelike outputs in seconds, are often free, and are easy to use, risking the proliferation of nonconsensual deepfakes that could recirculate online for a lifetime and inflict lasting harm.

In response, LAION, the German nonprofit that manages LAION-5B, confirmed that the data set contained the children’s personal photos found by Human Rights Watch and pledged to remove them. It disputed that AI models trained on LAION-5B could reproduce personal data verbatim. LAION also said that children and their guardians were responsible for removing children’s personal photos from the internet, which it argued was the most effective protection against misuse.

Lawmakers have proposed banning the nonconsensual use of AI to generate sexually explicit images of people, including children. These efforts are urgent and important, but they only tackle one symptom of the deeper problem that children’s personal data remain largely unprotected from misuse. As written, Brazil’s data protection law – the Lei Geral de Proteção de Dados Pessoais or the General Personal Data Protection Law – does not provide sufficient protections for children.

The government should bolster the data protection law by adopting additional, comprehensive safeguards for children’s data privacy. In April, the National Council for the Rights of Children and Adolescents, a deliberative body established by law to protect children’s rights, published a resolution directing itself and the Ministry of Human Rights and Citizenship to develop a national policy to protect the rights of children and adolescents in the digital environment within 90 days. They should do so.

The new policy should prohibit scraping children’s personal data into AI systems, given the privacy risks involved and the potential for new forms of misuse as the technology evolves. It should also prohibit the nonconsensual digital replication or manipulation of children’s likenesses. And it should provide children who experience harm with mechanisms to seek meaningful justice and remedy.

Brazil’s Congress should also ensure that proposed AI regulations incorporate data privacy protections for everyone, especially children.

“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” Han said. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”