SZA calls out Diplo after The Atlantic finds 21M recordings in AI datasets

The Atlantic says 21 million recordings from four datasets helped train AI. SZA says Black artists are being exploited.

ByMaha Al-JuhaniEntertainment Correspondent, The Executives Brief

about 10 hours ago·3 min read

SZA calls out Diplo after The Atlantic finds 21M recordings in AI datasets

Executive summary

SZA warned that AI is exploiting Black artists after The Atlantic reported that training datasets include recordings from major artists. For decision-makers, the issue is not just ethics but how data sourcing can trigger reputational and regulatory blowback.

Last week, The Atlantic published a report that landed like a brick in the inbox of anyone who cares about music, culture, or the future of AI. The paper identified over 21 million recordings across four datasets being used to train AI. And once people started searching those datasets, the pattern stopped being abstract and started feeling personal.

That is the backdrop for SZA speaking out. After searching her name, SZA warned that AI is exploiting Black artists, and she also called out Diplo. In other words, this is not only about what a model learned in a lab. It is about who is showing up in the training data without meaningful consent, and how that shows up when real artists recognize themselves inside systems they do not control.

Why this matters beyond the music bubble: training data is the raw material of modern AI, and raw material is where incentives live. If datasets are assembled at scale from existing catalogs, the industry quickly runs into a high-stakes question. Do artists get compensated, asked, or protected, or do they just get harvested? The Atlantic's reported scope, over 21 million recordings across four datasets, suggests the second dynamic is plausible at scale. It also hints at why this conversation is accelerating right now, because the more recordings are included, the harder it becomes to pretend the process is “neutral” or accidental.

And the report does not just name big mainstream acts. It also points to major artists spanning genres and eras, from Bad Bunny to Nirvana. That matters because it frames the controversy as an industry-wide training-data problem, not a niche complaint. It also helps explain why the issue is sticking in public consciousness: when datasets appear to include widely recognized artists, audiences stop seeing this as a technical footnote and start treating it as a cultural theft debate.

Stereogum’s piece adds another concrete layer. The writer did a quick search for personal favorites like Squirrel Flower and the Cramps, and found that their music was included. That kind of “oh wow, my artists are in there too” discovery is exactly how these stories spread. It converts the argument from policy language into something immediate: if your favorite artists show up, you are not discussing AI in the abstract anymore. You are discussing consent, ownership, and leverage.

This is also where regulators and legal teams start paying attention, even when the conversation begins online. When training datasets are publicly described with large totals and multiple sources, it invites scrutiny over licensing, data provenance, and whether artists had a realistic path to opt out or get compensated. Even without assuming new rules will land tomorrow, the signal for decision-makers is that “we trained on what existed” is no longer a safe posture. Reputational risk and compliance risk can rise together, especially when artists go public and name specific figures.

For boards and executives, SZA’s callout of Diplo underscores how quickly these debates turn into relationship and governance problems. In the AI era, brands are exposed not only through product outcomes but through the supply chain of data. If prominent creators interpret inclusion as exploitation, it can trigger backlash that hits funding sentiment, partnerships, and customer trust. The second-order effect is that governance frameworks for data sourcing become a competitive advantage, not a legal afterthought.

The strategic stake is simple: the training-data choices companies make now will shape how the public, artists, and policymakers judge them later. If more artists search datasets and publicly react, controversy can become a recurring cost center. If companies proactively demonstrate transparent sourcing and fair compensation pathways, they reduce the odds that their story becomes the next viral “this was in the dataset” moment. For peers building or licensing AI systems, the message is clear, even if the technical details are dense: data is strategy, and culture is not waiting for a press release to decide what it thinks.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Always free for Executives Club members. Join the Club

Taggedai music sza diplo dataset copyright stereogum the-atlantic

SZA calls out Diplo after The Atlantic finds 21M recordings in AI datasets

This story's Key Insights and Take-aways are locked.

More in Entertainment

Clive Barker’s Hellraiser Revival launches as a standalone single-player survival horror this October

Apple TV thriller with Javier Bardem and Amy Adams hits #1 worldwide

Reese Witherspoon turns “Elle” reunion into a 25-year reckoning: Prime Video July 1