SZA calls out Diplo after The Atlantic finds 21M recordings in AI datasets
The Atlantic says 21 million recordings from four datasets helped train AI. SZA says Black artists are being exploited.

SZA warned that AI is exploiting Black artists after The Atlantic reported that training datasets include recordings from major artists. For decision-makers, the issue is not just ethics but how data sourcing can trigger reputational and regulatory blowback.
Last week, The Atlantic published a report that landed like a brick in the inbox of anyone who cares about music, culture, or the future of AI. The paper identified over 21 million recordings across four datasets being used to train AI. And once people started searching those datasets, the pattern stopped being abstract and started feeling personal.
That is the backdrop for SZA speaking out. After searching her name, SZA warned that AI is exploiting Black artists, and she also called out Diplo. In other words, this is not only about what a model learned in a lab. It is about who is showing up in the training data without meaningful consent, and how that shows up when real artists recognize themselves inside systems they do not control.
Why this matters beyond the music bubble: training data is the raw material of modern AI, and raw material is where incentives live. If datasets are assembled at scale from existing catalogs, the industry quickly runs into a high-stakes question. Do artists get compensated, asked, or protected, or do they just get harvested? The Atlantic's reported scope, over 21 million recordings across four datasets, suggests the second dynamic is plausible at scale. It also hints at why this conversation is accelerating right now, because the more recordings are included, the harder it becomes to pretend the process is “neutral” or accidental.
And the report does not just name big mainstream acts. It also points to major artists spanning genres and eras, from Bad Bunny to Nirvana. That matters because it frames the controversy as an industry-wide training-data problem, not a niche complaint. It also helps explain why the issue is sticking in public consciousness: when datasets appear to include widely recognized artists, audiences stop seeing this as a technical footnote and start treating it as a cultural theft debate.
Stereogum’s piece adds another concrete layer. The writer did a quick search for personal favorites like Squirrel Flower and the Cramps, and found that their music was included. That kind of “oh wow, my artists are in there too” discovery is exactly how these stories spread. It converts the argument from policy language into something immediate: if your favorite artists show up, you are not discussing AI in the abstract anymore. You are discussing consent, ownership, and leverage.
This is also where regulators and legal teams start paying attention, even when the conversation begins online. When training datasets are publicly described with large totals and multiple sources, it invites scrutiny over licensing, data provenance, and whether artists had a realistic path to opt out or get compensated. Even without assuming new rules will land tomorrow, the signal for decision-makers is that “we trained on what existed” is no longer a safe posture. Reputational risk and compliance risk can rise together, especially when artists go public and name specific figures.
For boards and executives, SZA’s callout of Diplo underscores how quickly these debates turn into relationship and governance problems. In the AI era, brands are exposed not only through product outcomes but through the supply chain of data. If prominent creators interpret inclusion as exploitation, it can trigger backlash that hits funding sentiment, partnerships, and customer trust. The second-order effect is that governance frameworks for data sourcing become a competitive advantage, not a legal afterthought.
The strategic stake is simple: the training-data choices companies make now will shape how the public, artists, and policymakers judge them later. If more artists search datasets and publicly react, controversy can become a recurring cost center. If companies proactively demonstrate transparent sourcing and fair compensation pathways, they reduce the odds that their story becomes the next viral “this was in the dataset” moment. For peers building or licensing AI systems, the message is clear, even if the technical details are dense: data is strategy, and culture is not waiting for a press release to decide what it thinks.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Entertainment

Clive Barker’s Hellraiser Revival launches as a standalone single-player survival horror this October
Behavior and dead by daylight may own the asym horror conversation, but Hellraiser is changing the format on purpose.

Apple TV thriller with Javier Bardem and Amy Adams hits #1 worldwide
A psychological thriller front-runs Apple TV's slate, turning one show into the streamer’s current global attention magnet.

Reese Witherspoon turns “Elle” reunion into a 25-year reckoning: Prime Video July 1
The “Legally Blonde” cast reunites as Witherspoon spotlights the “Elle” prequel’s production and debut date.
