Translations of thousands of YouTube videos used without permission to train AI

Major Silicon Valley companies have been using YouTube videos without the creators’ knowledge to train their AI tools, in violation of the platform’s terms of service. Specifically, the companies used the videos’ subtitles, according to the investigation. Posted on Tuesday, July 16 in Burhan Newsan American non-profit media organization funded by Through several foundations.

Among the companies highlighted are three giants with huge profits: electronics company Apple, customer relations software expert Salesforce, and graphics card pioneer Nvidia, whose chips are widely used to train artificial intelligence. There’s also a big startup: Anthropic, the publisher of conversational AI clouds, which received $4 billion in funding from Amazon in 2024.

Burhan News I looked at the research articles published by these various companies: These clearly state that their researchers used a set of 173,536 videos called YouTube Subtitles. To train their artificial intelligence. These videos have more than 48,000 different YouTube titles and some, like YouTuber PewDiePie's cell phone, has 400 contents of siphons, and the plug will give you more than enough screenshots of more videos. At a rate.

Media and YouTubers

Journalists from Burhan News We were able to download this collection of videos. They have built a search engine that allows anyone to browse and identify the original channels. There are mainly English-language sources: educational channels such as those from MIT, Harvard, Khan Academy, and media such as Wall Street JournalOr TV channels like CBS and BBC, or even YouTube stars like MrBeast. According to Burhan NewsThere are also videos from conspiratorial sources claiming that the Earth is flat. There are also some videos in French: among the rare French media participating, some videos from world AFP, and French YouTubers include Squeezie, Norman and Cyprien.

according to Burhan NewsThese translations were compiled by EleutherAI, a non-profit research group, which did not respond to questions from US media outlets. Her websiteEleutherAI announces that it is working to make advanced AI technologies accessible to small players to prevent this sector from being “Dominated by a handful of large companies”.

Anthropic and Salesforce have confirmed this. Burhan News It used a dataset called The Pile, which contains translations from YouTube subtitles. Apple and NVidia did not respond to reporters’ questions. As for Google, which owns YouTube, a spokesperson simply said the company had taken action. “To prevent” This type of practice, but without responding to the specific situation he referred to. Burhan NewsIn April 2024, a investigating in The New York Times It has been proven that Google and OpenAI also used YouTube video translations to train their AI.

the world

Reuse this content

Stan Shaw

<p class="sign">"Professional food nerd. Internet scholar. Typical bacon buff. Passionate creator."</p>

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top