scroll to top arrow or icon

{{ subpage.title }}

How AI models are grabbing the world's data
Inside AI's data frenzy: The controversial practices fueling artificial intelligence | GZERO AI

How AI models are grabbing the world's data

In this episode of GZERO AI, Taylor Owen, host of the Machines Like Us podcast, examines the scale and implications of the historic data land grab happening in the AI sector. According to researcher Kate Crawford, AI is the largest superstructure ever built by humans, requiring immense human labor, natural resources, and staggering amounts of data. But how are tech giants like Meta and Google amassing this data?

So AI researcher Kate Crawford recently told me that she thinks that AI is the largest superstructure that our species has ever built. This is because of the enormous amount of human labor that goes into building AI, the physical infrastructure that's needed for the compute of these AI systems, the natural resources, the energy and the water that goes into this entire infrastructure. And of course, because of the insane amounts of data that is needed to build our frontier models. It's increasingly clear that we're in the middle of a historic land grab for these data, essentially for all of the data that has ever been created by humanity. So where is all this data coming from and how are these companies getting access to it? Well, first, they're clearly scraping the public internet. It's safe to say that if anything you've done has been posted to the internet in a public way, it's inside the training data of at least one of these models.

Read moreShow less

The Reddit logo is displayed on a smartphone with Reddit visible in the background in this photo illustration. Taken in Brussels, Belgium. On March 17, 2024.

Jonathan Raa / Sipa USA via Reuters

Reddit raising eyebrows

The social media website Reddit is set to go public on March 21 at a valuation of $6.4 billion. But new AI-related troubles are brewing for the company.

The US Federal Trade Commission launched an investigation into Reddit’s practice of licensing its user data to AI companies, according to a regulatory filing by the company. On March 14, Reddit was informed of the FTC probe “focused on our sale, licensing, or sharing of user-generated content with third parties to train AI models.” The company said it’s not surprised by the inquiry due to the “novel nature of these technologies and commercial arrangements.”

Read moreShow less

Subscribe to our free newsletter, GZERO Daily