Google AI Training Under Fire for Ignoring Publisher Opt-Outs

In a U.S. antitrust trial, Google confirmed it continues to train its AI-driven search capabilities using available web content, even when website authors have explicitly opted out. This revelation has raised significant concerns about data privacy, transparency, and the ethical responsibilities of tech companies.

Google’s Testimony in DOJ Antitrust Trial

In recent antitrust proceedings, a Google executive revealed that the company’s search-based AI tools, such as AI Overviews, may be trained on publicly available information from across the internet, even if publishers choose not to contribute to AI training.

According to Bloomberg, Eli Collins, Vice President of Google DeepMind, explained that the company’s opt-out controls apply solely to DeepMind’s AI models. These controls do not extend to AI tools used within Google’s larger search operations. As a result, content removed from DeepMind’s training could still be utilized by Google’s search-related AI products.

This discovery emerged during a high-profile antitrust trial where the U.S. Department of Justice (DOJ) is challenging Google’s dominance in internet search. The DOJ aims to implement structural remedies, including potentially severing Google’s Chrome browser from its search engine and blocking default search engine partnerships.

In court, Aguilar directly questioned Collins about whether search-integrated AI systems like Gemini could still be trained on content from publishers who had opted out. Collins confirmed that such data remains available for use in search applications.

Publishers’ Control Undermined

The practice of exploiting publisher content for AI training, even when specifically prohibited, is causing major worries in the digital publication industry. Google’s method to bypass these limits calls into question publishers’ ownership over their content.

Many media sites rely on search engine visibility to drive traffic and generate advertising revenue. However, the rise of AI-generated overviews, which provide immediate answers on search results pages, may limit user interaction by lowering click-through rates.

This puts further weight on publishers, who are already struggling owing to diminishing monetization strategies, increasing financial demands on media organizations.

Legal and Ethical Implications

Bloomberg reports that Google’s witness, Collins, was questioned regarding the company’s search market dominance, which was declared unconstitutional by U.S. District Judge Amit Mehta. The DOJ is pushing Google to sell its Chrome browser, disclose search information, and stop paying to be the default search engine on apps and devices, including its AI services like Gemini.

On cross-examination, DOJ attorney Aguilar asked Collins about the information Google uses in its AI models, and demonstrated that Google removed 80 billion tokens out of 160 billion after excluding content from publishers who opted out. Collins verified that 50 percent of the tokens were withdrawn for this reason.

Google’s attorney argued that the company’s search dominance did not impede other AI firms from competing, citing that chatbots could provide accurate sports scores by relying on commercial agreements, not web indexes.

However, additional testimony revealed that Google had considered enhancing its AI models using data accumulated from its search engine over the years.

Conclusion

As Google integrates AI into its search systems, tensions between innovation and content ownership grow. Google’s decision to train its AI on publisher content despite opt-out options illustrates an important aspect in the fight over data usage and intellectual property.

The digital landscape is changing, and legislative adjustments are expected to follow. Publishers, AI developers, and lawmakers must work together to preserve content producers’ rights while promoting technological advancement.

About The Author

Misbah Pandith

With a strong command of language and a passion for emerging technologies, I bring over four years of experience as a content writer specializing in articles and blogs. I have written across a wide range of niches, with a particular focus on tech-driven subjects such as cryptocurrency and artificial intelligence. My expertise lies in producing news-style blogs that break down complex topics into clear, accessible content tailored for both industry professionals and general audiences.

I stay up-to-date with emerging trends and evolving technologies, ensuring my content is always relevant, accurate, and optimized for search engines. Whether I’m reporting on the latest developments in blockchain or analyzing breakthroughs in machine learning, I strive to deliver content that educates, informs, and drives engagement.

See author's posts