NEWS ORGS BLOCK WEB ARCHIVE FROM AI TRAINING

AI DESK■ 2 MIN READ

THU, APR 30, 2026

■ AI-SUMMARIZED FROM 1 SOURCE ▸ TIMELINE

Major news outlets including CNN, NBC, and USA Today are taking action to prevent their content from being stored in web archives used by AI companies to train chatbots.

News organizations are escalating efforts to restrict access to their published content in web archives that artificial intelligence companies leverage for training data. CNN, NBC, and USA Today are among the major outlets joining a coordinated push to limit their content's availability in these archives. The move reflects growing tensions between media companies and AI developers over data usage and intellectual property rights. Web archives like the Internet Archive's Wayback Machine have long preserved digital content for historical and research purposes. However, AI companies have increasingly mined these repositories to train large language models and chatbots, raising questions about copyright and fair compensation. News organizations argue that their journalism should not be used to train AI systems without permission or payment. The content represents significant editorial investment and journalistic work that generates value for AI applications. The effort comes as several media outlets have already taken individual action. Some have modified their website code to prevent archiving, while others have sent legal notices demanding removal of their content from public archives. AI companies maintain that training on publicly available internet content falls within fair use protections. They argue that text used for machine learning differs fundamentally from direct republication and serves broader technological advancement. The conflict highlights a fundamental disagreement over digital content ownership in the AI era. News organizations contend they should control how their work is used commercially, while AI developers argue existing data is essential for developing competitive AI systems. Regulatory bodies and lawmakers are beginning to examine these disputes. The outcome could shape how AI companies source training data and potentially establish new licensing frameworks for digital content. The standoff remains unresolved, with neither side showing signs of backing down. The situation underscores broader debates about AI development, copyright law, and the rights of content creators in an increasingly automated digital landscape.

■ SOURCES

► Bloomberg Tech

■ SUMMARY WRITTEN BY AI FROM THE LINKS ABOVE

■ MORE FROM THE AI DESK

P537CLAUDE'S VALUES SHIFT WITH LANGUAGE, ANTHROPIC STUDY FINDS

Anthropic research reveals that Claude expresses different values depending on which language it uses, with Hindi conversations showing more warmth while Russian interactions demonstrate greater rigor.

JUST NOW— AI Desk

P539GOOGLE ROLLS OUT GEMINI IN CHROME TO UK USERS

Google has launched Gemini integration in Chrome for UK users. The AI assistant is now accessible directly within the browser.

JUST NOW— AI Desk

P535AI WORLD MODELS ATTRACT BILLIONS AS TECH LEADERS PUSH FORWARD

Startups led by prominent AI researchers like Yann LeCun are raising significant funding to develop world models—AI systems that learn to understand and simulate physical environments. The technology remains in early stages with key technical questions still unresolved.

JUST NOW— AI Desk

P532DEEPMIND CEO CALLS FOR AI REGULATORY BODY

Google Deepmind CEO Demis Hassabis has proposed creating a new US standards body to evaluate and oversee advanced AI development, citing uncertainty about the technology's trajectory.

JUST NOW— AI Desk

◄ BACK TO NEWS

NEWS ORGS BLOCK WEB ARCHIVE FROM AI TRAINING

■ MORE FROM THE AI DESK

■ SUBSCRIBE TO THE DAILY BRIEF