News Outlets vs. OpenAI: The AI Data Controversy
Posted on 31-08-2023

Clash Between News Media and OpenAI's ChatGPT Raises Questions on AI Data Sources

Introduction to Artificial Intelligence:

Artificial intelligence (AI) embodies computer or robot capabilities mirroring human intelligence, enabling tasks requiring reasoning, comprehension, learning, and problem-solving. It involves developing systems that replicate human-like intellectual processes, including pattern recognition, inference, and learning from past experiences, often facilitated by training AI algorithms on extensive datasets. AI algorithms identify patterns, predict outcomes, and offer recommendations analogous to human cognition but with greater speed and accuracy.

Exploring OpenAI:

OpenAI, a prominent AI research entity, is recognized for its creation, 'ChatGPT,' an advanced conversational AI. ChatGPT engages users in diverse conversations, providing accurate responses, narratives, and even programming code to user queries. It operates based on 'Large Language Models' (LLMs), which necessitate substantial data for effective training. Data for LLMs, utilized by companies like Google, Meta, and OpenAI, is often collected through automated software 'crawlers' that extract information from web pages.

Controversy over ChatGPT’s Data Acquisition:

The recent issue arose when news media outlets, including The New York Times, opposed OpenAI's use of their content for training ChatGPT. OpenAI's web crawler 'GPT bot' gathered publicly available data from these outlets, a practice opposed by news organizations. Unlike search engines like Google, OpenAI's data extraction didn't result in any mutual benefit for news companies, prompting their resistance.

Reasoning Behind News Outlets’ Stance:

Search engines reproduce portions of news articles to enhance search results, providing exposure and traffic to news websites. In contrast, OpenAI's data acquisition doesn't offer similar benefits to news outlets, as it repurposes their content without reciprocation. The conflict highlights the discrepancy between mutually beneficial relationships and one-sided data collection by AI entities like OpenAI.

Seeking Resolution:

OpenAI attempted to address these concerns by entering a licensing agreement with The Associated Press, granting access to archival content for training purposes. However, potential legal disputes similar to previous copyright infringement cases remain possible, challenging the boundaries between AI data utilization and intellectual property rights. The ensuing legal battles possess substantial implications for journalism, intellectual property, and the trajectory of artificial intelligence.


The clash between news media and OpenAI underscores the intricate interplay between AI data acquisition, journalism, and intellectual property rights. As technology evolves, the resolution of these conflicts will shape the landscape of AI development, its ethical implications, and its relationship with content creators.

