New Zealand’s Stuff media group has joined other leading news organisations around the world in restricting Open AI from using its content to power artificial intelligence tool Chat GPT.
A growing number of media companies globally have taken action to block access to Open AI bots from crawling and scraping content from their news sites.
Open AI is behind the most well-known and fastest-growing artificial intelligence chatbots, Chat GPT, released late 2022.
“The scraping of any content from Stuff or its news masthead sites for commercial gain has always been against our policy,” says Stuff CEO Laura Maxwell. “But it is important in this new era of Generative AI that we take further steps to protect our intellectual property.”
Generative Artificial Intelligence (Gen AI) is the name given to technologies that use vast amounts of information scraped from the internet to train large language models (LLMs).
This enables them to generate seemingly original answers — in text, visuals or other media — to queries based on mathematically predicting the most likely right answer to a prompt or dialogue.
Some of the most well-known Gen AI tools include Open AI’s ChatGPT and Dall-E, and Google’s Bard.
Surge of unease
There has been a surge of unease from news organisations, artists, writers and other creators of original content that their work has already been harvested without permission, knowledge or compensation by Open AI or other tech companies seeking to build new commercial products through Gen AI technology.
“High quality, accurate and credible journalism is of great value to these businesses, yet the business model of journalism has been significantly weakened as a result of their growth off the back of that work,” said Maxwell.
“The news industry must learn from the mistakes of the past, namely what happened in the era of search engines and social media, where global tech giants were able to build businesses of previously unimaginable scale and influence off the back of the original work of others.
“We recognise the value of our work to Open AI and others, and also the huge risk that these new tools pose to our existence if we do not protect our IP now.”
There is also increasing concern these tools will exacerbate the spread of disinformation and misinformation globally.
“Content produced by journalists here and around the world is the cornerstone of what makes these Gen AI tools valuable to the user,” Maxwell said.
“Without it, the models would be left to train on a sea of dross, misinformation and unverified information on the internet — and increasingly that will become the information that has itself been already generated by AI.
Risk of ‘eating itself’
“There is a risk the whole thing will end up eating itself.”
Stuff and other news companies have been able to block Open AI’s access to their content because its web crawler, GPTBot, is identifiable.
But not all crawlers are clearly labelled.
Stuff has also updated its site terms and conditions to expressly bar the use of its content to train AI models owned by any other company, as well as any other unauthorised use of its content for commercial use.
Earlier this year The Washington Post published a tool that detailed all major New Zealand news websites were already being used by OpenAI.
OpenAI has entered into negotiations with some news organisations in the United States, notably Associated Press, to license their content to train ChatGPT.
So far these agreements have not been widespread although a number of news companies globally are seeking licensing arrangements.
Maxwell said Stuff was looking forward to holding conversations around licensing its content in due course.