Online media brands including Yahoo, Quora, and Medium are taking a new step to prevent AI companies from copying and using their content to train models without their permission.
Publishers, including CNET parent company Ziff Davis, see this new tool, called Rsl , as another way to ensure that big AI developers don’t exploit their work without payment or compensation — an issue that has already sparked a series of lawsuits.
RSL, which stands for Really Simple Licensing, is inspired by Really Simple Syndication—a long-standing web standard that delivers up-to-date and automatic content updates in a computer-readable format. Like RSS, RSL is open, decentralized, and can work with almost any piece of content online, including web pages, videos, and datasets.
Currently, when an AI company’s roving internet bot, known as a crawler, wants to suck up information on a website, it has to go through robots.txt, which acts as a basic gateway or backdoor. AI companies have found ways around robots.txt or ignored it altogether and have subsequently been sued in the past. RSL aims to be a more robust layer of technology to deal with AI crawlers, which now account for more than half of all internet traffic in the past. (Disclosure: CNET parent company Ziff Davis filed a lawsuit against Openai in April, alleging that it infringes on Ziff Davis’ copyright in the training and operation of its AI systems.)
“RSL builds directly on the RSS legacy, providing the missing licensing layer for an AI-first internet,” said Tim O’Reilly, CEO of O’Reilly Media, in a press release. “It ensures that creators and publishers who drive AI innovation are not only part of the conversation, but are also fairly compensated for the value they create.”
Brands that have signed up to RSL include Reddit, People, Internet Brands, Fastly, Wikihow, O’Reilly, Daily Beast, The MIT Press, Miso, Adweek, Ranker, Evolve Media, and Raptive.
“If AI is trained on the work of our writers, then it should be paid for that work,” said Medium CEO Tony Stubbs in a press release. “Right now, AI is operating on stolen content. Adopting this RSL standard is how we force these AI companies to either pay for what they use, stop using it, or shut down.”
The emergence of RSL comes as online web traffic has shifted with the shift in Google and AI dominance. Google’s integrated AI-generated answers at the top of Google search have been criticized by publishers as taking away potential clicks they would have otherwise received. Google claims that AI insights send “ higher quality clicks ” to sites from people who are more engaged and stay on the sites longer. AI chatbots like Chatgpt also help with research and synthesis, meaning humans don’t have to jump around to different sites to gather pieces of information like they used to. Unstable in the Field
“Broad adoption of the RSL standard will protect the integrity of original work and accelerate a win-win system for publishers and AI service providers,” said Vivek Shah, CEO of Ziff Davis.
In response, publishers are suing AI companies or severing licensing deals. In other cases, sites are turning to services like Tollbit, which aim to charge AI crawlers every time they ask to crawl a site’s content. Content delivery networks like CloudFlare, which help ensure people have fast access to websites online, are blocking AI crawlers right in the field.
RSL co-founder Eckart Walther said the RSL standard and such efforts by CloudFlare are complementary, with many of the same media companies participating in both. Walther compared tools like CloudFlare to bounce-backs that protect a site from unwanted crawlers, while RSL simply lets the crawler understand the rules and the price of hosting. “These compensation methods can also work together. For example, a publisher might want to charge a fee to crawl their content, and then require a royalty payment every time the AI model uses the AI model to answer a question,” Walther told CNET in an email.