Discover more from Steven’s Substack
RAG to riches – how publishers adapt to AIs scraping their content
Generative AI is killing advertising as a publisher business model. Programmatic micro-royalties are the inevitable solution.
“Worried excitement” would have been an honest subheading for Hearst Ventures’ AI and media event at NYC’s TechWeek. On the panel sat representatives of Microsoft, journalists, AI startups, and media conglomerates. To distill an hour of deep discussion to a single takeaway:
Generative search is changing how we consume information in ways that undermine the publisher advertising business model.
Generative search’s threat to publishers
When we Google something nontrivial, we follow a pattern:
Click several result links
Read and synthesize the contents of the linked pages
Adjust query and repeat as needed
The ads shown during Step 3 pay for much of the internet.
AI retrieves and scans text from linked pages
AI outputs a summary of its findings, with footnotes
AI suggests followup and disambiguation queries to the user as needed
Retrieval-augmented generation (RAG) lets AIs retrieve and analyze new information beyond their training data – automating the “read, compare, synthesize” steps in the old search flow. But when humans don’t visit publishers, publishers don’t get to show ads.
Currently, 64% of Google searches (77% on mobile) are answered on the search engine result page (SERP). This percentage has been creeping upward for a decade as Google incrementally adds AI to its SERP.
Evolution of search
Legacy search (DuckDuckGo) gives users a list of links:
Modern search (Google) includes AI summaries:
Generative search (Perplexity) yields comprehensive answers:
In each instance, the underlying content was written by Investopedia, Vanguard, Schwab; but as the SERP evolves, fewer human users need to visit them (who reads the footnotes on Wikipedia?) This cannibalizes search ads to an extent, but the underlying publishers are existentially threatened. Any decrease in search traffic decreases publisher ad revenue.
The threat is so severe that Reddit is considering blocking Google from crawling and displaying its content. That's bad news for any who already appends "reddit" to their Google searches.
Any decrease in search traffic decreases publisher ad revenue.
Programatic RAG royalties
Blocking search traffic is a desperate move reflecting a breakdown in internet’s business model. Search engines and AI builders know that without economically viable publishers, there won't be any content to left to scrape. The internet needs a new business model.
This July, OpenAI payed the Associated Press to license access to their news stories. This is a good blueprint for a business model, but manually negotiating licenses between every AI company and every publisher is an O(n^2) complexity problem – analogous to direct digital ad deals in the late 90s before programmatic advertising.
Direct licensing may work for the very biggest AIs and publishers, but the rest of the internet needs a programmatic solution that pays publishers by usage – just like how ads get paid by the number of impressions.
AI surfaces publishers through RAG
AI earns 1st party ad revenue through its own on-site ads or paid users
AI shares revenue with RAG'd publishers
Just as ads democratized digital publishing, RAG royalties let anyone get paid when AI uses their content.
Who counts the RAGs?
To quote playwright Tom Stoppard, "It's not the voting that's democracy; it's the counting."
Google is the obvious candidate to run a RAG network, likely as an outgrowth of AdSense and Google Analytics.
But as with ads, there's the issue of “grading your own homework”. If Google’s generative search is the biggest consumer of RAG content, can publishers trust Google to impartially measure and distribute revenue? At a minimum, independent measurement companies are needed, similar to the adtech companies that measure ad viewability and invalid traffic.
Maybe a fully decentralized, programmatic marketplace is the best platform. Somewhere, a web3 fund manager just sat up straighter.
The next five years
A programatic RAG micro-royalty network won't be built in a day. Right now, publishers' best solution is creating interactive AI content that can't simply be scraped and RAG’d by third parties. Buzzfeed is experimenting with interactive formats; Direqt.ai has found traction providing publishers RAG chatbots for their own content. Incorporating genAI into traditional publishers increases user engagement and enables higher-performing ads that adapt to user interactions with AI.
Publishers will game RAG retrieval just like they game SEO. The flood of AI-generated made-for-advertising sites makes this harder.
Consequently, not all RAG'd content is equal. If five publishers are retrieved, but one provided the most valuable info, that's hard to measure. How do we reward the best content?
Premium publishers will demand higher royalties. Do real-time auctions decide what content gets RAG’d for which queries?
What do we call this method of tagging RAG’d publishers for payment? “RAG network” sounds like “ad network”. I also like "RAGtags".
Thanks for reading Steven’s Substack! Subscribe for free to receive new posts on AI, media, and advertising.