RAG to riches – how publishers adapt to AIs scraping their content
Generative AI is killing advertising as a publisher business model. Programmatic micro-royalties are the inevitable solution.
“Worried excitement” would have been an honest subheading for Hearst Ventures’ AI and media event at NYC’s TechWeek. On the panel sat representatives of Microsoft, journalists, AI startups, and media conglomerates. To distill an hour of deep discussion to a single takeaway:
Generative search is changing how we consume information in ways that undermine the publisher advertising business model.
Generative search’s threat to publishers
When we Google something nontrivial, we follow a pattern:
Input query
Click several result links
Read and synthesize the contents of the linked pages
Adjust query and repeat as needed
The ads shown during Step 3 pay for much of the internet.
The new wave of "generative search" engines – exemplified by Perplexity.ai, You.com, and Bing chat – changes the pattern:
Input query
AI retrieves and scans text from linked pages
AI outputs a summary of its findings, with footnotes
AI suggests followup and disambiguation queries to the user as needed
Retrieval-augmented generation (RAG) lets AIs retrieve and analyze new information beyond their training data – automating the “read, compare, synthesize” steps in the old search flow. But when humans don’t visit publishers, publishers don’t get to show ads.
Currently, 64% of Google searches (77% on mobile) are answered on the search engine result page (SERP). This percentage has been creeping upward for a decade as Google incrementally adds AI to its SERP.
Evolution of search
Legacy search (DuckDuckGo) gives users a list of links:
Modern search (Google) includes AI summaries:
Generative search (Perplexity) yields comprehensive answers:
In each instance, the underlying content was written by Investopedia, Vanguard, Schwab; but as the SERP evolves, fewer human users need to visit them (who reads the footnotes on Wikipedia?) This cannibalizes search ads to an extent, but the underlying publishers are existentially threatened. Any decrease in search traffic decreases publisher ad revenue.
The threat is so severe that Reddit is considering blocking Google from crawling and displaying its content. That's bad news for any who already appends "reddit" to their Google searches.
Any decrease in search traffic decreases publisher ad revenue.
Programatic RAG royalties
Blocking search traffic is a desperate move reflecting a breakdown in internet’s business model. Search engines and AI builders know that without economically viable publishers, there won't be any content to left to scrape. The internet needs a new business model.
This July, OpenAI payed the Associated Press to license access to their news stories. This is a good blueprint for a business model, but manually negotiating licenses between every AI company and every publisher is an O(n^2) complexity problem – analogous to direct digital ad deals in the late 90s before programmatic advertising.
Direct licensing may work for the very biggest AIs and publishers, but the rest of the internet needs a programmatic solution that pays publishers by usage – just like how ads get paid by the number of impressions.
Here's how:
AI surfaces publishers through RAG
AI earns 1st party ad revenue through its own on-site ads or paid users
AI shares revenue with RAG'd publishers
Just as ads democratized digital publishing, RAG royalties let anyone get paid when AI uses their content.
Who counts the RAGs?
To quote playwright Tom Stoppard, "It's not the voting that's democracy; it's the counting."
Google is the obvious candidate to run a RAG network, likely as an outgrowth of AdSense and Google Analytics.
But as with ads, there's the issue of “grading your own homework”. If Google’s generative search is the biggest consumer of RAG content, can publishers trust Google to impartially measure and distribute revenue? At a minimum, independent measurement companies are needed, similar to the adtech companies that measure ad viewability and invalid traffic.
Maybe a fully decentralized, programmatic marketplace is the best platform. Somewhere, a web3 fund manager just sat up straighter.
The next five years
A programatic RAG micro-royalty network won't be built in a day. Right now, publishers' best solution is creating interactive AI content that can't simply be scraped and RAG’d by third parties. Buzzfeed is experimenting with interactive formats; Direqt.ai has found traction providing publishers RAG chatbots for their own content. Incorporating genAI into traditional publishers increases user engagement and enables higher-performing ads that adapt to user interactions with AI.
Problems remain:
Publishers will game RAG retrieval just like they game SEO. The flood of AI-generated made-for-advertising sites makes this harder.
Consequently, not all RAG'd content is equal. If five publishers are retrieved, but one provided the most valuable info, that's hard to measure. How do we reward the best content?
Premium publishers will demand higher royalties. Do real-time auctions decide what content gets RAG’d for which queries?
What do we call this method of tagging RAG’d publishers for payment? “RAG network” sounds like “ad network”. I also like "RAGtags".
Indeed imagine a future where Substack has an agreement in place with Microsoft and Copilot gives me a daily summary of what my friends are up to (like Facebook in the good old days) as part of my copilot subscription. Substack agrees to license its content for a fee and it distributes some of it to the authors, and everytime I get a daily update about what Steven has published Steven earns a fee indirectly from me, encouraging him to publish more.
I wonder if there's some base model in how MSN runs its news service (I think Apple does something similar). Users of the platforms get to read articles from publications like the Atlantic for free and at the back-end Microsoft and Apple reimburse those content providers for their articles. Users on MSN are invited to read more from that publication and are taken to the website to do so.
As Microsoft is looking to make Windows a subscription product it could just seek to sign revenue sharing partnerships with the content providers that help strenghten its AI modeling. Indeed if they engage in "exclusive" content agreements it could even end up being super lucrative - imagine if the NYSE signed an agreement to provide equity data to Microsoft's generative AI platform ten minutes before Apple's.
Before search engines were just providing information and it was hard to charge for that and anyway people weren't used to paying for stuff. Now, perhaps, things have changed and more people are willing to pay, and people would perhaps be willing to pay for context and an understanding of intent alongside information.
Instead of it being the end it could be a cool new beginning.