NLWeb: How Microsoft's Protocol Is Changing Content Strategy for WordPress & Shopify

NLWeb (Natural Language Web) is an open-source protocol from Microsoft that turns any website into a conversational AI endpoint — and it fundamentally changes how we should write, structure, and optimise content. Announced at Microsoft Build in May 2025 and created by R.V. Guha (the architect behind RSS, RDF, and Schema.org), NLWeb lets both human users and AI agents query website content using natural language instead of navigating pages or typing keywords. For those of us building on WordPress and Shopify, this is the most significant shift in content strategy since mobile-responsive design. Schema.org markup — once a “nice to have” for rich snippets — is fast becoming the foundational infrastructure that determines whether your content is visible to AI systems at all.

NLWeb is still early-stage. The protocol specification isn’t yet finalised, the W3C Community Group launched only in October 2025, and analysts project 2–3 years before substantial mainstream adoption. But the ecosystem is moving fast: Yoast announced WordPress integration in November 2025, Shopify is an official early adopter, Cloudflare offers one-click deployment, and companies like Tripadvisor, O’Reilly Media, and Eventbrite are already testing it. The window to build expertise and gain a competitive edge is open now.

How NLWeb actually works under the hood

NLWeb combines two components: a protocol layer exposing REST API (/ask) and MCP (/mcp) endpoints, and an implementation layer (currently a Python reference implementation on GitHub) that orchestrates the AI processing. The analogy Microsoft uses is deliberate: “NLWeb is to MCP/A2A what HTML is to HTTP.” MCP (Model Context Protocol, developed by Anthropic) handles communication; NLWeb handles the content and interaction logic on top of it.

The data pipeline starts with content a site already publishes. NLWeb ingests Schema.org JSON-LD markup, RSS feeds, XML sitemaps, or JSONL files, processes them through an embedding model, and stores the resulting vectors in a supported database (Qdrant, Elasticsearch, Postgres, Snowflake, Milvus, Azure AI Search, or Cloudflare AutoRAG). When a query arrives, it triggers a sophisticated pipeline: parallel pre-processing checks for relevancy, decontextualisation of the query against conversation history, semantic vector retrieval, then ranking through multiple targeted LLM calls. A single user query can trigger 50+ individual LLM calls, each with a narrow, specific job — checking relevancy, scoring individual results, resolving conversational context. This architecture avoids hallucination by only returning results from the actual database, never generating answers from thin air.

The protocol supports three query modes: list (ranked matches), summarise (a synthesis alongside the list), and generate (traditional RAG-style answer generation). Responses use Schema.org vocabulary — structured, machine-readable data, not just a text blob. The system is stateless; conversation context passes through a prev parameter rather than server-side sessions.

What makes NLWeb architecturally significant is its dual-purpose design. The /ask endpoint serves human visitors with a conversational search experience on any website. The /mcp endpoint simultaneously makes that same content available to external AI agents — ChatGPT, Copilot, Gemini, or any MCP-compatible system. Every NLWeb instance is automatically an MCP server. A website no longer just serves pages — it becomes a queryable knowledge source for the entire AI ecosystem.

WordPress gets NLWeb through Yoast, Shopify through partnership

For WordPress, the most significant development is the Yoast-NLWeb collaboration announced 25 November 2025. Yoast, installed on over 13 million websites, is working directly with Microsoft to make WordPress sites AI-readable through NLWeb. The integration will roll out in phases and — critically — will require no additional setup or new tools for existing Yoast users. That means the millions of sites already running Yoast could become NLWeb-enabled through standard plugin updates. Yoast has published a detailed explainer and offers early notification signup at yoast.com/nlweb-updates.

A separate community-built plugin called WPNLWeb (22 GitHub stars, GPL-2.0 licence) already exists and is pending WordPress.org review. It creates a standards-compliant REST endpoint at /wp-json/nlweb/v1/ask, supports Schema.org responses, claims sub-500ms response times with caching, and works with ChatGPT and Claude. It’s a viable option for anyone wanting to experiment before Yoast’s integration reaches full deployment.

Shopify’s situation is different. Shopify is confirmed as an official NLWeb early adopter and partner — listed in Microsoft’s announcement alongside Tripadvisor and Eventbrite. However, no dedicated Shopify App Store app exists yet. The involvement appears to sit at the platform partnership level. A community tutorial on Medium demonstrates integrating Shopify product catalogues with NLWeb via Langflow (a low-code AI agent platform), but this requires custom development. For Shopify clients right now, the practical path is ensuring comprehensive Schema.org markup and robust product feeds — positioning sites to benefit the moment native Shopify NLWeb support lands.

The most accessible deployment option today is Cloudflare AutoRAG, offering genuine one-click NLWeb deployment. Through the Cloudflare Dashboard (Compute & AI → AutoRAG → NLWeb Website), Cloudflare crawls up to 100,000 pages, vectorises the content, and deploys a Worker implementing the NLWeb protocol — complete with both /ask and /mcp endpoints, continuous re-indexing, and a preview interface. It works with any site on Cloudflare regardless of CMS, making it the fastest path to a working NLWeb implementation for prototyping.

Content strategy shifts from keywords to entities and answers

NLWeb doesn’t just add a new channel — it changes what “good content” means. Search Engine Land’s Elmer Boutin captures the shift in one sentence: “SEO is shifting from keyword-first to entity-first.” Because NLWeb uses vector databases that search by semantic meaning rather than keyword matching, content needs to clearly define entities (products, services, people, locations, topics) and their relationships — not simply target keyword phrases.

The practical implications break down across content types. Blog posts should use clear semantic heading hierarchies where each section can stand alone as an answer to a specific question. Front-load key information using inverted-pyramid style — NLWeb retrieves the most semantically relevant content segments, so burying insights at the end of long articles means they may never surface in conversational responses. Include FAQ sections with natural-language questions that match how people actually ask. Every article should carry comprehensive Article schema with author, datePublished, dateModified, and rich descriptions.

Product descriptions need the most dramatic overhaul. Implement comprehensive Product schema including every available property — name, description, brand, offers, price, availability, reviews, specifications, images. Write descriptions that directly answer the questions AI agents will ask on behalf of shoppers: “Is this jacket waterproof?” “Does it fit true to size?” “What’s the return policy?” Microsoft’s own guidance states: “Make sure your site’s schema markup and feeds are complete and up-to-date (e.g., product names, descriptions, prices, availability). NLWeb will use those as the knowledge base.”

Landing pages and marketing copy require a fundamental mindset adjustment. Creative, emotionally-driven copy still matters for human visitors, but it also needs to be factually precise and unambiguous for machine consumption. AI systems extract meaning, not style. A clever headline that doesn’t literally describe the page’s topic will be invisible to NLWeb’s semantic retrieval. The solution isn’t to abandon creative copy — it’s to ensure Schema.org markup, meta descriptions, and heading structure carry the semantic payload machines need, while the body copy serves both audiences.

GRAYBOX captures the broader strategic framework well: “Start thinking of your content as data, your site as a database, and your users as researchers querying that database.” This doesn’t eliminate the need for compelling writing — it means compelling writing must now coexist with comprehensive, structured metadata.

Schema.org markup becomes non-negotiable infrastructure

If there’s one takeaway that should drive immediate action, it’s this: Schema.org JSON-LD markup is no longer optional. NLWeb relies entirely on structured data to understand and serve content. Neil Patel’s assessment is blunt: “Schema moved from a snippet enhancer to a foundational layer for machine understanding. Schema is now infrastructure.”

The required depth of markup has increased dramatically. Minimalist schema — a basic Organisation type on the homepage, maybe Article markup on blog posts — no longer cuts it. NLWeb demands entity-first, interconnected schema that maps the full knowledge graph of a business: how products relate to categories, how services connect to locations, how authors relate to expertise areas, how reviews connect to specific products. Search Engine Land describes NLWeb as “the connective layer that turns structured data into something AI can understand, enabling your website to function as an intelligent, conversational interface.”

This means conducting what Search Engine Land calls a “mandate entity-first schema audit” across every client site. The audit should evaluate integrity (is the markup valid?), completeness (are all relevant entities marked up?), and interconnectedness (do the entities reference each other through proper Schema.org relationships?). Beyond Schema.org, we need to ensure RSS feeds are active, well-structured, and include full content rather than excerpts. Keep XML sitemaps current. Use semantic HTML elements (<article>, <section>, <nav>) rather than generic <div> containers. Reduce JavaScript dependency — many AI crawlers don’t execute JavaScript, and NLWeb’s data ingestion pipeline works best with server-rendered, text-centric content.

The relationship between NLWeb-era structured data and existing SEO best practices is complementary, not contradictory. Everything we already do well for rich results — Product schema, FAQ schema, HowTo schema, Review schema — directly serves NLWeb readiness. The difference is scope: previously, these were optimisations for Google’s SERP features. Now they form the content layer that determines visibility across the entire agentic web.

Where NLWeb sits in the GEO and AIO landscape

NLWeb doesn’t exist in isolation — it’s the technical infrastructure layer within a broader ecosystem of AI-driven content optimisation disciplines. Understanding the hierarchy matters:

SEO optimises for traditional search engine rankings and click-through
AEO (Answer Engine Optimisation) targets featured snippets and voice search answers
GEO (Generative Engine Optimisation) structures content for visibility in AI-generated responses from systems like ChatGPT, Perplexity, and Google AI Overviews
AIO (AI Optimisation) is the umbrella discipline ensuring content is legible to every AI system — assistants, copilots, RAG platforms, and agent frameworks
NLWeb provides the technical protocol enabling direct AI agent access to website content

As DAC Group frames it: “If SEO made brands legible to search engines, and GEO makes them legible to AI search engines, AIO is the umbrella discipline that ensures your entire content ecosystem is legible to every AI system.” NLWeb is the infrastructure that makes AIO operationally possible at the website level. GRAYBOX’s advice is direct: “Engage in Generative Engine Optimisation (GEO) and track your performance” — and NLWeb provides the rails for GEO to work on.

The competitive landscape includes llms.txt, a simpler static-file standard (like robots.txt for AI) that directs LLMs to important content. Search Engine Land draws a clear distinction: NLWeb is a dynamic protocol where AI queries a site in real time, while llms.txt is a passive standard where AI reads a static file. NLWeb enables richer, transactional interactions. Both can coexist — implementing an llms.txt file is a quick complementary step — but NLWeb represents the more powerful and future-facing approach. Google’s A2A (Agent-to-Agent) protocol is another piece of the puzzle; NLWeb plans to support A2A alongside MCP, potentially making NLWeb-enabled sites accessible to both Microsoft/Anthropic and Google agent ecosystems.

The risks and the monetisation question

The enthusiasm around NLWeb is tempered by legitimate concerns. Ben Thompson of Stratechery identified the most fundamental one: NLWeb lacks native payments. If users consume content through conversational AI interfaces instead of visiting pages, the advertising-based business model that funds most web content breaks down. Thompson wrote that Microsoft’s proposal, by not including native payments, isn’t as compelling as it should be. TNL Mediagene is exploring AI content licensing and usage-based access as alternative revenue models, and IAB Tech Lab is developing “cost per crawl” APIs, but no proven monetisation framework exists yet.

Shelly Palmer, Syracuse University professor and LinkedIn Top Voice in Technology, raises the adoption risk: NLWeb’s success depends on widespread developer adoption amid competing standards, privacy concerns, and limited resources. Without a clear monetisation path or killer use case, NLWeb risks becoming another well-intentioned protocol that never escapes the GitHub demo stage.

Security is another concern. In May 2025, researchers discovered a critical path traversal vulnerability in the NLWeb reference implementation that exposed system configuration and cloud credentials to remote attackers. Microsoft patched it but did not issue a formal CVE, raising questions about security maturity for an early-stage project. Each query triggering 50+ LLM calls also raises cost concerns for site operators — running NLWeb at scale requires meaningful compute budgets.

The pragmatic read: NLWeb is real, Microsoft-backed, and built on proven web standards — but it’s not yet production-ready for most businesses. The right posture is prepare now, deploy selectively, and monitor closely.

What this means for you — and how we can help

The parallel to early SEO is apt. GRAYBOX states: “The opportunity of this moment is similar to the early days of SEO when many brands weren’t yet optimizing for Google — it left an opportunity to gain traction for early adopters.” Businesses that build NLWeb readiness into their digital strategy now will hold a genuine advantage as adoption accelerates through 2026–2027. We’re here to make that straightforward.

In the next 30 days

The single highest-impact action is a thorough audit of your Schema.org implementation — checking completeness and entity interconnectedness across every page. We’d also activate and optimise RSS feeds on your WordPress or Shopify site, and identify whether your content (product catalogues, knowledge bases, guides) makes you a strong candidate for early NLWeb experimentation.

In the next 90 days

We can build a full “AI Readiness” programme around your site: schema audits, structured data optimisation, and NLWeb consultation rolled into one. That includes deploying a proof-of-concept using Cloudflare AutoRAG (the fastest path) or the WPNLWeb WordPress plugin, shifting your content strategy from keyword-first to entity-first and question-first, implementing llms.txt files as a quick complementary win, and setting up tracking for new metrics — how AI tools reference your brand, conversational query patterns, and entity visibility.

Platform-specific next steps

If you’re on WordPress, we’ll monitor Yoast’s phased NLWeb rollout closely and ensure your site is properly configured the moment it ships. If you’re on Shopify, our focus right now is maximising Product schema depth and maintaining clean, complete product feeds — positioning you to benefit the moment native Shopify NLWeb support arrives.

The content we write together today, structured with comprehensive markup and answer-oriented clarity, is building the dataset that AI agents will query tomorrow. If any of this resonates, let’s have a conversation about where your site stands and what we can do to get you ahead of the curve.

NLWeb is rewriting how websites talk to AI

How NLWeb actually works under the hood

WordPress gets NLWeb through Yoast, Shopify through partnership

Content strategy shifts from keywords to entities and answers

Schema.org markup becomes non-negotiable infrastructure

Where NLWeb sits in the GEO and AIO landscape

The risks and the monetisation question

What this means for you — and how we can help

In the next 30 days

In the next 90 days

Platform-specific next steps

Key sources and further reading

Related articles

Ecommerce Website Design: A Practical Guide for UK SMEs

Why an Optimised Website Could Save You Time

What Happens When Your SSL Certificate Expires?

Ready to grow your business online?

24-hour response guarantee

No hard sell, ever

Talk to the people who'll do the work