WebMCP and the Web Agents Can Understand

Most websites are made for people.

That sounds obvious, but it becomes a problem when we ask agents to use them.

A person can look at a page and understand the intention behind it. We see a form, a button, a menu, a filter, a checkout flow, and usually infer what is happening. Even when the interface is not perfect, we compensate.

Agents do not have that same context.

They can inspect the DOM, read labels, click buttons, and try to follow the same path a human would. Sometimes it works. Sometimes it feels impressive. But it is still fragile, because the agent is often guessing what the interface means.

That is why WebMCP (Web Model Context Protocol)⁠ is interesting for us.

Not because it makes the web “agentic” overnight.

But because it points to a simple idea: websites should be able to expose what they can do, not only what they render.

flowchart LR
  subgraph H["Human Layer"]
    direction TB
    HU["Human User"]
    VI["Visual Interface<br/><small>HTML • CSS • JavaScript</small>"]

    HU --> VI
  end

  subgraph W["Web Application"]
    direction TB
    AL["Application Logic<br/><small>State, navigation, and product behavior</small>"]
    TOOLS{"WebMCP capability layer"}
    DT["Declarative WebMCP Tools<br/><small>HTML forms + annotations</small>"]
    IT["Imperative WebMCP Tools<br/><small>JavaScript-defined tools</small>"]

    AL --> TOOLS
    TOOLS --> DT
    TOOLS --> IT
  end

  subgraph A["Agent Layer"]
    direction TB
    AG["Agent"]
    BC["Browser Context<br/><small>Page • State • Session</small>"]
    WT["WebMCP Tools"]
    PC["Permissions & User Control"]
    SA["Structured Actions & Context"]

    AG --> BC
    BC --- WT
    WT --- PC
    PC --- SA
  end

  H --> W
  A --> W

  classDef node fill:#f1f5f9,stroke:#94a3b8,color:#111827,stroke-width:1.4px;
  classDef app fill:#ffffff,stroke:#e5e7eb,color:#030712,stroke-width:2px;
  classDef capability fill:#d1d5db,stroke:#f9fafb,color:#030712,stroke-width:2px;

  class HU,VI,AG,BC,WT,PC,SA node;
  class AL,DT,IT app;
  class TOOLS capability;

  style H fill:transparent,stroke:#64748b,color:#f8fafc,stroke-width:1.2px,stroke-dasharray:5 5
  style A fill:transparent,stroke:#64748b,color:#f8fafc,stroke-width:1.2px,stroke-dasharray:5 5
  style W fill:transparent,stroke:#e5e7eb,color:#f8fafc,stroke-width:1.6px,stroke-dasharray:5 5

From UI to Intent

Today, if an agent wants to interact with a website, it usually has to interpret the page from the outside.

It looks at the structure.

It reads the text.

It tries to understand what buttons, forms, and menus are supposed to do.

That works until it does not.

WebMCP proposes a more explicit path. A page can expose structured tools that describe useful actions available in the current context.

The WebMCP Declarative API⁠ is especially interesting because it starts from something very web-native: regular HTML forms. With the right annotations, a form can become a WebMCP tool an agent can understand, instead of just another piece of UI it has to guess through.

There is also the WebMCP Imperative API⁠, which is closer to the kind of thing you would expect in a more dynamic web app. Instead of only annotating existing forms, you can define WebMCP tools in JavaScript for flows that depend on application state, navigation, or more complex product behavior.

The UI is still there.

The product surface still matters.

But now there is an additional layer of meaning for agents.

This feels very aligned with the web to me. The web has always worked best when meaning is not trapped inside visuals. Semantic HTML, forms, links, metadata, and accessibility APIs are all reminders that good interfaces are not only about how things look, but also about how well they can be understood by other systems.

WebMCP feels like part of that same family.

A small contract between the page and the agent.

Here is what this page can do.

Here are the inputs.

Here is the current context.

Here is the action you are allowed to take.

That contract matters because clicking is not the same as understanding.

Not MCP, Not Scraping

The name can be a little confusing at first.

WebMCP is related to the broader MCP conversation, but it is not the same thing as a regular MCP server. Chrome has a good explanation of when to use WebMCP and MCP⁠, and I think the distinction is useful.

MCP is great when an agent needs to connect to tools, data, and services outside the browser.

WebMCP is about the live website.

It is closer to the product surface. It knows about the page the user is on, the state that exists there, and the actions the site chooses to expose in that moment.

That makes it different from scraping too.

Scraping is the agent trying to reverse-engineer meaning from what was rendered.

WebMCP is the site saying: this is the meaning I want to expose.

That does not remove the need for good UI. It just gives agents a better interface than guessing.

The Frontend Becomes a Capability Layer

For frontend engineers, this is the part I find most interesting.

We are used to thinking about the frontend as the layer that turns product intent into UI: routes, forms, components, state, validation, accessibility, performance, and design systems.

But if agents are going to operate inside web products, frontend code also becomes a capability layer.

Not in a “replace the UI” way.

More like progressive enhancement.

The human interface remains the main experience. WebMCP becomes an additional contract that helps agents understand and operate parts of that experience more reliably.

That framing makes more sense to me than treating this as another AI feature bolted onto a product.

It is not just “add a chatbot”.

It is: expose product intent in a way an agent can understand.

And this is where the WebMCP best practices⁠ matter. The interesting part is not just registering a tool. It is describing it clearly, keeping the surface small enough to be reliable, and avoiding the temptation to expose everything just because it is technically possible.

Start With the Boring Parts

I would not start with the dramatic demos.

The best place for this is probably the boring parts of software. The moments where the user already knows what they want, but the UI asks for five small steps to get there.

Find the setting. Apply the filters. Open the support request with the current context. Run the diagnostic flow. Generate the report from what is already on the page.

Not magic.

Just less guessing.

And that is also where trust matters. A website should not expose every internal function as an agent tool. A good WebMCP surface should be curated around product intent, user control, and clear boundaries.

The goal is not to let agents do everything.

The goal is to make the right actions easier to understand.

Testing the Agent Surface

One thing I like in the WebMCP docs is that they do not stop at “register a tool and call it done”.

There is a whole section on WebMCP evals⁠, and that feels important.

If a website exposes tools to agents, the work is not finished when the schema compiles. You still need to know whether the agent understands when to call the tool, which parameters to use, what a good result looks like, and where the flow can fail.

That is a different kind of frontend quality bar.

Not only “does the button work?”

But also “does the agent understand the action?”

I like that framing because it makes the agent surface feel testable. Not perfect. Not deterministic in the same way a normal unit test is deterministic. But still something we can evaluate, debug, and improve.

The same goes for Chrome DevTools support for WebMCP⁠. If agents are going to interact with the product surface, developers need visibility into what tools were registered, whether the schema is valid, and what happened during invocation.

That is the difference between a demo and a real developer workflow.

Why I Care About This

I have been spending time around browser-native AI APIs while building web-ai-sdk⁠, and WebMCP fits the same larger theme for me.

The browser is becoming more than a place where AI-powered UI is rendered.

It is also becoming a place where AI capabilities can run, where product context already exists, and where agents can interact with the same surfaces users already trust.

That does not mean every AI workflow should live in the browser.

But it does make the browser a much more interesting boundary.

Prompting, local models, tool calling, page context, user intent, product state, permissions, and UI are all closer together there than almost anywhere else.

That is the part I keep coming back to.

Not the hype around agents.

The interface between agents and the web.

A Small Shift in Vocabulary

For me, WebMCP is less about hype and more about vocabulary.

It gives us another way to describe the relationship between websites and agents.

Not scraping.

Not guessing.

Not replacing the interface.

Understanding it.

The API shape may still evolve, and that is fine. Early web primitives usually do. The early preview post⁠, the proposal on GitHub⁠, and the draft specification⁠ all make this feel like something still being shaped in public.

But the direction is useful.

If agents are going to help people inside real products, they need better affordances than “look at the screen and figure it out”.

And if product teams care about control, safety, and user trust, they probably do not want agents inventing workflows from raw page structure either.

WebMCP is one way to close that gap.

A way for websites to expose not only what users can see, but what agents can safely do.

That is why I am paying attention.

The agent-readable web does not need to replace the human web.

It just needs to make the web less guessable.

LoFM.