Who Owns the Surface?

Most people I’ve talked to think generative UI means letting a model render React components on the fly. That’s the worst possible version of the idea, and if that’s what you’re shipping, you’ve already lost.

The actual story is a taxonomy. Three patterns of how an agent and a UI fit together, distributed across a power law, each suited to a different kind of surface. And underneath the taxonomy, a quieter fight worth caring about: who controls the layer between the model and the pixels.

Let me walk through it.

The Inversion

There’s a real difference between “we shipped an AI assistant” and “we built a fullstack agentic application,” and it has nothing to do with marketing.

In the assistant-on-the-side version, the agent doesn’t know what’s on screen, doesn’t know what the user just clicked, and can’t do anything except return text into a sidebar. The application underneath couldn’t care less. The agent is a sticker.

In the fullstack version, the agent shares state with the app, drives the UI, and moves next to the user. You can’t draw a clean line between where the app ends and the agent begins. The agent isn’t a feature parked next to the product; it’s part of how the product runs.

That’s the inversion. The agent stops being a feature in the product. The product starts bending around the agent. The UI exists to give the agent surfaces to operate on.

That flips the design question entirely. It’s no longer “how do users prompt for components” It’s: how does the agent know what UI to put in front of the user, when to put it there, and how to do it without breaking trust?

That question splits into three answers, depending on how predictable the surface is.

Three Patterns, One Power Law

The cleanest way I’ve seen this framed is as a frequency distribution. Most of the surfaces in any real product are predictable. You know what they need to look like and you’ve already designed them. Some surfaces are shape-roughly-known but variable. A few are genuinely open-ended.

flowchart LR
    A["Catalog<br/>predictable surfaces<br/>(~80% of the product)"]
    B["Vocabulary<br/>shape-known, content-varies<br/>(~15%, the tail)"]
    C["Sandbox<br/>third-party surfaces<br/>(~5%, platform edge)"]
    A --> B --> C
    style A fill:#dbeafe,stroke:#3b82f6,color:#000
    style B fill:#d1fae5,stroke:#10b981,color:#000
    style C fill:#fee2e2,stroke:#ef4444,color:#000

I’ll call them the Catalog, the Vocabulary, and the Sandbox, each named after what you actually hand the agent. A catalog of pre-built components. A vocabulary of primitives the agent composes with. A sandbox where third-party surfaces run.

The choice between them isn’t aesthetic. It’s a decision about predictability, trust, and how much control you hand to the model versus how much you keep.

Let me unpack each.

Pattern 1: The Catalog

This is where most teams should start, and where most agentic apps in production actually live today.

The shape of it: you declare a fixed catalog of components (bar chart, pie chart, contextual card, button, whatever your design system already contains), each one annotated with what it’s called, what it does, and what arguments it needs. A middleware layer translates that catalog into tool definitions the agent can see. The agent runs, decides one of the registered components is the right answer, and calls the corresponding tool. The frontend tool call pauses backend execution, hands control to the client, and the SDK maps the call to the component, fills the props, and mounts it.

flowchart LR
    U["User Request"] --> AG["Agent"]
    TB["Component Catalog<br/>(Bar Chart, Pie Chart,<br/>Contextual Card, ...)"] -. registered as tools .-> AG
    AG -- tool call --> SDK["Frontend SDK"]
    SDK --> R["Rendered Component"]
    style AG fill:#dbeafe,stroke:#3b82f6,color:#000
    style TB fill:#f3f4f6,stroke:#9ca3af,color:#000
    style R fill:#dbeafe,stroke:#3b82f6,color:#000

The open protocol forming around this pattern is AG-UI, an MIT-licensed event protocol that originated inside CopilotKit with LangGraph and CrewAI, and now has implementations across Mastra, Microsoft’s Agent Framework, AG2, Agno, and LlamaIndex. Framework-agnostic by design.

The thing to notice here: the model never generates UI code. It picks from a menu you’ve already designed, tested, and signed off on. The “generative” part is the choice and the data binding. Not the markup, not the layout, not the styling.

This is what you want for surfaces that are predictable, which is most of your product. The dashboards, the cards, the inline answers, the action buttons. The bar chart isn’t going to surprise you with a 3D rotating waveform that mangles your design system. It’s your bar chart. The agent just decided to use it. Grafana Assistant is a clean production example of this: a sidebar chat that responds using Grafana’s existing component vocabulary (panels, plots, queries, tables) instead of only free-form text or markup. The components are predefined. The agent’s job is to pick the right one for the conversation and wire it to the right data.

If you build one of these patterns, build this one. It covers the majority of the surface area of any real product, and it’s the only one where the runtime risk is genuinely bounded.

Pattern 2: The Vocabulary

Sometimes you can’t pre-register every possible component. The agent needs to compose a layout you didn’t anticipate. The data shape is different every time. The surface answers a request that doesn’t fit your catalog.

That’s where the declarative approach earns its keep. The open spec here is A2UI (Agent-to-UI), at a2ui.org, open-sourced by Google in collaboration with CopilotKit. The core idea is simple.

The agent doesn’t send code. It doesn’t pick from a fixed menu either. It emits a declarative description of intent (structured operations), and a renderer on the client materializes those operations into native UI components from your design system.

flowchart LR
    A["Agent"] -- describes intent --> O["A2UI Operations<br/>(declarative)"]
    O --> R["Client Renderer"]
    R --> U["Native UI<br/>(your components)"]
    style A fill:#d1fae5,stroke:#10b981,color:#000
    style O fill:#f3f4f6,stroke:#9ca3af,color:#000
    style U fill:#d1fae5,stroke:#10b981,color:#000

If you’ve worked with React internally, this should feel familiar. It’s reconciler-shaped, but with the agent as the source of truth instead of your component tree. The agent says “show a card with this title, this body, these three actions”. The renderer maps that to your design system. The model never touches a <div>, never writes a Tailwind class, never decides spacing.

Notion AI is a clean in-the-wild example of this: when you ask it to draft a structured page, it composes from Notion’s existing block vocabulary (headings, tables, callouts, toggles, databases) rather than generating markup. Notion’s renderer handles the actual UI.

There are two flavors of A2UI that matter more than they look at first:

Fixed schema: you build the layout in advance. The agent populates it with data at runtime. Predictable structure, dynamic content.
Dynamic schema: a second model pass produces the layout itself, adapted to whatever the conversation is actually about. The same spec-level constraints still hold, but the structure is decided in-flight.

Fixed schema is the safer default and probably what you want most of the time you reach for this pattern. Dynamic schema is what you reach for when the shape genuinely can’t be enumerated in advance. Even then, the spec keeps the model inside a sandbox you defined.

Either way, the layer between the model and the pixels is yours. That’s the part that matters.

Pattern 3: The Sandbox

The far end of the curve. The thing you reach for when the answer isn’t “render a component” or “compose a layout”. It’s “open this entire other app inside my app”.

The open standard here is MCP Apps, part of the broader Model Context Protocol, consolidating earlier work from MCP-UI and OpenAI into a single specification. Three components, with the boundaries between them doing the heavy lifting.

Worth flagging the spec status: MCP Apps already exists and is in production use, but the upcoming MCP 2026-07-28 spec revision (currently a release candidate, locked May 21, ships final July 28) formalizes it as a first-class extension with a dedicated lifecycle. The pattern is stable; the surrounding surface is consolidating.

The Server publishes both the tools and the UI surfaces themselves. Each tool carries a URI pointing at the embeddable app. The Host is your front-end app. It mounts the embedded surface inside a sandboxed iframe and brokers messages between the agent and the embedded app. The View is the surface itself, running in the sandboxed iframe. It can’t touch your DOM. It can’t read your cookies. It can’t do anything except talk back through postMessage.

flowchart LR
    S["Server<br/>MCP tools + UI surfaces<br/>(ui:// URIs)"]
    H["Host<br/>Your frontend app<br/>Mounts View, brokers messages"]
    V["View<br/>Sandboxed iframe<br/>No DOM, no cookies"]
    S -- tool call --> H
    H <-- postMessage / JSON-RPC --> V
    style S fill:#ede9fe,stroke:#8b5cf6,color:#000
    style H fill:#d1fae5,stroke:#10b981,color:#000
    style V fill:#fce7f3,stroke:#ec4899,color:#000

Tool calls between Server and Host. postMessage and JSON-RPC between Host and View. Three boundaries, three trust zones, no shared state across them.

This is the right pattern when you’re building something platform-shaped: a host-of-hosts that orchestrates third-party experiences. The agent picks which embedded surface to open. The developer pre-wired the integration. The user sees a third-party app running right inside the agent’s surface, with the agent able to drive it.

Claude and ChatGPT both ship variations of this pattern in production today: both can host third-party UI surfaces (connected tools, MCP server apps) embedded inside the agent’s product, with sandboxed iframes brokering between host and View.

It’s the most powerful pattern and the most dangerous one. You’re embedding code you don’t own. The iframe sandbox isn’t decoration. It’s load-bearing. The whole point of the architecture is that the View can’t reach in, can’t read out, can’t do anything except talk through the channel you opened. Take that boundary seriously or don’t ship this pattern at all.

So Where Does Each Pattern Belong?

Pulling the implicit strategy out into the open:

Catalog for ~80% of your surface area. The components you’d build anyway. The agent just picks intelligently from your registered set.
Vocabulary (A2UI) for the tail. Surfaces where you know roughly what shape you’ll need but can’t enumerate every variation in advance.
Sandbox (MCP Apps) for the platform edge. When you’re building a host for other people’s products, not a product itself.

Most teams will live entirely in Catalog and never need the other two. That’s correct, actually. The mistake is jumping straight to “let the model render anything it wants” because it looks more impressive in a demo. That’s the path to UI that looks great in screenshots and falls apart in production within a quarter.

The further right you move on the curve, the more flexibility you get, and the more trust you outsource to the spec, the iframe, the third-party app. Move only as far right as the surface actually demands.

Same Shape, New Decision-Maker

This three-tier arc isn’t unique to generative UI. The same pattern played out in browser plugins, in embeds and iframes, in browser extensions, across thirty years of platform evolution: a curated middle, a declarative layer, and an open sandbox. The architectural shape isn’t novel.

What is novel: the decision-maker choosing between those tiers is now a language model instead of a user with a mouse. The topology stayed the same. The thing in front of the topology, picking which tier to use and when, became something with its own opinions.

That’s the shift worth paying attention to. Not the patterns themselves, but who’s choosing between them.

The Real Concern Nobody Is Saying Out Loud

The interesting tension here isn’t technical. It’s about who owns the line.

When the agent decides what UI to show, even within constraints, the agent is doing product work that used to belong to designers and PMs. The constraints (your registered catalog, the A2UI primitives, the MCP sandbox) are the only thing standing between your product’s design integrity and a runtime that picks whatever feels right to a language model on a given Tuesday.

Whoever ships the spec, the renderer, and the constraint layer ends up controlling a meaningful slice of what every agentic product is allowed to look like. That’s what’s actually being decided at ag-ui.com, at a2ui.org, inside the MCP Apps spec, inside every framework racing to ship a generative UI primitive. It’s not really a UI standards conversation. It’s a control-plane conversation, wearing a UI library’s clothes.

(My guess: the spec layer consolidates around two or three contenders in the next eighteen months, the same way the JS framework wars did. A2UI has the head start, with Google’s backing and the open-source ecosystem already shipping around it. AG-UI is locking in the transport layer underneath. MCP Apps is becoming the standard at the platform edge. But the surface is wide open, and whoever wins the constraint layer ends up with the kind of leverage Stripe has over payments. That’s the actual prize.)

Where to Start

There’s a temptation, when a stack like this lands, to split the takeaway across roles. Engineers get one paragraph, PMs another, designers a third. I’m skipping that. The roles are converging anyway: the person making the surface decision, writing the code, and owning the product spec is increasingly one person, or a small group wearing all three hats. (More on that in a future post ☺️)

One synthesis instead:

Start with the Catalog. Most surfaces don’t need anything else. Build the registered set, expose it as tools, let the agent pick from a menu you designed. The “let the model render anything” demo looks great in screenshots and falls apart in production within a quarter.

When you genuinely need flexibility, reach for the Vocabulary before open-ended rendering. The constraint layer is what keeps the model from picking what feels right on a given Tuesday.

Keep the Sandbox for the moment your product becomes a host for other people’s products. When you ship that, take the iframe seriously. It isn’t decoration.

And every time you move further right on the gradient, the question is the same: what are you outsourcing trust to, and do you actually trust it?

The component-library era of “AI features” is ending. The agent-runtime era is starting. Build accordingly.

The three-tier framing borrows its shape from the Controlled/Declarative/Open spectrum laid out in Build Interactive Agents with Generative UI; the Catalog/Vocabulary/Sandbox labels are my simplified own restatement.