Why I Built web-ai-sdk

The browser became an AI runtime in 2026, and almost nobody’s tooling caught up.

Chrome 148 ships the Prompt API on by default, Edge runs the same family of APIs on Phi-4-mini, and Summarizer, Translator, and Language Detector have been stable since Chrome 138. A web page can now run a language model locally, with no API key, no server round trip, and no data leaving the device. The capability is real, it’s shipping across engines, and it’s on a standards track.

Building on it, though, is harder than it should be. Not because the APIs are bad, but because they’re new, and every app that touches them ends up rewriting the same plumbing before it gets to do anything interesting. So I built web-ai-sdk: a thin, typed, framework-agnostic layer over the Web’s Built-in AI APIs, one package per capability, with zero runtime dependencies. It’s also, a little oddly, designed to shrink over time.

Here’s the reasoning.

The Interesting Part Never Comes First

This is what shipping on Built-in AI looks like before a single useful token appears:

  1. Feature-detect the API, because it isn’t everywhere yet.
  2. Check availability, because the model might still be downloading.
  3. Create a session, and reuse it, because creating one is not free.
  4. Stream the chunks, and wire up abort.
  5. Clean up, so you don’t leak a model into memory.

Then, and only then, your feature.

That sequence is identical whether you’re summarizing an article or running a chat turn, and none of it is your product. It’s the cost of admission, paid again in every codebase, in every browser, by every team. web-ai-sdk pays it once:

import { ask } from "@web-ai-sdk/prompt";
const { output } = await ask({ input: "Summarize this page in one sentence." });

Feature detection, availability, session reuse, streaming, abort, cleanup, and a no-op fallback when the API is absent all live underneath that call. You write against one stable surface instead of coupling your whole app to the exact shape the platform happens to have this quarter.

Young Platforms Move, and That’s the Point

The deeper reason for a layer isn’t the boilerplate, it’s that the surface is still settling, which is exactly what you’d expect from something this new. Building on it taught me where a thin abstraction earns its place right now.

The engines are converging but not identical. These APIs are a shared, standards-track effort and they already run across Chromium browsers, but the details still differ between Chrome’s Gemini Nano and Edge’s Phi-4-mini, down to the shape of a streamed chunk and the cleanup an output needs. A single typed surface absorbs those differences so your code doesn’t have to branch per browser.

Sessions are still settling, too. Keeping a model warm so the next call is fast, recovering a session that has drifted, and budgeting input against the context window are all real concerns today, and they’re far easier to own in one place than to rediscover in every project.

Tools are the most interesting case, because they’re arriving from two directions at once. The Prompt API has a tools option for function calling (native execution still in origin trial), and WebMCP, through document.modelContext, lets a page expose agent-callable tools to a visiting agent. They’re two views of the same idea, and the two are already starting to meet. A library can bridge them today and quietly become a passthrough once they fully converge, and my bet is that convergence is where the agentic web actually gets built.

And the spec itself is moving, on purpose, as names and options shift toward their final form. Absorbing that churn in one layer keeps your code stable while the standard matures underneath it.

And the surface keeps growing. What began as text in and text out now takes images and audio, and hands back structured JSON. That isn’t less for a thin layer to deal with, it’s more, at least until the ergonomics around each new capability settle.

None of this is a complaint. It’s what early looks like, and early is the right time to build the connective tissue, not the wrong one.

Two Rough Apps Beat One Polished Demo

To keep the SDK honest, I built two small apps next to it, and I’ll be clear that both are exploratory and alpha as of today.

pheed is an on-device RSS reader that writes reading notes locally, so it pulls on the batch and throughput side: many short generations, content budgeted into the context window, results cached and reused.

locala is an on-device chat agent, which pulls on the interactive and agentic side: streaming a conversation, holding session state across turns, and exercising the tool-calling path end to end.

They stress the platform from opposite directions, and that’s deliberate. A general-purpose SDK shouldn’t be shaped by one app’s wishlist, and building two rough apps instead of polishing a single demo is how you tell the concerns that are genuinely shared, and worth putting in a library, from the ones that belong in your own code.

We Have Seen This Movie Before

The shape of web-ai-sdk isn’t novel, and that’s the whole idea. The most durable utility libraries tend to follow the same arc: a platform primitive ships useful but rough, a library appears to smooth the rough edges, everyone adopts it because the primitive alone is painful, the platform slowly absorbs the library’s surface, and the library thins out until it does almost nothing.

lodash ceded most of itself to native array and object methods. moment gave way to Temporal, and date-fns is reorienting to be Temporal-first. The healthy end state is a thin shim over a platform primitive that keeps getting thinner as the primitive stabilizes.

web-ai-sdk starts from that endpoint on purpose. The Built-in AI APIs are the primitive, and the SDK only owns the lifecycle and ergonomics they currently leave rough: feature detection, session reuse, stream smoothing, safe cleanup. As the APIs stabilize and converge across engines, the right outcome is for this layer to get smaller, not larger.

There’s a Kit Coming

web-ai-sdk is the building blocks, deliberately small and unopinionated, one capability per package, with composition left to your code on purpose.

But the longer I build with the blocks, the more the same shapes keep reappearing across apps: the same compositions, the same loops, the same handful of patterns you reach for the moment you go past a single call. pheed and locala didn’t only stress the SDK, they sketched what wants to sit on top of it. That’s roughly where web-ai-kit comes in.

I’m not going to over-describe it, because it isn’t designed yet, not deeply, and I’d rather ship something honest than promise something imaginary. The blocks come first, and the kit, when it arrives, will follow the same rules they do: own only what’s worth sharing, stay out of the way of how you build, and be ready to thin out when the platform catches up.

More on that soon. For now, the blocks are already enough to build real things, which is the point.

Where It Fits

The honest framing is progressive enhancement. Built-in AI is small-model, on-device inference, which makes it excellent for tasks that are lightweight, local, and frequent (a quick summary, a language guess, a tone rewrite, a proofread) and the wrong tool for anything that needs a frontier model. You use it where it shines and fall back to the cloud where it doesn’t.

That’s the shape web-ai-sdk is built for. Each capability is its own package, feature-detected, with a no-op fallback baked in, so “use it when available, degrade when not” is the default behavior rather than something you wire up by hand. You pick the blocks you need, skip the rest, and ship the same code everywhere.

The Strange Goal

Here’s the part that’s unusual to say out loud: I’m building toward my own obsolescence.

Every concern web-ai-sdk smooths over today is one I expect to disappear, though not always the way you’d assume. Some of it goes away because the platform absorbs it. When sessions are resilient by default, that code goes; when tool calling is native and aligned with WebMCP, the bridge collapses into a passthrough and then into nothing.

And some of it goes away because the problem itself dissolves. Take streaming. We stream tokens for the same reason we stream HTML from a server: the thing producing the output is slow enough that you want to show progress while it works. It’s a latency patch, not a feature. If on-device models get fast enough to answer effectively instantly, streaming stops being worth doing at all, the way streaming SSR is pointless for a page that already renders in a millisecond. The smoothing layer wins either way, by normalizing the patch or by outliving the need for it.

The best version of this project is the one that, a year or two from now, does noticeably less, because the Built-in AI APIs underneath do more, and faster. Until then, there’s a lifecycle to own and a moving spec to track, and that’s what web-ai-sdk is for. The browser is an AI runtime now, and this is the layer that makes it pleasant to build on while it grows up.


The Built-in AI APIs are available in Chrome (Prompt 148+, Summarizer/Translator/Detector 138+) and in Edge on Phi-4-mini and newer on-device models. The Prompt API now supports multimodal input and structured JSON output, with tool calling and image input in origin trial, and WebMCP is in origin trial from Chrome 149. web-ai-sdk wraps these behind one typed interface, with feature-detected no-op fallbacks so the same code ships everywhere. web-ai-sdk.dev

LoFM.