Streaming AI-Generated Code in Real Time: Inside RapidNative's Architecture
By Rishav
28th May 2026
Last updated: 27th May 2026
When you type a prompt into ChatGPT, the worst-case failure mode is a paragraph that reads slightly off. When you type a prompt into RapidNative, the output is the running mobile app. Every token that streams back is a character of TypeScript that some browser-side bundler is about to try to compile and render inside an iframe — while the model is still typing.
That gap between "model is generating" and "user sees a screen update" is where most AI builders break. They either wait for the whole response (slow, dead-feeling UX), or they ship partial code to the bundler and watch the preview flicker through a dozen syntax errors per second. We've spent a lot of cycles on the boring, invisible pipework that closes that gap.
This post documents the actual transport layer: how streaming AI-generated code gets from a streamText() call on the server to a working React Native preview in the iframe, with no waiting, no broken JSX flashing on screen, and no lost data when someone's Wi-Fi drops mid-generation. It's a deep technical companion to the 4-step LLM pipeline post — that post covers what the model does; this one covers how the bytes get to your screen.
Photo by Markus Spiske on Unsplash
What "real-time streaming" actually means here
A short answer paragraph, since this is the question every developer asks first:
Real-time streaming in RapidNative means the model emits tokens via Server-Sent Events (SSE), the client reads each chunk through a ReadableStream reader loop, parses out individual <file> blocks from a custom <CodeProject> envelope as they complete, runs each file through streaming-safe post-processors, writes them to a Redux-backed virtual file system, and the in-browser bundler rebundles and reloads the preview iframe — all while the model is still typing the next file. From first token to first visible UI update is typically under a second.
That's eight distinct stages between the LLM and the screen. Let's walk through each one.
Stage 1: The four-step server pipeline
The AI generation endpoint lives at src/app/api/user/ai/generate-v2/route.ts and runs four stages on every request:
- Context gathering — a small, fast model with tool access (
get_files_content,batch_grep,read_skills) figures out what files matter for the user's request. - Signal parsing — the context model's output is scanned for deterministic markers like
NEW_SCREEN: yes,AUTH: yes,DB: yesthat tell the rest of the pipeline what to do. - Generation tools — if the request needs database schema or auth scaffolding, dedicated tools emit those files deterministically (no LLM creativity, just template expansion).
- Code generation — the main model runs with
maxSteps: 1,maxTokens: 32000, and no tools. Just pure code streaming back as text.
We use Vercel's AI SDK (ai v4.3.19) with the @ai-sdk/anthropic provider as the primary path, plus @openrouter/ai-sdk-provider for multi-model fallback. The relevant call looks like:
const step1Result = await streamText({
model: contextModel,
messages: contextMessages,
tools: contextTools,
toolChoice: 'auto',
temperature: 0.7,
maxTokens: 8000,
maxSteps: 10,
abortSignal: streamAbortController.signal,
});
for await (const chunk of step1Result.fullStream) {
if (chunk.type === 'text-delta') accumulatedText += chunk.textDelta;
if (chunk.type === 'tool-call') emitToolStatus(chunk);
if (chunk.type === 'tool-result') captureResult(chunk);
}
The fullStream is an AsyncIterable that gives us a typed view of every event the model emits: text deltas, tool calls, tool results, finish reasons, usage stats. We don't pipe this straight to the client — we re-encode it as our own SSE event format so the client can distinguish UI status messages from actual code.
Stage 2: The SSE event format
Every event the server emits to the browser is one of these:
function formatSSE(event: string, data: unknown): string {
return `event: ${event}\ndata: ${JSON.stringify(data)}\n\n`;
}
The named events are: start, text, tool_call, tool_result, usage, done, error, and screen_limit_exceeded. Splitting them out (instead of mashing everything into a single data: channel) lets the client switch on event type without parsing JSON it doesn't need. The response headers are the standard SSE incantation plus one custom field:
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache, no-transform',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no',
'X-Stream-Id': streamId,
},
});
X-Accel-Buffering: no matters more than people realise — without it, Nginx/Vercel's edge proxy will hold your stream in a buffer until you flush, which defeats the entire point. X-Stream-Id is the magic field that makes reconnection work, which brings us to the hardest part of streaming AI-generated code.
Stage 3: Resilient SSE — surviving network drops
A four-step LLM pipeline can run for 90 seconds on a complex request. In 90 seconds, plenty of users will switch Wi-Fi networks, walk into an elevator, or close their laptop. Naive SSE drops the entire generation on the floor when that happens.
We use persistent-request-response (v0.1.2) on top of Vercel KV (primary) and Upstash Redis (fallback). Every SSE event written to the response is also written to a KV-backed buffer keyed by streamId, with a 10-minute TTL:
export const resilientSSE = createResilientSSE({
kv: resolveAdapter(),
ttl: 600,
});
When the client reconnects, it sends back the last event ID it saw, and the server replays everything since that point from the KV buffer. From the user's perspective, the generation just continues — there's no "your request failed, please try again" toast, no half-finished file in the editor.
This is the single most impactful piece of infrastructure for perceived reliability. The model can be flawless and the LLM provider can be up, but if a 30-second mobile carrier handoff kills the stream, the user sees a failure. KV-backed SSE persistence turns that from a fatal error into a 200ms reconnect.
Photo by Taylor Vick on Unsplash
Stage 4: The client streaming consumer
On the client, we don't use useChat from ai/react. We need finer control over file extraction, debouncing, and abort handling than the hook gives us, so the streaming consumer is hand-rolled in a Redux thunk at src/modules/editor/store/thunks/editorThunks.ts.
The core loop is a textbook ReadableStreamDefaultReader.read() until done, but the SSE parsing is the interesting part — a tiny finite state machine that handles the line-by-line nature of the protocol without ever assuming a chunk arrives cleanly framed:
let sseBuffer = '';
let currentEvent = '';
let currentData = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
sseBuffer += decoder.decode(value, { stream: true });
const lines = sseBuffer.split('\n');
sseBuffer = lines.pop() || ''; // keep the incomplete trailing line
for (const line of lines) {
if (line.startsWith('event: ')) currentEvent = line.slice(7).trim();
else if (line.startsWith('data: ')) {
currentData = line.slice(6);
if (currentEvent && currentData) {
const data = JSON.parse(currentData);
switch (currentEvent) {
case 'text': streamedContent += data.content; break;
case 'done': lastFinishReason = data.finishReason; break;
case 'screen_limit_exceeded': showLimitToast(data); break;
case 'error': throw new Error(data.message);
}
}
currentEvent = '';
currentData = '';
}
}
}
The trick is sseBuffer = lines.pop(). SSE frames are separated by \n\n, but individual chunks delivered by the TextDecoder might cut a line in half. We always defer the last line back to the buffer because it might be incomplete. This is the one bug that everyone implementing SSE manually hits the first time.
Abort handling is wired through a standard AbortController. The Stop Generation button calls abortController.abort(), which fires a listener that calls reader.cancel(), which makes the read() promise resolve with { done: true }, which exits the loop cleanly. The same AbortSignal is also passed to persistentFetch so it stops trying to reconnect.
Stage 5: From text stream to file blocks
So far we've described how text gets from the model to a string in memory. But the user wants files, not text. The model emits structured output wrapped in custom tags:
<CodeProject>
```tsx file="app/index.tsx"
export default function HomeScreen() {
return <View><Text>Hello</Text></View>;
}
export const users = pgTable('users', { ... });
```
After every SSE text event, we re-run transformMessageContent(streamedContent) from src/shared/utils/messageTransformer.ts. This parser does not require well-formed input — it has to handle a half-finished file block where the closing ``` hasn't arrived yet. It returns an array of Block objects, each with a files array, and each file has an isComplete: boolean flag that tells downstream code whether to save it now or wait.
We treat different file types differently during streaming:
- Deterministic files (emitted by the auth/DB tools, not the main model): save once immediately on completion, no further processing.
.ts,.js,.jsonfiles: save immediately whenisCompleteflips to true, no debounce. These often have ordering dependencies (aseeds/users.tsmust save beforeseeds/index.tsreferences it)..tsxand.jsxfiles: 100ms debounced save. Every save triggers a Babel transform inside the bundler worker, which is the most expensive operation in the pipeline. Debouncing collapses many partial writes into one final transform per file._layout.tsxfiles: skipped entirely during streaming. Layout files re-evaluate the entire Expo Router tree, and mid-stream layout updates cause visible flickering. They're written once at the end.
Stage 6: Streaming-safe post-processors
When you have half a .tsx file in your hands, you can't just hand it to Babel and hope. It will throw on an unclosed JSX tag, a missing </View>, an import { useState, with a trailing comma and no closing brace. So before any partial file is committed to the virtual file system, it runs through a chain of streaming-safe processors in src/shared/utils/postProcessor.ts:
const STREAMING_SAFE_PROCESSORS = [
jsonValidator,
contentCleaner, // strips leaked markdown fences
jsxFixer, // closes unclosed tags, adds missing parens
importDeduplicator,
importFixer,
syntaxValidator,
];
export function postProcessFileStreaming(filename: string, content: string) {
let files = [{ filename, content, isValid: true }];
for (const processor of STREAMING_SAFE_PROCESSORS) {
try { files = processor.fn(files).files; }
catch (e) { /* never break streaming */ }
}
return { content: files[0]?.content, isValid: files[0]?.isValid !== false };
}
The contract for a streaming-safe processor is strict: never throw, never make the code worse, return isValid: false if you can't help. If isValid comes back false, we don't save that revision and wait for more tokens. This is what prevents the iframe from cycling through fifty syntax errors as the model types.
The JSX fixer in particular is doing a lot of work. It walks the partial source with @babel/parser in errorRecovery: true mode, finds the deepest unclosed JSX element on each branch, and synthesises closing tags. It's not a real fix — the next token might invalidate it — but it's "valid enough to compile" until the real closing tag arrives.
Photo by Carlos Muza on Unsplash
Stage 7: The virtual file system and Redux
Saved files don't go to disk. They go to a Redux-managed virtual file system (VFS) defined in src/modules/file/instances.ts. Every saveFile() thunk dispatch:
- Updates the Redux
projectFilesslice with the new content. - Fires VFS watchers registered via
vfs.watch(). - The watcher sends a delta (
{ path, content, type: 'change' }) to a Web Worker overpostMessage.
We use Redux Toolkit (@reduxjs/toolkit v2.8.2) for state because the editor has dozens of components that need to react to file changes — the file tree, the open editor tabs, the diff view, the activity log. A single subscribable store with shallow equality checks is the pragmatic answer.
The thing nobody tells you about streaming UI: Redux is fast enough. We were worried that dispatching a setMessages action on every SSE text chunk (potentially 50/second) would tank performance. It doesn't. React 18's automatic batching plus selector memoization eats it. The expensive operation is downstream, in the bundler.
Stage 8: The browser bundler and iframe blob URLs
The preview is the part that surprises most engineers when they hear how it works. There is no server-side build. There is no Metro server hosted on Vercel. The bundler runs entirely in the user's browser, inside a Web Worker, using browser-metro (v1.0.18) and @shaper-studio/almostmetro — JavaScript ports of Metro's core logic.
When the VFS watcher sends a file delta to the worker, the worker:
- Updates its own in-memory file map.
- Re-runs the dependency graph from the entry point.
- Babel-transforms changed files.
- Emits a new combined bundle as a string.
- Calls back to the main thread via
client.onBundle(code).
Back on the main thread, the bundle string gets wrapped in a Blob and a blob URL:
client.onBundle(async (code: string) => {
ps.latestBundle = code;
const blob = new Blob([code], { type: 'application/javascript' });
ps.htmlBlobUrl = URL.createObjectURL(blob);
Object.entries(localStore.iframeRefs).forEach(([_, iframe]) => {
iframe.src = ps.htmlBlobUrl;
});
});
Setting iframe.src to the blob URL reloads the iframe with the latest bundle. From the iframe's perspective, it's just loading a static JS file — no service worker tricks, no HMR runtime, no message passing. The iframe boots a stripped-down Expo Router shell from tools/rapidnative-expo-router/, evaluates the bundle, and renders. The whole loop from "model emits closing JSX tag" to "iframe shows new component" is typically 200-400ms.
If a single file fails to compile (a syntax error we couldn't recover from), the bundler replaces it with a BrokenComponentStub in the dependency graph rather than failing the whole build. Other screens keep working; only the broken one shows a fallback. This matters because users frequently iterate on one screen while three others sit idle — they shouldn't all break together.
What this buys us over the obvious approach
The obvious approach is: wait for the full LLM response, save the files to the server, push a server-side build, send a "build complete" event to the client, reload the iframe. It would work. Every step would be simpler.
What it would cost:
| Metric | Streaming pipeline | Wait-for-complete approach |
|---|---|---|
| Time to first visible update | 200-800ms | 30-60s |
| Network resilience | Reconnect mid-stream | Restart from scratch |
| Perceived progress | Tokens visible as they arrive | Spinner |
| Mid-generation interruption | Partial work preserved | Lost |
| Bundler load | Spread across stream | Spike at the end |
The cost on the engineering side is real — streaming-safe post-processors, debounced VFS writes, an SSE finite state machine, KV-backed reconnection — but the UX gap between the two approaches is enormous. Users feel the difference within the first second of using the product.
What we'd build differently if starting today
Three things, in order:
1. Use the AI SDK's data stream protocol instead of a custom SSE format. When we built this, the SDK's stream protocol was less mature, so we rolled our own. The SDK has since standardised on a clean SSE format with built-in tool-call streaming. We could delete a few hundred lines of glue code by adopting it.
2. Move post-processors into a Web Worker. Right now JSX fixing runs on the main thread between SSE events. It's fast (single-digit milliseconds per file), but it blocks paint. A worker would let us run heavier processors — TypeScript type checking, ESLint autofix — without UI cost.
3. Streaming partial bundles instead of full rebuilds. The bundler rebuilds the full graph on each file change. With dependency tracking, we could emit only the changed module and a tiny HMR shim instead. It'd cut bundle time on large projects from ~300ms to ~30ms.
See it for yourself
If you want to watch this pipeline in action, the easiest path is to open a new project on RapidNative, type a prompt like "build a habit tracker with a calendar view and a streak counter", and open your browser's Network tab to watch the SSE stream come back. The generate-v2 request will be open for 30-60 seconds; you'll see event: text lines flowing while the preview iframe keeps reloading with each new file the model commits.
For more on the LLM side of the pipeline, our 4-step LLM pipeline post covers context gathering, tool routing, and why we don't let the main model use tools. For the rendering side, our live preview deep dive covers how the browser bundler interacts with real devices over a QR-coded preview URL.
Start building a React Native app from a prompt and watch the streaming pipeline run on your own request.
Ready to Build Your App?
Turn your idea into a production-ready React Native app in minutes.
Free tools to get you started
Free AI PRD Generator
Generate a professional product requirements document in seconds. Describe your product idea and get a complete, structured PRD instantly.
Try it freeFree AI App Name Generator
Generate unique, brandable app name ideas with AI. Get creative name suggestions with taglines, brand colors, and monogram previews.
Try it freeFree AI App Icon Generator
Generate beautiful, professional app icons with AI. Describe your app and get multiple icon variations in different styles, ready for App Store and Google Play.
Try it freeFrequently Asked Questions
RapidNative is an AI-powered mobile app builder. Describe the app you want in plain English and RapidNative generates real, production-ready React Native screens you can preview, edit, and publish to the App Store or Google Play.