Streaming AI-Generated Code in Real Time: Inside RapidNative's Architecture

RI

By Rishav

28th May 2026

Last updated: 27th May 2026

Streaming AI-Generated Code in Real Time: Inside RapidNative's Architecture

When you type a prompt into ChatGPT, the worst-case failure mode is a paragraph that reads slightly off. When you type a prompt into RapidNative, the output is the running mobile app. Every token that streams back is a character of TypeScript that some browser-side bundler is about to try to compile and render inside an iframe — while the model is still typing.

That gap between "model is generating" and "user sees a screen update" is where most AI builders break. They either wait for the whole response (slow, dead-feeling UX), or they ship partial code to the bundler and watch the preview flicker through a dozen syntax errors per second. We've spent a lot of cycles on the boring, invisible pipework that closes that gap.

This post documents the actual transport layer: how streaming AI-generated code gets from a streamText() call on the server to a working React Native preview in the iframe, with no waiting, no broken JSX flashing on screen, and no lost data when someone's Wi-Fi drops mid-generation. It's a deep technical companion to the 4-step LLM pipeline post — that post covers what the model does; this one covers how the bytes get to your screen.

Code streaming on screen Photo by Markus Spiske on Unsplash

What "real-time streaming" actually means here

A short answer paragraph, since this is the question every developer asks first:

Real-time streaming in RapidNative means the model emits tokens via Server-Sent Events (SSE), the client reads each chunk through a ReadableStream reader loop, parses out individual <file> blocks from a custom <CodeProject> envelope as they complete, runs each file through streaming-safe post-processors, writes them to a Redux-backed virtual file system, and the in-browser bundler rebundles and reloads the preview iframe — all while the model is still typing the next file. From first token to first visible UI update is typically under a second.

That's eight distinct stages between the LLM and the screen. Let's walk through each one.

Stage 1: The four-step server pipeline

The AI generation endpoint lives at src/app/api/user/ai/generate-v2/route.ts and runs four stages on every request:

  1. Context gathering — a small, fast model with tool access (get_files_content, batch_grep, read_skills) figures out what files matter for the user's request.
  2. Signal parsing — the context model's output is scanned for deterministic markers like NEW_SCREEN: yes, AUTH: yes, DB: yes that tell the rest of the pipeline what to do.
  3. Generation tools — if the request needs database schema or auth scaffolding, dedicated tools emit those files deterministically (no LLM creativity, just template expansion).
  4. Code generation — the main model runs with maxSteps: 1, maxTokens: 32000, and no tools. Just pure code streaming back as text.

We use Vercel's AI SDK (ai v4.3.19) with the @ai-sdk/anthropic provider as the primary path, plus @openrouter/ai-sdk-provider for multi-model fallback. The relevant call looks like:

const step1Result = await streamText({
  model: contextModel,
  messages: contextMessages,
  tools: contextTools,
  toolChoice: 'auto',
  temperature: 0.7,
  maxTokens: 8000,
  maxSteps: 10,
  abortSignal: streamAbortController.signal,
});

for await (const chunk of step1Result.fullStream) {
  if (chunk.type === 'text-delta')  accumulatedText += chunk.textDelta;
  if (chunk.type === 'tool-call')   emitToolStatus(chunk);
  if (chunk.type === 'tool-result') captureResult(chunk);
}

The fullStream is an AsyncIterable that gives us a typed view of every event the model emits: text deltas, tool calls, tool results, finish reasons, usage stats. We don't pipe this straight to the client — we re-encode it as our own SSE event format so the client can distinguish UI status messages from actual code.

Stage 2: The SSE event format

Every event the server emits to the browser is one of these:

function formatSSE(event: string, data: unknown): string {
  return `event: ${event}\ndata: ${JSON.stringify(data)}\n\n`;
}

The named events are: start, text, tool_call, tool_result, usage, done, error, and screen_limit_exceeded. Splitting them out (instead of mashing everything into a single data: channel) lets the client switch on event type without parsing JSON it doesn't need. The response headers are the standard SSE incantation plus one custom field:

return new Response(stream, {
  headers: {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache, no-transform',
    'Connection': 'keep-alive',
    'X-Accel-Buffering': 'no',
    'X-Stream-Id': streamId,
  },
});

X-Accel-Buffering: no matters more than people realise — without it, Nginx/Vercel's edge proxy will hold your stream in a buffer until you flush, which defeats the entire point. X-Stream-Id is the magic field that makes reconnection work, which brings us to the hardest part of streaming AI-generated code.

Stage 3: Resilient SSE — surviving network drops

A four-step LLM pipeline can run for 90 seconds on a complex request. In 90 seconds, plenty of users will switch Wi-Fi networks, walk into an elevator, or close their laptop. Naive SSE drops the entire generation on the floor when that happens.

We use persistent-request-response (v0.1.2) on top of Vercel KV (primary) and Upstash Redis (fallback). Every SSE event written to the response is also written to a KV-backed buffer keyed by streamId, with a 10-minute TTL:

export const resilientSSE = createResilientSSE({
  kv: resolveAdapter(),
  ttl: 600,
});

When the client reconnects, it sends back the last event ID it saw, and the server replays everything since that point from the KV buffer. From the user's perspective, the generation just continues — there's no "your request failed, please try again" toast, no half-finished file in the editor.

This is the single most impactful piece of infrastructure for perceived reliability. The model can be flawless and the LLM provider can be up, but if a 30-second mobile carrier handoff kills the stream, the user sees a failure. KV-backed SSE persistence turns that from a fatal error into a 200ms reconnect.

Server room with cables Photo by Taylor Vick on Unsplash

Stage 4: The client streaming consumer

On the client, we don't use useChat from ai/react. We need finer control over file extraction, debouncing, and abort handling than the hook gives us, so the streaming consumer is hand-rolled in a Redux thunk at src/modules/editor/store/thunks/editorThunks.ts.

The core loop is a textbook ReadableStreamDefaultReader.read() until done, but the SSE parsing is the interesting part — a tiny finite state machine that handles the line-by-line nature of the protocol without ever assuming a chunk arrives cleanly framed:

let sseBuffer = '';
let currentEvent = '';
let currentData = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  sseBuffer += decoder.decode(value, { stream: true });
  const lines = sseBuffer.split('\n');
  sseBuffer = lines.pop() || '';  // keep the incomplete trailing line

  for (const line of lines) {
    if (line.startsWith('event: ')) currentEvent = line.slice(7).trim();
    else if (line.startsWith('data: ')) {
      currentData = line.slice(6);
      if (currentEvent && currentData) {
        const data = JSON.parse(currentData);
        switch (currentEvent) {
          case 'text': streamedContent += data.content; break;
          case 'done': lastFinishReason = data.finishReason; break;
          case 'screen_limit_exceeded': showLimitToast(data); break;
          case 'error': throw new Error(data.message);
        }
      }
      currentEvent = '';
      currentData = '';
    }
  }
}

The trick is sseBuffer = lines.pop(). SSE frames are separated by \n\n, but individual chunks delivered by the TextDecoder might cut a line in half. We always defer the last line back to the buffer because it might be incomplete. This is the one bug that everyone implementing SSE manually hits the first time.

Abort handling is wired through a standard AbortController. The Stop Generation button calls abortController.abort(), which fires a listener that calls reader.cancel(), which makes the read() promise resolve with { done: true }, which exits the loop cleanly. The same AbortSignal is also passed to persistentFetch so it stops trying to reconnect.

Stage 5: From text stream to file blocks

So far we've described how text gets from the model to a string in memory. But the user wants files, not text. The model emits structured output wrapped in custom tags:

<CodeProject>
```tsx file="app/index.tsx"
export default function HomeScreen() {
  return <View><Text>Hello</Text></View>;
}
export const users = pgTable('users', { ... });
```

After every SSE text event, we re-run transformMessageContent(streamedContent) from src/shared/utils/messageTransformer.ts. This parser does not require well-formed input — it has to handle a half-finished file block where the closing ``` hasn't arrived yet. It returns an array of Block objects, each with a files array, and each file has an isComplete: boolean flag that tells downstream code whether to save it now or wait.

We treat different file types differently during streaming:

  • Deterministic files (emitted by the auth/DB tools, not the main model): save once immediately on completion, no further processing.
  • .ts, .js, .json files: save immediately when isComplete flips to true, no debounce. These often have ordering dependencies (a seeds/users.ts must save before seeds/index.ts references it).
  • .tsx and .jsx files: 100ms debounced save. Every save triggers a Babel transform inside the bundler worker, which is the most expensive operation in the pipeline. Debouncing collapses many partial writes into one final transform per file.
  • _layout.tsx files: skipped entirely during streaming. Layout files re-evaluate the entire Expo Router tree, and mid-stream layout updates cause visible flickering. They're written once at the end.

Stage 6: Streaming-safe post-processors

When you have half a .tsx file in your hands, you can't just hand it to Babel and hope. It will throw on an unclosed JSX tag, a missing </View>, an import { useState, with a trailing comma and no closing brace. So before any partial file is committed to the virtual file system, it runs through a chain of streaming-safe processors in src/shared/utils/postProcessor.ts:

const STREAMING_SAFE_PROCESSORS = [
  jsonValidator,
  contentCleaner,    // strips leaked markdown fences
  jsxFixer,          // closes unclosed tags, adds missing parens
  importDeduplicator,
  importFixer,
  syntaxValidator,
];

export function postProcessFileStreaming(filename: string, content: string) {
  let files = [{ filename, content, isValid: true }];
  for (const processor of STREAMING_SAFE_PROCESSORS) {
    try { files = processor.fn(files).files; }
    catch (e) { /* never break streaming */ }
  }
  return { content: files[0]?.content, isValid: files[0]?.isValid !== false };
}

The contract for a streaming-safe processor is strict: never throw, never make the code worse, return isValid: false if you can't help. If isValid comes back false, we don't save that revision and wait for more tokens. This is what prevents the iframe from cycling through fifty syntax errors as the model types.

The JSX fixer in particular is doing a lot of work. It walks the partial source with @babel/parser in errorRecovery: true mode, finds the deepest unclosed JSX element on each branch, and synthesises closing tags. It's not a real fix — the next token might invalidate it — but it's "valid enough to compile" until the real closing tag arrives.

Developer at laptop Photo by Carlos Muza on Unsplash

Stage 7: The virtual file system and Redux

Saved files don't go to disk. They go to a Redux-managed virtual file system (VFS) defined in src/modules/file/instances.ts. Every saveFile() thunk dispatch:

  1. Updates the Redux projectFiles slice with the new content.
  2. Fires VFS watchers registered via vfs.watch().
  3. The watcher sends a delta ({ path, content, type: 'change' }) to a Web Worker over postMessage.

We use Redux Toolkit (@reduxjs/toolkit v2.8.2) for state because the editor has dozens of components that need to react to file changes — the file tree, the open editor tabs, the diff view, the activity log. A single subscribable store with shallow equality checks is the pragmatic answer.

The thing nobody tells you about streaming UI: Redux is fast enough. We were worried that dispatching a setMessages action on every SSE text chunk (potentially 50/second) would tank performance. It doesn't. React 18's automatic batching plus selector memoization eats it. The expensive operation is downstream, in the bundler.

Stage 8: The browser bundler and iframe blob URLs

The preview is the part that surprises most engineers when they hear how it works. There is no server-side build. There is no Metro server hosted on Vercel. The bundler runs entirely in the user's browser, inside a Web Worker, using browser-metro (v1.0.18) and @shaper-studio/almostmetro — JavaScript ports of Metro's core logic.

When the VFS watcher sends a file delta to the worker, the worker:

  1. Updates its own in-memory file map.
  2. Re-runs the dependency graph from the entry point.
  3. Babel-transforms changed files.
  4. Emits a new combined bundle as a string.
  5. Calls back to the main thread via client.onBundle(code).

Back on the main thread, the bundle string gets wrapped in a Blob and a blob URL:

client.onBundle(async (code: string) => {
  ps.latestBundle = code;
  const blob = new Blob([code], { type: 'application/javascript' });
  ps.htmlBlobUrl = URL.createObjectURL(blob);

  Object.entries(localStore.iframeRefs).forEach(([_, iframe]) => {
    iframe.src = ps.htmlBlobUrl;
  });
});

Setting iframe.src to the blob URL reloads the iframe with the latest bundle. From the iframe's perspective, it's just loading a static JS file — no service worker tricks, no HMR runtime, no message passing. The iframe boots a stripped-down Expo Router shell from tools/rapidnative-expo-router/, evaluates the bundle, and renders. The whole loop from "model emits closing JSX tag" to "iframe shows new component" is typically 200-400ms.

If a single file fails to compile (a syntax error we couldn't recover from), the bundler replaces it with a BrokenComponentStub in the dependency graph rather than failing the whole build. Other screens keep working; only the broken one shows a fallback. This matters because users frequently iterate on one screen while three others sit idle — they shouldn't all break together.

What this buys us over the obvious approach

The obvious approach is: wait for the full LLM response, save the files to the server, push a server-side build, send a "build complete" event to the client, reload the iframe. It would work. Every step would be simpler.

What it would cost:

MetricStreaming pipelineWait-for-complete approach
Time to first visible update200-800ms30-60s
Network resilienceReconnect mid-streamRestart from scratch
Perceived progressTokens visible as they arriveSpinner
Mid-generation interruptionPartial work preservedLost
Bundler loadSpread across streamSpike at the end

The cost on the engineering side is real — streaming-safe post-processors, debounced VFS writes, an SSE finite state machine, KV-backed reconnection — but the UX gap between the two approaches is enormous. Users feel the difference within the first second of using the product.

What we'd build differently if starting today

Three things, in order:

1. Use the AI SDK's data stream protocol instead of a custom SSE format. When we built this, the SDK's stream protocol was less mature, so we rolled our own. The SDK has since standardised on a clean SSE format with built-in tool-call streaming. We could delete a few hundred lines of glue code by adopting it.

2. Move post-processors into a Web Worker. Right now JSX fixing runs on the main thread between SSE events. It's fast (single-digit milliseconds per file), but it blocks paint. A worker would let us run heavier processors — TypeScript type checking, ESLint autofix — without UI cost.

3. Streaming partial bundles instead of full rebuilds. The bundler rebuilds the full graph on each file change. With dependency tracking, we could emit only the changed module and a tiny HMR shim instead. It'd cut bundle time on large projects from ~300ms to ~30ms.

See it for yourself

If you want to watch this pipeline in action, the easiest path is to open a new project on RapidNative, type a prompt like "build a habit tracker with a calendar view and a streak counter", and open your browser's Network tab to watch the SSE stream come back. The generate-v2 request will be open for 30-60 seconds; you'll see event: text lines flowing while the preview iframe keeps reloading with each new file the model commits.

For more on the LLM side of the pipeline, our 4-step LLM pipeline post covers context gathering, tool routing, and why we don't let the main model use tools. For the rendering side, our live preview deep dive covers how the browser bundler interacts with real devices over a QR-coded preview URL.

Start building a React Native app from a prompt and watch the streaming pipeline run on your own request.

Ready to Build Your App?

Turn your idea into a production-ready React Native app in minutes.

Try It Now

Free tools to get you started

Frequently Asked Questions

RapidNative is an AI-powered mobile app builder. Describe the app you want in plain English and RapidNative generates real, production-ready React Native screens you can preview, edit, and publish to the App Store or Google Play.