Under the Hood: How RapidNative Streams AI-Generated Components in Real Time
By Parth
7th May 2026
Last updated: 7th May 2026
Streaming AI code generation looks simple from the outside. Type a prompt, watch React Native components materialize on screen, see the preview update before the model has even finished its sentence. The reality is a deceptively complex pipeline: a streaming HTTP response, a partial-chunk parser, a debounced virtual file system, a Redux store racing the network, a bundler that has to compile invalid JSX gracefully, and a Redis-backed reconnection layer that lets a stream survive your phone falling asleep mid-generation.
This post walks through the actual architecture behind RapidNative's real-time component streaming — the file paths, the libraries, the trade-offs, and the small decisions that make the experience feel instant rather than sluggish.
Photo by Christopher Gower on Unsplash
What "real-time streaming" actually means in this context
Before unpacking the implementation, it helps to define the user-visible contract. When a developer or founder types a prompt in RapidNative, three things should be true:
- The chat panel updates as the model thinks — text, tool calls, and code arrive incrementally rather than after a 30-second wait.
- The live preview reflects the most recent valid state — files appearing partway through generation should not crash the bundler.
- A flaky network does not lose the response — closing a laptop, changing networks, or refreshing the browser should resume the stream rather than start over.
These three guarantees pull in different directions. Streaming text into the UI is easy. Streaming text into a live React Native preview without it blowing up is harder. Surviving a TCP reset mid-generation is harder still. The architecture is shaped by all three constraints simultaneously.
The high-level pipeline
At the top level, streaming AI code generation in RapidNative flows through six layers:
User prompt
│
▼
[1] Next.js API route /api/user/ai/generate-v2
│
▼
[2] Vercel AI SDK streamText() → LLM provider (OpenRouter, Anthropic, etc.)
│
▼
[3] Persistent SSE wrapper (Redis-backed event buffer)
│
▼
[4] Browser fetch reader → Redux thunk consumer
│
▼
[5] Message parser → Virtual File System (debounced writes)
│
▼
[6] Bundler → iframe preview hot reload
Each layer is independently swappable. The LLM provider can change without touching the parser. The SSE buffer can fail over from Vercel KV to Upstash Redis. The bundler can be replaced without rewriting the streaming consumer. That separation is intentional — it is what allowed the team to iterate on each piece without rewriting the whole pipeline.
Layer 1 and 2: from prompt to streamed tokens
The entry point is a single Next.js App Router route handler at src/app/api/user/ai/generate-v2/route.ts. It is one of the larger files in the codebase — over a thousand lines — because it coordinates everything that needs to happen before, during, and after the stream.
A request lands and four things happen in parallel:
- Auth and team resolution — NextAuth session lookup, team membership check
- Credit validation — the saas credit service checks whether the team has enough budget for an AI call
- Conversation history fetch — last four messages from the
messagesPostgres table - Project file listing — paths only, capped at 100, to give the model an index of what exists
Running these in parallel rather than serially shaves hundreds of milliseconds off time-to-first-token. The credit check, in particular, used to be an early bottleneck — fixing it required moving from await chains to Promise.all and only awaiting the credit promise when its result was actually needed.
Once context is gathered, generation kicks off using the Vercel AI SDK (ai v4.3.19) with its streamText() primitive. RapidNative runs a four-step pipeline rather than one big call:
- Context gathering — a fast, cheap model with tool-calling enabled (
get_files_content,batch_grep,get_images_by_keywords) explores the existing project and emits a structured context block. - Auth detection — a deterministic parse of the Step 1 output checks whether
(auth)routes need to be generated. - Deterministic generation tools — for database schemas and auth screens, code is emitted from validated templates rather than freshly generated. This is faster, cheaper, and removes a class of model-hallucinated bugs.
- Main code generation — the primary model receives the gathered context and streams
<CodeProject>blocks containing TSX files. Critically, no tools are exposed in this step — all context is pre-fetched in Step 1, so the model can focus 100% of its tokens on generating code rather than calling tools mid-stream.
Splitting context-gathering and code-generation into separate calls is one of the more counterintuitive decisions in the system. Naively, it looks like overhead — two API calls instead of one, two cold-starts instead of one. In practice, it is faster end-to-end because Step 1 uses a cheaper model with a small token budget, and Step 4 never hits the maxSteps ceiling that long tool-calling chains tend to.
Layer 3: persistent SSE, or why your stream survives a tunnel
A naive streamText() response uses toDataStreamResponse() and pipes directly to the browser. That works until the user's network glitches — then the connection dies and the entire generation is lost. For a 30-second code generation, that is a poor user experience. For a 2-minute generation involving database schemas and a 12-screen app, it is unacceptable.
RapidNative wraps the AI SDK output in a persistent SSE layer built on top of persistent-request-response (v0.1.2) plus a Redis-compatible KV store (Vercel KV in production, Upstash Redis as fallback).
The route handler creates a stream like this:
import { resilientSSE } from '@/lib/resilient-sse-instance';
const { stream, enqueue, close, streamId } = resilientSSE.createStream();
// Returned to client in response header:
// X-Stream-Id: <uuid>
Every Server-Sent Event the route produces — start, text, tool_call, usage, done, error — is enqueued into Redis with a 10-minute TTL before being flushed to the client. If the client disconnects, the stream keeps running on the server and events keep accumulating in Redis.
When the client comes back, it sends a follow-up POST with the original streamId and the lastEventId it received. The server replays events from that point onward and seamlessly attaches the live producer. From the user's perspective, the chat just keeps updating.
This is similar in spirit to the resumable streams pattern Vercel describes, but with a key difference — the buffer is persisted server-side rather than relying on the client to retry. That matters when the disconnect lasts longer than a TCP retry window.
Photo by Taylor Vick on Unsplash
Layer 4: how the browser reads the stream
On the client, a Redux thunk called sendMessage in src/modules/editor/store/thunks/editorThunks.ts is responsible for issuing the request and consuming the stream. Rather than using the AI SDK's useChat hook, RapidNative uses a custom thunk because the streamed payload contains structured XML blocks (<CodeProject>, <QuickEdit>, <Action>) that need bespoke parsing.
The flow is roughly:
const response = await persistentFetch(
'/api/user/ai/generate-v2',
{ method: 'POST', body: JSON.stringify(body) },
{ projectId }
);
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// parse SSE event boundaries, dispatch to switch(currentEvent) handler
}
Each SSE event is dispatched into a switch block. text events append to the streaming message and trigger a Redux update so the chat panel re-renders with the new content. tool_call events render an "Analyzing project…" indicator. usage events are buffered for cost tracking. done flips isAiRequestInProgress to false.
The Redux slices that hold this state live in src/modules/editor/store/slices/editorSlice.ts:
messages— the full chat history, including the in-flight assistant messageisAiRequestInProgress— boolean toggling spinners and the "Stop" buttonhasCodeProjectDetected— flips true the moment the parser sees<CodeProject>optimisticMessage— the user's prompt rendered immediately, before the server confirms
That last one matters. A user's message appears in the chat the instant they hit Enter, not after the server round-trip. If the request fails, the optimistic message is rolled back. This is a small touch but it makes the UI feel many hundreds of milliseconds faster than it actually is.
Layer 5: turning streamed text into files
Here is where things get interesting. The model's output is a wall of text. The preview needs files. Something has to turn one into the other while the stream is still in flight.
That something is transformMessageContent() in src/shared/utils/messageTransformer.ts. It is a stateful, line-by-line parser that scans the streaming buffer and emits typed blocks:
interface MessageBlock {
type: 'text' | 'code' | 'codeproject' | 'quickedit'
| 'projectmetadata' | 'action' | 'appicon';
files?: CodeProjectFile[];
quickEdit?: QuickEditBlock;
// ...
}
The parser handles XML-like wrappers the model is instructed to emit:
<CodeProject>
```tsx file="app/index.tsx"
import { View, Text } from 'react-native';
export default function Home() { ... }
...
```
The trick is that the parser must tolerate incomplete input. Halfway through generation, the buffer looks like this:
<CodeProject>
```tsx file="app/index.tsx"
import { View, Te
The closing backticks have not arrived yet. The parser cannot wait for them — that would defeat real-time streaming. Instead, it recognizes a "currently being written" file, marks it as in-progress, and lets downstream consumers decide what to do. JSX/TSX files use a 100ms debounce. JSON and TypeScript files only commit on isComplete, because invalid JSON would crash the runtime.
Why a 100ms debounce, specifically
This number was tuned empirically and it is one of the more interesting micro-decisions. Streaming chunks arrive faster than the bundler can compile. If every partial chunk triggered a save and a recompile, the bundler would thrash and the iframe would flicker.
Saving only at the end of the stream is the obvious alternative — but then the user stares at a stale preview for 20 seconds while watching code stream into the chat. That breaks the sense of liveness.
100 milliseconds turned out to be the sweet spot:
- Long enough that a typical "burst" of streamed tokens (tens of bytes) gets coalesced into one save
- Short enough that humans perceive the preview as live, not stale
- Forgiving enough that small syntax errors mid-stream get corrected by the next chunk before the bundler tries to compile
The debounce is implemented per-filename so two files streaming concurrently don't clobber each other:
const debouncedSaveFile = (filename: string, op: () => Promise<any>) => {
if (debounceState.timeoutId) clearTimeout(debounceState.timeoutId);
debounceState.lastCall = op;
debounceState.timeoutId = setTimeout(async () => {
if (debounceState.lastCall) await debounceState.lastCall();
}, 100);
};
There is one exception: route-defining files like app/(tabs)/_layout.tsx are deliberately not saved during the stream. A half-written _layout file would invalidate every screen route. Those files are buffered and only committed once the stream completes.
Post-processing during streaming
Before each file hits the virtual file system, it goes through postProcessFileStreaming(). This step fixes a small set of predictable issues — missing imports, accidentally-stripped useState calls, JSX comments inside attribute values — that LLMs occasionally emit. It is intentionally narrow. The post-processor is not a linter and does not try to fix general bugs. It exists to compensate for a handful of model failure modes that would otherwise make the preview unrenderable.
Layer 6: bundler and iframe preview
Files saved to the virtual file system are watched by an in-browser bundler. New or updated TSX files trigger an iframe message that re-executes the affected screens. The iframe runs Expo Router on top of react-native-web, so the same code runs in the preview that ships to a real device.
The post-stream pass does one more thing: it snapshots iframe keys before generation starts, then after done, reloads only the iframes corresponding to newly created screens. Pre-existing screens are not reloaded — they stay on whatever route the user was inspecting. This preserves navigation state across generations, which is a small detail that adds up over a long session.
For more on how the preview actually compiles streamed code in the browser, the real-time React Native live preview architecture post covers the bundler design in depth.
Photo by Daniel Romero on Unsplash
How does Server-Sent Events (SSE) compare to WebSockets for AI streaming?
SSE is a one-way protocol — server to client — built on plain HTTP. WebSockets are bidirectional and use their own framed protocol. For LLM streaming, SSE is almost always the right choice: it works through corporate proxies, plays nicely with HTTP/2, supports automatic reconnection at the browser level, and does not require a separate connection upgrade. WebSockets only win when the client also needs to push frequent, small, low-latency messages back to the server — which a code generation request does not.
Does streaming actually make the app faster?
Streaming does not reduce the total time the model spends generating tokens. What it changes is the perceived latency. A 25-second generation that streams its first text in 800ms feels dramatically faster than a 15-second generation that delivers everything at once. For AI mobile app builders, that perception gap is the difference between "I'm watching code happen" and "I'm waiting for a server."
There is a real performance benefit beyond perception, too: streaming files to the bundler in chunks means the first screens compile and render while the rest of the app is still being generated. By the time the model finishes, the early screens are already interactive in the preview.
Architecture decisions worth pulling out
A few of the choices made in the streaming pipeline are worth highlighting because they go against the obvious approach:
| Decision | Obvious alternative | Why we chose it |
|---|---|---|
| Custom Redux thunk for SSE | Vercel AI SDK useChat hook | Need to parse <CodeProject> and <QuickEdit> XML blocks the SDK doesn't know about |
| 100ms per-file debounce | Save on every chunk, or save at end | Avoids bundler thrash without making the preview feel stale |
| Persistent SSE with Redis buffer | Plain toDataStreamResponse() | Network glitches don't lose the entire generation |
| Four-step pipeline (gather, detect, deterministic, generate) | One streamText call with tools | Step 4 stays focused on code; deterministic steps remove a class of bugs |
_layout.tsx saves deferred to end | Save like any other file | Half-written layout would invalidate every route mid-stream |
| Pre-fetch context in Step 1 | Tool calls inside Step 4 | Eliminates maxSteps exhaustion and tool-loop token waste |
Each of these emerged from a specific failure mode observed in production. The 100ms debounce came from bundler thrash. The persistent SSE came from users complaining about lost generations on flaky Wi-Fi. The deferred _layout saves came from one bad afternoon when every project would crash mid-generation.
Where it goes from here
The pieces that are most likely to evolve are the parser and the post-processor. As models get better at emitting clean code, the post-processor shrinks. As prompts get more structured (and as the AI SDK matures its structured streaming primitives), the parser may eventually be replaced with native streamObject() schemas.
The streaming protocol itself is more stable. SSE plus a Redis-backed buffer is well-understood, debuggable, and works in every browser. There is no obvious upside to switching to WebSockets or HTTP/3 push.
For a deeper look at how the underlying two-step AI pipeline and browser bundler feed into this streaming architecture — and at how multiple LLMs are routed per step — those posts dig into the model selection and compilation layers that bracket what is described here.
Try it
The fastest way to see streaming AI code generation in action is to type a prompt and watch the chat panel and preview update side by side. Start at rapidnative.com — there's a free tier, no credit card needed, and the first prompt will stream a working React Native screen in under a minute.
If you want to compare different input modes, try the same idea via whiteboard sketch, PRD, or screenshot — the streaming pipeline is identical regardless of input format.
Ready to Build Your App?
Turn your idea into a production-ready React Native app in minutes.
Free tools to get you started
Free AI PRD Generator
Generate a professional product requirements document in seconds. Describe your product idea and get a complete, structured PRD instantly.
Try it freeFree AI App Name Generator
Generate unique, brandable app name ideas with AI. Get creative name suggestions with taglines, brand colors, and monogram previews.
Try it freeFree AI App Icon Generator
Generate beautiful, professional app icons with AI. Describe your app and get multiple icon variations in different styles, ready for App Store and Google Play.
Try it freeFrequently Asked Questions
RapidNative is an AI-powered mobile app builder. Describe the app you want in plain English and RapidNative generates real, production-ready React Native screens you can preview, edit, and publish to the App Store or Google Play.