Can I export the code?

Yes. RapidNative generates clean React Native and Expo code that you can export at any time. No lock-in, no proprietary format. Hand it to your developers or keep building inside RapidNative.

Is RapidNative free to use?

Yes. You can build apps on the free plan with no credit card required. Paid plans unlock unlimited AI generations, code export, and direct publishing to the App Store and Google Play.

Do I need to know how to code?

No. Most users build apps by describing what they want in plain English. Developers can drop into the code whenever they want more control, but coding is optional.

How long does it take to build an app?

Most users have a working first screen in under a minute. A full MVP usually takes a few hours instead of the weeks or months traditional development requires.

Why RapidNative Runs a 4-Step LLM Pipeline to Generate React Native Code

. No hallucinations. Identical output every time.

By Suraj Ahmed

24th May 2026

Last updated: 23rd May 2026

Why RapidNative Runs a 4-Step LLM Pipeline to Generate React Native Code

The cleanest mental model of an AI code generator is one model, one prompt, one file. Send the user's request to Claude or GPT-4, take whatever comes back, render it. It works in a demo. It breaks the moment a real user types "add a profile screen that pulls the logged-in user from the database" into a project that already has thirty files, an auth stack, and a half-finished settings tab.

A single model now has to do four jobs at once: figure out which files to read, decide whether auth is in play, plan a database migration, and write production-quality React Native code that compiles on the first try. Asking one LLM to do all of that means you're either burning tokens on a frontier model for tasks a small model could nail, or you're starving the actual code-generation step of context. Neither produces an app worth shipping.

This is why the LLM pipeline for code generation inside RapidNative is split into four distinct steps, each routed to a different model chosen for the specific job. Below is a walkthrough of how that pipeline actually works in production, drawn from the codebase that powers it.

Mobile developer working at laptop with code on screen Modern AI code generation isn't a single model call — it's an orchestrated pipeline. Photo by David Pupăză on Unsplash

The 4-step LLM pipeline at a glance

Every chat message from a user routes through src/app/api/user/ai/generate-v2/route.ts. Inside that route, the request flows through four ordered steps:

Context gathering — a fast, cheaper model equipped with file-reading tools figures out what the user is asking for and which existing files matter.
Auth and screen-limit detection — semantic signals (AUTH: yes, NEW_SCREEN: yes) parsed out of Step 1's output decide whether the next step needs to scaffold authentication, and whether a free-tier user is about to hit their screen cap.
Deterministic generation tools — pure code (no LLM) generates database schemas, migrations, and auth screens from a structured JSON description.
Final code generation — the strongest model writes the actual React Native screens, with no tools attached, given all the context the first three steps gathered.

Each step uses a model selected for that step's specific tradeoff between speed, cost, and capability. The model assignments live in a Supabase table (ai_model_config), cached for five minutes, so the team can swap models without redeploying.

Here's why that split matters — and how each step works under the hood.

Step 1: Context gathering with a fast, tool-equipped model

When a prompt arrives, the very first thing the system needs is grounding. The user wrote "add a profile screen that shows the current user's stats." That sentence is meaningless without knowing:

What files already exist in the project
Whether there's already a users table in the database
Whether the app has authentication wired up
Whether the user means a brand-new screen or an edit to an existing one

A frontier model could answer all of this, but you'd waste the expensive context window on file reads and grep results. Instead, Step 1 runs on a smaller, cheaper model — typically something in the Llama 3.1 8B or Qwen-3 Coder tier on OpenRouter — with a tight, tool-equipped prompt.

The tools available in Step 1 are defined in src/modules/api/services/ai/providers/ToolsProvider.ts:

get_files_content({ paths }) — reads files (with optional line ranges) from the project's virtual file system
list_dir() — lists every file in the project
glob({ pattern }) — finds files by pattern
batch_grep({ pattern }) — regex search across the project
get_images_by_keywords({ keywords }) — pulls image URLs from the project asset library
list_skills, search_skills, read_skills — dynamic knowledge injection

Some models get a reduced tool set. src/modules/api/services/ai/llm/models.ts declares a MODEL_TOOLS map that disables batch_grep for models known to misuse regex — Qwen 3 Coder and Llama 3.3 70B both fall into this category.

The Step 1 prompt is intentionally slim — around 3K tokens — and the model is instructed to output two structured signals inline before any prose: AUTH: yes|no and (for free-tier users near their screen limit) NEW_SCREEN: yes|no. Those signals get parsed out and fed to Step 2.

This step streams its tool calls and text deltas back to the client over SSE, so the user sees the "thinking" phase happen in real time. The code in generate-v2/route.ts uses streamText() from the Vercel AI SDK and pulls each chunk via for await (const chunk of step1Result.fullStream), dispatching text-delta, tool-call, and tool-result events as they arrive.

Step 2: Cheap semantic gating before expensive work

Step 2 isn't a model call at all — it's a parser on top of Step 1's output. But it's worth naming as a distinct step because it's where the pipeline saves the most money.

Two gates run here:

The auth gate. If AUTH: yes appeared in Step 1's output, Step 3 will scaffold a (auth)/_layout.tsx, a sign-in screen, and a sign-up screen — and the root layout will get patched to guard protected routes. If AUTH: no, none of that runs.

The screen-limit gate. Free-tier users get five screens per project. Before kicking off Step 4 (the expensive one), the route counts the existing screens in the project's app/ directory and, if the user is already at the limit, checks NEW_SCREEN: yes from Step 1. If both conditions are true, the route emits an SSE screen_limit_exceeded event and bails out — before burning any tokens on the main generation model.

This kind of cheap, deterministic gate is the single biggest argument for splitting a code-generation pipeline across steps. A monolithic prompt to a frontier model would only discover the screen-limit problem after spending several thousand output tokens drafting a screen the user is never allowed to see.

Mobile phone showing app interface A real AI code generator has to decide what to build before deciding how to build it. Photo by Rodion Kutsaiev on Unsplash

Step 3: Deterministic tools generate the boring (and risky) parts

Database schemas and auth boilerplate share a property that makes them dangerous to leave to an LLM: they're highly structured, and getting them subtly wrong breaks the app silently.

So Step 3 doesn't ask an LLM to write them. Instead, Step 1 has already (when relevant) emitted a structured JSON description like:

{
  "tables": [
    {
      "name": "posts",
      "columns": [
        { "name": "title", "type": "string", "required": true },
        { "name": "likes", "type": "integer" }
      ],
      "relationships": [
        { "name": "author", "type": "belongsTo", "target": "users", "foreignKey": "user_id" }
      ]
    }
  ]
}

Pure TypeScript code in tools/project-templates/fullstack/ai/tools/database.ts consumes that JSON and emits the actual files:

src/db/schema.ts in the project's DB client format
src/db/seeds/{table}.ts seed files
pocketbase/migrations/{timestamp}_create_{table}.js migration files
pocketbase/seeds/{table}.mjs PocketBase seed files

For follow-up messages, the tool parses the existing schema.ts, merges the AI's delta into it, and preserves unchanged tables — so adding a comments table to an app that already has posts and users doesn't accidentally drop the other two.

The auth tool in tools/project-templates/fullstack/ai/tools/auth.ts works the same way: if Step 1 said AUTH: yes, the tool emits the standard sign-in/sign-up screens and the route guard. No LLM, no hallucination risk, identical output every time.

Letting deterministic code own the structured parts means Step 4 — the only step where you really want a creative model — never has to think about migrations or auth boilerplate. It just writes screens.

Step 4: The main code generation with no tools and full context

By the time Step 4 starts, every piece of context the model needs has been packed into the prompt:

The full file content gathered in Step 1, formatted as --- FILE.tsx ---\n[content] blocks
The image URLs the user referenced (if any), already validated
The skills the model should follow — design patterns, layout docs, TanStack Query syntax, pre-output checklists — pulled from a compiled skill registry
The current file the user has open (for point-and-edit prompts)
A list of every file in the project, capped at 100 entries
The selected code range, if the user used point-and-edit to highlight a region

The system prompt for Step 4 is set explicitly to instruct the model that it has no tools — that's a critical change from Step 1. Tool use during code generation is a footgun: the model burns tokens deliberating about which file to read instead of writing the screen the user asked for, and maxSteps can run out before it finishes the actual code. Step 4 is pure text-out.

This is where the strongest model in the lineup runs. In production, that's typically Claude Sonnet 4.5 for projects that need design polish, with GLM-5 as a cost-effective alternative for simpler asks. The model is selected by getAIModelAsync('MAIN_GENERATION'), which reads from the cached ai_model_config map and instantiates the right provider — OpenRouter, Bedrock, Vertex, Azure, Anthropic direct, or OpenAI.

The output streams straight to the user's editor. Code blocks are rendered into files in real time using the streaming live preview that powers RapidNative's instant feedback loop — covered in more depth in Under the Hood: How RapidNative Streams AI-Generated Components in Real Time.

The model router: configuration-driven, not hard-coded

A pipeline that uses three different models is only useful if you can swap those models without shipping a new build of the app. The model router lives in src/modules/api/services/ai/llm/index.ts and exposes a small set of async helpers keyed by purpose, not by model:

const [modelToUse, modelId, providerOptions, cacheType] = await Promise.all([
  getAIModelAsync(modelPurpose),
  getAIModelIdAsync(modelPurpose),
  getProviderOptionsAsync(modelPurpose),
  getPromptCacheTypeAsync(modelPurpose),
]);

Three purposes are defined: MAIN_GENERATION, VISION (used when a user uploads a screenshot or sketch), and CONTEXT_GATHERING. Each purpose maps, in the database, to a (provider, modelId) pair plus any provider-specific options.

The router supports six providers via the Vercel AI SDK ecosystem — versions pinned in package.json:

@openrouter/ai-sdk-provider@^0.7.3 — unified access to OpenRouter's catalog (primary in production for cost flexibility)
@ai-sdk/anthropic@^1.2.12 — direct Anthropic API
@ai-sdk/amazon-bedrock@^1.1.6 — Claude via AWS Bedrock
@ai-sdk/google-vertex@^1.0.4 — Gemini and Claude via Google Cloud
@ai-sdk/azure@^1.3.25 — OpenAI and Claude via Azure
@fal-ai/client@^1.9.4 — Flux Schnell for app icon generation (not an LLM, but lives in the same orchestration layer)

The active config is wrapped in unstable_cache() with a 5-minute TTL and a revalidateTag('ai-model-configs') hook so a config edit takes effect within seconds. The team can route CONTEXT_GATHERING to a Vertex Gemini Flash one day and a Llama 3.1 8B on OpenRouter the next without a code change.

Prompt caching: a quiet 90% cost saver on follow-up messages

Most multi-turn AI products waste an enormous amount of money re-sending the same system prompt with every message. RapidNative's pipeline opts into Anthropic prompt caching when the routed model supports it.

The detection lives in the router:

export async function getPromptCacheTypeAsync(purpose) {
  if (providerType === 'azure' && modelId.includes('claude')) return 'anthropic';
  if (providerType === 'openrouter' && modelId.includes('anthropic')) return 'anthropic';
  if (providerType === 'anthropic') return 'anthropic';
  return null;
}

When the cache type is 'anthropic', the Step 4 messages get an ephemeral cache marker attached to the system message:

(codeGenMessages[0] as any).experimental_providerMetadata = {
  anthropic: { cacheControl: { type: 'ephemeral' } },
};

The system prompt — around 15K tokens on the first message of a project (with the full first-message skill bundle) and ~9K on follow-ups — gets cached on Anthropic's side. Subsequent messages within the cache window hit the cache for the static portion and only pay full price for the dynamic context (gathered file content, the user's latest message, recent chat history).

For a project with twenty back-and-forth messages, that's a 10–15K token discount on every turn after the first, at roughly 10% of normal input cost — meaningful when each token of Claude Sonnet costs roughly 4x what a context-gathering Llama call costs.

Code on laptop screen with mobile phone nearby Prompt caching only pays off when the static portion of the prompt is large enough to amortize the cache write. Photo by Carlos Muza on Unsplash

The cost math, in concrete numbers

Pricing data for the pipeline lives in src/modules/api/services/ai/llm/models.ts as a PROVIDER_PRICING map, broken down by provider and purpose. A representative production configuration looks roughly like:

Step	Purpose	Typical model	Input $/1K	Output $/1K
1	`CONTEXT_GATHERING`	Llama 3.1 8B / Qwen 3 Coder	~$0.0001	~$0.0003
1 (vision)	`VISION`	Claude 3.5 Haiku	~$0.001	~$0.005
4	`MAIN_GENERATION`	Claude Sonnet 4.5	~$0.003	~$0.015

A typical chat turn on an existing project might burn ~3,000 input + 800 output tokens in Step 1 (call it ~$0.0006) and ~12,000 input + 2,500 output tokens in Step 4 (call it ~$0.074). If Step 1 ran on the same model as Step 4 — a common architecture in simpler builders — that ~$0.0006 step would balloon to ~$0.075 or more, doubling the cost of every message for almost no quality gain.

This is the unsexy reason multi-model pipelines win. The capability ceiling of the best model isn't what makes the product work; the routing decisions around it are.

Beyond the main pipeline: background AI tasks

Three other AI workloads run alongside the main pipeline, each with its own model selected for the job:

App icon generation runs in parallel with Step 1 on the very first message of a project, via FAL's Flux Schnell endpoint. Three icon styles (flat modern, gradient glossy, 3D rendered) generate concurrently and land in the project's asset bucket. Code lives in src/modules/api/services/ai/IconGeneratorService.ts.
Sentiment analysis runs after every user message as a fire-and-forget background task in SentimentService.ts. It uses Llama 3.1 8B — the cheapest classifier in the lineup — to score the user's emotional state on a -1.0 to 1.0 scale, persisted as a rolling average per (user, project) pair. Slack alerts fire on outliers.
Free public tools under /api/tools/* (idea generators, color palette generators, etc.) call generateObject() with a Zod schema against Claude 3.5 Haiku via a separate OpenRouter API key, so usage on the marketing surface area never contends for quota with the in-app generation pipeline.

Each of these could in principle have been jammed into the main pipeline. Keeping them separate, with their own model selections, is what lets the team tune any one of them independently — bumping the icon model to a higher-fidelity Flux variant doesn't require redeploying anything that touches chat.

What this architecture actually buys you

A four-step LLM pipeline with per-step model selection looks like over-engineering on a whiteboard. Four model calls instead of one. Three system prompts to maintain. A database table just to hold model assignments.

In practice, every one of those costs has paid for itself:

Cost. Routing context gathering to a $0.0003-per-1K-output model instead of a $0.015 model is a 50x discount on a step that runs every single turn.
Latency. A small model finishes its tool calls in 2–4 seconds; a frontier model with the same tools would take 8–15. The user sees code start streaming faster.
Reliability. Deterministic code, not an LLM, owns database schemas and auth screens. Those files don't hallucinate. They're identical on the 1st generation and the 100th.
Quality. Step 4's model only has to do one thing — write a screen — with all the context already prepared and no tool-call deliberation eating its maxSteps budget. The output quality goes up because the cognitive load goes down.
Flexibility. Swapping the MAIN_GENERATION model to a newly released frontier model is a single database row update, taking effect within five minutes.

This is the architecture that turns a prompt like "build a fitness app with a leaderboard and Stripe payments" into a working Expo project you can scan with your phone and use. The full breakdown of what happens after the LLMs finish — the browser bundler, the live preview, the export pipeline — lives in Inside RapidNative's Export Pipeline and How We Built Team Collaboration into an AI App Builder.

Try the pipeline yourself

The 4-step LLM pipeline isn't visible from the chat UI — it just looks like fast, accurate React Native code appearing on the screen. But every prompt you send to RapidNative is routing through it. You can try it now at rapidnative.com — start from a text prompt, upload a whiteboard sketch, feed it a screenshot, or paste in a PRD. The first prompt is free; pricing only kicks in if you want to keep building beyond the free tier.

If you want a side-by-side comparison with the previous generation of this architecture, the two-step pipeline and browser bundler post covers the version that ran in production before the four-step pipeline replaced it.

Ready to Build Your App?

Turn your idea into a production-ready React Native app in minutes.

Try It Now

Free tools to get you started

Planning

Free AI PRD Generator

Generate a professional product requirements document in seconds. Describe your product idea and get a complete, structured PRD instantly.

Try it free

Ideation

Free AI App Name Generator

Generate unique, brandable app name ideas with AI. Get creative name suggestions with taglines, brand colors, and monogram previews.

Try it free

Design

Free AI App Icon Generator

Generate beautiful, professional app icons with AI. Describe your app and get multiple icon variations in different styles, ready for App Store and Google Play.

Try it free

Frequently Asked Questions

RapidNative is an AI-powered mobile app builder. Describe the app you want in plain English and RapidNative generates real, production-ready React Native screens you can preview, edit, and publish to the App Store or Google Play.

The 4-step LLM pipeline at a glance

Step 1: Context gathering with a fast, tool-equipped model

Step 2: Cheap semantic gating before expensive work

Step 3: Deterministic tools generate the boring (and risky) parts

Step 4: The main code generation with no tools and full context

The model router: configuration-driven, not hard-coded

Prompt caching: a quiet 90% cost saver on follow-up messages

The cost math, in concrete numbers

Beyond the main pipeline: background AI tasks

What this architecture actually buys you

People also ask

Try the pipeline yourself

Ready to Build Your App?

Free tools to get you started

Free AI PRD Generator

Free AI App Name Generator

Free AI App Icon Generator

Frequently Asked Questions

What is RapidNative?

Can I export the code?

Is RapidNative free to use?

Do I need to know how to code?

How long does it take to build an app?