Use cases

01

Chatbots and support assistants

The bread-and-butter LLM use case. A user types a question, your app sends it to a model with some context, the model streams back an answer. Wire it into a Discord bot, a Slackbot, a customer support widget, a help overlay inside your app, or a standalone chat product.

Recommended model

qai-flash for low-stakes chat; qai-pro for customer-facing.

Cost ballpark

~$0.20-$2 per 1,000 conversations on flash; ~$2-$10 on pro.

Architecture

User message → your app (build messages array with system prompt + history + new user msg) → Qai /v1/chat/completions with stream: true → pipe SSE chunks back to user → persist completed message

node - streaming chat

const stream = await qai.chat.completions.create({
  model: 'qai-pro',
  messages: [
    { role: 'system', content: 'You are a helpful support agent for Acme.' },
    ...conversationHistory,
    { role: 'user', content: userMessage },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    res.write(delta);  // stream to client
  }
}

Sharp edges

Conversation history grows linearly - cap it at the last 10-20 turns or summarise older context to avoid runaway token bills.
Streaming SSE through proxies / load balancers requires disabling buffering. Set X-Accel-Buffering: no and similar headers.
Cache the system prompt as a constant; do not regenerate it per request.

02

Content drafting

Blog posts, email replies, social captions, product descriptions, ad variants, internal docs. The "stare at the blank page" problem solved with a prompt and a model. Common pattern: user fills in a short brief, you generate 3 variants, they pick one and edit.

Recommended model

qai-pro as the default; qai-max when output goes to a customer unedited.

Cost ballpark

~$0.005 per generated blog paragraph on pro.

Architecture

User fills brief form → build prompt with brand voice + brief + format instruction → Qai /v1/chat/completions with n: 3 or three parallel calls → display variants → user picks & edits → save

Sharp edges

The model defaults to a "press release" voice - add explicit style instructions like "no exclamation marks, no buzzwords, no em dashes" to fight it.
Run output through the Humanize Text utility to strip AI typography tells before showing it to users.
Save drafts before edit - users often want the original back after they have over-edited.

03

Document Q&A and RAG

Retrieval-augmented generation: take a knowledge base of documents, find the chunks most relevant to a user's question, stuff them into the prompt, ask the model to answer using only those chunks. The standard pattern for "AI that knows about my company / product / data."

Recommended model

qai-pro for most cases; qai-think if answers require multi-step inference across chunks.

Cost ballpark

~$0.01-$0.05 per question depending on chunk size and tier.

Architecture (interim, pending Qai embed model)

Ingest docs → chunk → embed via any provider → store in vector DB → user asks question → embed query → retrieve top-K chunks → Qai /v1/chat/completions with chunks as system context → cite the chunk in the answer

Sharp edges

Chunk size matters more than chunk count. Start with ~500-token chunks with ~100-token overlap.
Always instruct the model "answer ONLY from the provided context" or you will get hallucinations leaking through.
Cite chunks back to the user (filename + page) so they can verify. This single change buys you 80% of the trust gain RAG promises.
Qai's own embed model is on the roadmap - you can switch over with a one-line change when it ships.

04

Data extraction

Free-form input in, structured output out. Receipts to JSON. Resumes to candidate records. Customer notes to ticket fields. Emails to CRM entries. The pattern that quietly replaces a thousand regex rules.

Recommended model

qai-pro with a clear JSON schema in the prompt.

Cost ballpark

~$0.002 per extraction on a typical 1-2 page document.

Architecture

Input doc → your app (build prompt with desired schema + few-shot examples) → Qai /v1/chat/completions → /v1/utilities/clean-json to repair output → validate against schema → persist

python - resume to structured candidate

response = client.chat.completions.create(
    model="qai-pro",
    messages=[
        {"role": "system", "content": """Extract candidate info as JSON:
{
  "name": string,
  "email": string,
  "years_experience": number,
  "skills": string[],
  "current_role": string
}
Output JSON only. No prose."""},
        {"role": "user", "content": resume_text},
    ],
)

# Always run through clean-json to handle markdown fences etc.
cleaned = requests.post(
    "https://llm.quickcasa.ai/v1/utilities/clean-json",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"input": response.choices[0].message.content},
).json()

candidate = cleaned["json"]

Sharp edges

Always pipeline through /v1/utilities/clean-json - it catches the markdown fence problem for free.
Provide 1-2 few-shot examples in the system prompt. Quality jumps dramatically.
Validate the output against a real schema (zod, pydantic) before you trust it - models will occasionally invent fields.
For very long documents, chunk and extract per chunk then merge.

05

Image generation in your app

Avatar generators, product mockups, AI art tools, social-media-card factories, marketing visual pipelines. With Qai-hosted media URLs, your app does not even need its own storage bucket - you call the API and embed the returned URL.

Recommended model

qai-imagine-turbo for batches and previews; qai-imagine-quality for hero images.

Cost ballpark

$0.04 per turbo image; $0.08 per quality image.

Architecture

User prompt or template → Qai /v1/images/generations with hostMedia: true → receive permanent CDN URL → embed directly in your app / email / social post

curl - hosted image

curl -X POST https://llm.quickcasa.ai/v1/images/generations \
  -H "Authorization: Bearer $QAI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-imagine-turbo",
    "prompt": "a friendly robot mascot holding a coffee mug, flat illustration style",
    "hostMedia": true
  }'

# Response includes: { "data": [{ "url": "https://llm.quickcasa.ai/media/{id}" }] }
# That URL is permanent. Embed it anywhere.

Sharp edges

Without hostMedia: true, the returned URL is temporary and auto-expires. Always set it for production.
Image models do not yet handle consistent character identity across multiple generations - if you need "same person across N images", that is on the roadmap.
For batch generation, hit turbo with n: 8 in one call instead of 8 separate calls.

06

Short-form video

Marketing clips, product demo b-roll, social hooks, intro animations, ad creative. Six seconds at a time, hosted on the Qai CDN, ready to embed. Stitch a few together for a longer piece.

Recommended model

qai-motion

Cost ballpark

~$1.08 per 6-second clip.

Architecture (async with polling)

POST /v1/videos/generations with prompt → get jobId back → poll /v1/videos/generations/{jobId} every ~10s → status flips to "completed" → download from result.data[0].url

Sharp edges

Video generation is async and slow (~90s end-to-end). Do not block a user-facing request on it; use a webhook-style pattern in your own app.
Always set hostMedia: true in production - same as images, the URL otherwise expires.
Stitching clips: keep prompts visually consistent (style, palette, subject) or seams will be obvious.
The model does not yet do reliable speech / voiceover. Generate the visuals only, layer audio separately.

07

Reasoning workflows

Anywhere you need a model to think before it speaks. Scheduling assistants, financial decisioning, multi-constraint planners, code review bots, debate referees. qai-think spends extra compute working through a problem instead of guessing the first plausible answer.

Recommended model

qai-think

Cost ballpark

~$0.01-$0.05 per query, varies with reasoning depth.

Architecture

User problem statement → Qai /v1/chat/completions with model: "qai-think" → show "thinking..." UX (response can take seconds) → render the answer with reasoning trace

Sharp edges

Reasoning models are not great at small-talk - use qai-pro or qai-flash for casual chat and route only hard questions to qai-think.
The reasoning trace is verbose. If your UI does not show it, instruct the model to "respond with only the final answer."
Latency is variable. Set a UI affordance for the wait (animated thinking indicator) so users do not assume it broke.

08

Automation pipelines

n8n, Make, Zapier, custom cron jobs, GitHub Actions, internal data pipelines - anywhere a workflow needs "an LLM call" as one of its steps. Qai's OpenAI-compatible endpoint slots into every automation platform that already supports OpenAI.

Recommended model

Depends on the step. qai-flash for classify/tag, qai-pro for generate/transform.

Cost ballpark

~$1-$10 per 10,000 automation runs.

Architecture (n8n example)

Trigger (webhook / cron / change) → data prep nodes → OpenAI node configured with base URL https://llm.quickcasa.ai/v1 → downstream nodes (send to Slack, write to DB, etc.)

Sharp edges

Create a dedicated API key per workflow with its own daily budget cap. If one workflow goes haywire, only its key gets locked, not your whole account.
Most automation platforms support custom base URLs but bury the setting - look for "OpenAI custom endpoint" or "compatible API" options.
Log the Qai response in your workflow so you can debug bad runs later. Most platforms have a "store output" toggle.

09

Coding assistants

Point Cursor, Continue, Aider, Cline, or any custom-base-URL coding tool at Qai and get a paid coding assistant for your team. Mix tiers across operations - cheap for autocomplete and quick refactors, qai-max for big architectural changes.

Recommended model

qai-pro for everyday; qai-max for refactors; qai-think for debugging tricky bugs.

Cost ballpark

~$5-$30 per developer per month at typical use.

Architecture (Cursor / Continue example)

IDE extension → user invokes a command (chat, edit, refactor) → extension sends OpenAI-format request to https://llm.quickcasa.ai/v1 → streamed response renders inline in editor

Sharp edges

Set a per-key daily budget. Coding assistants can burn tokens quickly during a "vibe coding" session.
Most IDE extensions assume OpenAI model names by default - configure them to use qai-pro / qai-max etc. in extension settings.
For autocomplete-style features, qai-flash is usually fast enough. Use qai-pro only for chat and code generation.
Watch out for inline "agent" extensions that fire many calls per minute - they can outpace your daily budget faster than you expect.

What people build with Qai.

Chatbots and support assistants

Content drafting

Document Q&A and RAG

Data extraction

Image generation in your app

Short-form video

Reasoning workflows

Automation pipelines

Coding assistants

Found yours? Get a key and start.