The full Qai catalogue

One key. Eight models. Three modalities.

Five tiers of text generation, two tiers of image generation, one video model. Pick by what you need to ship - this page gives you the depth to know which one before you write the first prompt.

Text models

Five tiers from free sandbox to deep reasoning.

Every text model speaks the OpenAI API dialect, supports streaming, accepts system messages, and respects standard sampling controls (temperature, top_p, max_tokens, stop sequences). They differ in size, speed, and what they are good at.

qai-hello-world
Free sandbox

A free sandbox model for shaking out your integration. Unlimited in development. Capped at 100 calls per API key per UTC day in production. Use it to validate your wiring before you start spending real money.

Use it when
  • You are testing that your integration calls land at all
  • You are wiring up streaming and want to verify the SSE parsing
  • You want to demo Qai in a sales / blog / tutorial without burning credits
  • You are debugging your prompt template structure
Do not use it when
  • You need production-quality outputs
  • You will exceed 100 calls per key per day
  • You need long context windows
  • You are doing any task that requires nuanced reasoning
Pricing
Free · capped at 100/key/UTC day
Typical TTFT
~150ms
Streaming
Yes (SSE)
Best for
Integration testing
Sample prompt
"Say hi and tell me what model you are."
You will get a short greeting back identifying itself as qai-hello-world. Useful as a heartbeat / liveness check.
Hit the daily cap? Switch to qai-flash
Need real conversation quality? qai-pro
qai-flash
Text - fast tier

Cheap and quick. Optimised for chat, summarisation, classification, intent detection, and high-volume background tasks where you care more about latency and unit economics than getting the most clever answer.

Use it when
  • You are powering a chatbot that answers in 1-3 sentences
  • You need to label, classify, or tag thousands of items a day
  • You are summarising customer support tickets, emails, or short docs
  • Latency matters more than nuance
  • You are running an LLM call inside an inner loop / per-row pipeline
Do not use it when
  • You need long-form reasoning or multi-step argument chains
  • The task requires writing or critiquing code beyond simple snippets
  • You need very tight JSON-schema adherence on complex structures
  • The output needs to be impressive enough to send to a customer unedited
Input price
$0.20 per 1M input tokens
Output price
$0.50 per 1M output tokens
Typical TTFT
~200ms
Best for
High-volume chat · classification
Sample prompt
"Classify this support ticket as: bug, billing, feature-request, other. Reply with just the label."
Returns one word, fast. Great for triage pipelines, label-then-route flows, or feeding a downstream tool.
Need stronger reasoning? Up to qai-pro
Need cheaper testing? Down to qai-hello-world
qai-pro
Text - balanced default

The everyday workhorse. Solid at reasoning, code generation, structured output, instruction following, and the long tail of "I just need a good answer" tasks. If you are not sure which tier to pick, start here.

Use it when
  • You want a default that does most jobs well without over-paying
  • You are building customer-facing chat, generation, or assistant features
  • You need solid JSON / structured output following
  • You are writing code snippets, transforming data, or drafting prose
  • You have a hard latency budget under 1 second
Do not use it when
  • The task involves multi-hop reasoning across many constraints
  • You are doing very long-context work (lengthy documents, dense codebases)
  • You need cited / hallucination-free output - layer your own retrieval
Input price
$0.80 per 1M input tokens
Output price
$2.00 per 1M output tokens
Typical TTFT
~400ms
Best for
The thing most teams ship on
Sample prompt
"Draft a 3-sentence onboarding email for a new SaaS sign-up. Friendly but professional, no exclamation marks."
Returns a coherent, on-brief draft in well under a second. Strong instruction-following including the negative constraint.
Need heavier reasoning? Up to qai-max
Chain-of-thought required? qai-think
Cost-sensitive volume? Down to qai-flash
qai-max
Text - top capability

The heaviest lifter. Long context, deep reasoning, complex multi-step problems, hard code refactors, nuanced creative writing. Reach for it when you would rather wait an extra few hundred milliseconds than read a mediocre answer.

Use it when
  • The output is high-stakes enough to be worth the latency
  • You are doing complex code review, refactor planning, or architecture work
  • You are working with long documents or large code contexts
  • You need writing that holds together across many paragraphs
  • You are running an agent that needs strong instruction following over many turns
Do not use it when
  • You are running an inner-loop pipeline where every ms compounds
  • The task is simple - qai-pro will land at almost the same quality for half the cost
  • You need answers in under 500ms
Input price
$1.50 per 1M input tokens
Output price
$5.00 per 1M output tokens
Typical TTFT
~700ms
Best for
Quality matters more than speed
Sample prompt
"Refactor this 400-line React component to use hooks and split into smaller components. Keep the visual output identical."
Holds the full file in context, produces a coherent restructuring with explanation of the changes. The kind of task you do not want to give to a smaller model and then babysit.
Need chain-of-thought? qai-think
Simple task? Down to qai-pro
qai-think
Reasoning

A reasoning model. Spends extra compute thinking through a problem before answering. Best at math, formal logic, multi-constraint planning, and "let me check my work" style tasks where a wrong-but-confident first guess is the failure mode you most want to avoid.

Use it when
  • The problem has a single correct answer and you need to land on it
  • You are doing math, scheduling, optimisation, or symbolic reasoning
  • You need a model that will catch its own mistakes
  • You are running a workflow where one wrong step poisons the whole chain
  • You can wait a few seconds for the right answer
Do not use it when
  • You need streaming token-by-token output for a chat UX
  • The task is creative / open-ended (the reasoning overhead is wasted)
  • You are doing high-volume classification or summarisation
  • You need answers in under a second
Input price
$1.80 per 1M input tokens
Output price
$8.00 per 1M output tokens
Typical latency
Variable (think time)
Best for
Right answer over fast answer
Sample prompt
"I have meetings at 10-11am Eastern, 1-2pm Pacific, and 4-5pm London. Find one 30-minute slot that works for someone in Singapore (no later than 10pm local)."
qai-think will work through the time-zone conversions step by step before answering, instead of confidently guessing. The output usually includes its reasoning.
Need fast non-reasoning answers? qai-max or qai-pro
Image models

Two tiers - fast or premium.

Both image models go through the standard /v1/images/generations endpoint, both can return the image as a Qai-hosted permanent URL (set hostMedia: true), and both bill per image returned.

qai-imagine-turbo
Image - fast

A fast-inference image model. Good for previews, batch generation, real-time UX, and any time you want to put 10 candidate images in front of a user instead of one.

Use it when
  • You are showing many candidate images and letting the user pick
  • You are batch-generating thumbnails, avatars, or placeholders
  • Speed of feedback matters more than perfect detail
  • You are iterating on a prompt and need to see many variants quickly
Do not use it when
  • You need photorealistic detail or accurate text rendering
  • The output is going on a billboard, a print piece, or a customer-facing hero image
  • You need consistent styling across many generations
Price
$0.04 per image
Typical time
~3 seconds
Max size
1024x1024
Output
PNG · hosted on Qai CDN
Sample prompt
"a cozy reading nook with a window, golden hour lighting, soft watercolour style"
Returns a pleasant, stylised image in seconds. Great for moodboard pipelines.
Need higher fidelity? Up to qai-imagine-quality
qai-imagine-quality
Image - premium

Studio-grade image generation. Best for hero images, marketing assets, product mockups, and anywhere the image is the centrepiece of what your user sees.

Use it when
  • The image is the product, not a side decoration
  • You need photorealism, accurate text rendering, or fine detail
  • You are generating marketing visuals, hero images, or printed assets
  • Your user expects "professional photography" quality
Do not use it when
  • You need to generate 100 variants and pick the best - use turbo
  • The output will be small (thumbnail, icon) - the quality is wasted
  • You need it in under 5 seconds
Price
$0.08 per image
Typical time
~15 seconds
Max size
1328x1328
Output
PNG · hosted on Qai CDN
Sample prompt
"professional product photo of a stainless-steel pour-over kettle on a clean white background, studio lighting, sharp focus"
Output looks like it came from a product photography studio. Suitable for direct use in e-commerce or marketing without retouching.
Video models

Text to motion, hosted and ready to embed.

Video generation goes through an async job pattern: submit a request, get a job id, poll until it is done. Average end-to-end is around 90 seconds for a six-second clip.

qai-motion
Video

A text-to-video model that produces short clips with coherent motion, smooth playback, and credible scene continuity. Six seconds at a time, billed by the second.

Use it when
  • You are generating short marketing clips, product demos, or social-post hooks
  • You want generative b-roll for a longer piece you stitch together separately
  • The output is for ad creative or audience-facing motion
  • You want it hosted on a permanent URL out of the box
Do not use it when
  • You need 30+ second clips - it is not built for long-form yet
  • You need consistent character / scene identity across many generations
  • The output needs to be photo-perfect (motion can still hallucinate)
Price
$0.18 per video-second
Typical wait
~90 seconds end-to-end
Resolution
832x480
Duration
Up to 6 seconds
Sample prompt
"a barista pulling an espresso shot, slow-motion crema pouring into a glass cup, warm cafe lighting"
Six seconds of believable motion with consistent lighting and smooth subject tracking. Cinema-style without the cinema budget.

How to pick a tier

Q1.
Are you testing your integration? Use qai-hello-world. Free, capped at 100 calls/key/day.
Q2.
Are you doing high-volume chat, classification, or summarisation? Use qai-flash. Cheap and quick.
Q3.
Do you want a balanced default for general tasks? Use qai-pro. The model most teams ship on.
Q4.
Is quality non-negotiable on long-form or complex tasks? Use qai-max. Slowest but strongest.
Q5.
Does the problem need step-by-step reasoning? Use qai-think. Math, planning, multi-constraint logic.
Q6.
Need an image? Cheap and fast = qai-imagine-turbo. Studio-grade = qai-imagine-quality.
Q7.
Need a video? qai-motion, up to 6 seconds, billed per video-second.

Ready to try one?

The free sandbox lets you wire up your integration before any spend kicks in.

Get your API key