Models - Qai

Text models

Five tiers from free sandbox to deep reasoning.

Every text model speaks the OpenAI API dialect, supports streaming, accepts system messages, and respects standard sampling controls (temperature, top_p, max_tokens, stop sequences). They differ in size, speed, and what they are good at.

qai-hello-world

Free sandbox

A free sandbox model for shaking out your integration. Unlimited in development. Capped at 100 calls per API key per UTC day in production. Use it to validate your wiring before you start spending real money.

Use it when

You are testing that your integration calls land at all
You are wiring up streaming and want to verify the SSE parsing
You want to demo Qai in a sales / blog / tutorial without burning credits
You are debugging your prompt template structure

Do not use it when

You need production-quality outputs
You will exceed 100 calls per key per day
You need long context windows
You are doing any task that requires nuanced reasoning

Pricing

Free · capped at 100/key/UTC day

Typical TTFT

~150ms

Streaming

Yes (SSE)

Best for

Integration testing

Sample prompt

"Say hi and tell me what model you are."

You will get a short greeting back identifying itself as qai-hello-world. Useful as a heartbeat / liveness check.

Hit the daily cap? Switch to qai-flash

Need real conversation quality? qai-pro

qai-flash

Text - fast tier

Cheap and quick. Optimised for chat, summarisation, classification, intent detection, and high-volume background tasks where you care more about latency and unit economics than getting the most clever answer.

Use it when

You are powering a chatbot that answers in 1-3 sentences
You need to label, classify, or tag thousands of items a day
You are summarising customer support tickets, emails, or short docs
Latency matters more than nuance
You are running an LLM call inside an inner loop / per-row pipeline

Do not use it when

You need long-form reasoning or multi-step argument chains
The task requires writing or critiquing code beyond simple snippets
You need very tight JSON-schema adherence on complex structures
The output needs to be impressive enough to send to a customer unedited

Input price

$0.20 per 1M input tokens

Output price

$0.50 per 1M output tokens

Typical TTFT

~200ms

Best for

High-volume chat · classification

Sample prompt

"Classify this support ticket as: bug, billing, feature-request, other. Reply with just the label."

Returns one word, fast. Great for triage pipelines, label-then-route flows, or feeding a downstream tool.

Need stronger reasoning? Up to qai-pro

Need cheaper testing? Down to qai-hello-world

qai-pro

Text - balanced default

The everyday workhorse. Solid at reasoning, code generation, structured output, instruction following, and the long tail of "I just need a good answer" tasks. If you are not sure which tier to pick, start here.

Use it when

You want a default that does most jobs well without over-paying
You are building customer-facing chat, generation, or assistant features
You need solid JSON / structured output following
You are writing code snippets, transforming data, or drafting prose
You have a hard latency budget under 1 second

Do not use it when

The task involves multi-hop reasoning across many constraints
You are doing very long-context work (lengthy documents, dense codebases)
You need cited / hallucination-free output - layer your own retrieval

Input price

$0.80 per 1M input tokens

Output price

$2.00 per 1M output tokens

Typical TTFT

~400ms

Best for

The thing most teams ship on

Sample prompt

"Draft a 3-sentence onboarding email for a new SaaS sign-up. Friendly but professional, no exclamation marks."

Returns a coherent, on-brief draft in well under a second. Strong instruction-following including the negative constraint.

Need heavier reasoning? Up to qai-max

Chain-of-thought required? qai-think

Cost-sensitive volume? Down to qai-flash

qai-max

Text - top capability

The heaviest lifter. Long context, deep reasoning, complex multi-step problems, hard code refactors, nuanced creative writing. Reach for it when you would rather wait an extra few hundred milliseconds than read a mediocre answer.

Use it when

The output is high-stakes enough to be worth the latency
You are doing complex code review, refactor planning, or architecture work
You are working with long documents or large code contexts
You need writing that holds together across many paragraphs
You are running an agent that needs strong instruction following over many turns

Do not use it when

You are running an inner-loop pipeline where every ms compounds
The task is simple - qai-pro will land at almost the same quality for half the cost
You need answers in under 500ms

Input price

$1.50 per 1M input tokens

Output price

$5.00 per 1M output tokens

Typical TTFT

~700ms

Best for

Quality matters more than speed

Sample prompt

"Refactor this 400-line React component to use hooks and split into smaller components. Keep the visual output identical."

Holds the full file in context, produces a coherent restructuring with explanation of the changes. The kind of task you do not want to give to a smaller model and then babysit.

Need chain-of-thought? qai-think

Simple task? Down to qai-pro

qai-think

Reasoning

A reasoning model. Spends extra compute thinking through a problem before answering. Best at math, formal logic, multi-constraint planning, and "let me check my work" style tasks where a wrong-but-confident first guess is the failure mode you most want to avoid.

Use it when

The problem has a single correct answer and you need to land on it
You are doing math, scheduling, optimisation, or symbolic reasoning
You need a model that will catch its own mistakes
You are running a workflow where one wrong step poisons the whole chain
You can wait a few seconds for the right answer

Do not use it when

You need streaming token-by-token output for a chat UX
The task is creative / open-ended (the reasoning overhead is wasted)
You are doing high-volume classification or summarisation
You need answers in under a second

Input price

$1.80 per 1M input tokens

Output price

$8.00 per 1M output tokens

Typical latency

Variable (think time)

Best for

Right answer over fast answer

Sample prompt

"I have meetings at 10-11am Eastern, 1-2pm Pacific, and 4-5pm London. Find one 30-minute slot that works for someone in Singapore (no later than 10pm local)."

qai-think will work through the time-zone conversions step by step before answering, instead of confidently guessing. The output usually includes its reasoning.

Need fast non-reasoning answers? qai-max or qai-pro

Image models

Two tiers - fast or premium.

Both image models go through the standard /v1/images/generations endpoint, both can return the image as a Qai-hosted permanent URL (set hostMedia: true), and both bill per image returned.

qai-imagine-turbo

Image - fast

A fast-inference image model. Good for previews, batch generation, real-time UX, and any time you want to put 10 candidate images in front of a user instead of one.

Use it when

You are showing many candidate images and letting the user pick
You are batch-generating thumbnails, avatars, or placeholders
Speed of feedback matters more than perfect detail
You are iterating on a prompt and need to see many variants quickly

Do not use it when

You need photorealistic detail or accurate text rendering
The output is going on a billboard, a print piece, or a customer-facing hero image
You need consistent styling across many generations

Price

$0.04 per image

Typical time

~3 seconds

Max size

1024x1024

Output

PNG · hosted on Qai CDN

Sample prompt

"a cozy reading nook with a window, golden hour lighting, soft watercolour style"

Returns a pleasant, stylised image in seconds. Great for moodboard pipelines.

Need higher fidelity? Up to qai-imagine-quality

qai-imagine-quality

Image - premium

Studio-grade image generation. Best for hero images, marketing assets, product mockups, and anywhere the image is the centrepiece of what your user sees.

Use it when

The image is the product, not a side decoration
You need photorealism, accurate text rendering, or fine detail
You are generating marketing visuals, hero images, or printed assets
Your user expects "professional photography" quality

Do not use it when

You need to generate 100 variants and pick the best - use turbo
The output will be small (thumbnail, icon) - the quality is wasted
You need it in under 5 seconds

Price

$0.08 per image

Typical time

~15 seconds

Max size

1328x1328

Output

PNG · hosted on Qai CDN

Sample prompt

"professional product photo of a stainless-steel pour-over kettle on a clean white background, studio lighting, sharp focus"

Output looks like it came from a product photography studio. Suitable for direct use in e-commerce or marketing without retouching.

Video models

Text to motion, hosted and ready to embed.

Video generation goes through an async job pattern: submit a request, get a job id, poll until it is done. Average end-to-end is around 90 seconds for a six-second clip.

qai-motion

Video

A text-to-video model that produces short clips with coherent motion, smooth playback, and credible scene continuity. Six seconds at a time, billed by the second.

Use it when

You are generating short marketing clips, product demos, or social-post hooks
You want generative b-roll for a longer piece you stitch together separately
The output is for ad creative or audience-facing motion
You want it hosted on a permanent URL out of the box

Do not use it when

You need 30+ second clips - it is not built for long-form yet
You need consistent character / scene identity across many generations
The output needs to be photo-perfect (motion can still hallucinate)

Price

$0.18 per video-second

Typical wait

~90 seconds end-to-end

Resolution

832x480

Duration

Up to 6 seconds

Sample prompt

"a barista pulling an espresso shot, slow-motion crema pouring into a glass cup, warm cafe lighting"

Six seconds of believable motion with consistent lighting and smooth subject tracking. Cinema-style without the cinema budget.

How to pick a tier

Q1.

Are you testing your integration? Use qai-hello-world. Free, capped at 100 calls/key/day.

Q2.

Are you doing high-volume chat, classification, or summarisation? Use qai-flash. Cheap and quick.

Q3.

Do you want a balanced default for general tasks? Use qai-pro. The model most teams ship on.

Q4.

Is quality non-negotiable on long-form or complex tasks? Use qai-max. Slowest but strongest.

Q5.

Does the problem need step-by-step reasoning? Use qai-think. Math, planning, multi-constraint logic.

Q6.

Need an image? Cheap and fast = qai-imagine-turbo. Studio-grade = qai-imagine-quality.

Q7.

Need a video? qai-motion, up to 6 seconds, billed per video-second.

One key. Eight models. Three modalities.

Five tiers from free sandbox to deep reasoning.

Two tiers - fast or premium.

Text to motion, hosted and ready to embed.

How to pick a tier

Ready to try one?