Authentication
All API requests require authentication via an API key. You can pass your key using either the Authorization header (OpenAI-style) or the x-api-key header.
Option 1: Bearer Token (recommended)
Authorization: Bearer sk-your-api-key-here
Option 2: API Key Header
x-api-key: sk-your-api-key-here
Base URL
All API requests should be made to:
https://llm.quickcasa.ai
Because the API is OpenAI-compatible, you can use the official OpenAI SDK by simply changing the base_url and api_key.
Error Handling
Errors follow the OpenAI error format:
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"code": "unauthorized"
}
}
Common HTTP status codes:
- 400 - Bad request (missing or invalid parameters)
- 401 - Unauthorised (invalid or missing API key)
- 429 - Rate limit exceeded
- 502 - Generation service error
- 503 - Service temporarily unavailable
List Models
Returns a list of all available models.
curl https://llm.quickcasa.ai/v1/models \ -H "Authorization: Bearer sk-your-api-key"
Response
{
"object": "list",
"data": [
{ "id": "qai-hello-world", "object": "model", "owned_by": "quickcasa", "type": "text" },
{ "id": "qai-flash", "object": "model", "owned_by": "quickcasa", "type": "text" },
{ "id": "qai-pro", "object": "model", "owned_by": "quickcasa", "type": "text" },
{ "id": "qai-imagine-turbo", "object": "model", "owned_by": "quickcasa", "type": "image" },
{ "id": "qai-imagine-quality", "object": "model", "owned_by": "quickcasa", "type": "image" },
{ "id": "qai-motion", "object": "model", "owned_by": "quickcasa", "type": "video" }
]
}
Chat Completions
Generate a text completion from a conversation. Fully compatible with the OpenAI chat completions API.
Request Body
| Parameter | Type | Description |
|---|---|---|
| model required | string | One of qai-hello-world (free), qai-flash, qai-pro, qai-max, or qai-think |
| messages required | array | Array of message objects with role and content |
| stream optional | boolean | Enable Server-Sent Events streaming. Default: false |
| temperature optional | number | Sampling temperature (0–2). Default: 1 |
| max_tokens optional | integer | Maximum tokens to generate |
| top_p optional | number | Nucleus sampling threshold |
| stop optional | string | array | Stop sequence(s) |
Example: Non-streaming
curl https://llm.quickcasa.ai/v1/chat/completions \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "qai-flash", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is QuickCasa?"} ], "temperature": 0.7 }'
Example: Streaming
curl https://llm.quickcasa.ai/v1/chat/completions \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "qai-pro", "messages": [ {"role": "user", "content": "Write a haiku about apartments."} ], "stream": true }'
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "qai-flash",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "QuickCasa is a property management platform..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 42,
"total_tokens": 67
}
}
Image Generation
Generate images from a text prompt. Compatible with the OpenAI images API.
Request Body
| Parameter | Type | Description |
|---|---|---|
| model required | string | qai-imagine-turbo (fast, ~3s) or qai-imagine-quality (detailed, ~15s) |
| prompt required | string | A text description of the desired image |
| n optional | integer | Number of images to generate. Default: 1 |
| size optional | string | Image size as WxH (e.g. 1024x1024). Defaults vary by model. |
| response_format optional | string | b64_json or url. Default: b64_json |
Example
curl https://llm.quickcasa.ai/v1/images/generations \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "qai-imagine-turbo", "prompt": "A modern apartment building at golden hour, photorealistic", "size": "1024x1024", "response_format": "b64_json" }'
Response
{
"created": 1700000000,
"data": [
{
"b64_json": "iVBORw0KGgoAAAANSUhEUg..."
}
]
}
Video Generation
Generate short videos from a text prompt. This is a QuickCasa-specific endpoint (not part of the OpenAI spec). Videos are generated using our internal 14B-parameter video model.
Request Body
| Parameter | Type | Description |
|---|---|---|
| model required | string | qai-motion |
| prompt required | string | A text description of the desired video |
| size optional | string | Video resolution as WxH. Default: 832x480 |
| duration optional | number | Video duration in seconds. Default: 6 |
| fps optional | integer | Frames per second. Default: 16 |
Example
curl https://llm.quickcasa.ai/v1/videos/generations \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "qai-motion", "prompt": "Aerial drone shot of a luxury condo complex, cinematic", "size": "832x480", "duration": 6 }'
Response
{
"created": 1700000000,
"data": [
{
"url": "https://llm.quickcasa.ai/output/video/QC-Wan_00001.mp4",
"content_type": "video/mp4",
"duration_seconds": 6
}
]
}
Note: Video generation can take up to 90 seconds depending on the duration and resolution requested. The request will block until the video is ready.
Python SDK
Use the official openai Python package. Just point it at our base URL.
pip install openai
Chat Completion
from openai import OpenAI client = OpenAI( base_url="https://llm.quickcasa.ai/v1", api_key="sk-your-api-key", ) response = client.chat.completions.create( model="qai-flash", messages=[ {"role": "user", "content": "Hello!"} ], ) print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="qai-pro",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Image Generation
image = client.images.generate(
model="qai-imagine-turbo",
prompt="A cozy studio apartment with warm lighting",
size="1024x1024",
response_format="b64_json",
)
print(image.data[0].b64_json[:50])
Node.js SDK
Use the official openai npm package.
npm install openai
Chat Completion
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://llm.quickcasa.ai/v1', apiKey: 'sk-your-api-key', }); const response = await client.chat.completions.create({ model: 'qai-flash', messages: [ { role: 'user', content: 'Hello!' }, ], }); console.log(response.choices[0].message.content);
Streaming
const stream = await client.chat.completions.create({ model: 'qai-pro', messages: [{ role: 'user', content: 'Tell me a story.' }], stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) { process.stdout.write(content); } }
Video Generation (fetch)
// Video generation uses a custom endpoint, so use fetch directly const response = await fetch('https://llm.quickcasa.ai/v1/videos/generations', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer sk-your-api-key', }, body: JSON.stringify({ model: 'qai-motion', prompt: 'Aerial tour of a modern apartment complex', duration: 6, }), }); const result = await response.json(); console.log(result.data[0].url);
cURL Quick Reference
Text
curl -X POST https://llm.quickcasa.ai/v1/chat/completions \ -H "Authorization: Bearer $QC_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qai-flash","messages":[{"role":"user","content":"Hi!"}]}'
Image
curl -X POST https://llm.quickcasa.ai/v1/images/generations \ -H "Authorization: Bearer $QC_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qai-imagine-turbo","prompt":"sunset over a city skyline"}'
Video
curl -X POST https://llm.quickcasa.ai/v1/videos/generations \ -H "Authorization: Bearer $QC_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qai-motion","prompt":"walkthrough of a modern kitchen","duration":6}'
Streaming
Every Qai text model supports Server-Sent Events streaming. Set stream: true on your chat completions request and Qai forwards tokens as they are generated. The response uses the standard OpenAI streaming envelope, so any client library that already handles OpenAI streaming works with Qai unmodified.
Streamed responses include a final usage chunk with prompt and completion token counts, so streaming and non-streaming billing land at the same precision. If a particular model variant ever skips the usage chunk, Qai falls back to a token estimate so billing never silently records zero.
from openai import OpenAI client = OpenAI(base_url="https://llm.quickcasa.ai/v1", api_key="sk-...") stream = client.chat.completions.create( model="qai-pro", messages=[{"role": "user", "content": "Tell me a story."}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True)
const stream = await client.chat.completions.create({ model: 'qai-pro', messages: [{ role: 'user', content: 'Tell me a story.' }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ''); }
Migrating from OpenAI
If you have an existing app on the OpenAI API, the migration is exactly two changes:
1. Change the base URL
Point your OpenAI client at https://llm.quickcasa.ai/v1 instead of https://api.openai.com/v1.
2. Swap the model name
Pick the Qai tier that matches your old model. Rough equivalents:
gpt-4o-mini→qai-flashgpt-4o→qai-proorqai-maxgpt-4.1→qai-maxo1,o3→qai-thinkdall-e-3→qai-imagine-quality
Everything else - streaming, function calling, JSON mode, system prompts, tool calls - works identically.
const client = new OpenAI({
- baseURL: 'https://api.openai.com/v1',
+ baseURL: 'https://llm.quickcasa.ai/v1',
apiKey: process.env.QAI_API_KEY,
});
await client.chat.completions.create({
- model: 'gpt-4o',
+ model: 'qai-pro',
messages: [...],
});
Best practices
Match the tier to the job
Do not default to qai-max for everything just because it is the most capable. A summarisation pipeline runs fine on qai-flash for a tenth of the cost. Use the dashboard's per-model breakdown to spot where you are over-spending.
Set per-key spend caps in the dashboard
Especially for production keys. A misbehaving cron job or a leaked key can otherwise spend its way through your monthly budget overnight. Caps mean the worst-case is "service degrades to 402" instead of "service degrades to my Stripe statement."
Use the free sandbox during development
qai-hello-world is unlimited in dev and capped at 100 calls per key per day in production. Wire your dev environment to it - your real spend should only start when you go to production.
Stream when the user is waiting
For chat-style interactions, set stream=true. A 4-second response feels instant when the first token arrives in 200ms; it feels broken when nothing happens for 4 seconds.
Set hostMedia: true for production images and videos
Without it, generated media is temporary and auto-expires. With it, you get a permanent URL on the Qai CDN that you can embed directly into emails, apps, or social posts without re-hosting.
Pin to a model, not a default
Always pass an explicit model field. Do not assume a default. Models can rev, behaviours can shift, and being explicit means your tests are reproducible.
Errors reference
Qai returns errors in OpenAI-compatible shape: { "error": { "message", "type", "code" } }. Below is the full list of error codes you might see.
HTTP 400 - invalid_request_error
missing_field- a required field (model, messages, prompt) was not in your request body.invalid_model- the model id is not one Qai serves. HitGET /v1/modelsfor the live catalogue.bad_request- the request body was malformed in some other way (bad JSON, wrong types, etc.).
HTTP 401 - invalid_request_error
unauthorized- the Authorization header was missing, malformed, or referred to a key that does not exist or is disabled.
HTTP 403 - invalid_request_error
forbidden- the key is valid but the associated account is inactive (cancelled, suspended, payment failed).
HTTP 429 - rate_limit_error
rate_limit_exceeded- account-wide rate limit hit. Retry with exponential backoff.sandbox_quota_exceeded- the free qai-hello-world tier's per-key per-UTC-day quota was hit. Wait until midnight UTC or switch to qai-flash.
HTTP 502 - api_error
upstream_error- the generation service returned an error mid-request. Themessagefield contains the underlying error text, which is usually safe to surface to your own users.
HTTP 500 - api_error
internal_error- something on our side broke. If you see this repeatedly, email hi@quickcasa.ai with the timestamp and we will dig in.
Rate limits
Qai applies two layers of rate limiting:
Account-wide
Every API key shares an account-wide rate limit that scales with your account tier. Default is 100 requests per minute. If you exceed it, you get a 429 with code: rate_limit_exceeded. The response includes a Retry-After header.
Per-key sandbox quota
The free qai-hello-world tier is capped at 100 calls per API key per UTC day. When you hit the cap, that specific key returns 429 with code: sandbox_quota_exceeded until midnight UTC. Other models on the same key keep working.
How to handle 429s
- Respect the
Retry-Afterheader when present. - Use exponential backoff with jitter for retries (start at 500ms, double up to 30s).
- If you regularly hit the account-wide limit, email us about lifting it - we can usually raise it within a business day.
Frequently asked questions
Is Qai really OpenAI-compatible?
Yes. Same endpoints, same request bodies, same streaming envelope, same error format. Drop in OpenAI's SDK, change the baseURL, and you are running on Qai. We test against the official openai-python and openai-node libraries.
Can I run Qai keys alongside OpenAI keys in the same app?
Of course - just instantiate two clients with different baseURLs. Some teams route cheap chat to Qai and reserve OpenAI for one specific feature they have already tuned. There is no exclusivity.
What is the actual latency?
Time-to-first-token on qai-flash and qai-pro is typically 200-400ms. qai-max and qai-think are slower (700ms+) since they are bigger models. Image generation is 3s (turbo) to 15s (quality). Video generation is 60-120s.
What if a model is temporarily unavailable?
You will get a 502 with a clear error message. Outages are rare but they happen - it is on the roadmap to automatically retry a failed generation against the next-best tier (e.g. qai-max failing over to qai-pro) so your app does not feel them.
Does Qai store my prompts?
We store the metadata needed for billing (model, token count, timestamp, API key id) but we do not store prompt content or completion content by default. If you want full request/response logging for your own debugging, that is opt-in per-key in your dashboard.
How do I cancel?
Sign in, head to your dashboard, click "Manage account" then "Cancel subscription". You stop being billed at the end of the current cycle. If you want all your data deleted too, email us and we will wipe everything.
I lost my API key. How do I recover it?
You cannot recover it - we only store the hashed version, not the raw key. Sign in, revoke the lost key, and create a new one. This is by design: even if our database leaked, your keys would not.
Can I get higher rate limits?
Yes. Email hi@quickcasa.ai with your account email and what you are building. Most lifts happen within a business day.