Documentation

Authentication

All API requests require authentication via an API key. You can pass your key using either the Authorization header (OpenAI-style) or the x-api-key header.

Option 1: Bearer Token (recommended)

Header

Authorization: Bearer sk-your-api-key-here

Option 2: API Key Header

Header

x-api-key: sk-your-api-key-here

Base URL

All API requests should be made to:

URL

https://llm.quickcasa.ai

Because the API is OpenAI-compatible, you can use the official OpenAI SDK by simply changing the base_url and api_key.

Error Handling

Errors follow the OpenAI error format:

JSON

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "unauthorized"
  }
}

Common HTTP status codes:

400 - Bad request (missing or invalid parameters)
401 - Unauthorised (invalid or missing API key)
429 - Rate limit exceeded
502 - Generation service error
503 - Service temporarily unavailable

List Models

GET /v1/models

Returns a list of all available models.

cURL

curl https://llm.quickcasa.ai/v1/models \
  -H "Authorization: Bearer sk-your-api-key"

Response

JSON

{
  "object": "list",
  "data": [
    { "id": "qai-hello-world", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-flash", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-pro", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-imagine-turbo", "object": "model", "owned_by": "quickcasa", "type": "image" },
    { "id": "qai-imagine-quality", "object": "model", "owned_by": "quickcasa", "type": "image" },
    { "id": "qai-motion", "object": "model", "owned_by": "quickcasa", "type": "video" }
  ]
}

Chat Completions

POST /v1/chat/completions

Generate a text completion from a conversation. Fully compatible with the OpenAI chat completions API.

Request Body

Parameter	Type	Description
model required	string	One of qai-hello-world (free), qai-flash, qai-pro, qai-max, or qai-think
messages required	array	Array of message objects with role and content
stream optional	boolean	Enable Server-Sent Events streaming. Default: false
temperature optional	number	Sampling temperature (0–2). Default: 1
max_tokens optional	integer	Maximum tokens to generate
top_p optional	number	Nucleus sampling threshold
stop optional	string \| array	Stop sequence(s)

Example: Non-streaming

cURL

curl https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-flash",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is QuickCasa?"}
    ],
    "temperature": 0.7
  }'

Example: Streaming

cURL

curl https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "qai-pro",
    "messages": [
      {"role": "user", "content": "Write a haiku about apartments."}
    ],
    "stream": true
  }'

Response

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qai-flash",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "QuickCasa is a property management platform..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 42,
    "total_tokens": 67
  }
}

Image Generation

POST /v1/images/generations

Generate images from a text prompt. Compatible with the OpenAI images API.

Request Body

Parameter	Type	Description
model required	string	qai-imagine-turbo (fast, ~3s) or qai-imagine-quality (detailed, ~15s)
prompt required	string	A text description of the desired image
n optional	integer	Number of images to generate. Default: 1
size optional	string	Image size as WxH (e.g. 1024x1024). Defaults vary by model.
response_format optional	string	b64_json or url. Default: b64_json

Example

cURL

curl https://llm.quickcasa.ai/v1/images/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-imagine-turbo",
    "prompt": "A modern apartment building at golden hour, photorealistic",
    "size": "1024x1024",
    "response_format": "b64_json"
  }'

Response

JSON

{
  "created": 1700000000,
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANSUhEUg..."
    }
  ]
}

Video Generation

POST /v1/videos/generations

Generate short videos from a text prompt. This is a QuickCasa-specific endpoint (not part of the OpenAI spec). Videos are generated using our internal 14B-parameter video model.

Request Body

Parameter	Type	Description
model required	string	qai-motion
prompt required	string	A text description of the desired video
size optional	string	Video resolution as WxH. Default: 832x480
duration optional	number	Video duration in seconds. Default: 6
fps optional	integer	Frames per second. Default: 16

Example

cURL

curl https://llm.quickcasa.ai/v1/videos/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-motion",
    "prompt": "Aerial drone shot of a luxury condo complex, cinematic",
    "size": "832x480",
    "duration": 6
  }'

Response

JSON

{
  "created": 1700000000,
  "data": [
    {
      "url": "https://llm.quickcasa.ai/output/video/QC-Wan_00001.mp4",
      "content_type": "video/mp4",
      "duration_seconds": 6
    }
  ]
}

Note: Video generation can take up to 90 seconds depending on the duration and resolution requested. The request will block until the video is ready.

Python SDK

Use the official openai Python package. Just point it at our base URL.

bash

pip install openai

Chat Completion

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.quickcasa.ai/v1",
    api_key="sk-your-api-key",
)

response = client.chat.completions.create(
    model="qai-flash",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Streaming

Python

stream = client.chat.completions.create(
    model="qai-pro",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Image Generation

Python

image = client.images.generate(
    model="qai-imagine-turbo",
    prompt="A cozy studio apartment with warm lighting",
    size="1024x1024",
    response_format="b64_json",
)

print(image.data[0].b64_json[:50])

Node.js SDK

Use the official openai npm package.

bash

npm install openai

Chat Completion

TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://llm.quickcasa.ai/v1',
  apiKey: 'sk-your-api-key',
});

const response = await client.chat.completions.create({
  model: 'qai-flash',
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);

Streaming

TypeScript

const stream = await client.chat.completions.create({
  model: 'qai-pro',
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Video Generation (fetch)

TypeScript

// Video generation uses a custom endpoint, so use fetch directly
const response = await fetch('https://llm.quickcasa.ai/v1/videos/generations', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk-your-api-key',
  },
  body: JSON.stringify({
    model: 'qai-motion',
    prompt: 'Aerial tour of a modern apartment complex',
    duration: 6,
  }),
});

const result = await response.json();
console.log(result.data[0].url);

cURL Quick Reference

Text

bash

curl -X POST https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-flash","messages":[{"role":"user","content":"Hi!"}]}'

Image

bash

curl -X POST https://llm.quickcasa.ai/v1/images/generations \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-imagine-turbo","prompt":"sunset over a city skyline"}'

Video

bash

curl -X POST https://llm.quickcasa.ai/v1/videos/generations \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-motion","prompt":"walkthrough of a modern kitchen","duration":6}'

Streaming

Every Qai text model supports Server-Sent Events streaming. Set stream: true on your chat completions request and Qai forwards tokens as they are generated. The response uses the standard OpenAI streaming envelope, so any client library that already handles OpenAI streaming works with Qai unmodified.

Streamed responses include a final usage chunk with prompt and completion token counts, so streaming and non-streaming billing land at the same precision. If a particular model variant ever skips the usage chunk, Qai falls back to a token estimate so billing never silently records zero.

python - streaming

from openai import OpenAI

client = OpenAI(base_url="https://llm.quickcasa.ai/v1", api_key="sk-...")

stream = client.chat.completions.create(
    model="qai-pro",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

node - streaming

const stream = await client.chat.completions.create({
  model: 'qai-pro',
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Migrating from OpenAI

If you have an existing app on the OpenAI API, the migration is exactly two changes:

1. Change the base URL

Point your OpenAI client at https://llm.quickcasa.ai/v1 instead of https://api.openai.com/v1.

2. Swap the model name

Pick the Qai tier that matches your old model. Rough equivalents:

gpt-4o-mini → qai-flash
gpt-4o → qai-pro or qai-max
gpt-4.1 → qai-max
o1, o3 → qai-think
dall-e-3 → qai-imagine-quality

Everything else - streaming, function calling, JSON mode, system prompts, tool calls - works identically.

diff

  const client = new OpenAI({
-   baseURL: 'https://api.openai.com/v1',
+   baseURL: 'https://llm.quickcasa.ai/v1',
    apiKey: process.env.QAI_API_KEY,
  });

  await client.chat.completions.create({
-   model: 'gpt-4o',
+   model: 'qai-pro',
    messages: [...],
  });

Best practices

Match the tier to the job

Do not default to qai-max for everything just because it is the most capable. A summarisation pipeline runs fine on qai-flash for a tenth of the cost. Use the dashboard's per-model breakdown to spot where you are over-spending.

Set per-key spend caps in the dashboard

Especially for production keys. A misbehaving cron job or a leaked key can otherwise spend its way through your monthly budget overnight. Caps mean the worst-case is "service degrades to 402" instead of "service degrades to my Stripe statement."

Use the free sandbox during development

qai-hello-world is unlimited in dev and capped at 100 calls per key per day in production. Wire your dev environment to it - your real spend should only start when you go to production.

Stream when the user is waiting

For chat-style interactions, set stream=true. A 4-second response feels instant when the first token arrives in 200ms; it feels broken when nothing happens for 4 seconds.

Set `hostMedia: true` for production images and videos

Without it, generated media is temporary and auto-expires. With it, you get a permanent URL on the Qai CDN that you can embed directly into emails, apps, or social posts without re-hosting.

Pin to a model, not a default

Always pass an explicit model field. Do not assume a default. Models can rev, behaviours can shift, and being explicit means your tests are reproducible.

Errors reference

Qai returns errors in OpenAI-compatible shape: { "error": { "message", "type", "code" } }. Below is the full list of error codes you might see.

HTTP 400 - invalid_request_error

missing_field - a required field (model, messages, prompt) was not in your request body.
invalid_model - the model id is not one Qai serves. Hit GET /v1/models for the live catalogue.
bad_request - the request body was malformed in some other way (bad JSON, wrong types, etc.).

HTTP 401 - invalid_request_error

unauthorized - the Authorization header was missing, malformed, or referred to a key that does not exist or is disabled.

HTTP 403 - invalid_request_error

forbidden - the key is valid but the associated account is inactive (cancelled, suspended, payment failed).

HTTP 429 - rate_limit_error

rate_limit_exceeded - account-wide rate limit hit. Retry with exponential backoff.
sandbox_quota_exceeded - the free qai-hello-world tier's per-key per-UTC-day quota was hit. Wait until midnight UTC or switch to qai-flash.

HTTP 502 - api_error

upstream_error - the generation service returned an error mid-request. The message field contains the underlying error text, which is usually safe to surface to your own users.

HTTP 500 - api_error

internal_error - something on our side broke. If you see this repeatedly, email hi@quickcasa.ai with the timestamp and we will dig in.

Rate limits

Qai applies two layers of rate limiting:

Account-wide

Every API key shares an account-wide rate limit that scales with your account tier. Default is 100 requests per minute. If you exceed it, you get a 429 with code: rate_limit_exceeded. The response includes a Retry-After header.

Per-key sandbox quota

The free qai-hello-world tier is capped at 100 calls per API key per UTC day. When you hit the cap, that specific key returns 429 with code: sandbox_quota_exceeded until midnight UTC. Other models on the same key keep working.

How to handle 429s

Respect the Retry-After header when present.
Use exponential backoff with jitter for retries (start at 500ms, double up to 30s).
If you regularly hit the account-wide limit, email us about lifting it - we can usually raise it within a business day.

Frequently asked questions

Is Qai really OpenAI-compatible?

Yes. Same endpoints, same request bodies, same streaming envelope, same error format. Drop in OpenAI's SDK, change the baseURL, and you are running on Qai. We test against the official openai-python and openai-node libraries.

Can I run Qai keys alongside OpenAI keys in the same app?

Of course - just instantiate two clients with different baseURLs. Some teams route cheap chat to Qai and reserve OpenAI for one specific feature they have already tuned. There is no exclusivity.

What is the actual latency?

Time-to-first-token on qai-flash and qai-pro is typically 200-400ms. qai-max and qai-think are slower (700ms+) since they are bigger models. Image generation is 3s (turbo) to 15s (quality). Video generation is 60-120s.

What if a model is temporarily unavailable?

You will get a 502 with a clear error message. Outages are rare but they happen - it is on the roadmap to automatically retry a failed generation against the next-best tier (e.g. qai-max failing over to qai-pro) so your app does not feel them.

Does Qai store my prompts?

We store the metadata needed for billing (model, token count, timestamp, API key id) but we do not store prompt content or completion content by default. If you want full request/response logging for your own debugging, that is opt-in per-key in your dashboard.

How do I cancel?

Sign in, head to your dashboard, click "Manage account" then "Cancel subscription". You stop being billed at the end of the current cycle. If you want all your data deleted too, email us and we will wipe everything.

I lost my API key. How do I recover it?

You cannot recover it - we only store the hashed version, not the raw key. Sign in, revoke the lost key, and create a new one. This is by design: even if our database leaked, your keys would not.

Can I get higher rate limits?

Yes. Email hi@quickcasa.ai with your account email and what you are building. Most lifts happen within a business day.

Authentication

Option 1: Bearer Token (recommended)

Option 2: API Key Header

Base URL

Error Handling

List Models

Response

Chat Completions

Request Body

Example: Non-streaming

Example: Streaming

Response

Image Generation

Request Body

Example

Response

Video Generation

Request Body

Example

Response

Python SDK

Chat Completion

Streaming

Image Generation

Node.js SDK

Chat Completion

Streaming

Video Generation (fetch)

cURL Quick Reference

Text

Image

Video

Streaming

Migrating from OpenAI

1. Change the base URL

2. Swap the model name

Best practices

Match the tier to the job

Set per-key spend caps in the dashboard

Use the free sandbox during development

Stream when the user is waiting

Set hostMedia: true for production images and videos

Pin to a model, not a default

Errors reference

HTTP 400 - invalid_request_error

HTTP 401 - invalid_request_error

HTTP 403 - invalid_request_error

HTTP 429 - rate_limit_error

HTTP 502 - api_error

HTTP 500 - api_error

Rate limits

Account-wide

Per-key sandbox quota

How to handle 429s

Frequently asked questions

Is Qai really OpenAI-compatible?

Can I run Qai keys alongside OpenAI keys in the same app?

What is the actual latency?

What if a model is temporarily unavailable?

Does Qai store my prompts?

How do I cancel?

I lost my API key. How do I recover it?

Can I get higher rate limits?

Set `hostMedia: true` for production images and videos