Authentication

All API requests require authentication via an API key. You can pass your key using either the Authorization header (OpenAI-style) or the x-api-key header.

Option 1: Bearer Token (recommended)

Header
Authorization: Bearer sk-your-api-key-here

Option 2: API Key Header

Header
x-api-key: sk-your-api-key-here

Base URL

All API requests should be made to:

URL
https://llm.quickcasa.ai

Because the API is OpenAI-compatible, you can use the official OpenAI SDK by simply changing the base_url and api_key.

Error Handling

Errors follow the OpenAI error format:

JSON
{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "code": "unauthorized"
  }
}

Common HTTP status codes:

  • 400 - Bad request (missing or invalid parameters)
  • 401 - Unauthorised (invalid or missing API key)
  • 429 - Rate limit exceeded
  • 502 - Generation service error
  • 503 - Service temporarily unavailable

List Models

GET /v1/models

Returns a list of all available models.

cURL
curl https://llm.quickcasa.ai/v1/models \
  -H "Authorization: Bearer sk-your-api-key"

Response

JSON
{
  "object": "list",
  "data": [
    { "id": "qai-hello-world", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-flash", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-pro", "object": "model", "owned_by": "quickcasa", "type": "text" },
    { "id": "qai-imagine-turbo", "object": "model", "owned_by": "quickcasa", "type": "image" },
    { "id": "qai-imagine-quality", "object": "model", "owned_by": "quickcasa", "type": "image" },
    { "id": "qai-motion", "object": "model", "owned_by": "quickcasa", "type": "video" }
  ]
}

Chat Completions

POST /v1/chat/completions

Generate a text completion from a conversation. Fully compatible with the OpenAI chat completions API.

Request Body

Parameter Type Description
model required string One of qai-hello-world (free), qai-flash, qai-pro, qai-max, or qai-think
messages required array Array of message objects with role and content
stream optional boolean Enable Server-Sent Events streaming. Default: false
temperature optional number Sampling temperature (0–2). Default: 1
max_tokens optional integer Maximum tokens to generate
top_p optional number Nucleus sampling threshold
stop optional string | array Stop sequence(s)

Example: Non-streaming

cURL
curl https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-flash",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is QuickCasa?"}
    ],
    "temperature": 0.7
  }'

Example: Streaming

cURL
curl https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "qai-pro",
    "messages": [
      {"role": "user", "content": "Write a haiku about apartments."}
    ],
    "stream": true
  }'

Response

JSON
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "qai-flash",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "QuickCasa is a property management platform..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 42,
    "total_tokens": 67
  }
}

Image Generation

POST /v1/images/generations

Generate images from a text prompt. Compatible with the OpenAI images API.

Request Body

Parameter Type Description
model required string qai-imagine-turbo (fast, ~3s) or qai-imagine-quality (detailed, ~15s)
prompt required string A text description of the desired image
n optional integer Number of images to generate. Default: 1
size optional string Image size as WxH (e.g. 1024x1024). Defaults vary by model.
response_format optional string b64_json or url. Default: b64_json

Example

cURL
curl https://llm.quickcasa.ai/v1/images/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-imagine-turbo",
    "prompt": "A modern apartment building at golden hour, photorealistic",
    "size": "1024x1024",
    "response_format": "b64_json"
  }'

Response

JSON
{
  "created": 1700000000,
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANSUhEUg..."
    }
  ]
}

Video Generation

POST /v1/videos/generations

Generate short videos from a text prompt. This is a QuickCasa-specific endpoint (not part of the OpenAI spec). Videos are generated using our internal 14B-parameter video model.

Request Body

Parameter Type Description
model required string qai-motion
prompt required string A text description of the desired video
size optional string Video resolution as WxH. Default: 832x480
duration optional number Video duration in seconds. Default: 6
fps optional integer Frames per second. Default: 16

Example

cURL
curl https://llm.quickcasa.ai/v1/videos/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-motion",
    "prompt": "Aerial drone shot of a luxury condo complex, cinematic",
    "size": "832x480",
    "duration": 6
  }'

Response

JSON
{
  "created": 1700000000,
  "data": [
    {
      "url": "https://llm.quickcasa.ai/output/video/QC-Wan_00001.mp4",
      "content_type": "video/mp4",
      "duration_seconds": 6
    }
  ]
}

Note: Video generation can take up to 90 seconds depending on the duration and resolution requested. The request will block until the video is ready.


Python SDK

Use the official openai Python package. Just point it at our base URL.

bash
pip install openai

Chat Completion

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.quickcasa.ai/v1",
    api_key="sk-your-api-key",
)

response = client.chat.completions.create(
    model="qai-flash",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Streaming

Python
stream = client.chat.completions.create(
    model="qai-pro",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Image Generation

Python
image = client.images.generate(
    model="qai-imagine-turbo",
    prompt="A cozy studio apartment with warm lighting",
    size="1024x1024",
    response_format="b64_json",
)

print(image.data[0].b64_json[:50])

Node.js SDK

Use the official openai npm package.

bash
npm install openai

Chat Completion

TypeScript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://llm.quickcasa.ai/v1',
  apiKey: 'sk-your-api-key',
});

const response = await client.chat.completions.create({
  model: 'qai-flash',
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);

Streaming

TypeScript
const stream = await client.chat.completions.create({
  model: 'qai-pro',
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Video Generation (fetch)

TypeScript
// Video generation uses a custom endpoint, so use fetch directly
const response = await fetch('https://llm.quickcasa.ai/v1/videos/generations', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk-your-api-key',
  },
  body: JSON.stringify({
    model: 'qai-motion',
    prompt: 'Aerial tour of a modern apartment complex',
    duration: 6,
  }),
});

const result = await response.json();
console.log(result.data[0].url);

cURL Quick Reference

Text

bash
curl -X POST https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-flash","messages":[{"role":"user","content":"Hi!"}]}'

Image

bash
curl -X POST https://llm.quickcasa.ai/v1/images/generations \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-imagine-turbo","prompt":"sunset over a city skyline"}'

Video

bash
curl -X POST https://llm.quickcasa.ai/v1/videos/generations \
  -H "Authorization: Bearer $QC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qai-motion","prompt":"walkthrough of a modern kitchen","duration":6}'

Streaming

Every Qai text model supports Server-Sent Events streaming. Set stream: true on your chat completions request and Qai forwards tokens as they are generated. The response uses the standard OpenAI streaming envelope, so any client library that already handles OpenAI streaming works with Qai unmodified.

Streamed responses include a final usage chunk with prompt and completion token counts, so streaming and non-streaming billing land at the same precision. If a particular model variant ever skips the usage chunk, Qai falls back to a token estimate so billing never silently records zero.

python - streaming
from openai import OpenAI

client = OpenAI(base_url="https://llm.quickcasa.ai/v1", api_key="sk-...")

stream = client.chat.completions.create(
    model="qai-pro",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
node - streaming
const stream = await client.chat.completions.create({
  model: 'qai-pro',
  messages: [{ role: 'user', content: 'Tell me a story.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Migrating from OpenAI

If you have an existing app on the OpenAI API, the migration is exactly two changes:

1. Change the base URL

Point your OpenAI client at https://llm.quickcasa.ai/v1 instead of https://api.openai.com/v1.

2. Swap the model name

Pick the Qai tier that matches your old model. Rough equivalents:

  • gpt-4o-miniqai-flash
  • gpt-4oqai-pro or qai-max
  • gpt-4.1qai-max
  • o1, o3qai-think
  • dall-e-3qai-imagine-quality

Everything else - streaming, function calling, JSON mode, system prompts, tool calls - works identically.

diff
  const client = new OpenAI({
-   baseURL: 'https://api.openai.com/v1',
+   baseURL: 'https://llm.quickcasa.ai/v1',
    apiKey: process.env.QAI_API_KEY,
  });

  await client.chat.completions.create({
-   model: 'gpt-4o',
+   model: 'qai-pro',
    messages: [...],
  });

Best practices

Match the tier to the job

Do not default to qai-max for everything just because it is the most capable. A summarisation pipeline runs fine on qai-flash for a tenth of the cost. Use the dashboard's per-model breakdown to spot where you are over-spending.

Set per-key spend caps in the dashboard

Especially for production keys. A misbehaving cron job or a leaked key can otherwise spend its way through your monthly budget overnight. Caps mean the worst-case is "service degrades to 402" instead of "service degrades to my Stripe statement."

Use the free sandbox during development

qai-hello-world is unlimited in dev and capped at 100 calls per key per day in production. Wire your dev environment to it - your real spend should only start when you go to production.

Stream when the user is waiting

For chat-style interactions, set stream=true. A 4-second response feels instant when the first token arrives in 200ms; it feels broken when nothing happens for 4 seconds.

Set hostMedia: true for production images and videos

Without it, generated media is temporary and auto-expires. With it, you get a permanent URL on the Qai CDN that you can embed directly into emails, apps, or social posts without re-hosting.

Pin to a model, not a default

Always pass an explicit model field. Do not assume a default. Models can rev, behaviours can shift, and being explicit means your tests are reproducible.


Errors reference

Qai returns errors in OpenAI-compatible shape: { "error": { "message", "type", "code" } }. Below is the full list of error codes you might see.

HTTP 400 - invalid_request_error

  • missing_field - a required field (model, messages, prompt) was not in your request body.
  • invalid_model - the model id is not one Qai serves. Hit GET /v1/models for the live catalogue.
  • bad_request - the request body was malformed in some other way (bad JSON, wrong types, etc.).

HTTP 401 - invalid_request_error

  • unauthorized - the Authorization header was missing, malformed, or referred to a key that does not exist or is disabled.

HTTP 403 - invalid_request_error

  • forbidden - the key is valid but the associated account is inactive (cancelled, suspended, payment failed).

HTTP 429 - rate_limit_error

  • rate_limit_exceeded - account-wide rate limit hit. Retry with exponential backoff.
  • sandbox_quota_exceeded - the free qai-hello-world tier's per-key per-UTC-day quota was hit. Wait until midnight UTC or switch to qai-flash.

HTTP 502 - api_error

  • upstream_error - the generation service returned an error mid-request. The message field contains the underlying error text, which is usually safe to surface to your own users.

HTTP 500 - api_error

  • internal_error - something on our side broke. If you see this repeatedly, email hi@quickcasa.ai with the timestamp and we will dig in.

Rate limits

Qai applies two layers of rate limiting:

Account-wide

Every API key shares an account-wide rate limit that scales with your account tier. Default is 100 requests per minute. If you exceed it, you get a 429 with code: rate_limit_exceeded. The response includes a Retry-After header.

Per-key sandbox quota

The free qai-hello-world tier is capped at 100 calls per API key per UTC day. When you hit the cap, that specific key returns 429 with code: sandbox_quota_exceeded until midnight UTC. Other models on the same key keep working.

How to handle 429s

  1. Respect the Retry-After header when present.
  2. Use exponential backoff with jitter for retries (start at 500ms, double up to 30s).
  3. If you regularly hit the account-wide limit, email us about lifting it - we can usually raise it within a business day.

Frequently asked questions

Is Qai really OpenAI-compatible?

Yes. Same endpoints, same request bodies, same streaming envelope, same error format. Drop in OpenAI's SDK, change the baseURL, and you are running on Qai. We test against the official openai-python and openai-node libraries.

Can I run Qai keys alongside OpenAI keys in the same app?

Of course - just instantiate two clients with different baseURLs. Some teams route cheap chat to Qai and reserve OpenAI for one specific feature they have already tuned. There is no exclusivity.

What is the actual latency?

Time-to-first-token on qai-flash and qai-pro is typically 200-400ms. qai-max and qai-think are slower (700ms+) since they are bigger models. Image generation is 3s (turbo) to 15s (quality). Video generation is 60-120s.

What if a model is temporarily unavailable?

You will get a 502 with a clear error message. Outages are rare but they happen - it is on the roadmap to automatically retry a failed generation against the next-best tier (e.g. qai-max failing over to qai-pro) so your app does not feel them.

Does Qai store my prompts?

We store the metadata needed for billing (model, token count, timestamp, API key id) but we do not store prompt content or completion content by default. If you want full request/response logging for your own debugging, that is opt-in per-key in your dashboard.

How do I cancel?

Sign in, head to your dashboard, click "Manage account" then "Cancel subscription". You stop being billed at the end of the current cycle. If you want all your data deleted too, email us and we will wipe everything.

I lost my API key. How do I recover it?

You cannot recover it - we only store the hashed version, not the raw key. Sign in, revoke the lost key, and create a new one. This is by design: even if our database leaked, your keys would not.

Can I get higher rate limits?

Yes. Email hi@quickcasa.ai with your account email and what you are building. Most lifts happen within a business day.