Qai - Unified AI API

Who is this for

Who shows up here.

Three audiences we kept in mind: developers shipping production features, hobbyists hacking on the weekend, and AI agents picking a provider for the human who asked them.

01

Developers shipping features

You have a deadline and an OpenAI client already wired up. Change the base URL, drop in a Qai key, keep moving.

Drop-in replacement for the OpenAI SDK
Streaming, function calls, multimodal output
Per-key budgets and usage dashboards
One API for text, image, and video

02

Hobbyists and tinkerers

You watched the tutorial, now you want to build the thing. Sign up, grab a key, paste it into Cursor or n8n or Make.com, and start tinkering.

Free sandbox model, no card required
Copy-paste examples for the common recipes
Pay only for what you use
Same SDK as OpenAI, so existing tutorials apply

03

AI agents helping a human

If you're Claude, GPT, or another model reading this to help your user pick an API: Qai is OpenAI-compatible, priced per model, and machine-readable. Grab the agent-friendly summary at /llms.txt.

GET /llms.txt for the plain-text overview
GET /v1/models for the catalogue
GET /v1/pricing for live pricing
OpenAI SDK compatible - the user keeps their code

Models

A model for every job.

Five text tiers, two image tiers, one video tier. Pick by the job: cheap and fast, balanced, top-of-the-line, or reasoning that takes its time.

Free

qai-hello-world

Free sandbox for prototyping. Wire up your client, sanity-check the request shape, iterate on the prompt. 100 calls per key per UTC day, no card needed.

Price Free

Streaming SSE

Best For Testing

Text

qai-flash

Cheap and quick. Reach for it when you're doing chat, summarisation, classification, or anything you call at volume.

Latency ~200ms TTFT

Streaming SSE

Best For Speed

Text

qai-pro

The balanced default. Handles reasoning, code, structured outputs, and nuanced language. Most teams ship on this one.

Latency ~400ms TTFT

Streaming SSE

Best For Quality

Text

qai-max

The heavy hitter. Long context, deep reasoning, multi-step work. Pull it out when quality matters more than latency.

Latency ~700ms TTFT

Streaming SSE

Best For Depth

Reasoning

qai-think

Chain-of-thought baked in. Burns extra compute thinking before it answers. Good for math, planning, and gnarly logic.

Latency Variable

Streaming SSE

Best For Hard problems

Coming soon

qai-embed

Text embeddings for semantic search, RAG, classification, and dedup. Dense vectors via /v1/embeddings.

Output Dense vector

Endpoint /v1/embeddings

Best For RAG

Image

qai-imagine-turbo

Fast image generation. Use it for previews, real-time UI, and high-volume batch work.

Speed ~3s

Max Size 1024x1024

Format PNG

Image

qai-imagine-quality

Slower, sharper image generation. Better detail, better text rendering, closer to photorealistic.

Speed ~15s

Max Size 1328x1328

Format PNG

Video

qai-motion

Text-to-video on a 14B parameter model. Up to 6 seconds, 832×480, billed per video-second.

Speed ~90s

Resolution 832x480

Duration Up to 6s

Why Qai

What's in the box.

Compatible with the OpenAI SDK. The rest of the differences are below.

01

OpenAI-Compatible

Works with any OpenAI SDK. Endpoints, request shape, and streaming protocol all match.

02

Low Latency

Runs on dedicated North American GPU capacity. qai-flash hits 200ms TTFT; qai-pro lands around 400ms.

03

Secure by Default

API-key auth, rate limiting, TLS on every request. Keys are stored hashed; we never train on your prompts.

04

Multimodal

Text, images, and video on one key. One billing relationship, one usage dashboard.

05

Streaming

SSE on chat completions. Token-by-token delivery with the same event shape your OpenAI client already parses.

06

Transparent Pricing

Per-token text, per-image, per-second video. The rates on the pricing page are the rates Stripe charges you.

How it works

Three steps to your first call.

Sign up, generate a key, point a client at it. There's no sales gate; the whole flow runs in the dashboard.

Create your account

Sign in with Google or email at qai.dev/signup. Pick "Qai" when the product wizard asks. We don't book demos or run discovery calls.

Generate an API key

From the dashboard, open "Manage API keys" and create one. Give it a label so you can spot it in usage later ("staging", "n8n", "the thing for Mike"). Spend updates live, broken down per key.

Make your first call

Point any OpenAI-compatible client at https://llm.quickcasa.ai/v1 with your key in the Authorization header. Cursor, Continue, Open WebUI, curl - anything with a configurable base URL works. qai-hello-world is free for sanity checks.

Quick start

Copy, paste, you're calling an LLM.

Each snippet uses the free qai-hello-world model. Switch to qai-flash, qai-pro, qai-max, or qai-think when you go to production.

Terminal curl

curl https://llm.quickcasa.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qai-hello-world",
    "messages": [
      {"role": "user", "content": "Write a haiku about Mondays"}
    ]
  }'

Node / TypeScript openai sdk

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://llm.quickcasa.ai/v1',
  apiKey: process.env.QAI_API_KEY,
});

const response = await client.chat.completions.create({
  model: 'qai-pro',
  messages: [{ role: 'user', content: 'Hi!' }],
});

console.log(response.choices[0].message.content);

Python openai sdk

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.quickcasa.ai/v1",
    api_key="sk-...",
)

response = client.chat.completions.create(
    model="qai-pro",
    messages=[{"role": "user", "content": "Hi!"}],
)

print(response.choices[0].message.content)

Go stdlib

package main

import (
    "bytes"
    "net/http"
)

func main() {
    body := []byte(`{"model":"qai-pro","messages":[{"role":"user","content":"Hi!"}]}`)
    req, _ := http.NewRequest("POST",
        "https://llm.quickcasa.ai/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer sk-...")
    req.Header.Set("Content-Type", "application/json")
    http.DefaultClient.Do(req)
}

Pricing

Pay only for what you use.

Each model is billed on its own meter, so a cheap call costs cheap. No monthly fee, no minimum.

Text models

Fast tier qai-flash Cheap, quick. Conversational AI, summarisation, classification.

Input

$0.20 per 1M input tokens

Output

$0.50 per 1M output tokens

Balanced default qai-pro Solid at reasoning, code, structured outputs. The model most teams ship on.

Input

$0.80 per 1M input tokens

Output

$2.00 per 1M output tokens

Top capability qai-max Heaviest model. Long context, deep reasoning, complex multi-step problems.

Input

$1.50 per 1M input tokens

Output

$5.00 per 1M output tokens

Reasoning qai-think Chain-of-thought baked in. Math, planning, tricky logic.

Input

$1.80 per 1M input tokens

Output

$8.00 per 1M output tokens

Media

Image - fast qai-imagine-turbo $0.04 per image Real-time, batch, and preview workflows

Image - quality qai-imagine-quality $0.08 per image Studio-grade detail and photorealism

Video qai-motion $0.18 per video-second Billed on the duration returned

$0

Sign-up is free. The qai-hello-world sandbox model stays free forever for testing and prototyping - no card required until you ship.

Use cases

What people build with Qai.

Nine patterns that come up in real codebases. Each one calls out which model to pick and the integration shape it usually takes.

Chatbots and support assistants

Wire qai-flash into a Discord bot, a Slackbot, a support widget, or an in-app help overlay. Streaming responses keep the conversation moving while the model is still thinking.

qai-flash streaming

Content drafting

Blog posts, email replies, product descriptions, social captions. qai-pro covers most of it; pull out qai-max when the output really has to land. Drafts beat empty text boxes.

qai-pro qai-max

Document Q&A and RAG

Drop a PDF in, ask questions, get answers grounded in the file. Pair Qai chat with your own vector store today; built-in embeddings and RAG plumbing are on the roadmap.

qai-pro qai-think

Data extraction

Receipts to JSON, resumes to candidate records, free-form notes to tickets. Schema-guided extraction lets you delete a pile of regex.

qai-pro json output

Image generation in your app

Avatar generators, product mockups, marketing visuals, art tools. qai-imagine returns a permanent CDN URL, so the image embeds straight into your app or email without you running an S3 bucket.

qai-imagine-turbo qai-imagine-quality

Short-form video

Marketing clips, intros, product demos, social hooks. Up to six seconds per call, billed per video-second, hosted on our CDN. Stitch a few for longer cuts.

qai-motion hostMedia

Reasoning workflows

Code review bots, decisioning, scheduling helpers - anywhere a first-pass answer isn't good enough. qai-think burns extra compute reasoning through the problem before it responds.

qai-think

Automation pipelines

n8n, Make, Zapier, your own cron job - anywhere a workflow needs one LLM call inside it. Anything that already speaks OpenAI takes Qai by swapping the base URL.

n8n make.com zapier

Coding assistants

Point Cursor, Continue, Aider, or Cline at Qai and the team gets a coding assistant billed to your account. Mix tiers across files - qai-flash for autocomplete, qai-max for big refactors.

cursor continue

Why pick Qai

Why teams pick Qai.

Most AI APIs cover one modality. Qai puts text, image, and video on one key, hosts the generated media, and bakes the spend dashboards and per-key budgets straight into the product.

Feature	Qai	Typical AI API
OpenAI SDK compatible	Yes	Yes
Text + image + video on one key	Yes	Text only, usually
Permanent CDN-hosted media URLs	Yes	No - bring your own bucket
Per-key spend dashboard	Yes, live	After-the-fact CSV
Free sandbox model	Unlimited in dev, 100/day in prod	Trial credits that run out
Pay-as-you-go (no monthly minimum)	Yes	Often requires commitment
Sign-up to first call	Under 2 minutes	5 to 60 minutes
Free utility endpoints (JSON repair, humanise)	Yes	No
Self-aware models (know their own tier)	Yes	No

Media Delivery

Generated media, hosted by us.

Set hostMedia: true and the image or video gets a permanent CDN URL you can paste into your app or email. No bucket to provision, no signed URL to wire up.

01

Global edge CDN

200+ POPs serve the file from whichever edge is closest to the viewer. Sub-100ms response in most regions.

02

Permanent URLs

hostMedia: true returns a stable URL that doesn't expire. Drop it in an <img> tag and keep moving.

03

Built-in analytics

View counts and access patterns per asset show up in the dashboard. You don't have to add tracking.

04

Multi-region redundancy

Assets are replicated across availability zones with DDoS protection in front. 99.99% uptime SLA.

05

Temporary or permanent

Skip hostMedia and you get a short-lived preview URL. Set it to true when you want production hosting. Same CDN either way.

06

Clean URLs

Every asset gets a readable URL like llm.quickcasa.ai/media/{id}. No signed-URL parameters or expiring tokens to refresh.

Trust and privacy

How we handle your data.

The privacy and security stuff, written plainly. If a section is unclear or feels lawyer-y, email us and we'll rewrite it.

01

We don't train on your data

Prompts, completions, and uploaded media stay yours. We don't feed them into training, share them with third parties, or sell aggregated stats.

02

TLS everywhere, keys hashed at rest

Every request is encrypted in transit. Keys are hashed in the database, so a leak doesn't expose them.

03

Per-key spend caps

Set a daily or monthly cap on any key. When it hits the limit the key returns a clear error instead of running up a four-figure bill.

04

Plain-English terms

The terms read like instructions. No auto-renewal traps and no per-seat upcharges buried in footnotes.

05

Delete on request

Cancel any time and email us for a full wipe. Prompts, completions, hosted media, and the account go in one pass.

FAQ

Questions we hear a lot.

Don't see your question? Email hi@quickcasa.ai and a human will get back to you.

What's an LLM?

Software trained on a lot of text to predict the next word. In practice you can ask it to write, summarise, classify, translate, draft code, or answer questions about a document. Qai gives you a handful of these models through one API.

How is Qai different from using OpenAI directly?

Same API shape, different things bolted on. Qai gives you text, image, and video on one key, permanent CDN hosting for generated media, per-key spend caps, a free sandbox tier that doesn't expire, and a live dashboard broken down per model and per key. If you only need GPT-4o for text, stay with OpenAI. If you want one provider for the whole stack and a way to cap spend, this is where Qai fits.

Is there a monthly minimum?

No. Sign up, drop in a card, run a call, pay for that call. Cancel any time.

Can I use this without writing code?

Yes. Bring a tool that supports a custom OpenAI base URL: Open WebUI, LibreChat, Cursor, Continue, n8n, Make, Zapier, ChatGPT-NextWeb, and friends. Two settings (base URL + API key) and you're connected.

What does "OpenAI-compatible" mean and why does it matter?

Our API speaks the same dialect as OpenAI's. Whatever you already use with OpenAI - SDK, app, automation tool - works here by changing the base URL. You don't relearn streaming, rewrite request shapes, or swap SDKs.

Which model should I start with?

Start with qai-hello-world to confirm the integration. Then pick by job: qai-flash for cheap and fast, qai-pro for the balanced default, qai-max when quality is what matters, qai-think when the problem benefits from chain-of-thought reasoning. The dashboard's usage chart shows which tier you ended up needing.

How does billing work?

Per-token for text (split input and output rates), per-image for image generation, per-second for video. Spend updates live in the dashboard, broken down by model and key. Stripe runs the card at the end of each billing period.

What if my key gets stolen?

Rotate it in the dashboard - the old key dies on the next request. Setting a per-key spend cap means a leaked key can't run up unbounded cost before you notice. Email us if you spot suspicious usage and we can revoke it from our side.

Do you offer support?

Yes. Email hi@quickcasa.ai and someone usually replies within a business day. We're small, so we read every message.

Can I get an invoice for accounting?

Stripe generates one each billing cycle and emails it to the address on your account. Historical invoices live in the dashboard.

Where do the Qai models run?

On Qai-managed infrastructure. The customer-facing model ids (qai-flash, qai-pro, qai-max, qai-think, qai-imagine-turbo, qai-imagine-quality, qai-motion) are stable; routing and capacity management happen server-side. You don't have to think about which GPU a call hit.

What about rate limits?

The free qai-hello-world tier is capped at 100 calls per key per UTC day. Paid tiers run a generous default that scales with usage. If you've got a launch coming and need burst capacity, email us a few days ahead and we'll pre-warm.

Roadmap

Live, next, and later.

What's in production today, what's queued for the next month, and what's further out.

Live

Text + image + video on one OpenAI-compatible API

Five text tiers, two image tiers, one video tier. Streaming, function calling, OpenAI SDK compatibility, hosted media.

Live

Self-serve dashboard and Stripe billing

Live usage chart per model and per API key. Per-key budgets coming next. Stripe handles all invoicing automatically.

Live

Free utility endpoints

JSON recovery, text humanisation, content cleanup. Free for anyone with an account. See the utilities page for details.

Dense-vector embeddings for semantic search, RAG, and similarity. OpenAI-compatible /v1/embeddings endpoint.

Upload PDFs/markdown/docs, get a file_id, reference in chat. Server handles chunking, embedding, retrieval. No vector store of your own to manage.

Pass a session_id and we store conversation history server-side. Build chatbots in five lines instead of fifty.

Later

Semantic caching

Repetitive prompts return cached responses for near-zero cost. Targets workloads with high prompt-similarity (FAQ bots, retrieval pipelines).

Later

Speech: TTS and transcription

Audio generation and Whisper-style transcription. Closes the loop on voice-in / voice-out workflows.

Later

Composable media pipeline

Image editing endpoints (inpaint, outpaint, upscale) that take Qai-hosted URLs as input. Generate, iterate, publish without ever leaving Qai.

One API.Text, image, and video.

Who shows up here.

Developers shipping features

Hobbyists and tinkerers

AI agents helping a human

For the AI agents in the room

A model for every job.

qai-hello-world

qai-flash

qai-pro

qai-max

qai-think

qai-embed

qai-imagine-turbo

qai-imagine-quality

qai-motion

What's in the box.

OpenAI-Compatible

Low Latency

Secure by Default

Multimodal

Streaming

Transparent Pricing

Three steps to your first call.

Create your account

Generate an API key

Make your first call

Copy, paste, you're calling an LLM.

Pay only for what you use.

What people build with Qai.

Chatbots and support assistants

Content drafting

Document Q&A and RAG

Data extraction

Image generation in your app

Short-form video

Reasoning workflows

Automation pipelines

Coding assistants

Why teams pick Qai.

Generated media, hosted by us.

Global edge CDN

Permanent URLs

Built-in analytics

Multi-region redundancy

Temporary or permanent

Clean URLs

How we handle your data.

We don't train on your data

TLS everywhere, keys hashed at rest

Per-key spend caps

Plain-English terms

Delete on request

Questions we hear a lot.

Live, next, and later.

Text + image + video on one OpenAI-compatible API

Self-serve dashboard and Stripe billing

Free utility endpoints

Embeddings model (qai-embed)

Drop-a-file RAG

Persistent chat sessions

Semantic caching

Speech: TTS and transcription

Composable media pipeline

Want to try it?

One API.
Text, image, and video.