One API key, three kinds of output. Point your existing OpenAI SDK at our base URL and you're talking to text, image, and video models from the same call site.
curl https://llm.quickcasa.ai/v1/chat/completions \ -H "Authorization: Bearer sk-..." \ -H "Content-Type: application/json" \ -d '{ "model": "qai-flash", "messages": [ {"role": "user", "content": "Hello!"} ] }'
Three audiences we kept in mind: developers shipping production features, hobbyists hacking on the weekend, and AI agents picking a provider for the human who asked them.
You have a deadline and an OpenAI client already wired up. Change the base URL, drop in a Qai key, keep moving.
You watched the tutorial, now you want to build the thing. Sign up, grab a key, paste it into Cursor or n8n or Make.com, and start tinkering.
If you're Claude, GPT, or another model reading this to help your user pick an API: Qai is OpenAI-compatible, priced per model, and machine-readable. Grab the agent-friendly summary at /llms.txt.
GET /llms.txt for the plain-text overviewGET /v1/models for the catalogueGET /v1/pricing for live pricingFive text tiers, two image tiers, one video tier. Pick by the job: cheap and fast, balanced, top-of-the-line, or reasoning that takes its time.
Free sandbox for prototyping. Wire up your client, sanity-check the request shape, iterate on the prompt. 100 calls per key per UTC day, no card needed.
Cheap and quick. Reach for it when you're doing chat, summarisation, classification, or anything you call at volume.
The balanced default. Handles reasoning, code, structured outputs, and nuanced language. Most teams ship on this one.
The heavy hitter. Long context, deep reasoning, multi-step work. Pull it out when quality matters more than latency.
Chain-of-thought baked in. Burns extra compute thinking before it answers. Good for math, planning, and gnarly logic.
Text embeddings for semantic search, RAG, classification, and dedup. Dense vectors via /v1/embeddings.
Fast image generation. Use it for previews, real-time UI, and high-volume batch work.
Slower, sharper image generation. Better detail, better text rendering, closer to photorealistic.
Text-to-video on a 14B parameter model. Up to 6 seconds, 832×480, billed per video-second.
Compatible with the OpenAI SDK. The rest of the differences are below.
Works with any OpenAI SDK. Endpoints, request shape, and streaming protocol all match.
Runs on dedicated North American GPU capacity. qai-flash hits 200ms TTFT; qai-pro lands around 400ms.
API-key auth, rate limiting, TLS on every request. Keys are stored hashed; we never train on your prompts.
Text, images, and video on one key. One billing relationship, one usage dashboard.
SSE on chat completions. Token-by-token delivery with the same event shape your OpenAI client already parses.
Per-token text, per-image, per-second video. The rates on the pricing page are the rates Stripe charges you.
Sign up, generate a key, point a client at it. There's no sales gate; the whole flow runs in the dashboard.
Sign in with Google or email at qai.dev/signup. Pick "Qai" when the product wizard asks. We don't book demos or run discovery calls.
From the dashboard, open "Manage API keys" and create one. Give it a label so you can spot it in usage later ("staging", "n8n", "the thing for Mike"). Spend updates live, broken down per key.
Point any OpenAI-compatible client at https://llm.quickcasa.ai/v1 with your key in the Authorization header. Cursor, Continue, Open WebUI, curl — anything with a configurable base URL works. qai-hello-world is free for sanity checks.
Each snippet uses the free qai-hello-world model. Switch to qai-flash, qai-pro, qai-max, or qai-think when you go to production.
curl https://llm.quickcasa.ai/v1/chat/completions \ -H "Authorization: Bearer sk-..." \ -H "Content-Type: application/json" \ -d '{ "model": "qai-hello-world", "messages": [ {"role": "user", "content": "Write a haiku about Mondays"} ] }'
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://llm.quickcasa.ai/v1', apiKey: process.env.QAI_API_KEY, }); const response = await client.chat.completions.create({ model: 'qai-pro', messages: [{ role: 'user', content: 'Hi!' }], }); console.log(response.choices[0].message.content);
from openai import OpenAI client = OpenAI( base_url="https://llm.quickcasa.ai/v1", api_key="sk-...", ) response = client.chat.completions.create( model="qai-pro", messages=[{"role": "user", "content": "Hi!"}], ) print(response.choices[0].message.content)
package main import ( "bytes" "net/http" ) func main() { body := []byte(`{"model":"qai-pro","messages":[{"role":"user","content":"Hi!"}]}`) req, _ := http.NewRequest("POST", "https://llm.quickcasa.ai/v1/chat/completions", bytes.NewBuffer(body)) req.Header.Set("Authorization", "Bearer sk-...") req.Header.Set("Content-Type", "application/json") http.DefaultClient.Do(req) }
Each model is billed on its own meter, so a cheap call costs cheap. No monthly fee, no minimum.
qai-hello-world sandbox model stays free forever for testing and prototyping - no card required until you ship.
Nine patterns that come up in real codebases. Each one calls out which model to pick and the integration shape it usually takes.
Wire qai-flash into a Discord bot, a Slackbot, a support widget, or an in-app help overlay. Streaming responses keep the conversation moving while the model is still thinking.
Blog posts, email replies, product descriptions, social captions. qai-pro covers most of it; pull out qai-max when the output really has to land. Drafts beat empty text boxes.
Drop a PDF in, ask questions, get answers grounded in the file. Pair Qai chat with your own vector store today; built-in embeddings and RAG plumbing are on the roadmap.
Receipts to JSON, resumes to candidate records, free-form notes to tickets. Schema-guided extraction lets you delete a pile of regex.
Avatar generators, product mockups, marketing visuals, art tools. qai-imagine returns a permanent CDN URL, so the image embeds straight into your app or email without you running an S3 bucket.
Marketing clips, intros, product demos, social hooks. Up to six seconds per call, billed per video-second, hosted on our CDN. Stitch a few for longer cuts.
Code review bots, decisioning, scheduling helpers — anywhere a first-pass answer isn't good enough. qai-think burns extra compute reasoning through the problem before it responds.
n8n, Make, Zapier, your own cron job — anywhere a workflow needs one LLM call inside it. Anything that already speaks OpenAI takes Qai by swapping the base URL.
Point Cursor, Continue, Aider, or Cline at Qai and the team gets a coding assistant billed to your account. Mix tiers across files — qai-flash for autocomplete, qai-max for big refactors.
Most AI APIs cover one modality. Qai puts text, image, and video on one key, hosts the generated media, and bakes the spend dashboards and per-key budgets straight into the product.
| Feature | Qai | Typical AI API |
|---|---|---|
| OpenAI SDK compatible | Yes | Yes |
| Text + image + video on one key | Yes | Text only, usually |
| Permanent CDN-hosted media URLs | Yes | No - bring your own bucket |
| Per-key spend dashboard | Yes, live | After-the-fact CSV |
| Free sandbox model | Unlimited in dev, 100/day in prod | Trial credits that run out |
| Pay-as-you-go (no monthly minimum) | Yes | Often requires commitment |
| Sign-up to first call | Under 2 minutes | 5 to 60 minutes |
| Free utility endpoints (JSON repair, humanise) | Yes | No |
| Self-aware models (know their own tier) | Yes | No |
Set hostMedia: true and the image or video gets a permanent CDN URL you can paste into your app or email. No bucket to provision, no signed URL to wire up.
200+ POPs serve the file from whichever edge is closest to the viewer. Sub-100ms response in most regions.
hostMedia: true returns a stable URL that doesn't expire. Drop it in an <img> tag and keep moving.
View counts and access patterns per asset show up in the dashboard. You don't have to add tracking.
Assets are replicated across availability zones with DDoS protection in front. 99.99% uptime SLA.
Skip hostMedia and you get a short-lived preview URL. Set it to true when you want production hosting. Same CDN either way.
Every asset gets a readable URL like llm.quickcasa.ai/media/{id}. No signed-URL parameters or expiring tokens to refresh.
The privacy and security stuff, written plainly. If a section is unclear or feels lawyer-y, email us and we'll rewrite it.
Prompts, completions, and uploaded media stay yours. We don't feed them into training, share them with third parties, or sell aggregated stats.
Every request is encrypted in transit. Keys are hashed in the database, so a leak doesn't expose them.
Set a daily or monthly cap on any key. When it hits the limit the key returns a clear error instead of running up a four-figure bill.
The terms read like instructions. No auto-renewal traps and no per-seat upcharges buried in footnotes.
Cancel any time and email us for a full wipe. Prompts, completions, hosted media, and the account go in one pass.
Don't see your question? Email hi@quickcasa.ai and a human will get back to you.
qai-hello-world to confirm the integration. Then pick by job: qai-flash for cheap and fast, qai-pro for the balanced default, qai-max when quality is what matters, qai-think when the problem benefits from chain-of-thought reasoning. The dashboard's usage chart shows which tier you ended up needing.qai-hello-world tier is capped at 100 calls per key per UTC day. Paid tiers run a generous default that scales with usage. If you've got a launch coming and need burst capacity, email us a few days ahead and we'll pre-warm.What's in production today, what's queued for the next month, and what's further out.
Five text tiers, two image tiers, one video tier. Streaming, function calling, OpenAI SDK compatibility, hosted media.
Live usage chart per model and per API key. Per-key budgets coming next. Stripe handles all invoicing automatically.
JSON recovery, text humanisation, content cleanup. Free for anyone with an account. See the utilities page for details.
Dense-vector embeddings for semantic search, RAG, and similarity. OpenAI-compatible /v1/embeddings endpoint.
Upload PDFs/markdown/docs, get a file_id, reference in chat. Server handles chunking, embedding, retrieval. No vector store of your own to manage.
Pass a session_id and we store conversation history server-side. Build chatbots in five lines instead of fifty.
Repetitive prompts return cached responses for near-zero cost. Targets workloads with high prompt-similarity (FAQ bots, retrieval pipelines).
Audio generation and Whisper-style transcription. Closes the loop on voice-in / voice-out workflows.
Image editing endpoints (inpaint, outpaint, upscale) that take Qai-hosted URLs as input. Generate, iterate, publish without ever leaving Qai.
Sign up, drop in a card, grab a key. First API call lands in under a minute.