LLM gateway · control plane

One endpoint. Many models.

An OpenAI-compatible proxy in front of multiple providers — with routing, response caching, automatic fallback, streaming passthrough, and per-key rate limiting. Point any OpenAI client at /v1; the numbers below come straight off the live process, refreshed every 4s.

Drop-in OpenAI endpoint

curl $API_BASE/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile",
       "messages":[{"role":"user","content":"hi"}],
       "stream":true}'