LLM gateway · control plane
One endpoint. Many models.
An OpenAI-compatible proxy in front of multiple providers — with routing, response caching, automatic fallback, streaming passthrough, and per-key rate limiting. Point any OpenAI client at /v1; the numbers below come straight off the live process, refreshed every 4s.
Drop-in OpenAI endpoint
curl $API_BASE/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.3-70b-versatile",
"messages":[{"role":"user","content":"hi"}],
"stream":true}'