Rate limits
Per-tier API caps, headers, retry policy, and where to read the live remaining count.
Rate limiting is per-org and applied at the gateway. Three lever points: requests per minute, requests per day, and per-endpoint specialization (e.g. generation is more constrained than reads).
Per-tier defaults
| Tier | Req/min | Req/day | Concurrent generations |
|---|---|---|---|
| Free | 60 | 5,000 | 1 |
| Pro | 600 | 50,000 | 5 |
| Team | 3,000 | 250,000 | 25 |
| Enterprise | custom | custom | custom |
Generation endpoints (POST /v1/personas, POST /v1/personas/{id}/refine,
POST /v1/personas/{id}/fork) burn from the concurrent-generations
budget separately from the per-minute cap. Reads (GET *) don't.
Headers
Every response carries:
X-RateLimit-Limit: <per-minute cap>
X-RateLimit-Remaining: <remaining in current window>
X-RateLimit-Reset: <Unix timestamp when window resets>
On 429, additionally:
Retry-After: 12
(seconds until the next request is permitted)
Endpoint-specific caps
Some endpoints have tighter individual budgets to protect upstream LLM providers:
| Endpoint family | Pro cap |
|---|---|
POST /v1/personas (generation) | 60/hour, 5 concurrent |
POST /v1/personas/{id}/refine | 120/hour |
POST /v1/chat/sessions/{id}/messages | 600/min (counts against the general cap) |
POST /v1/personas/{id}/audit | 300/hour |
Cap multipliers per tier: Free × 0.1, Pro × 1, Team × 5, Enterprise custom.
Client-side backpressure
Don't wait for 429. The SDKs read X-RateLimit-Remaining after
every call; if you're below 10% of the cap with > 30 seconds to
reset, slow down. The SDK exposes a callback for custom
backpressure:
const client = new Moonborn({
apiKey: process.env.MOONBORN_API_KEY,
onRateLimitNearCap: ({ remaining, resetIn }) => {
if (remaining < 50) setMyOwnBackpressure(resetIn);
},
});Quota vs rate limit
These are different. Rate limit is per-minute / per-day shaped to
protect throughput. Quota is a tier-bound monthly cap on certain
operations (e.g. "100 generations/month on Pro"). A quota cap returns
quota_exceeded (429); the rate limit returns rate_limited (429).
Same status, different code in the envelope.
Enterprise
Per-org custom caps via contract. Configurable via
api.rate_limit.* config items (Owner role, Enterprise only).
Honest scope
Rate limits protect the system; they don't shape your application's UX. Buffer, queue, and retry in your code; treat 429 as routine, not exceptional.