Moonborn — Developers

Rate limits

Per-tier API caps, headers, retry policy, and where to read the live remaining count.

Rate limiting is per-org and applied at the gateway. Three lever points: requests per minute, requests per day, and per-endpoint specialization (e.g. generation is more constrained than reads).

Per-tier defaults

Tier	Req/min	Req/day	Concurrent generations
Free	60	5,000	1
Pro	600	50,000	5
Team	3,000	250,000	25
Enterprise	custom	custom	custom

Generation endpoints (POST /v1/personas, POST /v1/personas/{id}/refine, POST /v1/personas/{id}/fork) burn from the concurrent-generations budget separately from the per-minute cap. Reads (GET *) don't.

Headers

Every response carries:

X-RateLimit-Limit:     <per-minute cap>
X-RateLimit-Remaining: <remaining in current window>
X-RateLimit-Reset:     <Unix timestamp when window resets>

On 429, additionally:

Retry-After: 12

(seconds until the next request is permitted)

Endpoint-specific caps

Some endpoints have tighter individual budgets to protect upstream LLM providers:

Endpoint family	Pro cap
`POST /v1/personas` (generation)	60/hour, 5 concurrent
`POST /v1/personas/{id}/refine`	120/hour
`POST /v1/chat/sessions/{id}/messages`	600/min (counts against the general cap)
`POST /v1/personas/{id}/audit`	300/hour

Cap multipliers per tier: Free × 0.1, Pro × 1, Team × 5, Enterprise custom.

Client-side backpressure

Don't wait for 429. The SDKs read X-RateLimit-Remaining after every call; if you're below 10% of the cap with > 30 seconds to reset, slow down. The SDK exposes a callback for custom backpressure:

const client = new Moonborn({
  apiKey: process.env.MOONBORN_API_KEY,
  onRateLimitNearCap: ({ remaining, resetIn }) => {
    if (remaining < 50) setMyOwnBackpressure(resetIn);
  },
});

Quota vs rate limit

These are different. Rate limit is per-minute / per-day shaped to protect throughput. Quota is a tier-bound monthly cap on certain operations (e.g. "100 generations/month on Pro"). A quota cap returns quota_exceeded (429); the rate limit returns rate_limited (429). Same status, different code in the envelope.

Enterprise

Per-org custom caps via contract. Configurable via api.rate_limit.* config items (Owner role, Enterprise only).

Honest scope

Rate limits protect the system; they don't shape your application's UX. Buffer, queue, and retry in your code; treat 429 as routine, not exceptional.