Moonborn — Developers

Drift detection

Every chat reply scored against the persona's voice fingerprint. Below the threshold, the reply ships; above it, the runtime alerts (and optionally enforces recovery).

A persona stays "in voice" by being measured. Each chat reply is embedded with the same model used to build the voice fingerprint, and the cosine distance between the two yields a drift score between 0 and 1.

What you get per reply

{
  "driftScore": 0.12,
  "driftThreshold": 0.30,
  "driftAlert": false
}

driftScore is the cosine distance. driftThreshold is the workspace value of engine.pipeline.drift_detection.threshold (default 0.30). driftAlert is the boolean — convenient for downstream consumers.

What causes drift

Long context. The system prompt's authority decays as conversation history grows.
Off-topic steering. Users push the persona into territory it was not generated for.
Provider model swap. Switching from Claude Opus to Sonnet changes the voice surface even with identical prompts.
Cross-tool calls. Tool responses inject system-like text that bleeds back into the reply tone.
High temperature. The variance reads as drift even when the persona is "still itself."

Recovery actions

engine.pipeline.drift_detection.action_on_alert controls what happens when the threshold trips:

warn (default) — the reply ships, the alert is logged, the webhook event persona.audit_failed (drift variant) emits.
auto_recover — Moonborn runs a single low-temperature regeneration with the fingerprint reference re-injected.
block — the reply is not returned; the caller gets a 409 Conflict with the drift envelope.

Threshold tuning

The default 0.30 is a balanced middle. Tighten for brand-safe surfaces:

Customer support, regulated content → 0.20.
General product chat → 0.30.
Creative play, character workshops → 0.45.

Per-persona overrides via the persona's runtime contract. See the Drift threshold tuning workshop.

API

Every chat reply (POST /v1/chat/sessions/{id}/messages) carries drift fields in its response.
Webhook event persona.audit_failed fires on alert (HMAC-signed, five retries).

Honest scope

Drift detection measures how close the reply is to the persona's voice. It doesn't measure factual accuracy or content safety — that's the moderation pipeline's job. A factually wrong reply with perfect voice still scores 0.05.