Open app
Moonborn — Developers

Drift detection

Every chat reply scored against the persona's voice fingerprint. Below the threshold, the reply ships; above it, the runtime alerts (and optionally enforces recovery).

A persona stays "in voice" by being measured. Each chat reply is embedded with the same model used to build the voice fingerprint, and the cosine distance between the two yields a drift score between 0 and 1.

What you get per reply

{
  "driftScore": 0.12,
  "driftThreshold": 0.30,
  "driftAlert": false
}

driftScore is the cosine distance. driftThreshold is the workspace value of engine.pipeline.drift_detection.threshold (default 0.30). driftAlert is the boolean — convenient for downstream consumers.

What causes drift

  • Long context. The system prompt's authority decays as conversation history grows.
  • Off-topic steering. Users push the persona into territory it was not generated for.
  • Provider model swap. Switching from Claude Opus to Sonnet changes the voice surface even with identical prompts.
  • Cross-tool calls. Tool responses inject system-like text that bleeds back into the reply tone.
  • High temperature. The variance reads as drift even when the persona is "still itself."

Recovery actions

engine.pipeline.drift_detection.action_on_alert controls what happens when the threshold trips:

  • warn (default) — the reply ships, the alert is logged, the webhook event persona.audit_failed (drift variant) emits.
  • auto_recover — Moonborn runs a single low-temperature regeneration with the fingerprint reference re-injected.
  • block — the reply is not returned; the caller gets a 409 Conflict with the drift envelope.

Threshold tuning

The default 0.30 is a balanced middle. Tighten for brand-safe surfaces:

  • Customer support, regulated content → 0.20.
  • General product chat → 0.30.
  • Creative play, character workshops → 0.45.

Per-persona overrides via the persona's runtime contract. See the Drift threshold tuning workshop.

API

  • Every chat reply (POST /v1/chat/sessions/{id}/messages) carries drift fields in its response.
  • Webhook event persona.audit_failed fires on alert (HMAC-signed, five retries).

Honest scope

Drift detection measures how close the reply is to the persona's voice. It doesn't measure factual accuracy or content safety — that's the moderation pipeline's job. A factually wrong reply with perfect voice still scores 0.05.