Drift detection
Every chat reply scored against the persona's voice fingerprint. Below the threshold, the reply ships; above it, the runtime alerts (and optionally enforces recovery).
A persona stays "in voice" by being measured. Each chat reply is embedded with the same model used to build the voice fingerprint, and the cosine distance between the two yields a drift score between 0 and 1.
What you get per reply
{
"driftScore": 0.12,
"driftThreshold": 0.30,
"driftAlert": false
}driftScore is the cosine distance. driftThreshold is the workspace
value of engine.pipeline.drift_detection.threshold (default 0.30).
driftAlert is the boolean — convenient for downstream consumers.
What causes drift
- Long context. The system prompt's authority decays as conversation history grows.
- Off-topic steering. Users push the persona into territory it was not generated for.
- Provider model swap. Switching from Claude Opus to Sonnet changes the voice surface even with identical prompts.
- Cross-tool calls. Tool responses inject system-like text that bleeds back into the reply tone.
- High temperature. The variance reads as drift even when the persona is "still itself."
Recovery actions
engine.pipeline.drift_detection.action_on_alert controls what happens
when the threshold trips:
warn(default) — the reply ships, the alert is logged, the webhook eventpersona.audit_failed(drift variant) emits.auto_recover— Moonborn runs a single low-temperature regeneration with the fingerprint reference re-injected.block— the reply is not returned; the caller gets a409 Conflictwith the drift envelope.
Threshold tuning
The default 0.30 is a balanced middle. Tighten for brand-safe
surfaces:
- Customer support, regulated content →
0.20. - General product chat →
0.30. - Creative play, character workshops →
0.45.
Per-persona overrides via the persona's runtime contract. See the Drift threshold tuning workshop.
API
- Every chat reply (
POST /v1/chat/sessions/{id}/messages) carries drift fields in its response. - Webhook event
persona.audit_failedfires on alert (HMAC-signed, five retries).
Honest scope
Drift detection measures how close the reply is to the persona's voice. It doesn't measure factual accuracy or content safety — that's the moderation pipeline's job. A factually wrong reply with perfect voice still scores 0.05.