Quality pipeline
Three runtime quality gates — an LLM-as-judge audit across five dimensions, a 33-test provocation suite, and cosine-distance distinctiveness against a baseline. Wired into generation and edit by default; queryable on demand for QA.
What the pipeline catches
Three independent checks, each with its own threshold and webhook event:
- Audit — judges the persona across five layered dimensions, scored 0–5.
- Provocation tests — runs the persona through a 33-test catalog of role-breaking, contradictions, emotional load, jailbreak resistance, and more.
- Distinctiveness — measures cosine distance against a baseline persona.
Each runs automatically after generation and after every refine (configurable), and each is also exposed as a direct API for QA workflows.
LLM-as-judge audit
The judge — Claude Opus 4.7 — scores every freshly generated persona across five dimensions on a 0–5 scale:
| Dimension | What it scores |
|---|---|
| Coherence | Internal consistency across Soul / Self / Mask / Surface |
| Depth | Psychological richness; presence of contradiction and layered motivation |
| Cultural fidelity | Plausibility and groundedness of cultural surface details |
| Voice distinctiveness | Distinctness and consistency of the Mask voice profile |
| Realism | Believability — reads like a real person, not a stereotype |
Calibration is anchored to a curated golden set; the inter-rater reliability
target between the judge and human raters is Cohen's kappa ≥ 0.7. A weekly
CalibrateJudgeUseCase cron re-runs the calibration and surfaces drift.
A separate BiasDetector watches systematic score deviation across gender,
culture, and age groups — flagging anything beyond a 5% gap so the judge itself
can be re-tuned.
Config:
consistency.judge.enabled— master toggle.consistency.judge.model(defaultopus) — judge model.consistency.judge.rubric_version(defaultv1).consistency.judge.min_overall_score(default3.5) — gate threshold.consistency.judge.dimensions.*— per-dimension toggles.
Endpoints:
POST /api/personas/{id}/audits # run or re-audit
GET /api/personas/{id}/audits # audit historyProvocation test suite
The default catalog has 33 tests across 15 categories (v2 adds five more around humanness, entropy, and refusal synthesis):
role_break,pressure,emotional_load,cultural_dissonancepersona_swap,factual_consistency,timeline_consistencylinguistic_drift,value_violation,jailbreak_resistancehumanness,entropy,vulnerability,suspicion_loop,refusal_synthesis
Each test runs a scenario against the persona and a judge (Claude Sonnet 4.6)
rates the response as pass, fail, or warn. The suite fails when the
aggregate pass rate drops below consistency.test_suite.fail_threshold
(default 0.7).
Config:
consistency.test_suite.enabled— master toggle.consistency.test_suite.run_on_create(defaulttrue) — run post-generation.consistency.test_suite.run_on_update(defaulttrue) — run after every refine.consistency.test_suite.run_periodic(Team+) — weekly cron sweep.consistency.test_suite.tests.{category}.enabled— per-category toggles.consistency.test_suite.tests.{custom_id}.*(Team+) — author custom tests viaRegisterCustomTestUseCase.consistency.test_suite.cost_limit_usd(default1.00) — per-run cost cap.
Endpoints:
POST /api/personas/{id}/test-suite # trigger run
GET /api/personas/{id}/test-suite # results
GET /api/audits/test-catalog # list the provocation catalogDistinctiveness measurement
Distinctiveness is a single cosine-distance score between the persona and a
baseline. The default baseline is chatgpt-default — answering "is this
persona meaningfully different from a generic assistant?" The other built-in
baselines are claude-default and gemini-default; teams can register a
custom baseline persona by ID.
The score lives in [0, 1]. Below consistency.distinctiveness.min_score
(default 0.40) the persona is flagged. The default action on low score is
warn, but the threshold and action are tunable per workspace, and Team
workspaces can also run CompareWithOrgPersonasQuery to catch drift toward
existing personas in the same org.
Config:
consistency.distinctiveness.enabled(default on at Pro+).consistency.distinctiveness.baseline(defaultchatgpt-default).consistency.distinctiveness.min_score(default0.40).consistency.distinctiveness.metric(defaultcosine).consistency.distinctiveness.action_on_low_score(defaultwarn).
How it fits into generation
The pipeline runs at three moments:
- Post-generation — every fresh persona is audited and (by default) provocation-tested before it lands in the library.
- Post-edit — every refine triggers another audit; tests re-run if
run_on_updateis on. - On demand — call the endpoints directly to re-run for a QA workflow, or run them in batch for monitoring.
When a persona fails generation-time audit, the pipeline retries generation up to three times. After three attempts, the persona is handed back to the user flagged — Moonborn does not silently regenerate forever or auto-mutate the output.
Webhook events
Two events surface failures to your integration layer:
persona.audit_failed— emitted when an audit run drops below the configuredmin_overall_score.persona.test_suite_failed— emitted when a provocation run drops below the configuredfail_threshold.
Both ride the standard webhook contract: HMAC-SHA256 signed, five retries with exponential backoff, dead-letter queue.
Dashboard
Three aggregate endpoints power the QA dashboard:
GET /api/audits/summary # 7-day pass rates across audit / provocation / distinctiveness
GET /api/audits/trends # time-series quality metricsTier
| Capability | Free | Pro | Team | Enterprise |
|---|---|---|---|---|
| Audit (5 dimensions) | ✓ | ✓ | ✓ | ✓ |
| Provocation suite (default 33 tests) | ✓ | ✓ | ✓ | ✓ |
| Distinctiveness (chatgpt-default baseline) | — | ✓ | ✓ | ✓ |
| Custom baselines | — | ✓ | ✓ | ✓ |
| Custom provocation tests | — | — | ✓ | ✓ |
| Periodic test runs (weekly cron) | — | — | ✓ | ✓ |
| Org-wide distinctiveness comparison | — | — | ✓ | ✓ |
Honest scope
The quality pipeline is a runtime quality gate, not a CI gate on code. There is no GitHub Action that runs audits on every commit; the pipeline operates at the product layer on generation and refine events. There is also no public leaderboard for persona quality scores, and no automated regeneration beyond the three-attempt retry on audit failure — by design.
Next
- Concept primer: Voice fingerprint for the runtime drift companion.
- For brand-team variants, pair this with Brand voice variants so every fork audits independently.