Open app
Moonborn — Developers

Quality pipeline

Three runtime quality gates — an LLM-as-judge audit across five dimensions, a 33-test provocation suite, and cosine-distance distinctiveness against a baseline. Wired into generation and edit by default; queryable on demand for QA.

What the pipeline catches

Three independent checks, each with its own threshold and webhook event:

  • Audit — judges the persona across five layered dimensions, scored 0–5.
  • Provocation tests — runs the persona through a 33-test catalog of role-breaking, contradictions, emotional load, jailbreak resistance, and more.
  • Distinctiveness — measures cosine distance against a baseline persona.

Each runs automatically after generation and after every refine (configurable), and each is also exposed as a direct API for QA workflows.

LLM-as-judge audit

The judge — Claude Opus 4.7 — scores every freshly generated persona across five dimensions on a 0–5 scale:

DimensionWhat it scores
CoherenceInternal consistency across Soul / Self / Mask / Surface
DepthPsychological richness; presence of contradiction and layered motivation
Cultural fidelityPlausibility and groundedness of cultural surface details
Voice distinctivenessDistinctness and consistency of the Mask voice profile
RealismBelievability — reads like a real person, not a stereotype

Calibration is anchored to a curated golden set; the inter-rater reliability target between the judge and human raters is Cohen's kappa ≥ 0.7. A weekly CalibrateJudgeUseCase cron re-runs the calibration and surfaces drift.

A separate BiasDetector watches systematic score deviation across gender, culture, and age groups — flagging anything beyond a 5% gap so the judge itself can be re-tuned.

Config:

  • consistency.judge.enabled — master toggle.
  • consistency.judge.model (default opus) — judge model.
  • consistency.judge.rubric_version (default v1).
  • consistency.judge.min_overall_score (default 3.5) — gate threshold.
  • consistency.judge.dimensions.* — per-dimension toggles.

Endpoints:

POST /api/personas/{id}/audits # run or re-audit
GET /api/personas/{id}/audits # audit history

Provocation test suite

The default catalog has 33 tests across 15 categories (v2 adds five more around humanness, entropy, and refusal synthesis):

  • role_break, pressure, emotional_load, cultural_dissonance
  • persona_swap, factual_consistency, timeline_consistency
  • linguistic_drift, value_violation, jailbreak_resistance
  • humanness, entropy, vulnerability, suspicion_loop, refusal_synthesis

Each test runs a scenario against the persona and a judge (Claude Sonnet 4.6) rates the response as pass, fail, or warn. The suite fails when the aggregate pass rate drops below consistency.test_suite.fail_threshold (default 0.7).

Config:

  • consistency.test_suite.enabled — master toggle.
  • consistency.test_suite.run_on_create (default true) — run post-generation.
  • consistency.test_suite.run_on_update (default true) — run after every refine.
  • consistency.test_suite.run_periodic (Team+) — weekly cron sweep.
  • consistency.test_suite.tests.{category}.enabled — per-category toggles.
  • consistency.test_suite.tests.{custom_id}.* (Team+) — author custom tests via RegisterCustomTestUseCase.
  • consistency.test_suite.cost_limit_usd (default 1.00) — per-run cost cap.

Endpoints:

POST /api/personas/{id}/test-suite # trigger run
GET /api/personas/{id}/test-suite # results
GET /api/audits/test-catalog # list the provocation catalog

Distinctiveness measurement

Distinctiveness is a single cosine-distance score between the persona and a baseline. The default baseline is chatgpt-default — answering "is this persona meaningfully different from a generic assistant?" The other built-in baselines are claude-default and gemini-default; teams can register a custom baseline persona by ID.

The score lives in [0, 1]. Below consistency.distinctiveness.min_score (default 0.40) the persona is flagged. The default action on low score is warn, but the threshold and action are tunable per workspace, and Team workspaces can also run CompareWithOrgPersonasQuery to catch drift toward existing personas in the same org.

Config:

  • consistency.distinctiveness.enabled (default on at Pro+).
  • consistency.distinctiveness.baseline (default chatgpt-default).
  • consistency.distinctiveness.min_score (default 0.40).
  • consistency.distinctiveness.metric (default cosine).
  • consistency.distinctiveness.action_on_low_score (default warn).

How it fits into generation

The pipeline runs at three moments:

  1. Post-generation — every fresh persona is audited and (by default) provocation-tested before it lands in the library.
  2. Post-edit — every refine triggers another audit; tests re-run if run_on_update is on.
  3. On demand — call the endpoints directly to re-run for a QA workflow, or run them in batch for monitoring.

When a persona fails generation-time audit, the pipeline retries generation up to three times. After three attempts, the persona is handed back to the user flagged — Moonborn does not silently regenerate forever or auto-mutate the output.

Webhook events

Two events surface failures to your integration layer:

  • persona.audit_failed — emitted when an audit run drops below the configured min_overall_score.
  • persona.test_suite_failed — emitted when a provocation run drops below the configured fail_threshold.

Both ride the standard webhook contract: HMAC-SHA256 signed, five retries with exponential backoff, dead-letter queue.

Dashboard

Three aggregate endpoints power the QA dashboard:

GET /api/audits/summary # 7-day pass rates across audit / provocation / distinctiveness
GET /api/audits/trends # time-series quality metrics

Tier

CapabilityFreeProTeamEnterprise
Audit (5 dimensions)
Provocation suite (default 33 tests)
Distinctiveness (chatgpt-default baseline)
Custom baselines
Custom provocation tests
Periodic test runs (weekly cron)
Org-wide distinctiveness comparison

Honest scope

The quality pipeline is a runtime quality gate, not a CI gate on code. There is no GitHub Action that runs audits on every commit; the pipeline operates at the product layer on generation and refine events. There is also no public leaderboard for persona quality scores, and no automated regeneration beyond the three-attempt retry on audit failure — by design.

Next