Moonborn — Developers

Long-term memory

Persona-scoped memory: short-term context, long-term pgvector retrieval, and the cold-tier rotation that keeps sessions grounded without blowing the context window.

Chat sessions accumulate. A long support conversation, a multi-week research panel, a creative writing project — all push past the model's working context. Moonborn's memory layer addresses this with three tiers.

Short-term: the active window

The last N turns ride in the prompt as-is, governed by chat.memory.short_term.window_turns (default 12). This is the fastest, highest-fidelity memory — it's literally in the LLM's attention.

Long-term: pgvector retrieval

Earlier turns get summarized and embedded with voyage-3-large (the default; configurable via engine.embedding.model). On each new turn, the runtime retrieves the top-K relevant chunks via hybrid search:

Semantic (cosine distance, pgvector).
BM25 lexical match (Postgres tsvector).
Rerank with a cross-encoder.
MMR (Maximum Marginal Relevance) to avoid redundancy.

Tuned by chat.memory.long_term.{top_k, retrieval_strategy}.

Cold tier

Chunks older than chat.memory.long_term.cold_tier_after_days (default 90) move to a slower storage class. They remain queryable, but the retrieval pass skips them unless the user explicitly references something older.

User-initiated forget

GDPR + product UX both want this: DELETE /v1/chat/sessions/{id}/memory/{chunk_id} removes a memory chunk. The persona forgets that specific fact for the session (other sessions are unaffected — memory is session-scoped, not persona-scoped).

API

GET /v1/chat/sessions/{id}/memory — list memory chunks.
DELETE /v1/chat/sessions/{id}/memory/{chunk_id} — forget.
POST /v1/chat/sessions/{id}/memory/summarize — manually trigger summarization (rare; runs automatically by default).

Tier

Short-term: Free and up. Long-term retrieval + cold tier: Pro and up (higher tiers get larger retention windows + bigger chunk caps).

Honest scope

Memory is session-scoped by default. A persona doesn't remember across sessions unless you wire it up — that's an chat.memory.cross_session.enabled opt-in flag (Team+), and it introduces complex privacy + provenance concerns. Read the memory configuration guide before turning it on.