Skip to content

How Astrocyte works

A mid-level overview of Astrocyte’s architecture — what happens when you call retain, recall, or reflect, and how the pieces fit together.


Most interactions with Astrocyte use the memory operations below. retain, recall, reflect, and forget are the basic loop; history, audit, compile, and the M21 live-memory surface (create_directive, mental model CRUD, observation CRUD) extend it.

┌─────────────────────────────────────────────────┐
│ Your agent / app │
│ (LangGraph, CrewAI, MCP, REST client, …) │
└──────────────┬──────────────────────────────────┘
│ retain / recall / reflect / forget / history / audit / compile
┌─────────────────────────────────────────────────┐
│ Astrocyte │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Policy │ │ Pipeline │ │ Provider │ │
│ │ layer │→ │ stages │→ │ adapters │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
└──────────────────────────────┬──────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
pgvector Neo4j Qdrant
(vector) (graph, optional) (vector)
OperationWhat it does
retainIngest text → route through Document or Conversation Engine → extract entities and facts → deduplicate → embed → store in a memory bank
recallParse query → search vector + keyword + optional graph stores → RRF fusion → rerank → return scored hits; optional as_of for point-in-time queries
reflectRuns an agentic recall loop and synthesizes a finished natural-language answer with citations
forgetSoft-delete memories — writes forgotten_at timestamp; hard delete available; compliance audit support
historyConvenience wrapper around recall(as_of=...) for point-in-time snapshots
auditReason about absence — scan coverage of a scope in a bank, surface missing topics and thin areas, return AuditResult(gaps, coverage_score)
compileOptional wiki compile that materializes topic pages from raw memories for higher-quality recall
create_directiveAuthor a user-curated hard rule stored as a MentalModel(kind="directive") — consulted at recall time before raw memories
create/update/delete mental modelCRUD for curated, refreshable summaries with structured-doc schema and typed delta operations (M21)
create/list/delete observationCRUD for autonomously or manually authored observations with trend tracking (M21)

The audit operation is what separates Astrocyte from a retrieval index. A retrieval system finds what exists; audit surfaces what doesn’t.


A bank is an isolated namespace for memories. Think of it as a folder — each bank has its own access control, rate limits, and PII policies.

Common patterns:

  • One bank per user — tenant isolation in a SaaS app (user-alice, user-bob)
  • Shared team bank — collective knowledge for a team (team-engineering)
  • Agent-scoped bank — each agent gets its own memory (agent-researcher, agent-reviewer)
  • Layered — personal bank takes priority, falls back to team, then global

Banks are created automatically on first retain(), or explicitly in astrocyte.yaml. See Bank management for multi-bank queries and lifecycle patterns.


Astrocyte routes incoming content through one of two dedicated ingest engines before the Memory Engine stores it. Choosing the right engine gives the Memory Engine richer structural signals at recall time.

EngineInput shapeWhat it preserves
Document EnginePDFs, markdown, long-form textPageIndex tree (title + nested section hierarchy, line anchors, adaptive summaries)
Conversation EngineChat transcripts, turn arraysSpeaker identity, turn ordering, session boundaries, per-turn timestamps
Memory EngineOutput of either engineFacts, observations, mental models, section links, wiki pages, embeddings

Select the engine via content_type on retain(): "text" / "document" routes through the Document Engine; "conversation" routes through the Conversation Engine. Both write to the same Memory Engine downstream — so a chat session with an attached PDF can be ingested through both engines into one bank.


When you call retain(), content flows through several stages:

  1. Policy check — access control, rate limits, token budgets
  2. PII scanning — regex and optional LLM-based detection; redact, warn, or reject
  3. Engine routingcontent_type selects Document Engine (tree extraction) or Conversation Engine (turn-aware chunking)
  4. Fact + entity extraction — break content into discrete facts; optionally extract named entities and resolve aliases via LLM with evidence quotes
  5. Deduplication — skip facts that already exist in the bank
  6. Embedding — convert chunks to vectors via the configured LLM provider
  7. Storage — write to the configured backend; retained_at system timestamp is set at this point
  8. Observation consolidation — autonomous consolidator runs in the background; observations accumulate evidence and acquire computed trends (NEW / STRENGTHENING / STABLE / WEAKENING / STALE)

Each stage is configurable. The default pipeline works out of the box — override only what you need.


When you call recall():

  1. Query analysis — extract intent, entities, and keywords
  2. Time-travel filter — if as_of is set, apply retained_at <= as_of and forgotten_at > as_of (or forgotten_at IS NULL) to scope the search to the bank’s historical state
  3. Multi-store search — semantic (vector similarity), keyword (BM25), and optional graph neighborhood
  4. Fusion — merge results from all stores using reciprocal rank fusion (RRF)
  5. Reranking — boost proper nouns, keyword overlap, and recency
  6. Policy filtering — enforce access control and DLP on returned hits
  7. Token budgeting — trim results to stay within the caller’s token budget

For multi-bank queries, you choose a strategy: parallel (search all banks at once), cascade (search in order, stop when enough results), or first_match (return from the first bank with hits).

Time travel example:

# What did the team know about the deployment strategy before the incident?
hits = await brain.recall(
"deployment strategy",
bank_id="eng-team",
as_of=datetime(2026, 3, 1, tzinfo=timezone.utc),
)

brain.audit(scope, bank_id) asks: given what’s retained in this bank, what’s missing about this topic?

Unlike recall — which retrieves what exists — audit reasons about absence. It:

  1. Runs a scoped recall — fetch what the bank knows about the scope topic
  2. Invokes an LLM judge — given these memories, what should be here that isn’t? What’s thin? What contradicts?
  3. Returns AuditResult(gaps: list[GapItem], coverage_score: float)
result = await brain.audit("incident response procedures", bank_id="eng-team")
# AuditResult(
# gaps=[
# GapItem(topic="rollback procedures", severity="high", reason="no memories matching rollback or revert"),
# GapItem(topic="escalation path for database incidents", severity="medium", reason="only one memory, from 2025"),
# ],
# coverage_score=0.54,
# )

The coverage_score is a 0–1 float. Below 0.7 indicates significant gaps. Operators can alert on low scores for critical knowledge areas (compliance procedures, runbooks, customer commitments).


Every operation passes through the policy layer before touching storage. The policy layer enforces:

PolicyWhat it does
Access controlPrincipals, permissions, and per-bank grants
PII barriersDetect and handle personally identifiable information
Rate limitsPer-bank and per-principal throttling
Token budgetsCap LLM token usage per operation
DeduplicationPrevent storing redundant content
Signal qualityMinimum relevance thresholds for recall

Policies are configured in astrocyte.yaml at the instance level and can be overridden per bank. See Configuration reference for the full schema.


Astrocyte supports two kinds of providers:

Storage providers (you pick your backend; Astrocyte’s pipeline owns recall):

  • VectorStore — semantic search (pgvector, Qdrant)
  • GraphStore — relationship-aware recall and entity resolution (Neo4j). Optional — graph traversal over flat SQL tables (section_links, section_entities) is built-in without any graph adapter.
  • DocumentStore — full-text / keyword search (Elasticsearch, BM25)

Astrocyte runs its own pipeline (chunking, embedding, reranking, synthesis) and calls providers for storage and retrieval.

Engine providers (the engine owns the pipeline):

  • Full memory engines like Mem0 or custom implementations
  • Astrocyte delegates retain/recall to the engine and applies policy around it

Most users start with storage providers. See Storage backend setup for configuring backends.


The Memory Intent Protocol (MIP) lets you declare rules that automatically route memories to the right bank with the right tags and policies — without routing logic in your application code.

rules:
- name: support-tickets
match:
content_type: support_ticket
action:
bank: "support-{metadata.customer_id}"
tags: [support]

Mechanical rules resolve deterministically with zero LLM cost. When no rule matches, an optional intent layer can ask an LLM to decide. See MIP developer guide for the full DSL.


ModelWhen to use
LibraryEmbed in your Python app — brain = Astrocyte.from_config("astrocyte.yaml")
Standalone gatewayREST API via Docker — agents call /v1/retain, /v1/recall, /v1/reflect
Gateway pluginAdd memory to existing Kong/APISIX/Azure APIM — intercepts /chat/completions

All three models use the same core, policy layer, and providers. The gateway adds HTTP auth, rate limiting, and health endpoints. See Quick Start for setup instructions.