Skip to content

How Astrocyte works

A mid-level overview of Astrocyte’s architecture — what happens when you call retain, recall, or reflect, and how the pieces fit together.


Every interaction with Astrocyte uses one of four operations:

┌─────────────────────────────────────────────────┐
│ Your agent / app │
│ (LangGraph, CrewAI, MCP, REST client, …) │
└──────────────┬──────────────────────────────────┘
│ retain / recall / reflect / forget
┌─────────────────────────────────────────────────┐
│ Astrocyte │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Policy │ │ Pipeline │ │ Provider │ │
│ │ layer │→ │ stages │→ │ adapters │ │
│ └──────────┘ └──────────┘ └───────────────┘ │
└──────────────────────────────┬──────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
pgvector Qdrant Neo4j
(vector) (vector) (graph)
OperationWhat it does
retainIngest text → extract facts → deduplicate → chunk → embed → store in a memory bank
recallParse query → search vector + keyword + graph stores → rerank → return scored hits
reflectRun recall, then pass hits to an LLM to synthesize a natural-language answer
forgetDelete or archive memories by ID, tag, date, or full bank wipe

A bank is an isolated namespace for memories. Think of it as a folder — each bank has its own access control, rate limits, and PII policies.

Common patterns:

  • One bank per user — tenant isolation in a SaaS app (user-alice, user-bob)
  • Shared team bank — collective knowledge for a team (team-engineering)
  • Agent-scoped bank — each agent gets its own memory (agent-researcher, agent-reviewer)
  • Layered — personal bank takes priority, falls back to team, then global

Banks are created automatically on first retain(), or explicitly in astrocyte.yaml. See Bank management for multi-bank queries and lifecycle patterns.


When you call retain(), content flows through several stages:

  1. Policy check — access control, rate limits, token budgets
  2. PII scanning — regex and optional LLM-based detection; redact, warn, or reject
  3. Fact extraction — break content into discrete facts (configurable profiles)
  4. Deduplication — skip facts that already exist in the bank
  5. Chunking — split into embeddable pieces (sentence, dialogue, or fixed-size)
  6. Embedding — convert chunks to vectors via the configured LLM provider
  7. Storage — write to the configured backend (pgvector, Qdrant, etc.)

Each stage is configurable. The default pipeline works out of the box — override only what you need.


When you call recall():

  1. Query analysis — extract intent, entities, and keywords
  2. Multi-store search — semantic (vector similarity), keyword (BM25), and optional graph neighborhood
  3. Fusion — merge results from all stores using reciprocal rank fusion (RRF)
  4. Reranking — boost proper nouns, keyword overlap, and recency
  5. Policy filtering — enforce access control and DLP on returned hits
  6. Token budgeting — trim results to stay within the caller’s token budget

For multi-bank queries, you choose a strategy: parallel (search all banks at once), cascade (search in order, stop when enough results), or first_match (return from the first bank with hits).


Every operation passes through the policy layer before touching storage. The policy layer enforces:

PolicyWhat it does
Access controlPrincipals, permissions, and per-bank grants
PII barriersDetect and handle personally identifiable information
Rate limitsPer-bank and per-principal throttling
Token budgetsCap LLM token usage per operation
DeduplicationPrevent storing redundant content
Signal qualityMinimum relevance thresholds for recall

Policies are configured in astrocyte.yaml at the instance level and can be overridden per bank. See Configuration reference for the full schema.


Astrocyte uses a two-tier provider model:

Tier 1 — Storage providers (you pick your backend):

  • VectorStore — semantic search (pgvector, Qdrant, Elasticsearch)
  • GraphStore — relationship queries (Neo4j)
  • DocumentStore — full-text / keyword search (Elasticsearch, BM25)

Astrocyte runs its own pipeline (chunking, embedding, reranking, synthesis) and calls providers for storage and retrieval.

Tier 2 — Memory engine providers (the engine owns the pipeline):

  • Full memory engines like Mem0 or custom implementations
  • Astrocyte delegates retain/recall to the engine and applies policy around it

Most users start with Tier 1. See Storage backend setup for configuring backends.


The Memory Intent Protocol (MIP) lets you declare rules that automatically route memories to the right bank with the right tags and policies — without routing logic in your application code.

rules:
- name: support-tickets
match:
content_type: support_ticket
action:
bank: "support-{metadata.customer_id}"
tags: [support]

Mechanical rules resolve deterministically with zero LLM cost. When no rule matches, an optional intent layer can ask an LLM to decide. See MIP developer guide for the full DSL.


ModelWhen to use
LibraryEmbed in your Python app — brain = Astrocyte.from_config("astrocyte.yaml")
Standalone gatewayREST API via Docker — agents call /v1/retain, /v1/recall, /v1/reflect
Gateway pluginAdd memory to existing Kong/APISIX/Azure APIM — intercepts /chat/completions

All three models use the same core, policy layer, and providers. The gateway adds HTTP auth, rate limiting, and health endpoints. See Quick Start for setup instructions.