Skip to content

Astrocyte framework architecture

This document defines the layer boundaries, composition model, and relationship to adjacent systems (memory engines, LLM gateways, storage backends, optional outbound HTTP/credential gateways, authentication (AuthN) and authorization (AuthZ) integration) for the Astrocyte open-source framework.

AuthN / AuthZ in one sentence: proving who the caller is (AuthN) is the application’s job (IdP, tokens, API keys); Astrocyte consumes a principal string. Deciding what that principal may do on each memory bank (AuthZ) is enforced in the framework via configurable grants and optional external policy engines - see section 4.6, access-control.md, and identity-and-external-policy.md.

For the neuroscience foundations, see neuroscience-astrocyte.md. For the design principles these layers implement, see design-principles.md.


Astrocyte is an open-source memory framework that sits between AI agents and memory storage. It provides:

  • A stable API for agents to store, retrieve, and synthesize memories.
  • A built-in intelligence pipeline (embedding, entity extraction, multi-strategy retrieval, fusion, reranking) so users get a fully functional memory system with just Astrocyte + any storage backend.
  • A pluggable provider interface at two tiers: Tier 1 retrieval adapters (vector / graph / lexical stores, including warehouse or lakehouse serving surfaces when you implement the Retrieval SPI against their query APIs) and Tier 2 memory engine providers (Mystique, Mem0, Zep) that bring their own pipeline.
  • An optional outbound transport plugin surface for credential gateways and enterprise proxies (HTTP/TLS/proxy configuration shared by LLM adapters and other outbound HTTP) - orthogonal to memory tiers; see section 4.5 and outbound-transport.md.
  • An optional memory export sink surface for warehouses, lakehouses, and open table formats (event-oriented durability for BI and compliance—not online recall); see section 4.4, storage-and-data-planes.md, and memory-export-sink.md.
  • AuthN / AuthZ integration: Authentication (AuthN) - external IdPs and middleware map credentials to an opaque principal; Astrocyte does not validate passwords or issue tokens. Authorization (AuthZ) - per-bank read / write / forget / admin checks run in the core (access-control.md); an optional AccessPolicyProvider can delegate allow/deny to enterprise PDPs (OPA, Cerbos, Casbin, …) - see section 4.6 and identity-and-external-policy.md.
  • A policy layer that enforces neuroscience-inspired governance (homeostasis, barriers, pruning, observability) regardless of which backend is plugged in.

Astrocyte is not an LLM gateway. It does not route completion requests, track LLM spend, or normalize chat formats. That is the job of LLM gateways and model aggregators — for example LiteLLM, Portkey, OpenRouter, Vercel AI Gateway, or your cloud provider’s unified model APIs — and of direct first-party SDKs when you call each vendor without an intermediary.

Astrocyte is not an agent runtime. It does not define agent orchestration: graphs, steps, tool loops, checkpoints, scheduling, or multi-agent routing. Those concerns belong to agent frameworks and your application (LangGraph, CrewAI, Pydantic AI, custom orchestrators, …). The framework contract is memory, governance, and provider SPIs; thin adapters connect frameworks to that API - see agent-framework-middleware.md.

Context engineering vs harness engineering

Section titled “Context engineering vs harness engineering”

Two labels often separate what the model sees from how the system runs around it:

  • Context engineering — Shaping the information that reaches the model when it acts: prompts, message structure, truncation, which retrieval hits to include, how memory snippets are worded in the window, tool observations as text, and token discipline for what is pasted into the next completion. Success is about relevance, faithfulness, and fit inside the context window.

  • Harness engineering — Building the runtime shell around the model: orchestration graphs, steps, tool or MCP wiring, checkpoints, retries, scheduling, multi-agent handoff, sandbox boundaries, edge AuthN, and telemetry. Success is about reliable control flow and safe repeatability.

The harness calls memory and tools and assembles the next prompt; context engineering chooses how results are distilled into that prompt. In practice teams blend both—the split is vocabulary, not a hard wall—but it clarifies what Astrocyte does not own (the harness) vs what it enables (governed evidence for the prompt).

Where Astrocyte sits: It is not a harness (see not an agent runtime above). It is the memory and retrieval substrate and policy layer that feeds context engineering: durable retain / recall / reflect, hybrid retrieval, fusion, reranking, token budgets inside the pipeline, and governance (PII, quotas, access control). Your application still owns the final chat layout—how recall hits become system vs user messages—Astrocyte supplies consistent, auditable memory, not the entire transcript design.

That boundary is the same “slot” many curricula label the Context & Memory plane: governed memory and cognition support—not “vectors only” or ad-hoc RAG. For a vendor-neutral eight-plane framing (and how it relates to control outside the agent loop), see the Applied AI Fellowship. A vocabulary crosswalk between that coursework and Astrocyte primitives is in Fellowship curriculum mapping.

Agent cards and catalogs: Many products describe agents with agent cards or registry metadata. Astrocyte does not execute those cards or own the catalog, but it does aim to understand them at the memory boundary: a small, explicit mapping from card identity to principal + memory bank (and optional defaults), declared in config and used by integrations, so memory calls stay consistent without one-off logic in every app. See agent-framework-middleware.md.

Sandbox awareness: Execution sandboxes (containers, gVisor, microVMs, WASM, OS permission fences) limit code isolation; they do not by themselves stop memory APIs from becoming an exfiltration path if recall is mis-scoped or egress is wide open. Astrocyte is sandbox-aware in the sense of binding principal + bank + environment/sandbox context consistently and documenting Backend for Frontend (BFF) and network expectations—see sandbox-awareness-and-exfiltration.md.

Implementation language: Astrocyte ships as two parallel implementations in this repository, intended as drop-in replacements at the framework contract: astrocyte-py/ (Python, PyPI package astrocyte) and astrocyte-rs/ (Rust). Portable DTOs, config, and SPI versioning keep them aligned. See implementation-language-strategy.md for constraints and packaging.

Astrocyte is the tripartite synapse (Principle 2): an active mediator at the exchange between agents and memory, responsible for both the intelligence pipeline and continuous environmental stewardship.


The central architectural decision: providers come in two tiers, and the framework adapts its behavior based on which tier is active.

Tier 1: Retrieval providers (retrieval backends)

Section titled “Tier 1: Retrieval providers (retrieval backends)”

Tier 1 vs blob storage: Tier 1 is not generic object or blob storage (e.g. S3). The Retrieval Provider SPI covers retrieval backends - databases you query for evidence: dense (embedding) search, sparse (lexical / keyword) search, and graph-structured traversal. Astrocyte splits that into three protocols:

SPIRole in hybrid retrievalDedicated backends (typical)Warehouse / lakehouse serving layers (adapter targets)
VectorStore (required)Dense retrieval — similarity over embeddingsVector database, ANN index, semantic searchWarehouse vector columns + distance SQL; lake query engine (e.g. Trino-class SQL over Iceberg/Delta with vector-friendly schemas); OLAP serving tier fed from the lake; embedded SQL (e.g. DuckDB) over Parquet; Backend for Frontend (BFF) wrapping any of these APIs
GraphStore (optional)Structured retrieval — entities and linksKnowledge graph, property graphFew native graph traversal APIs in warehouses; common patterns: Backend for Frontend (BFF) to a graph database, or SQL-shaped entity/edge tables behind a lake/warehouse engine with an adapter that maps graph operations to joins / hops
DocumentStore (optional)Sparse retrieval — keywords / full-textBM25, inverted index, lexical searchWarehouse search / JSON / full-text features where available; sidecar lexical search (often beside lake exports); Backend for Frontend (BFF) to OpenSearch- or Elasticsearch-class indexes

Serving layer is a deployment pattern, not a fourth SPI: whatever query or search API sits in front of curated tables (native warehouse endpoint, distributed SQL on the lake, OLAP acceleration, embedded analytics SQL, or your own HTTP façade). The three rows above stay the contract; the fourth column is where vendors often host those operations for warehouse / lakehouse estates.

Together these are the retrieval substrate the built-in pipeline orchestrates (multi-strategy retrieval, fusion, reranking). Adapters implement protocol methods against those surfaces, not raw object buckets.

Examples (dedicated retrieval infrastructure): pgvector, Pinecone, Qdrant, Weaviate, Neo4j, Memgraph. Plus custom Tier 1 packages that target warehouse / lakehouse serving (vector SQL, Trino-class lake SQL, OLAP, DuckDB-on-Parquet, …) when your adapter implements the protocols above with acceptable recall latency—see the fourth column of the table and Lakehouse and warehouse-backed recall below.

When to use: Users who want a fully functional memory system using their existing retrieval database infrastructure, without purchasing a commercial memory engine.

Lakehouse and warehouse-backed recall (serving layers)

Section titled “Lakehouse and warehouse-backed recall (serving layers)”

Tier 1 adapters target VectorStore / GraphStore / DocumentStore, not a vendor brand. Lakehouse- or warehouse-backed recall is in scope when a serving layer exposes a query or search surface the adapter can implement: native warehouse vector SQL, a query engine on open tables (Trino-class), OLAP in front of the lake, or an HTTP Backend for Frontend (BFF) that runs the right calls—see storage-and-data-planes.md §1.

That online path is not the Memory export sink SPI: sinks emit / flush governed events for analytics; they do not substitute for search_similar unless you also operate this Tier 1-style retrieval integration with suitable latency SLOs. Teams often combine both—e.g. Tier 1 recall on pgvector and a sink to Iceberg/BigQuery—or Tier 1 backed by warehouse SQL and the same sink for long-term tables.

Full-stack memory engines that handle the entire pipeline internally - from content ingestion through retrieval and synthesis. When a memory engine provider is active, Astrocyte’ built-in pipeline steps aside. The framework only applies governance (policy layer), not intelligence.

Examples: Mystique (proprietary), Mem0, Zep, Letta, Cognee

When to use: Users who want a specialized memory engine with its own retrieval strategies, fusion algorithms, and synthesis capabilities.

Operational retrieval vs analytical persistence

Section titled “Operational retrieval vs analytical persistence”

This frame contrasts both agent-time tiers (Tier 1 and Tier 2) with the export plane—it is not a Tier-1-only topic.

Tier 1 / Tier 2 answer agent-time memory: indexed recall and hybrid retrieval backed by vector, graph, and lexical stores—or a full engine that owns those concerns.

Analytical persistence answers durable history and warehouse-scale analysis: writing governed events or snapshots to SQL warehouses, lakehouse tables (Iceberg, Delta, Hudi, Paimon, …; Parquet files), and similar systems for BI, compliance, and ML datasets. That path is orthogonal to the two-tier model: it does not replace VectorStore, and it is not generic blob storage for unstructured dumps. It uses a separate Memory Export Sink SPI (see §4.4 and memory-export-sink.md)—event-oriented (emit / flush), not search_similar.

When the built-in pipeline is active (provider_tier: storage), warehouse- or lake-backed online recall is implemented via Tier 1 Retrieval adapters—see Lakehouse and warehouse-backed recall (serving layers) above and storage-and-data-planes.md §1. A Tier 2 memory engine may still use a warehouse or lake internally; that storage choice is opaque to the Retrieval SPI (the engine replaces Tier 1 from Astrocyte’ perspective).

flowchart TD
  C["Caller: brain.recall(query)"] --> PL["Policy layer - always active"]
  PL --> T{Which tier?}
  T -->|Tier 1| S["Built-in pipeline: embed, multi-strategy retrieval, fusion, rerank, token budget"]
  T -->|Tier 2| E["Forward to memory engine provider; token budget still enforced"]
  S --> R["Return results"]
  E --> R

Config mapping: Tier 1 is provider_tier: storage (Retrieval SPI + built-in pipeline). Tier 2 is provider_tier: engine (memory engine provider).

Organization data vs user / agent context (banks—not tiers)

Section titled “Organization data vs user / agent context (banks—not tiers)”

Tier 1 vs Tier 2 answers who runs the recall pipeline. Organization-facing corpora (policies, KBs, team playbooks) vs user- or agent-scoped context (episodic traces, preferences, session) is modeled with bank_ids, grants, and optional multi-bank orchestration—not by picking Tier 1 for one and Tier 2 for the other.

Typical pattern:

  • Declare separate banks (e.g. org-policies, team-docs, user-calvin-episodic, agent-session) and grant each principal the right read / write / forget on the banks they should see (access-control.md).
  • Use single-bank recall when only one slice is needed, or multi-bank cascade / parallel so one recall fans out across allowed banks and merges hits (multi-bank-orchestration.md).
  • Tier 1 still means the built-in pipeline issues retrieval against the stores backing those banks; Tier 2 means the engine does the same logical job using its internal storage—either way, which org vs personal vs agent data appears is which banks are in scope, filtered by AuthZ.
flowchart TD
  C["recall(query) + AstrocyteContext — principal, bank_ids, optional multi-bank strategy"]

  C --> POL["Policy layer — always active"]

  POL --> AZ["Access control — only banks this principal may READ"]

  AZ --> SCOPE["In scope: you define the mix — e.g. org corpus, team KB, user episodic, agent session"]

  SCOPE --> MB{Multi-bank?}

  MB -->|no| ONE["Single bank — one slice of memory"]
  MB -->|yes| ORC["Cascade or parallel + cross-bank merge — see multi-bank-orchestration.md"]

  ONE --> TIER{Which tier?}
  ORC --> TIER

  TIER -->|Tier 1| P1["Built-in pipeline — hybrid retrieval + in-pipeline fusion of strategies"]
  TIER -->|Tier 2| P2["Memory engine — retrieval + fusion internal to provider"]

  P1 --> R["Return results"]
  P2 --> R

Cross-bank “fusion” (parallel multi-bank) merges evidence from different banks the principal may read. In-pipeline “fusion” on Tier 1 merges vector / graph / lexical hits within one recall path—the diagram’s diamond and two tier branches are unchanged; this block adds bank scoping above that split.


flowchart TB
  subgraph EDGE["Optional - inbound HTTP API"]
    APIGW[API gateway - TLS, routing, coarse limits, edge JWT or API-key check]
  end

  subgraph APP["Application layer"]
    APP1["AuthN → principal; AuthZ in Astrocyte; agents, MCP, …"]
    APP2["Astrocyte.from_config - retain / recall / reflect"]
  end

  subgraph CORE["Astrocyte core (open source)"]
    API["Public API - retain, recall, reflect, forget, health"]
    POL["Policy layer - always active: homeostasis, barriers, pruning, escalation, observability"]
    PIPE["Built-in intelligence pipeline - when provider_tier = storage"]
    CAP["Capability negotiation - Tier 1 vs Tier 2"]
    SPI["SPIs: Retrieval, Memory Engine, LLM"]
    OTX["Outbound Transport SPI - optional shared HTTP or TLS for LLM and outbound HTTP"]
    MSK["Memory Export Sink SPI - optional warehouse / lake / table-format events"]
  end

  subgraph BACK["Provider backends"]
    VEC["Vector and graph DBs - pgvector, Pinecone, Qdrant, Weaviate, Neo4j, …"]
    ENG["Full engines - Mystique, Mem0, Zep, Letta, Cognee, …"]
    LLM["LLM backends - gateways or aggregators (LiteLLM, Portkey, OpenRouter, …), direct OpenAI / Anthropic / Bedrock / Azure adapters, local embedders, …"]
    WH["Warehouse / lakehouse sinks - Snowflake, BigQuery, Iceberg, Delta, … (analytics)"]
  end

  APP --> API
  API --> POL --> PIPE --> CAP --> SPI
  SPI --> VEC
  SPI --> ENG
  SPI --> LLM
  POL -.-> MSK
  MSK -.-> WH
  OTX -.-> LLM

  APIGW -.-> APP

The dashed link means omit this box when callers embed Astrocyte in-process (library, local agent) with no HTTP edge.

Where an API gateway sits (inbound): An API gateway (Kong, AWS API Gateway, Envoy, Azure APIM, …) is not part of the Astrocyte core. It appears in the diagram as optional inbound edge - in front of your HTTP or gRPC service (or Backend for Frontend (BFF)) that embeds Astrocyte. Typical roles: TLS termination, path routing, coarse rate limits, and sometimes JWT or API-key validation at the edge before requests hit your code. Your service then maps validated identity to an opaque principal on AstrocyteContext (section 4.6). Do not confuse this with LLM gateways (section 5 - outbound to models; see examples there) or outbound transport plugins (section 4.5 - how egress HTTP is built).


4. Memory SPIs, optional sinks, and outbound transport

Section titled “4. Memory SPIs, optional sinks, and outbound transport”

Astrocyte defines three memory-related provider interfaces (Retrieval, Memory Engine, LLM), an optional Memory Export Sink SPI for warehouse / lakehouse / open table formats (durable export—not online retrieval), plus an optional Outbound Transport SPI that does not participate in memory tiers.

Low-level adapters for retrieval backends (see §2 Tier 1 table): dense vector search, optional graph traversal, optional lexical / full-text search. Astrocyte’ built-in pipeline orchestrates these.

  • VectorStore: store_vectors(), search_similar(), delete()
  • GraphStore (optional): store_entities(), store_links(), query_neighbors(), query_paths()
  • DocumentStore (optional): store_document(), get_document(), search_fulltext()

Users can mix and match: one vector store + one graph store + optional document store. The pipeline coordinates across them for hybrid retrieval.

Detailed in provider-spi.md.

High-level interface for full memory engines. The engine handles its own storage, retrieval, and optionally synthesis.

  • Required: retain(), recall(), health(), capabilities()
  • Optional: reflect(), forget(), consolidate()

When a memory engine provider is active, the Retrieval SPI and built-in pipeline are not used.

Detailed in provider-spi.md.

A secondary plugin surface for LLM access. Used by the Astrocyte core for:

  • Built-in pipeline operations (Tier 1): entity extraction, embedding generation, query analysis, reflect synthesis
  • Policy layer (both tiers): PII classification, signal quality scoring
  • Fallback reflect (Tier 2): when a memory engine provider lacks reflect() and fallback_strategy: local_llm

This is not an LLM gateway. It is a narrow internal dependency with two methods: complete() and embed(). Adapters exist for:

  • Unified gateways and aggregators: products that front many models behind one API or control plane — e.g. LiteLLM, Portkey, OpenRouter, Vercel AI Gateway, cloud AI Gateway / router services, or comparable layers — not only LiteLLM.
  • Direct SDKs: OpenAI, Anthropic, Google Gemini, Mistral, Cohere
  • Self-hosted: Any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, TGI) via the OpenAI adapter with custom api_base
  • Local embeddings: Built-in sentence-transformers support (no API cost for embeddings)

Completion and embedding providers can be configured separately - e.g., Claude for reasoning + local models for embeddings. See provider-spi.md section 4 for the full LLM SPI specification and gateway integration patterns.

Scope: Event-oriented adapters that persist governed memory lifecycle data to data warehouses, lakehouses, and open table formats (SQL engines, Parquet, Iceberg, Delta, Hudi, Paimon, …) for analytics, compliance, and ML—not to serve low-latency recall.

  • MemoryExportSink: emit(), optional flush(), health(), optional capabilities()
  • Orthogonal to Tier 1 and Tier 2: sinks do not participate in provider_tier negotiation and are not VectorStore implementations over raw object storage

Wired from the policy / observability path (and aligned with event-hooks.md) after successful operations. Full specification: storage-and-data-planes.md (hub), memory-export-sink.md, and provider-spi.md section 5.

Credential gateways (OneCLI-class products), corporate HTTP proxies, and TLS inspection stacks need to control how outbound HTTP leaves the process - proxies, custom CAs, optional gateway headers. That is not the job of the LLM Provider SPI (which defines complete() / embed()), and not a memory tier.

Astrocyte exposes an optional OutboundTransportProvider interface applied at a single choke point when building HTTP clients for LLM adapters and other outbound HTTP. Users who only need standard environment variables (HTTP_PROXY, HTTPS_PROXY, trust bundles) require no plugin. Full specification: outbound-transport.md and provider-spi.md section 6.

4.6 Authentication (AuthN) and authorization (AuthZ)

Section titled “4.6 Authentication (AuthN) and authorization (AuthZ)”

Authentication (AuthN) - Astrocyte is not an identity provider. Proving identity (OIDC, SAML, API keys, workload identity, sessions) completes outside the framework. The application passes an opaque principal on AstrocyteContext after your middleware or gateway validates credentials (access-control.md §7). Open-source IAMs such as Casdoor fit here: you run Casdoor, validate tokens, map claims to user:… / agent:… strings.

Authorization (AuthZ) - Who may read / write / forget / administer which memory bank is decided by Astrocyte: default declarative grants in config, enforced before pipeline or engine calls. Teams may add an optional AccessPolicyProvider so allow/deny is delegated to remote PDPs (OPA, Cerbos, …) or in-process Casbin via astrocyte-access-policy-* packages; the framework still owns enforcement order and audit events. Full integration patterns: identity-and-external-policy.md.


Astrocyte and LLM gateways (LiteLLM, Portkey, OpenRouter, Vercel AI Gateway, cloud model routers, …) occupy different layers with a narrow overlap:

ConcernLLM gateway / aggregatorAstrocyte
Normalize LLM provider APIsYes (primary job)No
Route completion/embedding requestsYesNo
Track LLM spendYesNo
Normalize memory provider APIsNoYes (primary job)
Built-in memory intelligence pipelineNoYes
Enforce memory governance policiesNoYes
Memory-layer observabilityNoYes
Needs LLM access internallyN/AYes (for pipeline + policies)

How they compose:

flowchart LR
  APIGW["API gateway - optional inbound"]
  A[Agent or app service]
  APIGW -.->|hosted HTTP API| A
  A --> AST[Astrocyte - memory + governance]
  AST --> T1["Tier 1 - retrieval stores (vector / graph / full-text)"]
  AST --> T2[Memory Engine Provider - Tier 2]
  AST --> LLM[LLM Provider - pipeline + policies]
  LLM --> OT["Outbound Transport - optional"]
  LLM --> SDK[Gateway, aggregator, or direct SDK]
  SDK --> UP[Upstream models]

5.1 Deployment options: API gateway placement vs secrets (Vault, OneCLI)

Section titled “5.1 Deployment options: API gateway placement vs secrets (Vault, OneCLI)”

The high-level diagram above collapses inbound and outbound concerns. In practice, teams choose where the northbound API gateway sits relative to Astrocyte. Secret vaults (HashiCorp Vault, Azure Key Vault, AWS Secrets Manager, …) and credential gateways (OneCLI-class products wired through the Outbound Transport SPI) answer different questions: vaults store credentials; OneCLI / outbound transport controls how egress HTTPS is built from a workload. Neither replaces the other.

Option A - API gateway in front of Astrocyte (and usually the app)
Clients (or the app) reach Astrocyte through the same class of edge (Kong, APISIX, Azure APIM, …) as other APIs: separate routes or hosts for app vs memory. The gateway holds its own secrets (TLS, validation keys, policy). The app and Astrocyte each use a vault or workload identity for their credentials. OneCLI / Outbound Transport attaches to egress from Astrocyte (and optionally from the app) toward upstream LLM and HTTP APIs - not between the client and Astrocyte on the memory request path.

flowchart LR
  C[Clients]
  GW[API gateway - Kong / APISIX / APIM]
  APP[Agent / app service]
  AST[Astrocyte service]
  V[(Vault / Key Vault - per workload)]
  OT[Outbound Transport / OneCLI - egress]
  UP[Upstream LLM and HTTP APIs]
  C --> GW
  GW --> APP
  GW --> AST
  V -.->|runtime or deploy| GW
  V -.->|runtime or deploy| APP
  V -.->|runtime or deploy| AST
  AST --> OT --> UP

Option B - API gateway only in front of the agent/app; Astrocyte on a private path
External traffic hits only the app through the gateway. Agents and apps call Astrocyte over the private network (cluster DNS, VNet, service mesh, mTLS) without that northbound gateway in the path. Astrocyte still uses a vault for provider secrets and Outbound Transport / OneCLI for southbound calls to models and SaaS - same as Option A on the egress side.

flowchart LR
  C[Clients]
  GW[API gateway - Kong / APISIX / APIM]
  APP[Agent / app service]
  AST[Astrocyte service]
  V[(Vault / Key Vault - per workload)]
  OT[Outbound Transport / OneCLI - egress]
  UP[Upstream LLM and HTTP APIs]
  C --> GW --> APP
  APP -->|private network - no northbound gateway| AST
  V -.->|runtime or deploy| GW
  V -.->|runtime or deploy| APP
  V -.->|runtime or deploy| AST
  AST --> OT --> UP

Full specification for outbound credential gateways: outbound-transport.md.

Skip API gateway when the agent embeds Astrocyte in-process (no public HTTP edge). API gateway (inbound, your API) is unrelated to LLM gateways (outbound to model APIs).

Key distinction: LLM gateways are stateless pass-through with policy. Astrocyte is stateful intelligence with policy. It owns the memory pipeline (or delegates it to a memory engine provider) and enforces governance. The gateway pattern does not apply - the tripartite synapse pattern does.

Credential gateways vs. LLM gateways: Products that inject API keys into outbound HTTP (OneCLI-class) are outbound transport concerns - they sit under whatever SDK the LLM adapter uses. They do not replace LLM gateways or direct provider adapters; see outbound-transport.md.

LLM gateways vs. multimodal / video / voice APIs: Gateways such as LiteLLM, OpenRouter, Portkey, and Vercel AI Gateway target text (and embedding) model routing. Conversational video (Tavus, HeyGen, D-ID, …) and voice (ElevenLabs, …) products are presentation or modality layers - integrate them next to Astrocyte in your application, not as drop-in LLMProvider implementations unless they expose a compatible chat/embedding HTTP API you configure explicitly. See presentation-layer-and-multimodal-services.md.


6. Relationship to storage backends (Vector DBs, Graph DBs, warehouse / lake serving)

Section titled “6. Relationship to storage backends (Vector DBs, Graph DBs, warehouse / lake serving)”

Storage backends are pluggable infrastructure underneath the Astrocyte pipeline (when Tier 1 + built-in pipeline are active), not a separate integration concern for callers. That includes dedicated vector and graph databases and serving-layer SQL or search APIs over warehouse or lakehouse tables—still behind the same Retrieval SPI (provider-spi.md §1, §2 Tier 1 table).

When a caller does brain.recall("What do we know about Calvin?"), they don’t know or care whether the answer came from a pgvector similarity search, a Neo4j graph traversal, a warehouse vector query, or several strategies fused together. That’s retrieval strategy—it belongs inside the pipeline (either Astrocyte’ built-in or the memory engine provider’s).

Tier 1 (Retrieval providers): The user configures which vector store and optional graph / document stores to use (dedicated DBs or warehouse/lake serving surfaces via adapters). Astrocyte’ built-in pipeline manages them.

# astrocyte.yaml - Tier 1 example
# provider_tier: storage - legacy keyword for Tier 1 (Retrieval SPI + built-in pipeline), not blob storage
provider_tier: storage
vector_store: pgvector
vector_store_config:
connection_url: postgresql://localhost/memories
graph_store: neo4j # optional
graph_store_config:
uri: bolt://localhost:7687

Tier 2 (Memory Engine Providers): The memory engine manages its own storage internally. Users configure database choices through the memory engine’s own config, not through Astrocyte.

# astrocyte.yaml - Tier 2 example
provider_tier: engine
provider: mystique
provider_config:
endpoint: https://mystique.company.com
api_key: ${MYSTIQUE_API_KEY}
# Mystique configures its own pgvector, entity graph, etc. internally

The public API (retain(), recall(), reflect()) is identical regardless of tier or storage backend. Callers code against one surface. The framework and providers handle the rest.


A framework that is just a protocol definition + entry points will be skipped. The Astrocyte core provides standalone value at two levels:

7.1 Intelligence value (built-in pipeline)

Section titled “7.1 Intelligence value (built-in pipeline)”

Users get a fully functional memory system with just astrocyte + astrocyte-pgvector:

CapabilityBuilt-in pipeline (free)
Embedding generationsentence-transformers (local) or API-based
Entity extractionspaCy NER or LLM-based
Semantic retrievalVector similarity via any Tier 1 store
Graph retrievalEntity-link traversal (if graph store configured)
Keyword retrievalBM25 full-text search (if document store configured)
FusionReciprocal rank fusion
RerankingBasic flashrank or cross-encoder
Reflectrecall + LLM synthesis

This is good enough to build real products.

Applies to both tiers:

PolicyValue to every user regardless of backend
PII barrierCatches sensitive data before it reaches any provider
Token budgetsPrevents runaway costs regardless of backend pricing
Unified OTel tracesSwitch providers without rebuilding dashboards
Signal quality scoringPrevent noisy, low-value data from polluting memory
Use-case profilesProduction-ready configs out of the box
Circuit breakersGraceful degradation when backends are unavailable
Rate limitingPrevent runaway agent loops from exhausting resources

Together, intelligence + governance make the framework worth using at any scale.

Beyond intelligence and governance, the framework provides capabilities that no individual memory provider offers:

CapabilityValueDocumentation
Multi-bank orchestrationQuery across personal + team + org banks with cascade/parallel strategiesmulti-bank-orchestration.md
Memory portabilityExport/import memories between providers; break vendor lock-inmemory-portability.md
MCP serverAny MCP-capable agent gets memory without code integrationmcp-server.md
Agent framework middlewareOne integration per framework, works with every provider (N+M, not NxM)agent-framework-middleware.md
Memory lifecycleTTL policies, compliance purge (GDPR/PDPA), legal hold, archival, audit trailmemory-lifecycle.md
AuthZ (access control)Per-bank read/write/forget/admin for principals; enforced in coreaccess-control.md
Event hooksWebhooks and alerts for retain, PII detection, circuit breaker, lifecycle eventsevent-hooks.md
Bank health & utilizationIn-process bank health scores, noisy agent detection, utilization reports, quality trendsmemory-analytics.md
EvaluationBenchmark suites, provider comparison, regression detectionevaluation.md
Data governanceClassification, PII taxonomy, residency, encryption, DLP, compliance profiles (GDPR/HIPAA/PDPA)data-governance.md
Outbound transportOptional plugins for credential gateways and enterprise HTTP/TLS; env-only path without pluginsoutbound-transport.md
AuthN wiring + external AuthZMap IdP claims to principals; optional PDP/Casbin adapters beyond config grantsidentity-and-external-policy.md
Presentation / multimodal (non-LLM API)How Tavus-class video, voice (e.g. ElevenLabs), and related APIs compose beside the LLM SPIpresentation-layer-and-multimodal-services.md
Multimodal LLM (vision/audio in chat)ContentPart, Message extensions, LLMCapabilities, adapter mapping for multi-provider gateways (LiteLLM / OpenRouter–class and similar)multimodal-llm-spi.md

Capabilities inspired by ByteRover (agent-native curation, progressive retrieval) and Hindsight (mental models, utility scoring). All framework-level, provider-agnostic.

InnovationStatusDescriptionDocumentation
Recall cacheImplementedLRU cache by query embedding similarity; 5-10x latency reductioninnovations.md §1.1
Memory hierarchyImplementedFacts → observations → models with layer-weighted fusioninnovations.md §1.2
Utility scoringImplementedPer-memory recency × frequency × relevance × freshness compositeinnovations.md §1.3
Adaptive tiered retrievalImplemented5-tier escalation: cache → fuzzy → BM25 → multi-strategy → agenticinnovations.md §2.1
LLM-curated retainImplementedLLM decides ADD/UPDATE/MERGE/SKIP/DELETE + classifies layerinnovations.md §2.2
Curated recallImplementedPost-retrieval re-scoring by freshness, reliability, salienceinnovations.md §2.3
Progressive retrievalImplementeddetail_level: "titles" for 10x token savingsinnovations.md §2.4
Cross-source fusionImplementedexternal_context for RAG/graph blendinginnovations.md §2.5
Cross-engine routingImplementedAdaptive per-query weights in HybridEngineProviderinnovations.md §2.6

Open-core principle: Every innovation listed above is in the open-source framework. Mystique’s advantage is execution quality (better algorithms for the same operations), not withheld capabilities. See innovations.md for the full split rationale.

These capabilities exist at the framework layer — they apply regardless of which memory provider is active. They are a major reason to use Astrocyte rather than calling a provider directly.


ComponentPackageLicense
Public API, DTOs, policy layerastrocyteApache 2.0
Built-in intelligence pipelineastrocyteApache 2.0
Design docs and principlesastrocyte (this repo)Apache 2.0
Retrieval SPI + Memory Engine SPI + LLM SPI + Outbound Transport SPI + optional AccessPolicy SPIastrocyteApache 2.0
Use-case profilesastrocyteApache 2.0
OTel instrumentationastrocyteApache 2.0
Retrieval providers (Tier 1)
pgvector adapterastrocyte-pgvectorApache 2.0
Pinecone adapterastrocyte-pineconeApache 2.0
Qdrant adapterastrocyte-qdrantApache 2.0
Weaviate adapterastrocyte-weaviateApache 2.0
Neo4j graph adapterastrocyte-neo4jApache 2.0
Memgraph graph adapterastrocyte-memgraphApache 2.0
Memory engine providers (Tier 2)
Mystique memory engine providerastrocyte-mystiqueProprietary
Mem0 memory engine providerastrocyte-mem0Apache 2.0
Zep memory engine providerastrocyte-zepApache 2.0
Letta memory engine providerastrocyte-lettaApache 2.0
Cognee memory engine providerastrocyte-cogneeApache 2.0
LLM providers
LiteLLM adapterastrocyte-litellmApache 2.0
OpenAI direct adapterastrocyte-openaiApache 2.0
Anthropic direct adapterastrocyte-anthropicApache 2.0
Outbound transport
Example: gateway-specific transport adapterastrocyte-transport-{name}Apache 2.0
Memory export sink (warehouse / lake / open tables)
Example: Iceberg / warehouse sink adapterastrocyte-sink-{target}Apache 2.0
Access policy (external PDP)
Example: OPA / Cerbos adaptersastrocyte-access-policy-{name}Apache 2.0
Identity helpers (optional)
Example: web framework → principal wiringastrocyte-identity-{framework}Apache 2.0

Community memory and LLM providers follow the naming convention astrocyte-{provider}. Outbound transport plugins use astrocyte-transport-{name} and the astrocyte.outbound_transports entry point group (see ecosystem-and-packaging.md and outbound-transport.md). Memory export sink packages use astrocyte-sink-{target} and astrocyte.memory_export_sinks (see memory-export-sink.md and ecosystem-and-packaging.md §2.6 / §3.5). External access policy plugins use astrocyte-access-policy-{name} and astrocyte.access_policies (see identity-and-external-policy.md).


The two-tier architecture creates a natural upgrade path:

StageStackCost
Getting startedastrocyte + astrocyte-pgvectorFree
Add graphastrocyte + astrocyte-pgvector + astrocyte-neo4jFree
Want better retrievalastrocyte + astrocyte-mystiquePaid

What makes Mystique worth paying for (beyond the free built-in pipeline):

CapabilityAstrocyte built-in (free)Mystique (premium)
Semantic retrievalBasic vector similarityHNSW-tuned with partial indexes per fact type
Graph retrievalBasic entity-link traversalSpreading activation with decay
FusionStandard RRFTuned RRF + cross-encoder reranking
Reflectrecall + generic LLM synthesisAgentic multi-turn with tool use
DispositionsNot supportedNative personality modulation (skepticism, literalism, empathy)
ConsolidationBasic dedup + archiveQuality-based loss functions, observation formation
Temporal retrievalDate range filteringTemporal proximity weighting, temporal link expansion
Entity resolutionBasic NER + exact dedupCanonical resolution with co-occurrence tracking
ScaleSingle-nodeMulti-tenant, distributed, production-grade

The free tier is good enough to build real products. The premium tier is materially better in ways that matter at scale.


Each framework layer maps to specific neuroscience principles from design-principles.md:

Framework LayerPrinciples Applied
Public API (stable, mediating)P2: Tripartite synapse
Built-in pipeline (intelligence layer)P1: Fast signaling (the pipeline) vs. slow regulation (the policies)
Policy: homeostasisP3: Keep the milieu within bounds
Policy: barriersP6: BBB / boundary maintenance
Policy: pruning / signal qualityP7: Structured forgetting
Policy: escalation / circuit breakersP8: Inflammation with de-escalation
Policy: observabilityP9: Observable state
Capability negotiation (tier selection)P5: Metabolic coupling (adapt to supply)
Use-case profilesP4: Heterogeneity (specialized subtypes)
Retrieval SPI (pluggable backends)P6: Barrier maintenance (what crosses boundaries)
Outbound Transport SPI (optional proxy / CA path)P6: Selective control of what crosses the network boundary
Multi-bank orchestrationP4: Heterogeneity (specialized subtypes per region)
Memory lifecycle (TTL, archival, pruning)P7: Structured forgetting / phagocytosis
AuthZ (access control)P6: Barrier maintenance (identity boundaries)
Optional external PDP (AccessPolicyProvider)P6: Same barrier - delegated decision, framework-enforced audit
Bank health & utilization (memory-analytics.md)P9: Observable state (system-level health)
Event hooks / escalation alertsP8: Inflammation with controlled channels
Data governance (classification, DLP, residency)P6: BBB - selective, actively maintained boundary

The neuroscience principles are not metaphors in this framework. They are enforcement points with code behind them.