Memory export sink (warehouse, lakehouse, open table formats)
This document specifies the durable export / analytical persistence plane for Astrocyte: optional, orthogonal to Tier 1 retrieval (VectorStore / GraphStore / DocumentStore) and to Tier 2 memory engines.
Purpose: stream or batch governed memory events and snapshots into data warehouses, lakehouses, and open table formats (for example Apache Iceberg, Delta Lake, Apache Hudi, Apache Paimon; Apache Parquet as the on-disk columnar layout; SQL-accessible tables in Snowflake, BigQuery, Redshift, Fabric, Doris, …) for BI, compliance, model training, and historical analysis.
Non-goals: This plane does not replace brain.recall() online path. It does not define similarity search over lake files by itself. Offline or batch SQL over exported tables is normal; agent-time hybrid recall still needs a Tier 1 (or Tier 2) path—often implemented against a serving or query API above the same lake/warehouse (storage-and-data-planes.md §1). Downstream jobs (SQL, Spark, Flink, Trino) consume sink data for analytics; they are not a substitute for the export SPI unless you explicitly design a different product.
Related: storage-and-data-planes.md (hub), architecture-framework.md §2 (operational vs analytical persistence), provider-spi.md §5 (Memory Export Sink SPI), event-hooks.md (wiring from hooks), memory-portability.md (AMA export vs sink events), data-governance.md (classification follows content into sinks only when explicitly allowed).
1. Design principles
Section titled “1. Design principles”- Orthogonal to memory tiers — Sinks run after policy succeeds on the operational path (or on explicit export), like webhooks in
event-hooks.md. They are not a third memory tier: they do not participate inprovider_tier: storagevsenginenegotiation. - Event-oriented, not retrieval-shaped — The protocol is
emit(event)/ optionalflush(), notsearch_similar(). Vector search in Snowflake/BigQuery belongs in a Tier 1 adapter if you want the built-in pipeline to call it; this sink is for durable analytics rows/files. - At-least-once by default — Implementations should accept idempotent keys (
event_id,(bank_id, sequence), pattern) so warehouses and Iceberg merges dedupe cleanly. - Governance-aware — Payloads should respect redaction and classification: sinks must not widen exposure beyond what
data-governance.mdand policy allow (embeddings and raw text often off by default). - Portable contract — Same SPI shape in Python and Rust implementations of the framework (
implementation-language-strategy.md).
2. When to use which integration
Section titled “2. When to use which integration”| Goal | Recommended approach |
|---|---|
| Agent-time hybrid recall with warehouse-native vector SQL, lake query engine, or OLAP/BFF façade | Tier 1 VectorStore / GraphStore / DocumentStore adapter against that query/search API—not the sink. |
| Long-term tables, BI, training exports, compliance timelines | Memory export sink (this document) + optional ETL from the sink’s staging layout. |
| Both analytics export and online recall against data that ultimately lives in the lake/warehouse | Memory export sink and a separate Tier 1 integration; shared storage is optional—shared contracts for governance are not. |
| One vendor owns the full pipeline and storage | Tier 2 EngineProvider; the engine may use a lakehouse internally. |
| One-off migration between providers | AMA export/import (memory-portability.md), not necessarily a running sink. |
3. Event taxonomy (normative sketch)
Section titled “3. Event taxonomy (normative sketch)”Sinks consume a versioned discriminated union of events. The exact names belong in shared DTOs (astrocyte.types); this list is the design contract.
| Event kind (illustrative) | When emitted | Typical payload fields |
|---|---|---|
RetainCommitted | After retain succeeds | event_id, bank_id, principal, memory_id, occurred_at, fact_type, tags, content hash or redacted snippet length, classification labels |
ForgetApplied | After forget succeeds | event_id, bank_id, memory_ids, reason |
RecallObserved (optional) | After recall (sampled or config-gated) | aggregates only: result_count, latency_ms, top_score — avoid full queries |
ReflectObserved (optional) | After reflect | similar aggregate discipline |
BankLifecycle | Bank created/deleted | bank_id, action |
ExportCompleted | After AMA or snapshot export | path URI, memory_count, checksum |
GovernanceSignal (optional) | PII action, rate limit, hold | reference to policy codes — no raw secrets |
Implementations may filter event kinds via sink capabilities() or config.
4. Physical targets
Section titled “4. Physical targets”Packages implement the same SPI against:
- Object storage + table catalog — Iceberg/Delta/Hudi/Paimon tables; Parquet as file format.
- Warehouse load — Snowpipe-style ingest, BigQuery
LOAD, RedshiftCOPY, JDBC batch insert into curated tables. - Streaming bus — Kafka/Pulsar topic consumed by Flink/Spark into the lake; the Astrocyte package may only publish to the bus.
The framework does not mandate one catalog or SQL dialect; adapters own connection config (Hive metastore, Glue, Unity, Polaris, …).
5. Configuration (illustrative YAML)
Section titled “5. Configuration (illustrative YAML)”memory_export_sinks: - provider: iceberg config: catalog_uri: thrift://metastore:9083 database: astrocyte table: memory_events # embeddings: false # default: do not replicate vectors to lakeMultiple sinks are allowed (for example staging + compliance archive).
6. Implementation status
Section titled “6. Implementation status”The Memory Export Sink SPI is a documented framework extension. Core astrocyte-py / astrocyte-rs wiring (discovery, hook integration, conformance tests) may land incrementally; until then, deployers can use event-hooks.md webhooks toward custom ingestors that honor this event shape.
7. Naming and packaging
Section titled “7. Naming and packaging”Community packages SHOULD use the prefix astrocyte-sink- (for example astrocyte-sink-iceberg, astrocyte-sink-kafka) — see ecosystem-and-packaging.md.
Register implementations under the entry point group astrocyte.memory_export_sinks (specified alongside other groups in that document).