Skip to content

Configuration reference

Complete reference for astrocyte.yaml — every key, type, default, and valid value.

Astrocyte loads configuration with this merge order (last wins):

  1. Compliance profile (compliance_profile: pdpa) — sets barriers, lifecycle, access control, DLP
  2. Profile (profile: personal) — sets defaults, homeostasis, signal quality
  3. Your astrocyte.yaml — overrides everything above

All string values support ${ENV_VAR} substitution — unresolved vars are left as-is.


KeyTypeDefaultDescription
provider_tier"storage" | "engine""engine"Tier 1 (storage) uses Astrocyte’s built-in pipeline with your own backends. Tier 2 (engine) delegates to a full memory engine (Mystique, Mem0, Zep, etc.)
profilestring | nullnullBuilt-in profile name (minimal, personal, research, coding, support) or path to custom profile file (./my-profile.yaml)
compliance_profilestring | nullnullPre-built compliance preset: gdpr, hipaa, or pdpa. Sets barriers, lifecycle, access control, and DLP automatically
fallback_strategy"error" | "local_llm" | "degrade""error"How to handle provider failures

Used when provider_tier: storage.

KeyTypeDefaultDescription
vector_storestring | nullnullVector store provider: in_memory, pgvector, qdrant, etc.
vector_store_configdict | nullnullProvider-specific settings (connection URL, dimensions, etc.)
graph_storestring | nullnullGraph store provider: neo4j, etc.
graph_store_configdict | nullnullProvider-specific settings
document_storestring | nullnullDocument store provider: elasticsearch, etc.
document_store_configdict | nullnullProvider-specific settings
# Example: pgvector + Neo4j hybrid
provider_tier: storage
vector_store: pgvector
vector_store_config:
connection_url: ${DATABASE_URL}
embedding_dimensions: 1536
bootstrap_schema: true
graph_store: neo4j
graph_store_config:
uri: bolt://localhost:7687
auth_user: neo4j
auth_password: ${NEO4J_PASSWORD}

Used when provider_tier: engine.

KeyTypeDefaultDescription
providerstring | nullnullEngine provider name: mystique, mem0, zep, etc.
provider_configdict | nullnullEngine-specific settings (endpoint, API key, etc.)
provider_tier: engine
provider: mystique
provider_config:
endpoint: ${MYSTIQUE_ENDPOINT}
api_key: ${MYSTIQUE_API_KEY}
KeyTypeDefaultDescription
llm_providerstring | nullnullLLM provider for reflect and extraction: openai, anthropic, litellm, mock, etc.
llm_provider_configdict | nullnullAPI key, model name, endpoint, etc.
embedding_providerstring | nullnullSeparate embedding provider (if different from LLM)
embedding_provider_configdict | nullnullEmbedding-specific settings (dimensions, model, etc.)
llm_provider: openai
llm_provider_config:
api_key: ${OPENAI_API_KEY}
model: gpt-4o-mini
embedding_provider: openai
embedding_provider_config:
model: text-embedding-3-small
dimensions: 1536

Rate limits, quotas, and token budgets.

homeostasis:
recall_max_tokens: 4096
reflect_max_tokens: 8192
retain_max_content_bytes: 51200
rate_limits:
retain_per_minute: 60
recall_per_minute: 120
reflect_per_minute: 30
global_per_minute: null # optional global cap
quotas:
retain_per_day: null # optional daily cap
reflect_per_day: null
KeyTypeDefaultDescription
recall_max_tokensint | nullnullMax tokens per recall operation
reflect_max_tokensint | nullnullMax tokens per reflect operation
retain_max_content_bytesint | nullnullMax bytes per retain operation
rate_limits.retain_per_minuteint | nullnullMax retain operations per minute
rate_limits.recall_per_minuteint | nullnullMax recall operations per minute
rate_limits.reflect_per_minuteint | nullnullMax reflect operations per minute
rate_limits.global_per_minuteint | nullnullGlobal rate limit across all operations
quotas.retain_per_dayint | nullnullMax retains per 24 hours
quotas.reflect_per_dayint | nullnullMax reflects per 24 hours

Safety controls applied on every retain operation.

KeyTypeDefaultDescription
modestring"regex"Detection mode: regex, ner, llm, rules_then_llm, disabled
actionstring"redact"What to do when PII is found: redact, reject, warn
countrieslist[string] | nullnullCountry-specific patterns: SG (NRIC), IN (Aadhaar), GB (NINO), US (SSN), AU (TFN), CA, JP, CN, DE, FR, IT, ES
patternslist[dict] | nullnullCustom regex patterns: [{type: "custom_id", pattern: "\\d{8}"}]
type_overridesdict | nullnullOverride action per PII type: {credit_card: {action: reject}}
barriers:
pii:
mode: rules_then_llm
action: redact
countries: [SG, IN, GB, US]
type_overrides:
credit_card: { action: reject }
name: { action: warn }
KeyTypeDefaultDescription
max_content_lengthint50000Reject content over this many bytes
reject_empty_contentbooltrueReject empty or whitespace-only content
reject_binary_contentbooltrueReject non-text content
allowed_content_typeslist[string] | nullnullOptional content type whitelist
KeyTypeDefaultDescription
blocked_keyslist[string]["api_key", "password", "token", "secret"]Keys to scrub from metadata
max_metadata_size_bytesint4096Max metadata size in bytes

Deduplication and noise detection.

KeyTypeDefaultDescription
enabledbooltrueEnable duplicate detection
similarity_thresholdfloat0.95Cosine similarity threshold for duplicates (0–1)
actionstring"skip"What to do with duplicates: skip, warn, update
KeyTypeDefaultDescription
enabledbooltrueEnable noisy bank detection
retain_spike_multiplierfloat5.0Retain rate spike threshold
min_avg_content_lengthint20Minimum average chunk length (chars)
max_dedup_ratefloat0.8Max dedup rate before flagging
actionstring"warn"Action on noisy bank: warn, throttle, reject

Circuit breaker and degraded mode.

KeyTypeDefaultDescription
degraded_modestring"empty_recall"Fallback on failure: empty_recall, error, cache
circuit_breaker.failure_thresholdint5Failures before circuit opens
circuit_breaker.recovery_timeout_secondsfloat30.0Seconds before half-open attempt
circuit_breaker.half_open_max_callsint2Max calls in half-open state

Similarity-based caching for repeated queries.

KeyTypeDefaultDescription
enabledboolfalseEnable recall cache
similarity_thresholdfloat0.95Cache hit threshold
max_entriesint256Max cached results
ttl_secondsfloat300.0Cache entry lifetime (seconds)

Progressive retrieval strategy — tries cheaper/faster tiers first.

KeyTypeDefaultDescription
enabledboolfalseEnable tiered retrieval
min_resultsint3Min results per tier before advancing
min_scorefloat0.3Min relevance score threshold
max_tierint3Maximum tier (0–4)
full_recall"pipeline" | "hybrid""pipeline"Recall path for tier 3+. hybrid requires a hybrid engine provider

Structured truth precedence — labels fused hits for synthesis.

KeyTypeDefaultDescription
enabledboolfalseEnable recall authority
rules_inlinestring | nullnullInline authority rules
rules_pathstring | nullnullPath to authority rules file
tierslist[]Precedence tiers: [{id: "primary", priority: 1, label: "Verified"}]
tier_by_bankdict{}Map bank IDs to tier IDs: {bank-1: "primary"}
apply_to_reflectbooltrueInject authority context into reflect prompts
recall_authority:
enabled: true
tiers:
- id: primary
priority: 1
label: "Verified sources"
- id: secondary
priority: 2
label: "Inferred knowledge"
tier_by_bank:
verified-bank: primary
inferred-bank: secondary
apply_to_reflect: true

LLM-scored selective retention — only stores content above an importance threshold.

KeyTypeDefaultDescription
enabledboolfalseEnable curated retain
modelstring | nullnullLLM model for importance scoring
context_recall_limitint5Max context items for scoring

Multi-factor ranking — blends recency, reliability, salience, and similarity.

KeyTypeDefaultDescription
enabledboolfalseEnable curated recall
freshness_weightfloat0.3Recency bonus weight
reliability_weightfloat0.2Authority/source weight
salience_weightfloat0.2Relevance/importance weight
original_score_weightfloat0.3Vector similarity weight
freshness_half_life_daysfloat30.0Decay curve for recency
min_scorefloat | nullnullMin final score threshold

KeyTypeDefaultDescription
enabledboolfalseEnable ACL enforcement
default_policystring"owner_only"Default when no grants match: owner_only, open, deny

Identity-driven bank resolution.

KeyTypeDefaultDescription
auto_resolve_banksboolfalseAuto-create banks from principal
user_bank_prefixstring"user-"Prefix for user-scoped banks
agent_bank_prefixstring"agent-"Prefix for agent-scoped banks
service_bank_prefixstring"service-"Prefix for service-scoped banks
resolver"convention" | "config" | "custom" | nullnullBank resolution strategy
obo_enabledboolfalseEnable on-behalf-of permission intersection

Top-level access grants (merged with per-bank banks.*.access).

access_grants:
- bank_id: "shared-*"
principal: "agent:support-bot"
permissions: [read, write]
- bank_id: "user-alice"
principal: "user:alice"
permissions: [read, write, forget, admin]
FieldTypeDescription
bank_idstringBank ID or glob pattern (* for all)
principalstringPrincipal: user:X, agent:X, service:X, or *
permissionslist[string]Permissions: read, write, forget, admin, *

Data Loss Prevention — output scanning for PII in recall and reflect results.

KeyTypeDefaultDescription
scan_recall_outputboolfalseScan recall results for PII
scan_reflect_outputboolfalseScan reflect output for PII
output_pii_actionstring"warn"Action on detected PII: redact, reject, warn

Automatic memory archival and deletion based on age and activity.

KeyTypeDefaultDescription
enabledboolfalseEnable lifecycle management
ttl.archive_after_daysint90Archive if not recalled in N days
ttl.delete_after_daysint365Delete if older than N days
ttl.exempt_tagslist[string] | nullnullTags that skip TTL (e.g. pinned, compliance)
ttl.fact_type_overridesdict | nullnullOverride archive_after_days by fact type: {world: 180, experience: null}
lifecycle:
enabled: true
ttl:
archive_after_days: 90
delete_after_days: 365
exempt_tags: [pinned, compliance]
fact_type_overrides:
world: 180
experience: null # never auto-archive

Per-profile reasoning defaults — affect reflect synthesis behavior.

KeyTypeDefaultDescription
skepticismint3How critical to source (1–5)
literalismint3How literal vs. interpretive (1–5)
empathyint3How empathetic in synthesis (1–5)
preferred_fact_typeslist[string] | nullnullPreference order: [experience, world, observation]
tagslist[string] | nullnullDefault tags for all retained content

KeyTypeDefaultDescription
otel_enabledboolfalseEnable OpenTelemetry spans
prometheus_enabledboolfalseEnable Prometheus metrics
log_levelstring"info"Log level: debug, info, warn, error

MCP (Model Context Protocol) server settings — used by astrocyte.integrations.mcp.

KeyTypeDefaultDescription
default_bank_idstring | nullnullDefault bank for MCP calls
expose_reflectbooltrueAllow reflect via MCP
expose_forgetboolfalseAllow forget via MCP
max_results_limitint50Max items returned per request
principalstring | nullnullPrincipal for MCP operations

KeyTypeDefaultDescription
mip_config_pathstring | nullnullPath to MIP routing rules file (./mip.yaml)

See Memory Intent Protocol for the full MIP DSL — match operators, actions, override hierarchy, and intent policy.


Per-bank overrides. Each key is a bank ID. Any top-level section can be overridden per bank.

banks:
sensitive-bank:
profile: research
barriers:
pii:
mode: rules_then_llm
action: reject
homeostasis:
recall_max_tokens: 2048
rate_limits:
recall_per_minute: 60
signal_quality:
dedup:
similarity_threshold: 0.90
access:
- bank_id: sensitive-bank
principal: "agent:analyst"
permissions: [read, write]
KeyTypeDefaultDescription
profilestring | nullnullOverride profile for this bank
accesslist[dict] | nullnullBank-specific access grants
homeostasisHomeostasisConfig | nullnullOverride homeostasis settings
barriersBarrierConfig | nullnullOverride barrier settings
signal_qualitySignalQualityConfig | nullnullOverride signal quality settings

External data source definitions for ingestion. See poll ingest guide for webhook, stream, and poll setup.

sources:
my-webhook:
type: webhook
extraction_profile: builtin_text
target_bank: webhook-data
auth:
type: hmac_sha256
secret: ${WEBHOOK_SECRET}
github-issues:
type: poll
driver: github
path: owner/repo
interval_seconds: 300
target_bank: github-issues
auth:
token: ${GITHUB_PAT}
type: bearer
redis-events:
type: stream
driver: redis
url: redis://localhost:6379
topic: my-stream
consumer_group: astrocyte-group
target_bank: events
external-api:
type: proxy
url: "https://api.example.com/search?q={query}"
target_bank: external
KeyTypeDescription
typestringSource type: webhook, stream, poll / api_poll, proxy
driverstring | nullDriver name: github, redis, kafka
extraction_profilestring | nullExtraction profile name for ingested content
target_bankstring | nullDestination bank ID
target_bank_templatestring | nullTemplate: "bank-{source_id}"
authdict | nullAuth config (type-specific: hmac_sha256, bearer, etc.)
pathstring | nullSource-specific path (e.g. owner/repo for GitHub)
urlstring | nullSource URL (Redis URL, Kafka bootstrap servers, proxy endpoint)
topicstring | nullStream topic or Redis stream key
consumer_groupstring | nullConsumer group name
interval_secondsint | nullPoll interval (min 60 for GitHub)
recall_methodstring | nullProxy only: GET (default) or POST
recall_bodydict | nullProxy POST only: request body template

Registered agents with bank access and rate hints.

agents:
support-bot:
principal: "agent:support-bot"
default_bank: shared-support
banks: [shared-support, team-*]
permissions: [read, write]
max_retain_per_minute: 60
max_recall_per_minute: 120
KeyTypeDefaultDescription
principalstring | nullnullAgent principal (e.g. agent:my-bot)
bankslist[string] | nullnullAllowed bank IDs (glob patterns supported)
allowed_bankslist[string] | nullnullAlias for banks
default_bankstring | nullnullDefault bank when not specified
permissionslist[string] | nullnullDeclared permissions (documentation/validation)
max_retain_per_minuteint | nullnullPer-agent retain rate hint
max_recall_per_minuteint | nullnullPer-agent recall rate hint

Reusable extraction configurations for ingestion sources.

extraction_profiles:
conversation:
chunking_strategy: dialogue
entity_extraction: llm
content_type: text/plain
chunk_size: 512
fact_type: experience
metadata_mapping:
speaker: "$.participant_name"
tag_rules:
- match: { source: slack }
tags: [slack, chat]
KeyTypeDefaultDescription
content_typestring | nullnullExpected content type
chunking_strategystring | nullnullStrategy: sentence, paragraph, fixed, dialogue
chunk_sizeint | nullnullMax characters per chunk
entity_extractionbool | string | nullnullExtract entities: true, false, ner, llm
fact_typestring | nullnullDefault fact type: world, experience, observation
authority_tierstring | nullnullRecall authority tier ID (overrides recall_authority.tier_by_bank)
metadata_mappingdict | nullnullMap source fields to metadata keys
tag_ruleslist[dict] | nullnullGenerate tags from metadata patterns

Built-in profiles: builtin_text and builtin_conversation.


Standalone gateway settings — ignored in library mode.

deployment:
mode: standalone
host: "0.0.0.0"
port: 8000
workers: 4
cors_origins:
- "https://example.com"
- "http://localhost:3000"
tls:
cert_path: /path/to/cert.pem
key_path: /path/to/key.pem
KeyTypeDefaultDescription
mode"library" | "standalone" | "plugin""library"Deployment mode
hoststring | nullnullBind address (standalone only)
portint | nullnullPort (standalone only)
workersint | nullnullWorker processes (standalone only)
cors_originslist[string] | nullnullCORS allowed origins
tls.cert_pathstring | nullnullTLS certificate path
tls.key_pathstring | nullnullTLS private key path

Pre-built compliance presets that configure barriers, lifecycle, access control, and DLP. Set compliance_profile at the top level — your explicit config overrides any values the profile sets.

ProfilePII modePII actionLifecycleAccess defaultDLP
pdparules_then_llmredact5-year retentionowner_onlyReflect output scanned
gdprrules_then_llmredact2-year retentiondenyReflect output scanned
hipaarules_then_llmreject7-year retentiondenyRecall + reflect scanned

Any string value in astrocyte.yaml can reference environment variables with ${VAR_NAME}:

vector_store_config:
connection_url: ${DATABASE_URL}
llm_provider_config:
api_key: ${OPENAI_API_KEY}
sources:
my-webhook:
auth:
secret: ${WEBHOOK_SECRET}

Unresolved variables (not set in the environment) are left as the literal string ${VAR_NAME}.


provider_tier: storage
vector_store: in_memory
llm_provider: mock
barriers:
pii:
mode: disabled

Production (pgvector + OpenAI + PDPA compliance)

Section titled “Production (pgvector + OpenAI + PDPA compliance)”
profile: personal
provider_tier: storage
vector_store: pgvector
vector_store_config:
connection_url: ${DATABASE_URL}
llm_provider: openai
llm_provider_config:
api_key: ${OPENAI_API_KEY}
model: gpt-4o-mini
compliance_profile: pdpa
mip_config_path: ./mip.yaml
lifecycle:
enabled: true
ttl:
archive_after_days: 90
delete_after_days: 365
access_control:
enabled: true
default_policy: owner_only