Complete reference for astrocyte.yaml — every key, type, default, and valid value.
Astrocyte loads configuration with this merge order (last wins):
Compliance profile (compliance_profile: pdpa) — sets barriers, lifecycle, access control, DLP
Profile (profile: personal) — sets defaults, homeostasis, signal quality
Your astrocyte.yaml — overrides everything above
All string values support ${ENV_VAR} substitution — unresolved vars are left as-is.
Key Type Default Description provider_tier"storage" | "engine""engine"storage uses Astrocyte’s built-in pipeline with your own storage adapters (vector store, wiki store, etc.). engine delegates the entire pipeline to a full memory engine (Mystique, Mem0, Zep, etc.)profilestring | null nullBuilt-in profile name (minimal, personal, research, coding, support) or path to custom profile file (./my-profile.yaml) compliance_profilestring | null nullPre-built compliance preset: gdpr, hipaa, or pdpa. Sets barriers, lifecycle, access control, and DLP automatically fallback_strategy"error" | "local_llm" | "degrade""error"How to handle provider failures
Used when provider_tier: storage.
Key Type Default Description vector_storestring | null nullVector store provider: in_memory, postgres, qdrant, etc. vector_store_configdict | null nullProvider-specific settings (connection URL, dimensions, etc.) graph_storestring | null nullGraph store provider: neo4j, or in_memory for testing. Section-level graph traversal (entity bridging, causal links) runs over built-in flat SQL tables without a graph store adapter. graph_store_configdict | null nullProvider-specific settings document_storestring | null nullDocument store provider: postgres (pg_tsvector + GIN + BM25, same adapter as vector_store: postgres), elasticsearch, etc. document_store_configdict | null nullProvider-specific settings
# Example: pgvector + Neo4j hybrid
embedding_dimensions : 1536
uri : bolt://localhost:7687
password : ${NEO4J_PASSWORD}
Used when provider_tier: engine.
Key Type Default Description providerstring | null nullEngine provider name: mystique, mem0, zep, etc. provider_configdict | null nullEngine-specific settings (endpoint, API key, etc.)
endpoint : ${MYSTIQUE_ENDPOINT}
api_key : ${MYSTIQUE_API_KEY}
Key Type Default Description llm_providerstring | null nullLLM provider for reflect and extraction: openai, anthropic, litellm, mock, etc. llm_provider_configdict | null nullAPI key, model name, endpoint, etc. embedding_providerstring | null nullSeparate embedding provider (if different from LLM) embedding_provider_configdict | null nullEmbedding-specific settings (dimensions, model, etc.)
api_key : ${OPENAI_API_KEY}
# The OpenAI provider handles both chat completions and embeddings.
# Use astrocyte-llm-litellm for Anthropic, Bedrock, Vertex, Ollama, and other providers.
Rate limits, quotas, and token budgets.
retain_max_content_bytes : 51200
global_per_minute : null # optional global cap
retain_per_day : null # optional daily cap
Key Type Default Description recall_max_tokensint | null nullMax tokens per recall operation reflect_max_tokensint | null nullMax tokens per reflect operation retain_max_content_bytesint | null nullMax bytes per retain operation rate_limits.retain_per_minuteint | null nullMax retain operations per minute rate_limits.recall_per_minuteint | null nullMax recall operations per minute rate_limits.reflect_per_minuteint | null nullMax reflect operations per minute rate_limits.global_per_minuteint | null nullGlobal rate limit across all operations quotas.retain_per_dayint | null nullMax retains per 24 hours quotas.reflect_per_dayint | null nullMax reflects per 24 hours
Safety controls applied on every retain operation.
Key Type Default Description modestring "regex"Detection mode: regex, ner, llm, rules_then_llm, disabled actionstring "redact"What to do when PII is found: redact, reject, warn countrieslist[string] | null nullCountry-specific patterns: SG (NRIC), IN (Aadhaar), GB (NINO), US (SSN), AU (TFN), CA, JP, CN, DE, FR, IT, ES patternslist[dict] | null nullCustom regex patterns: [{type: "custom_id", pattern: "\\d{8}"}] type_overridesdict | null nullOverride action per PII type: {credit_card: {action: reject}}
countries : [ SG , IN , GB , US ]
credit_card : { action : reject }
Key Type Default Description max_content_lengthint 50000Reject content over this many bytes reject_empty_contentbool trueReject empty or whitespace-only content reject_binary_contentbool trueReject non-text content allowed_content_typeslist[string] | null nullOptional content type whitelist
Key Type Default Description blocked_keyslist[string] ["api_key", "password", "token", "secret"]Keys to scrub from metadata max_metadata_size_bytesint 4096Max metadata size in bytes
Deduplication and noise detection.
Key Type Default Description enabledbool trueEnable duplicate detection similarity_thresholdfloat 0.95Cosine similarity threshold for duplicates (0–1) actionstring "skip"What to do with duplicates: skip, warn, update
Key Type Default Description enabledbool trueEnable noisy bank detection retain_spike_multiplierfloat 5.0Retain rate spike threshold min_avg_content_lengthint 20Minimum average chunk length (chars) max_dedup_ratefloat 0.8Max dedup rate before flagging actionstring "warn"Action on noisy bank: warn, throttle, reject
Circuit breaker and degraded mode.
Key Type Default Description degraded_modestring "empty_recall"Fallback on failure: empty_recall, error, cache circuit_breaker.failure_thresholdint 5Failures before circuit opens circuit_breaker.recovery_timeout_secondsfloat 30.0Seconds before half-open attempt circuit_breaker.half_open_max_callsint 2Max calls in half-open state
Similarity-based caching for repeated queries.
Key Type Default Description enabledbool falseEnable recall cache similarity_thresholdfloat 0.95Cache hit threshold max_entriesint 256Max cached results ttl_secondsfloat 300.0Cache entry lifetime (seconds)
Progressive retrieval strategy — tries cheaper/faster tiers first.
Key Type Default Description enabledbool falseEnable tiered retrieval min_resultsint 3Min results per tier before advancing min_scorefloat 0.3Min relevance score threshold max_tierint 3Maximum tier (0–4) full_recall"pipeline" | "hybrid""pipeline"Recall path for tier 3+. hybrid requires a hybrid engine provider
Defenses against adversarial / false-premise / prompt-injection-shaped queries. All knobs default to off so the framework stays accuracy-first; opt in selectively per preset. See benchmark-presets.md for measured impact per knob and the post-mortem on which combinations help vs hurt.
Key Type Default Description abstention_enabledbool falseWhen true, recall short-circuits to a stable “insufficient evidence” reply if the top semantic score falls below abstention_floor. abstention_floorfloat 0.2Score threshold below which abstention_enabled fires. Lower values trigger more aggressively. abstention_floor_intent_onlybool falseWhen true, only fire the abstention floor for queries the intent classifier returns as EXPLORATORY or UNKNOWN (i.e. NOT a confidently well-formed FACTUAL / TEMPORAL / RELATIONAL / COMPARATIVE / PROCEDURAL query). Empirically the flat floor cratered single-hop (-10pp) and temporal (-10pp) on config-hindsight-balanced because legitimate factual / temporal queries occasionally have top scores below 0.2; intent-gating recovers those points while keeping the adversarial floor for query shapes where false-premise actually hides. See pipeline/query_intent.py for the classifier. premise_verification_enabledbool falseWhen true, decompose multi-claim queries into atomic premises and LLM-verify each before retrieval. Adds one LLM call per query; can interfere with rate limits at high concurrency. adversarial_prompt_enabledbool falseWhen true, prepend an adversarial-defense rule to the synthesis system prompt (“if the question’s premise is false, say so explicitly”). Cheap defense-in-depth — no extra LLM calls.
abstention_floor_intent_only : true # v0.11.0+
premise_verification_enabled : false
adversarial_prompt_enabled : true
Structured truth precedence — labels fused hits for synthesis.
Key Type Default Description enabledbool falseEnable recall authority rules_inlinestring | null nullInline authority rules rules_pathstring | null nullPath to authority rules file tierslist []Precedence tiers: [{id: "primary", priority: 1, label: "Verified"}] tier_by_bankdict {}Map bank IDs to tier IDs: {bank-1: "primary"} apply_to_reflectbool trueInject authority context into reflect prompts
label : " Verified sources "
label : " Inferred knowledge "
LLM-scored selective retention — only stores content above an importance threshold.
Key Type Default Description enabledbool falseEnable curated retain modelstring | null nullLLM model for importance scoring context_recall_limitint 5Max context items for scoring
Multi-factor ranking — blends recency, reliability, salience, and similarity.
Key Type Default Description enabledbool falseEnable curated recall freshness_weightfloat 0.3Recency bonus weight reliability_weightfloat 0.2Authority/source weight salience_weightfloat 0.2Relevance/importance weight original_score_weightfloat 0.3Vector similarity weight freshness_half_life_daysfloat 30.0Decay curve for recency min_scorefloat | null nullMin final score threshold
Key Type Default Description enabledbool falseEnable ACL enforcement default_policystring "owner_only"Default when no grants match: owner_only, open, deny
Identity-driven bank resolution.
Key Type Default Description auto_resolve_banksbool falseAuto-create banks from principal user_bank_prefixstring "user-"Prefix for user-scoped banks agent_bank_prefixstring "agent-"Prefix for agent-scoped banks service_bank_prefixstring "service-"Prefix for service-scoped banks resolver"convention" | "config" | "custom" | nullnullBank resolution strategy obo_enabledbool falseEnable on-behalf-of permission intersection
Top-level access grants (merged with per-bank banks.*.access).
principal : " agent:support-bot "
permissions : [ read , write ]
permissions : [ read , write , forget , admin ]
Field Type Description bank_idstring Bank ID or glob pattern (* for all) principalstring Principal: user:X, agent:X, service:X, or * permissionslist[string] Permissions: read, write, forget, admin, *
Data Loss Prevention — output scanning for PII in recall and reflect results.
Key Type Default Description scan_recall_outputbool falseScan recall results for PII scan_reflect_outputbool falseScan reflect output for PII output_pii_actionstring "warn"Action on detected PII: redact, reject, warn
Automatic memory archival and deletion based on age and activity.
Key Type Default Description enabledbool falseEnable lifecycle management ttl.archive_after_daysint 90Archive if not recalled in N days ttl.delete_after_daysint 365Delete if older than N days ttl.exempt_tagslist[string] | null nullTags that skip TTL (e.g. pinned, compliance) ttl.fact_type_overridesdict | null nullOverride archive_after_days by fact type: {world: 180, experience: null}
exempt_tags : [ pinned , compliance ]
experience : null # never auto-archive
Per-profile reasoning defaults — affect reflect synthesis behavior.
Key Type Default Description skepticismint 3How critical to source (1–5) literalismint 3How literal vs. interpretive (1–5) empathyint 3How empathetic in synthesis (1–5) preferred_fact_typeslist[string] | null nullPreference order: [experience, world, observation] tagslist[string] | null nullDefault tags for all retained content
Key Type Default Description otel_enabledbool falseEnable OpenTelemetry spans prometheus_enabledbool falseEnable Prometheus metrics log_levelstring "info"Log level: debug, info, warn, error
MCP (Model Context Protocol) server settings — used by astrocyte.mcp and the astrocyte-mcp CLI.
Key Type Default Description default_bank_idstring | null nullDefault bank for MCP calls expose_reflectbool trueAllow reflect via MCP expose_forgetbool falseAllow forget via MCP expose_adminbool falseAllow lifecycle, bank health, and legal-hold admin tools via MCP max_results_limitint 50Max items returned per request principalstring | null nullPrincipal for MCP operations
Key Type Default Description mip_config_pathstring | null nullPath to MIP routing rules file (./mip.yaml)
See Memory Intent Protocol for the full MIP DSL — match operators, actions, override hierarchy, and intent policy.
Per-bank overrides. Each key is a bank ID. Any top-level section can be overridden per bank.
similarity_threshold : 0.90
- bank_id : sensitive-bank
principal : " agent:analyst "
permissions : [ read , write ]
Key Type Default Description profilestring | null nullOverride profile for this bank accesslist[dict] | null nullBank-specific access grants homeostasisHomeostasisConfig | null nullOverride homeostasis settings barriersBarrierConfig | null nullOverride barrier settings signal_qualitySignalQualityConfig | null nullOverride signal quality settings
External data source definitions for ingestion. See poll ingest guide for webhook, stream, and poll setup.
extraction_profile : builtin_text
target_bank : webhook-data
secret : ${WEBHOOK_SECRET}
target_bank : github-issues
url : redis://localhost:6379
consumer_group : astrocyte-group
url : " https://api.example.com/search?q={query} "
Key Type Description typestring Source type: webhook, stream, poll / api_poll, proxy driverstring | null Driver name: github, redis, kafka extraction_profilestring | null Extraction profile name for ingested content target_bankstring | null Destination bank ID target_bank_templatestring | null Template: "bank-{source_id}" authdict | null Auth config (type-specific: hmac_sha256, bearer, etc.) pathstring | null Source-specific path (e.g. owner/repo for GitHub) urlstring | null Source URL (Redis URL, Kafka bootstrap servers, proxy endpoint) topicstring | null Stream topic or Redis stream key consumer_groupstring | null Consumer group name interval_secondsint | null Poll interval (min 60 for GitHub) recall_methodstring | null Proxy only: GET (default) or POST recall_bodydict | null Proxy POST only: request body template
Registered agents with bank access and rate hints.
principal : " agent:support-bot "
default_bank : shared-support
banks : [ shared-support , team-* ]
permissions : [ read , write ]
max_retain_per_minute : 60
max_recall_per_minute : 120
Key Type Default Description principalstring | null nullAgent principal (e.g. agent:my-bot) bankslist[string] | null nullAllowed bank IDs (glob patterns supported) allowed_bankslist[string] | null nullAlias for banks default_bankstring | null nullDefault bank when not specified permissionslist[string] | null nullDeclared permissions (documentation/validation) max_retain_per_minuteint | null nullPer-agent retain rate hint max_recall_per_minuteint | null nullPer-agent recall rate hint
Reusable extraction configurations for ingestion sources.
chunking_strategy : dialogue
speaker : " $.participant_name "
- match : { source : slack }
Key Type Default Description content_typestring | null nullExpected content type chunking_strategystring | null nullStrategy: sentence, paragraph, fixed, dialogue chunk_sizeint | null nullMax characters per chunk entity_extractionbool | string | null nullExtract entities: true, false, ner, llm fact_typestring | null nullDefault fact type: world, experience, observation authority_tierstring | null nullRecall authority tier ID (overrides recall_authority.tier_by_bank) metadata_mappingdict | null nullMap source fields to metadata keys tag_ruleslist[dict] | null nullGenerate tags from metadata patterns
Built-in profiles: builtin_text and builtin_conversation.
Standalone gateway settings — ignored in library mode.
- " http://localhost:3000 "
cert_path : /path/to/cert.pem
key_path : /path/to/key.pem
Key Type Default Description mode"library" | "standalone" | "plugin""library"Deployment mode hoststring | null nullBind address (standalone only) portint | null nullPort (standalone only) workersint | null nullWorker processes (standalone only) cors_originslist[string] | null nullCORS allowed origins tls.cert_pathstring | null nullTLS certificate path tls.key_pathstring | null nullTLS private key path
Pre-built compliance presets that configure barriers, lifecycle, access control, and DLP. Set compliance_profile at the top level — your explicit config overrides any values the profile sets.
Profile PII mode PII action Lifecycle Access default DLP pdparules_then_llmredact5-year retention owner_onlyReflect output scanned gdprrules_then_llmredact2-year retention denyReflect output scanned hipaarules_then_llmreject7-year retention denyRecall + reflect scanned
Any string value in astrocyte.yaml can reference environment variables with ${VAR_NAME}:
api_key : ${OPENAI_API_KEY}
secret : ${WEBHOOK_SECRET}
Unresolved variables (not set in the environment) are left as the literal string ${VAR_NAME}.
api_key : ${OPENAI_API_KEY}
mip_config_path : ./mip.yaml
default_policy : owner_only
Memory API reference — retain/recall/reflect/forget signatures and examples
Authentication setup — auth modes, OIDC providers, JWT claim mapping
Storage backend setup — pgvector, Qdrant, Neo4j, Elasticsearch install and config
Monitoring & observability — health endpoints, logging, tracing, metrics
Access control setup — grants, OBO, common patterns
Bank management — bank creation, multi-bank queries, lifecycle recipes
MIP developer guide — writing and testing MIP routing rules
Production-grade HTTP service — full production checklist