Skip to content

Production-grade HTTP service

This document describes how to operate Astrocyte behind an HTTP (or similar) API in production: security, durability, observability, and compliance. It applies whether you embed Astrocyte in your own service or use the optional astrocyte-rest reference HTTP service in this repository.

For the optional inbound edge and where a gateway sits relative to the core, see architecture-framework.md §3. For principals and banks, see access-control.md. For identity and external policy engines, see identity-and-external-policy.md. For outbound HTTP to LLMs and proxies, see outbound-transport.md.

Checklists: Items below use - [ ] so they render as checkboxes in GitHub, GitLab, VS Code, and many Markdown previews. Copy the doc or track items in your issue tracker.


Implementing your own production service

Use the implementation checklist (§3) as the primary guide. You own the process (FastAPI, gRPC, Lambda, etc.), TLS termination, identity, and scaling. The Astrocyte core remains a library; your service maps authenticated callers to AstrocyteContext and invokes retain / recall / reflect / forget.

Using or hardening the reference REST service (astrocyte-rest)

The repository includes astrocyte-services-py/astrocyte-rest/ as an optional REST process built on astrocyte-py. §4 documents its purpose, layout, configuration, local Compose / Makefile / troubleshooting (§4.5), and how it maps to this checklist. §5 lists repository-specific follow-ups for that package.


An HTTP API is not part of the Astrocyte core contract. Typical production shape:

  • Edge: API gateway or ingress (TLS, coarse rate limits, JWT or API-key validation).
  • Application: Your service (or the reference astrocyte-rest package) constructs Astrocyte, attaches Tier 1 or Tier 2 providers, and passes an opaque principal on each call after AuthN.

The framework enforces AuthZ on memory banks when access control is configured. Do not treat unauthenticated client headers as proof of identity.


Track completion in your issue tracker or PRs as needed.

  • Product decision: Confirm target environments (Kubernetes, VM, single-tenant SaaS, air-gapped) and SLOs (availability, p95 latency, error budget).
  • Support boundary: Document what is supported (API versions, auth modes, backends) versus experimental.
  • Threat model: Document trust zones (internet, VPC, mesh), who may call the API, and what data may appear in logs.

3.1 Memory backends and durability (not in-memory)

Section titled “3.1 Memory backends and durability (not in-memory)”
  • Tier 1: Use production VectorStore adapters (and optional GraphStore / DocumentStore) per provider-spi.md. Avoid in-process-only stores for durable memory. This repository includes an optional astrocyte-pgvector adapter (PostgreSQL + pgvector; docker-compose.yml under astrocyte-services-py/ runs API + Postgres); wire it via config entry point pgvector and the same astrocyte_rest/wiring.py resolution path as other Tier 1 stores. Compose networking: the API container must use a DSN with the Postgres service hostname (postgres), not a host-only URL such as 127.0.0.1:5433. The repo’s Compose file sets DATABASE_URL from ASTROCYTES_REST_DATABASE_URL or builds ...@postgres:5432/... from POSTGRES_*; use MIGRATE_DATABASE_URL (or a one-off DATABASE_URL only for the shell) for host-side migrate.sh, not for the in-cluster API—see astrocyte-services-py/.env.example.
  • Tier 2 (optional): If using a memory engine provider, wire EngineProvider and validate capability negotiation (reflect, forget, etc.).
  • LLM / embeddings: Use real LLMProvider (and embedding path) appropriate to latency and cost.
  • Configuration: Load provider entry points from config (YAML/env) with validation; fail fast on missing required settings in prod.
  • Data durability: Define backup, restore, and RPO/RTO for each store; test restore drills.
  • Migrations: If stores require schema migrations, own a migration process (job or init container) and versioning. For astrocyte-pgvector, use the shipped SQL files and psql runner (migrate.sh); set bootstrap_schema: false in vector_store_config after applying migrations.

3.2 Authentication (AuthN) - do not trust client-supplied principals

Section titled “3.2 Authentication (AuthN) - do not trust client-supplied principals”
  • Remove or gate dev behavior: A header such as X-Astrocyte-Principal must not be the only identity in production unless behind verified mTLS or a trusted gateway that strips spoofed headers.
  • Choose and implement one primary mode:
    • JWT (OIDC): validate issuer, audience, signature, exp; map claims to AstrocyteContext.principal.
    • API keys: hashed keys in store, rotation, per-key scopes.
    • mTLS: client certificates mapped to principals.
  • Service-to-service: Use workload identity (SPIFFE, IAM) where applicable.
  • Anonymous / public: Explicitly forbid or allow only for specific routes; document risk.
  • Enable framework access control: access_control on with explicit policy (not open by default in prod).
  • Grants: Load AccessGrant (or equivalent) from config, database, or external PDP (OPA, Cerbos, Casbin) per access-control.md.
  • Bank isolation: Enforce bank_id scoping; prevent cross-tenant IDOR (validate bank belongs to tenant/principal).
  • Admin paths: Separate privileged operations (if any) with stricter checks.
  • Optional external PDP: Integrate AccessPolicyProvider when enterprise policy engines are required (identity-and-external-policy.md).
  • Memory API exfiltration: Treat Astrocyte like any sensitive backend: bind AuthN at the Backend for Frontend (BFF) (do not let sandboxed clients freely choose a production principal or bank); combine with network egress policy for agent workloads so recall cannot be relayed to the public internet—see sandbox-awareness-and-exfiltration.md and Let’s discuss sandbox isolation.
  • TLS: Terminate TLS at ingress/gateway or in-process; HSTS and modern cipher policy where browsers are involved.
  • Private networking: Prefer private subnets / mesh; no public exposure without WAF/rate limits as needed.
  • Request limits: Max body size, header size, URL length; timeouts on client and server.
  • CORS: If browser clients exist, restrict origins; avoid * in production.
  • Rate limiting: Edge (API gateway) + align with Astrocyte homeostasis / quotas in config (policy-layer.md).
  • Versioning: Keep /v1 (or header versioning) with a compatibility policy; deprecate with lead time.
  • OpenAPI: Publish stable OpenAPI; generate clients if useful; CI diff on API changes.
  • Pagination / limits: Cap max_results, max_tokens, and content size consistently; return truncation flags where applicable.
  • Idempotency: For retain, define idempotency keys or dedup strategy for retries (align with framework behavior).
  • Error shape: Stable JSON error body (code, message, request_id); no stack traces to clients in prod.
  • Health endpoints:
    • Liveness: process up without hitting backends (reference astrocyte-rest: GET /live or GET /health/live).
    • Readiness: dependencies reachable (reference: GET /health runs Astrocyte.health() → vector store check; with pgvector that implies PostgreSQL connectivity). Use a bounded server-side timeout for deep checks so load balancers do not hang; expect 503 if dependencies fail or exceed the timeout.
  • Structured logs: JSON logs with request_id, principal (hashed or opaque id if sensitive), bank_id, route, latency, status; PII redaction policy.
  • Metrics: RED (rate, errors, duration) per route; saturation (CPU, memory, queue depth); dependency health.
  • Tracing: OpenTelemetry traces across HTTP and outbound calls (LLM, stores); optional astrocyte OTel extras (astrocyte-py optional dependencies).
  • Dashboards and alerts: SLO-based alerts (burn rate) on error rate and latency.
  • Graceful shutdown: Honor SIGTERM; drain in-flight requests within a bounded time (uvicorn/gunicorn settings).
  • Timeouts: Per-route and per-outbound-call timeouts; avoid unbounded await chains.
  • Retries: Safe retries for read paths only unless idempotent writes are guaranteed.
  • Circuit breaking: Align gateway and framework escalation / circuit breaker settings with dependency behavior.
  • Overload: Backpressure (503 + Retry-After) when saturated; consider queue or shed load policy.

3.8 Security hardening (application and supply chain)

Section titled “3.8 Security hardening (application and supply chain)”
  • Dependencies: Pin versions; SBOM; automated CVE scanning on images and lockfiles.
  • Secrets: No secrets committed in repos; use secret manager / K8s secrets / IAM roles.
  • Process: Run as non-root in containers; read-only root filesystem where possible; capability drop.
  • Headers: Strip or validate forwarded headers; prevent header injection in logs.
  • Input validation: Reject unexpected JSON fields if strict mode is desired; validate bank_id format and length.
  • Base image: Minimal, pinned digest; regular rebuilds for CVE patches.
  • Multi-stage build: Optional, to reduce attack surface and size.
  • Image signing: Cosign / policy in cluster to verify signatures.
  • Resource limits: CPU/memory requests and limits; avoid OOM under load.
  • Probes: Kubernetes liveness and readiness using the endpoints from §3.5.
  • Classification: Tag banks or payloads per data class if required (data-governance.md).
  • Retention and legal hold: Align with memory-lifecycle.md; block deletes when under hold.
  • Audit log: Immutable audit stream for retain/forget/admin actions (who, when, bank, outcome).
  • Residency: Ensure stores and LLM regions meet data residency requirements.
  • PII: Configure barriers / PII policy deliberately; document false positive handling (policy-layer.md).
  • Load testing: Establish p95/p99 under target QPS and payload sizes; test cold start and warm pools.
  • Connection pools: HTTP clients to LLM and DBs use bounded pools and sane keep-alive.
  • Horizontal scaling: Stateless replicas; sticky sessions only if you introduce local-only state (avoid).
  • Cost controls: Token and spend caps aligned with homeostasis and product limits.
  • Contract tests: HTTP API vs OpenAPI; backward compatibility tests on /v1.
  • Integration tests: Real (or containerized) dependencies in CI for critical paths.
  • Security tests: AuthZ tests (tenant A cannot read tenant B), fuzzing on inputs, OWASP API checks.
  • Chaos / failure injection: Dependency failures (DB down, LLM timeout) produce expected errors and no data corruption.
  • Runbooks: Incident response, key rotation, scaling, failover, and rollback of releases.
  • Configuration rollout: Safe rollout of config changes (feature flags or staged deploys).
  • Backup and DR: Documented restore tests; RPO/RTO validated annually or on change.
  • On-call: Alert routing and escalation paths tied to SLOs.

4. Reference REST service (astrocyte-rest)

Section titled “4. Reference REST service (astrocyte-rest)”

This repository ships an optional astrocyte-rest HTTP service that embeds astrocyte-py. It is a convenience for development and as a starting point for a custom deployment; it is not production-ready by default.

  • Demonstrate mapping HTTP JSON bodies to Astrocyte calls (retain, recall, reflect, forget) and optional X-Astrocyte-Principal.
  • Provide a Dockerfile so the stack can be run locally or in a sandbox without writing a host app.

4.2 Current behavior (non-production defaults)

Section titled “4.2 Current behavior (non-production defaults)”
  • Tier 1 pipeline resolved from config: defaults are in_memory vector store and mock LLM (entry points on astrocyte-py). Optional YAML / env can select other registered providers or module:Class paths (for example pgvector after installing astrocyte-pgvector). Data is not durable when using the built-in in-memory stack.
  • Access control defaults to off when no config file is loaded; when you enable access_control in YAML, access_grants and banks.*.access are loaded from config and applied via set_access_grants (see §5). You still need a deliberate prod policy—do not rely on defaults.
  • Identity: ASTROCYTES_AUTH_MODE selects dev (trusts X-Astrocyte-Principal only—use only behind a trusted gateway), api_key, or jwt_hs256 (Bearer JWT, sub → principal). This is a starting point for §3.2, not a full IdP integration (no OIDC discovery, JWKS, or per-key store yet).

Treat this as a reference implementation of the HTTP mapping only, not as a hardened product.

ItemLocation
Packageastrocyte-services-py/astrocyte-rest/astrocyte_rest/
FastAPI app factoryapp.py - create_app()
Brain wiringbrain.py - build_reference_astrocyte()
CLIastrocyte-rest (see pyproject.toml)
ContainerDockerfile (build from repository root; see README)
Compose (API + Postgres)docker-compose.yml
Runbook overlay (migrations + bootstrap_schema: false)docker-compose.runbook.yml, config.runbook.example.yaml
One-shot local deployscripts/runbook-up.sh
Common commandsMakefile (make runbook, make up, make health, …)
VariableRole
ASTROCYTES_HOST / ASTROCYTES_PORTBind address and port
ASTROCYTES_CONFIG_PATHOptional YAML loaded via load_config for policy/homeostasis; in-memory Tier 1 pipeline is still attached in the reference implementation
DATABASE_URL / ASTROCYTES_PG_DSNpgvector: PostgreSQL URI. In Docker Compose, the service sets DATABASE_URL for the API from ASTROCYTES_REST_DATABASE_URL or a built-in ...@postgres:5432/... DSN (see docker-compose.yml). Do not point the API container at a host-only URL (127.0.0.1:published_port) via a generic DATABASE_URL in .env.
MIGRATE_DATABASE_URLOptional host-side DSN for migrate.sh / runbook-up.sh (typically 127.0.0.1 + POSTGRES_PUBLISH_PORT).

See astrocyte-services-py/astrocyte-rest/README.md for run instructions, HTTP route summary (including GET /live, GET /health/live, GET /health), and Docker commands.

4.5 Local Compose, Makefile, and troubleshooting

Section titled “4.5 Local Compose, Makefile, and troubleshooting”
  • Runbook (recommended for durable schema): From astrocyte-services-py/, ./scripts/runbook-up.sh or make runbook starts Postgres, runs SQL migrations on the host port, then docker compose with the runbook overlay. Details: astrocyte-services-py/README.md (Runbook, Verify, Debugging).
  • Quick stack: make up or docker compose up --build uses in-container bootstrap DDL unless you add the runbook file; for HNSW / migration-owned DDL, use the runbook path.
  • Health checks: GET /live (or GET /health/live) confirms the process only; GET /health checks Astrocyte.health() (with pgvector, a real DB round-trip). If /live succeeds and /health fails or times out, inspect docker compose logs astrocyte-rest, printenv DATABASE_URL inside the API container, and the Debugging section of astrocyte-services-py/README.md.
  • Implementation note: The astrocyte-pgvector adapter uses psycopg 3 async pools with register_vector_async and explicit transaction boundaries in pool configure callbacks (provider-spi.md §7.1).

Align astrocyte-rest with §3: real backends, verified AuthN, AuthZ with grants, health/readiness, observability, hardened image, and the items in §5. Until then, run it only in trusted dev or demo environments.


5. Repository-specific follow-ups (astrocyte-rest)

Section titled “5. Repository-specific follow-ups (astrocyte-rest)”
  • Brain wiring: build_astrocyte() in astrocyte_rest/brain.py loads AstrocyteConfig and calls build_tier1_pipeline() in astrocyte_rest/wiring.py, which resolves vector_store, llm_provider, and optional graph/document stores via astrocyte._discovery.resolve_provider (entry points or package.module:Class). Built-in names ship in astrocyte-py pyproject.toml (in_memory, mock). A PostgreSQL pgvector adapter lives in astrocyte-pgvector (optional pgvector extra on astrocyte-rest).
  • Health routes: GET /live and GET /health/live (liveness); GET /health (readiness / dependency check with bounded timeout). Documented in astrocyte-rest/README.md.
  • pgvector + psycopg async: Pool configure uses register_vector_async, commits after configure, and registers vector types only when the vector extension is present (see provider-spi.md §7.1).
  • Grants from config: When access_control.enabled, grants are loaded from YAML (access_grants and banks.*.access) via access_grants_for_astrocyte() in astrocyte-py and set_access_grants() in astrocyte_rest/brain.py — see identity-and-external-policy.md §8. Not done here: grants from a database or external PDP (still your integration or future work).
  • Profiles: Keep PII disabled and access_control.enabled = False only for explicit dev profile; document prod profile requirements.
  • MCP / CLI: If astrocyte-mcp is shipped, align its security model with the HTTP service (same AuthN story). See mcp-server.md.

AreaOwnerDateNotes
Security review
SLO agreed
Launch readiness