Production-grade HTTP service
This document describes how to operate Astrocyte behind an HTTP (or similar) API in production: security, durability, observability, and compliance. It applies whether you embed Astrocyte in your own service or use the optional astrocyte-rest reference HTTP service in this repository.
For the optional inbound edge and where a gateway sits relative to the core, see architecture-framework.md §3. For principals and banks, see access-control.md. For identity and external policy engines, see identity-and-external-policy.md. For outbound HTTP to LLMs and proxies, see outbound-transport.md.
Checklists: Items below use - [ ] so they render as checkboxes in GitHub, GitLab, VS Code, and many Markdown previews. Copy the doc or track items in your issue tracker.
1. Audience
Section titled “1. Audience”Implementing your own production service
Use the implementation checklist (§3) as the primary guide. You own the process (FastAPI, gRPC, Lambda, etc.), TLS termination, identity, and scaling. The Astrocyte core remains a library; your service maps authenticated callers to AstrocyteContext and invokes retain / recall / reflect / forget.
Using or hardening the reference REST service (astrocyte-rest)
The repository includes astrocyte-services-py/astrocyte-rest/ as an optional REST process built on astrocyte-py. §4 documents its purpose, layout, configuration, local Compose / Makefile / troubleshooting (§4.5), and how it maps to this checklist. §5 lists repository-specific follow-ups for that package.
2. Architecture reminder
Section titled “2. Architecture reminder”An HTTP API is not part of the Astrocyte core contract. Typical production shape:
- Edge: API gateway or ingress (TLS, coarse rate limits, JWT or API-key validation).
- Application: Your service (or the reference
astrocyte-restpackage) constructsAstrocyte, attaches Tier 1 or Tier 2 providers, and passes an opaqueprincipalon each call after AuthN.
The framework enforces AuthZ on memory banks when access control is configured. Do not treat unauthenticated client headers as proof of identity.
3. Implementation checklist
Section titled “3. Implementation checklist”Track completion in your issue tracker or PRs as needed.
3.0 Scope and definition of done
Section titled “3.0 Scope and definition of done”- Product decision: Confirm target environments (Kubernetes, VM, single-tenant SaaS, air-gapped) and SLOs (availability, p95 latency, error budget).
- Support boundary: Document what is supported (API versions, auth modes, backends) versus experimental.
- Threat model: Document trust zones (internet, VPC, mesh), who may call the API, and what data may appear in logs.
3.1 Memory backends and durability (not in-memory)
Section titled “3.1 Memory backends and durability (not in-memory)”- Tier 1: Use production VectorStore adapters (and optional GraphStore / DocumentStore) per
provider-spi.md. Avoid in-process-only stores for durable memory. This repository includes an optionalastrocyte-pgvectoradapter (PostgreSQL + pgvector;docker-compose.ymlunderastrocyte-services-py/runs API + Postgres); wire it via config entry pointpgvectorand the sameastrocyte_rest/wiring.pyresolution path as other Tier 1 stores. Compose networking: the API container must use a DSN with the Postgres service hostname (postgres), not a host-only URL such as127.0.0.1:5433. The repo’s Compose file setsDATABASE_URLfromASTROCYTES_REST_DATABASE_URLor builds...@postgres:5432/...fromPOSTGRES_*; useMIGRATE_DATABASE_URL(or a one-offDATABASE_URLonly for the shell) for host-sidemigrate.sh, not for the in-cluster API—seeastrocyte-services-py/.env.example. - Tier 2 (optional): If using a memory engine provider, wire EngineProvider and validate capability negotiation (
reflect,forget, etc.). - LLM / embeddings: Use real LLMProvider (and embedding path) appropriate to latency and cost.
- Configuration: Load provider entry points from config (YAML/env) with validation; fail fast on missing required settings in prod.
- Data durability: Define backup, restore, and RPO/RTO for each store; test restore drills.
- Migrations: If stores require schema migrations, own a migration process (job or init container) and versioning. For
astrocyte-pgvector, use the shipped SQL files andpsqlrunner (migrate.sh); setbootstrap_schema: falseinvector_store_configafter applying migrations.
3.2 Authentication (AuthN) - do not trust client-supplied principals
Section titled “3.2 Authentication (AuthN) - do not trust client-supplied principals”- Remove or gate dev behavior: A header such as
X-Astrocyte-Principalmust not be the only identity in production unless behind verified mTLS or a trusted gateway that strips spoofed headers. - Choose and implement one primary mode:
- JWT (OIDC): validate issuer, audience, signature,
exp; map claims toAstrocyteContext.principal. - API keys: hashed keys in store, rotation, per-key scopes.
- mTLS: client certificates mapped to principals.
- JWT (OIDC): validate issuer, audience, signature,
- Service-to-service: Use workload identity (SPIFFE, IAM) where applicable.
- Anonymous / public: Explicitly forbid or allow only for specific routes; document risk.
3.3 Authorization (AuthZ) and tenancy
Section titled “3.3 Authorization (AuthZ) and tenancy”- Enable framework access control:
access_controlon with explicit policy (notopenby default in prod). - Grants: Load AccessGrant (or equivalent) from config, database, or external PDP (OPA, Cerbos, Casbin) per
access-control.md. - Bank isolation: Enforce bank_id scoping; prevent cross-tenant IDOR (validate bank belongs to tenant/principal).
- Admin paths: Separate privileged operations (if any) with stricter checks.
- Optional external PDP: Integrate AccessPolicyProvider when enterprise policy engines are required (
identity-and-external-policy.md).
3.4 Network, TLS, and edge
Section titled “3.4 Network, TLS, and edge”- Memory API exfiltration: Treat Astrocyte like any sensitive backend: bind AuthN at the Backend for Frontend (BFF) (do not let sandboxed clients freely choose a production principal or bank); combine with network egress policy for agent workloads so recall cannot be relayed to the public internet—see
sandbox-awareness-and-exfiltration.mdand Let’s discuss sandbox isolation. - TLS: Terminate TLS at ingress/gateway or in-process; HSTS and modern cipher policy where browsers are involved.
- Private networking: Prefer private subnets / mesh; no public exposure without WAF/rate limits as needed.
- Request limits: Max body size, header size, URL length; timeouts on client and server.
- CORS: If browser clients exist, restrict origins; avoid
*in production. - Rate limiting: Edge (API gateway) + align with Astrocyte homeostasis / quotas in config (
policy-layer.md).
3.5 API design and contract
Section titled “3.5 API design and contract”- Versioning: Keep
/v1(or header versioning) with a compatibility policy; deprecate with lead time. - OpenAPI: Publish stable OpenAPI; generate clients if useful; CI diff on API changes.
- Pagination / limits: Cap
max_results,max_tokens, and content size consistently; return truncation flags where applicable. - Idempotency: For
retain, define idempotency keys or dedup strategy for retries (align with framework behavior). - Error shape: Stable JSON error body (
code,message,request_id); no stack traces to clients in prod. - Health endpoints:
- Liveness: process up without hitting backends (reference
astrocyte-rest:GET /liveorGET /health/live). - Readiness: dependencies reachable (reference:
GET /healthrunsAstrocyte.health()→ vector store check; with pgvector that implies PostgreSQL connectivity). Use a bounded server-side timeout for deep checks so load balancers do not hang; expect 503 if dependencies fail or exceed the timeout.
- Liveness: process up without hitting backends (reference
3.6 Observability
Section titled “3.6 Observability”- Structured logs: JSON logs with request_id, principal (hashed or opaque id if sensitive), bank_id, route, latency, status; PII redaction policy.
- Metrics: RED (rate, errors, duration) per route; saturation (CPU, memory, queue depth); dependency health.
- Tracing: OpenTelemetry traces across HTTP and outbound calls (LLM, stores); optional
astrocyteOTel extras (astrocyte-pyoptional dependencies). - Dashboards and alerts: SLO-based alerts (burn rate) on error rate and latency.
3.7 Reliability and resilience
Section titled “3.7 Reliability and resilience”- Graceful shutdown: Honor SIGTERM; drain in-flight requests within a bounded time (uvicorn/gunicorn settings).
- Timeouts: Per-route and per-outbound-call timeouts; avoid unbounded
awaitchains. - Retries: Safe retries for read paths only unless idempotent writes are guaranteed.
- Circuit breaking: Align gateway and framework escalation / circuit breaker settings with dependency behavior.
- Overload: Backpressure (503 +
Retry-After) when saturated; consider queue or shed load policy.
3.8 Security hardening (application and supply chain)
Section titled “3.8 Security hardening (application and supply chain)”- Dependencies: Pin versions; SBOM; automated CVE scanning on images and lockfiles.
- Secrets: No secrets committed in repos; use secret manager / K8s secrets / IAM roles.
- Process: Run as non-root in containers; read-only root filesystem where possible; capability drop.
- Headers: Strip or validate forwarded headers; prevent header injection in logs.
- Input validation: Reject unexpected JSON fields if strict mode is desired; validate
bank_idformat and length.
3.9 Container image and runtime
Section titled “3.9 Container image and runtime”- Base image: Minimal, pinned digest; regular rebuilds for CVE patches.
- Multi-stage build: Optional, to reduce attack surface and size.
- Image signing: Cosign / policy in cluster to verify signatures.
- Resource limits: CPU/memory requests and limits; avoid OOM under load.
- Probes: Kubernetes liveness and readiness using the endpoints from §3.5.
3.10 Data governance and compliance
Section titled “3.10 Data governance and compliance”- Classification: Tag banks or payloads per data class if required (
data-governance.md). - Retention and legal hold: Align with
memory-lifecycle.md; block deletes when under hold. - Audit log: Immutable audit stream for retain/forget/admin actions (who, when, bank, outcome).
- Residency: Ensure stores and LLM regions meet data residency requirements.
- PII: Configure barriers / PII policy deliberately; document false positive handling (
policy-layer.md).
3.11 Performance and capacity
Section titled “3.11 Performance and capacity”- Load testing: Establish p95/p99 under target QPS and payload sizes; test cold start and warm pools.
- Connection pools: HTTP clients to LLM and DBs use bounded pools and sane keep-alive.
- Horizontal scaling: Stateless replicas; sticky sessions only if you introduce local-only state (avoid).
- Cost controls: Token and spend caps aligned with homeostasis and product limits.
3.12 Testing and quality gates
Section titled “3.12 Testing and quality gates”- Contract tests: HTTP API vs OpenAPI; backward compatibility tests on
/v1. - Integration tests: Real (or containerized) dependencies in CI for critical paths.
- Security tests: AuthZ tests (tenant A cannot read tenant B), fuzzing on inputs, OWASP API checks.
- Chaos / failure injection: Dependency failures (DB down, LLM timeout) produce expected errors and no data corruption.
3.13 Operations and lifecycle
Section titled “3.13 Operations and lifecycle”- Runbooks: Incident response, key rotation, scaling, failover, and rollback of releases.
- Configuration rollout: Safe rollout of config changes (feature flags or staged deploys).
- Backup and DR: Documented restore tests; RPO/RTO validated annually or on change.
- On-call: Alert routing and escalation paths tied to SLOs.
4. Reference REST service (astrocyte-rest)
Section titled “4. Reference REST service (astrocyte-rest)”This repository ships an optional astrocyte-rest HTTP service that embeds astrocyte-py. It is a convenience for development and as a starting point for a custom deployment; it is not production-ready by default.
4.1 Purpose
Section titled “4.1 Purpose”- Demonstrate mapping HTTP JSON bodies to
Astrocytecalls (retain,recall,reflect,forget) and optionalX-Astrocyte-Principal. - Provide a Dockerfile so the stack can be run locally or in a sandbox without writing a host app.
4.2 Current behavior (non-production defaults)
Section titled “4.2 Current behavior (non-production defaults)”- Tier 1 pipeline resolved from config: defaults are
in_memoryvector store andmockLLM (entry points onastrocyte-py). Optional YAML / env can select other registered providers ormodule:Classpaths (for examplepgvectorafter installingastrocyte-pgvector). Data is not durable when using the built-in in-memory stack. - Access control defaults to off when no config file is loaded; when you enable
access_controlin YAML,access_grantsandbanks.*.accessare loaded from config and applied viaset_access_grants(see §5). You still need a deliberate prod policy—do not rely on defaults. - Identity:
ASTROCYTES_AUTH_MODEselectsdev(trustsX-Astrocyte-Principalonly—use only behind a trusted gateway),api_key, orjwt_hs256(Bearer JWT,sub→ principal). This is a starting point for §3.2, not a full IdP integration (no OIDC discovery, JWKS, or per-key store yet).
Treat this as a reference implementation of the HTTP mapping only, not as a hardened product.
4.3 Layout and entrypoints
Section titled “4.3 Layout and entrypoints”| Item | Location |
|---|---|
| Package | astrocyte-services-py/astrocyte-rest/astrocyte_rest/ |
| FastAPI app factory | app.py - create_app() |
| Brain wiring | brain.py - build_reference_astrocyte() |
| CLI | astrocyte-rest (see pyproject.toml) |
| Container | Dockerfile (build from repository root; see README) |
| Compose (API + Postgres) | docker-compose.yml |
Runbook overlay (migrations + bootstrap_schema: false) | docker-compose.runbook.yml, config.runbook.example.yaml |
| One-shot local deploy | scripts/runbook-up.sh |
| Common commands | Makefile (make runbook, make up, make health, …) |
4.4 Configuration surface (today)
Section titled “4.4 Configuration surface (today)”| Variable | Role |
|---|---|
ASTROCYTES_HOST / ASTROCYTES_PORT | Bind address and port |
ASTROCYTES_CONFIG_PATH | Optional YAML loaded via load_config for policy/homeostasis; in-memory Tier 1 pipeline is still attached in the reference implementation |
DATABASE_URL / ASTROCYTES_PG_DSN | pgvector: PostgreSQL URI. In Docker Compose, the service sets DATABASE_URL for the API from ASTROCYTES_REST_DATABASE_URL or a built-in ...@postgres:5432/... DSN (see docker-compose.yml). Do not point the API container at a host-only URL (127.0.0.1:published_port) via a generic DATABASE_URL in .env. |
MIGRATE_DATABASE_URL | Optional host-side DSN for migrate.sh / runbook-up.sh (typically 127.0.0.1 + POSTGRES_PUBLISH_PORT). |
See astrocyte-services-py/astrocyte-rest/README.md for run instructions, HTTP route summary (including GET /live, GET /health/live, GET /health), and Docker commands.
4.5 Local Compose, Makefile, and troubleshooting
Section titled “4.5 Local Compose, Makefile, and troubleshooting”- Runbook (recommended for durable schema): From
astrocyte-services-py/,./scripts/runbook-up.shormake runbookstarts Postgres, runs SQL migrations on the host port, thendocker composewith the runbook overlay. Details:astrocyte-services-py/README.md(Runbook, Verify, Debugging). - Quick stack:
make upordocker compose up --builduses in-container bootstrap DDL unless you add the runbook file; for HNSW / migration-owned DDL, use the runbook path. - Health checks:
GET /live(orGET /health/live) confirms the process only;GET /healthchecksAstrocyte.health()(with pgvector, a real DB round-trip). If/livesucceeds and/healthfails or times out, inspectdocker compose logs astrocyte-rest,printenv DATABASE_URLinside the API container, and the Debugging section ofastrocyte-services-py/README.md. - Implementation note: The
astrocyte-pgvectoradapter uses psycopg 3 async pools withregister_vector_asyncand explicit transaction boundaries in poolconfigurecallbacks (provider-spi.md§7.1).
4.6 Path to production for astrocyte-rest
Section titled “4.6 Path to production for astrocyte-rest”Align astrocyte-rest with §3: real backends, verified AuthN, AuthZ with grants, health/readiness, observability, hardened image, and the items in §5. Until then, run it only in trusted dev or demo environments.
5. Repository-specific follow-ups (astrocyte-rest)
Section titled “5. Repository-specific follow-ups (astrocyte-rest)”- Brain wiring:
build_astrocyte()inastrocyte_rest/brain.pyloadsAstrocyteConfigand callsbuild_tier1_pipeline()inastrocyte_rest/wiring.py, which resolvesvector_store,llm_provider, and optional graph/document stores viaastrocyte._discovery.resolve_provider(entry points orpackage.module:Class). Built-in names ship inastrocyte-pypyproject.toml(in_memory,mock). A PostgreSQLpgvectoradapter lives inastrocyte-pgvector(optionalpgvectorextra onastrocyte-rest). - Health routes:
GET /liveandGET /health/live(liveness);GET /health(readiness / dependency check with bounded timeout). Documented inastrocyte-rest/README.md. - pgvector + psycopg async: Pool
configureusesregister_vector_async, commits afterconfigure, and registers vector types only when thevectorextension is present (seeprovider-spi.md§7.1). - Grants from config: When
access_control.enabled, grants are loaded from YAML (access_grantsandbanks.*.access) viaaccess_grants_for_astrocyte()inastrocyte-pyandset_access_grants()inastrocyte_rest/brain.py— seeidentity-and-external-policy.md§8. Not done here: grants from a database or external PDP (still your integration or future work). - Profiles: Keep PII
disabledandaccess_control.enabled = Falseonly for explicit dev profile; document prod profile requirements. - MCP / CLI: If
astrocyte-mcpis shipped, align its security model with the HTTP service (same AuthN story). Seemcp-server.md.
6. Sign-off (optional)
Section titled “6. Sign-off (optional)”| Area | Owner | Date | Notes |
|---|---|---|---|
| Security review | |||
| SLO agreed | |||
| Launch readiness |