Presentation layer: multimodal, voice, and video services
This document clarifies how Astrocyte relates to commercial AI products that are not text LLM backends - conversational video avatars, synthetic voice, real-time video APIs - and how to integrate them alongside the LLM Provider SPI. For multimodal LLM inputs (image/audio in chat completions with GPT-4o, Claude, Gemini, etc.), see multimodal-llm-spi.md. For the base SPI, see provider-spi.md section 4.
1. What the LLM Provider SPI covers
Section titled “1. What the LLM Provider SPI covers”Astrocyte needs complete() (chat-style messages → text) and embed() (texts → vectors) for the built-in pipeline and policies. Adapters such as astrocyte-llm-litellm (and the built-in openai provider with a custom base_url) route those calls to direct vendor SDKs, multi-provider gateways (LiteLLM, Portkey, OpenRouter, cloud routers, …), or any OpenAI-compatible HTTP surface your deployment standardizes on.
That scope does not include:
- Real-time video conversation sessions
- Text-to-speech or voice cloning as primary products
- Avatar rendering or lip-sync pipelines
- Video generation from scripts alone (non-chat)
Those are different APIs and products. They are not drop-in replacements for LLMProvider, but they compose with Astrocyte in full applications.
2. Landscape: Tavus and similar products
Section titled “2. Landscape: Tavus and similar products”2.1 Conversational video / digital human (closest to Tavus)
Section titled “2.1 Conversational video / digital human (closest to Tavus)”These platforms emphasize real-time or scripted video with AI personas, developer APIs, and often REST or WebRTC flows. They overlap on “AI face + voice + dialogue” but differ on pricing, enterprise features, and API shape.
| Category | Examples (non-exhaustive) | Notes |
|---|---|---|
| Conversational video / CVI | Tavus, HeyGen, D-ID, Hour One, Synthesia (often enterprise), ReachOut.AI, various digital human APIs | APIs are usually conversation / video / avatar oriented, not “replace OpenAI for all complete() calls.” |
| Video-first marketing / personalization | Tools in the same space as above; feature emphasis varies (marketing vs support vs L&D). |
Tavus positions on real-time conversational video with replicas and personas; many “alternatives” compete on avatar quality, API access, or batch video rather than identical CVI semantics. Treat names as research starting points, not guarantees of feature parity.
2.2 Voice and audio (related but not the same category as Tavus)
Section titled “2.2 Voice and audio (related but not the same category as Tavus)”| Category | Examples | Relation to Tavus |
|---|---|---|
| Voice / TTS / conversational voice | ElevenLabs, PlayHT, Cartesia, Amazon Polly, Google Cloud TTS | ElevenLabs is a strong competitor on AI voice (TTS, agents, dubbing), not a full video replica CVI in the same sense as Tavus. Pairing ElevenLabs (voice) + separate avatar/video is a common architecture. |
| STT / telephony | Deepgram, AssemblyAI, Twilio Voice, etc. | Input/output paths for voice apps; orthogonal to memory. |
2.3 Where LLM gateways still help (LiteLLM, OpenRouter, Portkey, …)
Section titled “2.3 Where LLM gateways still help (LiteLLM, OpenRouter, Portkey, …)”Unified gateways and aggregators (LiteLLM, OpenRouter, Portkey, Vercel AI Gateway, cloud model routers, …) excel at text models (OpenAI, Anthropic, Gemini, Mistral, open models, …). If a video platform lets you bring your own model via an OpenAI-compatible base URL (some document “custom LLM” onboarding), that same LLM can often be configured as llm_provider: litellm or openai + base_url for Astrocyte - provided the endpoint implements chat completions (and optionally embeddings) the adapter expects. That is “shared brain” between Astrocyte and the video stack, not “Tavus as the LLMProvider” unless Tavus exposes such an endpoint to your backend for arbitrary calls.
3. How to integrate: architectural patterns
Section titled “3. How to integrate: architectural patterns”3.1 Recommended split: memory (Astrocyte) vs presentation (vendor)
Section titled “3.1 Recommended split: memory (Astrocyte) vs presentation (vendor)”User / client │ ├─► Application orchestrator │ │ │ ├─► Astrocyte brain.retain / recall / reflect │ │ └─► LLMProvider → gateway (LiteLLM, OpenRouter, …) or direct OpenAI / Anthropic / … │ │ (text completion + embeddings for pipeline) │ │ │ └─► Presentation service API (HTTP / WebRTC / SDK) │ └─► Tavus / HeyGen / D-ID / ElevenLabs / … │ (video session, avatar, TTS audio stream) │ └─► Optional: same “persona” config in both layers (your product contract)- Astrocyte holds durable memory and runs retrieval + governance using the LLM SPI.
- Tavus-class systems handle how the agent is seen and heard in real time or in rendered video.
Integration is orchestration in your app, not a second LLMProvider for the video API.
3.2 When a vendor exposes OpenAI-compatible text APIs
Section titled “3.2 When a vendor exposes OpenAI-compatible text APIs”If the commercial product documents a chat completions (and optionally embeddings) endpoint compatible with your adapter:
# Example: custom OpenAI-compatible endpoint used by Astrocyte onlyllm_provider: openaillm_provider_config: base_url: https://your-vendor-or-proxy.example/v1 api_key: ${VENDOR_API_KEY} model: vendor-text-modelOr via a gateway adapter (e.g. litellm) when the model is listed or proxied there — the same pattern applies to other aggregators you expose through an OpenAI-compatible surface your adapter supports:
llm_provider: litellmllm_provider_config: model: openai/your-model # or provider-specific slug per your gateway’s docs base_url: https://...Use this only when the API truly matches message-in, text-out (and embed if needed). Do not point base_url at a video-only REST root expecting it to behave like /v1/chat/completions.
3.3 Feeding memory into presentation (composer pattern)
Section titled “3.3 Feeding memory into presentation (composer pattern)”Treat your application orchestrator (often a Backend for Frontend or API service) as the composer: it assembles what the presentation vendor sees—Astrocyte supplies durable semantic memory (retain / recall / reflect / forget); the vendor SDK or REST API supplies the conversation surface (room, avatar, realtime audio/video). Astrocyte does not fetch your CRM, billing DB, or identity store unless you retain derived text there; canonical business facts normally live in your system of record and are read at compose time under your authorization rules.
Typical end-to-end pattern:
- Composer loads allowed facts from your SoR (profile, entitlements, tickets, …) and retrieves from Astrocyte via
recall()orreflect()(or both—see latency trade-offs in your product). - Composer renders a single briefing string (or vendor-specific fields) with explicit ordering, truncation, and precedence (e.g. SoR wins on conflicting subscription tier).
- Composer starts or updates the vendor session using their API (session URL,
conversational_context, script payload, custom LLM URL—per vendor docs). - New information from the session is
retain()’d through explicit tools, async summarization, or webhooks, always through policy andAstrocyteContext(principal + bank binding at the server—seesandbox-awareness-and-exfiltration.md).
SoR vs Astrocyte
Section titled “SoR vs Astrocyte”| Source | Role in the composed briefing |
|---|---|
| System of record (your databases, CRM, support tools) | Authoritative operational data; read through the composer with field-level allowlists and TTL / freshness rules you own. |
| Astrocyte | Learned / recalled agent memory (user-stated preferences, past conversation distilled facts, RAG you chose to retain) with governance (PII, quotas, access grants, lifecycle). |
Do not assume the video platform merges these sources; composition is your contract.
Fusing SoR snippets with memory in one recall
Section titled “Fusing SoR snippets with memory in one recall”When you want vector memory and external snippets fused under one token budget, use RecallRequest.external_context: pass curated SoR (or RAG) hits as MemoryHit rows alongside the query. The pipeline / provider recall path blends them with stored memory—see innovations.md (cross-source fusion) and built-in-pipeline.md (external_context and detail_level). Your composer still fetches SoR data and maps it to MemoryHit; Astrocyte remains the memory engine, not the CRM client. If your integration uses the high-level brain.recall(query, bank_id=…) helper only, merge SoR snippets with recall hits in the composer before rendering the briefing until your stack wires external_context through that entry point.
Example: Tavus Conversational Video Interface (CVI)
Section titled “Example: Tavus Conversational Video Interface (CVI)”Tavus is representative of CVI products: server-side session creation, client-side realtime UI.
Server (composer), holding the Tavus API key only here:
- Build
conversational_contextfrom briefing blocks (SoR summaries + Astrocyterecall/reflectoutput), capped for latency and model context. POST https://tavusapi.com/v2/conversationswithpersona_id,replica_id, and optionalmemory_stores(vendor-native cross-session memory, orthogonal to Astrocyte unless you design sync).- Return
conversation_url(andmeeting_tokenif using private rooms) to the browser; never expose Tavus or Astrocyte secrets to the client.
Client (CVI / Daily): Tavus does not execute LLM tool calls on your behalf. Listen for conversation.tool_call on the realtime channel, call your HTTPS endpoint, run retain / recall / forget on Astrocyte, then inject compact results with conversation.append_llm_context or conversation.echo per Tavus tool-calling guidance.
# Illustrative: composer builds a briefing, then opens a CVI session (names differ per vendor).# Merge Astrocyte hits with SoR snippets in the composer (or use RecallRequest.external_context# on a pipeline recall(RecallRequest) path—see built-in-pipeline.md).recall_result = await brain.recall( "What should we remember for this support call?", bank_id=bank_id, max_results=5, context=ctx,)sor_blocks = [f"Plan: {user_row['plan_tier']}"] # fetch + allowlist in your SoR layerbriefing = render_briefing(sor_blocks=sor_blocks, astrocyte_hits=recall_result.hits)
# Pseudocode: Tavus create conversation — see Tavus API reference for full schema.# await tavus_client.create_conversation(# persona_id="...",# replica_id="...",# conversational_context=briefing,# memory_stores=[f"{user_id}_{persona_id}"],# )
# On tool execution or async pipeline:await brain.retain( f"User said: {user_utterance}", metadata={"source": "tavus_cvi", "session_id": sid}, bank_id=bank_id, context=ctx,)3.4 ElevenLabs (voice) alongside Astrocyte
Section titled “3.4 ElevenLabs (voice) alongside Astrocyte”Use Astrocyte + text LLM for reasoning and memory; use ElevenLabs for TTS or voice agent audio for the same turn:
LLM (via Astrocyte pipeline) → text reply → ElevenLabs TTS → audio to userMemory writes still go through retain() based on text you choose to persist (user message, assistant message, or summaries), not the raw audio.
4. Summary
Section titled “4. Summary”| Question | Answer |
|---|---|
| Does LLM-gateway integration (LiteLLM-style routing) “include” Tavus / HeyGen / ElevenLabs automatically? | No - those are not generic chat/embedding backends for the SPI unless they expose compatible text APIs you configure explicitly. |
| Are they competitors? | Partially. Many video avatar platforms compete with Tavus on CVI or marketing video; ElevenLabs competes on voice, often as a component next to video or text agents. |
| How does Astrocyte “allow” integration? | Composition: memory and text LLM via LLMProvider; presentation via vendor SDKs/REST in your app. Optional shared OpenAI-compatible LLM URL when the vendor supports it. |
Who merges CRM / DB facts with recall for a CVI session? | Your application composer (BFF / API). Astrocyte holds learned memory; your system of record stays authoritative for operational data unless you retain derived text into Astrocyte. Use RecallRequest.external_context on pipeline recall where supported, or merge SoR snippets with brain.recall() hits before rendering the vendor briefing. |
For vendor-specific steps, always follow their API reference (sessions, webhooks, custom LLM URLs). Astrocyte stays stable on retain / recall / reflect and text + embeddings for intelligence and governance.