Personalise a voice agent

Give your voice agent persistent memory of each caller — names, preferences, past calls — with sub-30ms retrieval on warm cache.

Warning
Latency is everything in voice. A 200ms delay before TTS is audible. The patterns below keep Anansi off your critical path.

Architecture

1

Call connects → pre-warm memory

Immediately fetch GET /v1/context?userId=callerId (no query). This primes the Redis cache so the first-turn retrieval is instant.

2

User speaks → STT + memory in parallel

Start transcription and fetch /v1/context?userId=callerId&q=transcript concurrently. Both finish around the same time.

3

Build system prompt → LLM → TTS

Inject the memory profile. Generate response with Claude. Feed to ElevenLabs or OpenAI TTS.

4

Ingest the turn (background)

Fire-and-forget POST /v1/ingest after TTS starts — never on the critical path.

Pre-warm at session start

When the call connects, before the user says anything, kick off a context fetch with no query. This loads the synthesized profile into Redis cache (TTL 60s) so every turn in the call hits the fast path.

typescriptvoice-session.ts
"kw">const ANANSI_URL = "https:">class="cm">//anansimemory.com"; "kw">const ANANSI_KEY = process.env.ANANSI_API_KEY!; "kw">class="cm">// Call this the moment the phone call connects "kw">export "kw">async "kw">function onCallStart(callerId: "kw">string) { "kw">class="cm">// Pre-warm: fetches profile into cache, ~80ms on first call, ~15ms after "kw">const profile = "kw">await fetch( `${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}`, { headers: { Authorization: `Bearer ${ANANSI_KEY}` } } ).then((r) => r.json()); "kw">const isReturning = profile.static.length > 0; "kw">return { profile, isReturning }; }

Per-turn implementation

typescriptvoice-turn.ts
"kw">import Anthropic "kw">from "@anthropic-ai/sdk"; "kw">const anthropic = "kw">new Anthropic(); "kw">export "kw">async "kw">function handleVoiceTurn( callerId: "kw">string, transcript: "kw">string, "kw">class="cm">// "kw">from STT history: { role: "user" | "assistant"; content: "kw">string }[], sessionId: "kw">string ) { "kw">class="cm">// Fetch memory with the transcript as query — concurrent with "kw">any post-STT work "kw">const memory = "kw">await fetch( `${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}&q=${encodeURIComponent(transcript)}`, { headers: { Authorization: `Bearer ${ANANSI_KEY}` } } ).then((r) => r.json()); "kw">const isReturning = memory.static.length > 0; "kw">const system = [ isReturning ? `You remember this caller. Their name may appear in the facts below.` : `You are a friendly voice assistant. This is a "kw">new caller.`, "Keep every response under 2 sentences. Natural, conversational tone.", ...memory.static.map((f: "kw">string) => `Fact: ${f}`), ...memory.dynamic.map((d: "kw">string) => `Recent: ${d}`), ].join("\n"); "kw">const response = "kw">await anthropic.messages.create({ model: "claude-haiku-4-5-20251001", system, messages: [...history, { role: "user", content: transcript }], max_tokens: 100, "kw">class="cm">// short replies = faster TTS }); "kw">const reply = response.content[0]."kw">type === "text" ? response.content[0].text : ""; "kw">class="cm">// Ingest after TTS starts — zero impact on latency fetch(`${ANANSI_URL}/v1/ingest`, { method: "POST", headers: { Authorization: `Bearer ${ANANSI_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ userId: callerId, content: `Caller: ${transcript}\nAgent: ${reply}`, sourceType: "voice", sessionId, }), }).catch(() => {}); "kw">return reply; }

Latency targets

OperationWarm cacheCold cache
GET /v1/context (no query)~15ms~80ms
GET /v1/context (with query)~30ms~150ms
Claude Haiku (100 tokens)~300ms~300ms
ElevenLabs streaming TTS~200ms first audio~200ms first audio

Pre-warming at call start means every per-turn fetch hits the warm path. Total time-to-first-audio stays under 600ms.

Caller identification strategies

  • Phone number (+14155551234) — always available via Twilio or Vonage. Best for anonymous callers who call back.
  • Account ID — if callers enter a PIN to authenticate before the agent starts, use their account ID as userId.
  • Email / username — for authenticated web-based voice interfaces (browser microphone).