Personalise a voice agent
Give your voice agent persistent memory of each caller — names, preferences, past calls — with sub-30ms retrieval on warm cache.
Warning
Latency is everything in voice. A 200ms delay before TTS is audible. The patterns below keep Anansi off your critical path.
Architecture
1
Call connects → pre-warm memory
Immediately fetch GET /v1/context?userId=callerId (no query). This primes the Redis cache so the first-turn retrieval is instant.
2
User speaks → STT + memory in parallel
Start transcription and fetch /v1/context?userId=callerId&q=transcript concurrently. Both finish around the same time.
3
Build system prompt → LLM → TTS
Inject the memory profile. Generate response with Claude. Feed to ElevenLabs or OpenAI TTS.
4
Ingest the turn (background)
Fire-and-forget POST /v1/ingest after TTS starts — never on the critical path.
Pre-warm at session start
When the call connects, before the user says anything, kick off a context fetch with no query. This loads the synthesized profile into Redis cache (TTL 60s) so every turn in the call hits the fast path.
typescriptvoice-session.ts
"kw">const ANANSI_URL = "https:">class="cm">//anansimemory.com";
"kw">const ANANSI_KEY = process.env.ANANSI_API_KEY!;
"kw">class="cm">// Call this the moment the phone call connects
"kw">export "kw">async "kw">function onCallStart(callerId: "kw">string) {
"kw">class="cm">// Pre-warm: fetches profile into cache, ~80ms on first call, ~15ms after
"kw">const profile = "kw">await fetch(
`${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}`,
{ headers: { Authorization: `Bearer ${ANANSI_KEY}` } }
).then((r) => r.json());
"kw">const isReturning = profile.static.length > 0;
"kw">return { profile, isReturning };
}
Per-turn implementation
typescriptvoice-turn.ts
"kw">import Anthropic "kw">from "@anthropic-ai/sdk";
"kw">const anthropic = "kw">new Anthropic();
"kw">export "kw">async "kw">function handleVoiceTurn(
callerId: "kw">string,
transcript: "kw">string, "kw">class="cm">// "kw">from STT
history: { role: "user" | "assistant"; content: "kw">string }[],
sessionId: "kw">string
) {
"kw">class="cm">// Fetch memory with the transcript as query — concurrent with "kw">any post-STT work
"kw">const memory = "kw">await fetch(
`${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}&q=${encodeURIComponent(transcript)}`,
{ headers: { Authorization: `Bearer ${ANANSI_KEY}` } }
).then((r) => r.json());
"kw">const isReturning = memory.static.length > 0;
"kw">const system = [
isReturning
? `You remember this caller. Their name may appear in the facts below.`
: `You are a friendly voice assistant. This is a "kw">new caller.`,
"Keep every response under 2 sentences. Natural, conversational tone.",
...memory.static.map((f: "kw">string) => `Fact: ${f}`),
...memory.dynamic.map((d: "kw">string) => `Recent: ${d}`),
].join("\n");
"kw">const response = "kw">await anthropic.messages.create({
model: "claude-haiku-4-5-20251001",
system,
messages: [...history, { role: "user", content: transcript }],
max_tokens: 100, "kw">class="cm">// short replies = faster TTS
});
"kw">const reply = response.content[0]."kw">type === "text" ? response.content[0].text : "";
"kw">class="cm">// Ingest after TTS starts — zero impact on latency
fetch(`${ANANSI_URL}/v1/ingest`, {
method: "POST",
headers: { Authorization: `Bearer ${ANANSI_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({
userId: callerId,
content: `Caller: ${transcript}\nAgent: ${reply}`,
sourceType: "voice",
sessionId,
}),
}).catch(() => {});
"kw">return reply;
}
Latency targets
| Operation | Warm cache | Cold cache |
|---|
GET /v1/context (no query) | ~15ms | ~80ms |
GET /v1/context (with query) | ~30ms | ~150ms |
| Claude Haiku (100 tokens) | ~300ms | ~300ms |
| ElevenLabs streaming TTS | ~200ms first audio | ~200ms first audio |
Pre-warming at call start means every per-turn fetch hits the warm path. Total time-to-first-audio stays under 600ms.
Caller identification strategies
- Phone number (
+14155551234) — always available via Twilio or Vonage. Best for anonymous callers who call back. - Account ID — if callers enter a PIN to authenticate before the agent starts, use their account ID as userId.
- Email / username — for authenticated web-based voice interfaces (browser microphone).