Anansi Docs — Memory API for AI apps

Documentation menu

Personalise a voice agent

Give your voice agent persistent memory of each caller — names, preferences, past calls — served from a warm cache so retrieval stays off your critical path.

Warning

Latency is everything in voice. A 200ms delay before TTS is audible. The patterns below keep Anansi off your critical path.

Architecture

Call connects → pre-warm memory

Immediately fetch GET /v1/context?userId=callerId (no query). This primes the Redis cache so the first-turn retrieval is instant.

User speaks → STT + memory in parallel

Start transcription and fetch /v1/context?userId=callerId&q=transcript concurrently. Both finish around the same time.

Build system prompt → LLM → TTS

Inject the memory profile. Generate response with Claude. Feed to ElevenLabs or OpenAI TTS.

Ingest the turn (background)

Fire-and-forget POST /v1/ingest after TTS starts — never on the critical path.

Pre-warm at session start

When the call connects, before the user says anything, kick off a context fetch with no query. This loads the synthesized profile into Redis cache (TTL 60s) so every turn in the call hits the fast path.

typescriptvoice-session.ts
"kw">const ANANSI_URL = "https:">class="cm">//anansimemory.com";
"kw">const ANANSI_KEY = process.env.ANANSI_API_KEY!;

"kw">class="cm">// Call this the moment the phone call connects
"kw">export "kw">async "kw">function onCallStart(callerId: "kw">string) {
  "kw">class="cm">// Pre-warm: fetches the profile into the Redis cache so per-turn fetches hit the warm path
  "kw">const profile = "kw">await fetch(
    `${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}`,
    { headers: { Authorization: `Bearer ${ANANSI_KEY}` } }
  ).then((r) => r.json());

  "kw">const isReturning = profile.static.length > 0;
  "kw">return { profile, isReturning };
}

Per-turn implementation

typescriptvoice-turn.ts
"kw">import Anthropic "kw">from "@anthropic-ai/sdk";

"kw">const anthropic = "kw">new Anthropic();

"kw">export "kw">async "kw">function handleVoiceTurn(
  callerId: "kw">string,
  transcript: "kw">string,                              "kw">class="cm">// "kw">from STT
  history: { role: "user" | "assistant"; content: "kw">string }[],
  sessionId: "kw">string
) {
  "kw">class="cm">// Fetch memory with the transcript as query — concurrent with "kw">any post-STT work
  "kw">const memory = "kw">await fetch(
    `${ANANSI_URL}/v1/context?userId=${encodeURIComponent(callerId)}&q=${encodeURIComponent(transcript)}`,
    { headers: { Authorization: `Bearer ${ANANSI_KEY}` } }
  ).then((r) => r.json());

  "kw">const isReturning = memory.static.length > 0;
  "kw">const system = [
    isReturning
      ? `You remember this caller. Their name may appear in the facts below.`
      : `You are a friendly voice assistant. This is a "kw">new caller.`,
    "Keep every response under 2 sentences. Natural, conversational tone.",
    ...memory.static.map((f: "kw">string) => `Fact: ${f}`),
    ...memory.dynamic.map((d: "kw">string) => `Recent: ${d}`),
  ].join("\n");

  "kw">const response = "kw">await anthropic.messages.create({
    model: "claude-haiku-4-5-20251001",
    system,
    messages: [...history, { role: "user", content: transcript }],
    max_tokens: 100,   "kw">class="cm">// short replies = faster TTS
  });

  "kw">const reply = response.content[0]."kw">type === "text" ? response.content[0].text : "";

  "kw">class="cm">// Ingest after TTS starts — zero impact on latency
  fetch(`${ANANSI_URL}/v1/ingest`, {
    method: "POST",
    headers: { Authorization: `Bearer ${ANANSI_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      userId: callerId,
      content: `Caller: ${transcript}\nAgent: ${reply}`,
      sourceType: "voice",
      sessionId,
    }),
  }).catch(() => {});

  "kw">return reply;
}

Keeping Anansi off the critical path

Latency depends on your deployment, your LLM, and your TTS provider, so we don't publish universal numbers. What you control is where Anansi sits in the turn:

Pre-warm at call start. A no-query GET /v1/context primes the Redis cache (60s TTL), so subsequent in-call fetches are served from cache rather than re-running retrieval.
Fetch memory in parallel with STT. Kick off the context fetch the moment transcription starts — both finish before you build the prompt, so retrieval adds no serial time.
Ingest off the response path. Fire-and-forget POST /v1/ingest after TTS starts. Ingest returns 202 immediately and embedding runs on a worker, so it never blocks the turn.

Net effect: on the per-turn hot path your agent hits a warm cache read, and the LLM + TTS providers — not Anansi — dominate time-to-first-audio.

Caller identification strategies

Phone number (+14155551234) — always available via Twilio or Vonage. Best for anonymous callers who call back.
Account ID — if callers enter a PIN to authenticate before the agent starts, use their account ID as userId.
Email / username — for authenticated web-based voice interfaces (browser microphone).