Agent Memory with Redis Search - Upstash Documentation

Large language models are stateless: once a request returns, the model forgets everything. To build an agent that remembers who a user is and what happened in past conversations, you need to store that context yourself and feed it back into the prompt. In this tutorial we build a small but complete agent memory layer on Upstash Redis, with two tiers:

Working memory: the running conversation for the current session, stored in a single Redis key with a TTL so it expires on its own.
Long-term memory: durable facts about the user (preferences, events, decisions) stored as JSON documents and recalled with Redis Search full-text queries.

On every turn the agent recalls relevant long-term memories, answers using those plus the recent conversation, then remembers any new facts worth keeping.

This tutorial uses OpenAI for the chat and fact-extraction calls, but the memory layer itself is model-agnostic, so swap in any LLM you like.

Prerequisites

An Upstash Redis database (the REST URL and token).
An OpenAI API key.

Install the dependencies:

TypeScript
Python

npm install @upstash/redis openai

pip install upstash-redis openai

Set your environment variables:

UPSTASH_REDIS_REST_URL="https://..."
UPSTASH_REDIS_REST_TOKEN="..."
OPENAI_API_KEY="sk-..."

Step 1: Create the long-term memory index

Long-term memories are JSON documents stored under the memory: prefix. We index the text field for full-text recall, and keep userId and kind as exact-match keywords so we can scope a search to a single user. createdAt is a sortable number we can use to favor recent memories. Create the index once (e.g. in a setup script), not on every request.

TypeScript
Python

// setup.ts
import { Redis, s } from "@upstash/redis";

const redis = Redis.fromEnv();

try {
  await redis.search.createIndex({
    name: "memories",
    dataType: "json",
    prefix: "memory:",
    schema: s.object({
      text: s.string(),        // full-text searchable fact
      userId: s.keyword(),     // exact-match owner
      kind: s.keyword(),       // "preference" | "event" | "fact" ...
      createdAt: s.number(),   // epoch ms, sortable
    }),
  });
} catch {
  // Index already exists, safe to ignore when re-running setup.
}

# setup.py
from upstash_redis import Redis

redis = Redis.from_env()

redis.search.create_index(
    name="memories",
    data_type="json",
    prefixes="memory:",
    exists_ok=True, # idempotent: don't error if the index already exists
    schema={
        "text": "TEXT",        # full-text searchable fact
        "userId": "KEYWORD",   # exact-match owner
        "kind": "KEYWORD",     # "preference" | "event" | "fact" ...
        "createdAt": "F64",    # epoch ms, sortable
    },
)

Step 2: Working (short-term) memory

Working memory is just the recent message history for a session. We store it as a single JSON value with a one-hour TTL and cap it to the last 20 messages so the prompt stays small. When the session goes quiet, Redis expires the key for us.

TypeScript
Python

// memory.ts
import { Redis } from "@upstash/redis";

const redis = Redis.fromEnv();

export type Message = { role: "user" | "assistant"; content: string };

const SESSION_TTL = 60 * 60; // 1 hour
const MAX_MESSAGES = 20;

export async function loadHistory(sessionId: string): Promise<Message[]> {
  return (await redis.get<Message[]>(`chat:${sessionId}`)) ?? [];
}

export async function saveHistory(sessionId: string, messages: Message[]) {
  const trimmed = messages.slice(-MAX_MESSAGES);
  await redis.set(`chat:${sessionId}`, trimmed, { ex: SESSION_TTL });
}

# memory.py
import json
from upstash_redis import Redis

redis = Redis.from_env()

SESSION_TTL = 60 * 60  # 1 hour
MAX_MESSAGES = 20


def load_history(session_id: str) -> list[dict]:
    raw = redis.get(f"chat:{session_id}")
    return json.loads(raw) if raw else []


def save_history(session_id: str, messages: list[dict]) -> None:
    trimmed = messages[-MAX_MESSAGES:]
    redis.set(f"chat:{session_id}", json.dumps(trimmed), ex=SESSION_TTL)

Step 3: Recall relevant memories

To answer well, the agent needs the long-term facts that relate to the current message. We run a full-text query against the memories index, scoped to the user with the userId keyword. Redis Search ranks matches by relevance, so we take the top few.

TypeScript
Python

const memories = redis.search.index({ name: "memories" });

export async function recall(
  userId: string,
  query: string,
  limit = 5,
): Promise<string[]> {
  const results = await memories.query({
    filter: { text: query, userId },
    limit,
  });

  // No memories yet → the index may not exist → results is null
  return (results ?? []).map((r) => r.data.text as string);
}

memories = redis.search.index(name="memories")


def recall(user_id: str, query: str, limit: int = 5) -> list[str]:
    results = memories.query(filter={"text": query, "userId": user_id}, limit=limit)

    # No memories yet → the index may not exist → results is None
    return [r.data["text"] for r in (results or [])]

To bias recall toward recent memories, you can boost the score with the createdAt field using a score function, or sort with orderBy / order_by. We keep plain relevance ranking here for simplicity.

Step 4: Remember new facts

After each exchange we ask the model to pull out durable facts, the things worth remembering across sessions, not small talk. Each fact becomes a JSON document under the memory: prefix, so the index picks it up automatically. Because full-text search gives us a cheap similarity check, we deduplicate before writing: if a very similar memory already exists for this user, we skip it.

TypeScript
Python

import OpenAI from "openai";

const openai = new OpenAI();

// Heuristic: full-text scores are unbounded, so this threshold is tuned by feel.
const DEDUPE_SCORE = 8;

async function alreadyKnown(userId: string, text: string): Promise<boolean> {
  const hits = await memories.query({ filter: { text, userId }, limit: 1 });
  return !!hits?.length && hits[0].score > DEDUPE_SCORE;
}

export async function remember(userId: string, conversation: Message[]) {
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    response_format: { type: "json_object" },
    messages: [
      {
        role: "system",
        content:
          "Extract durable facts about the user worth remembering across " +
          "sessions (preferences, decisions, personal details). Ignore " +
          'small talk. Respond as JSON: {"facts": ["..."]}. Empty if none.',
      },
      { role: "user", content: JSON.stringify(conversation) },
    ],
  });

  const { facts } = JSON.parse(completion.choices[0].message.content ?? '{"facts":[]}');

  for (const text of facts as string[]) {
    if (await alreadyKnown(userId, text)) continue;
    const id = crypto.randomUUID();
    await redis.json.set(`memory:${userId}:${id}`, "$", {
      text,
      userId,
      kind: "fact",
      createdAt: Date.now(),
    });
  }
}

import json
import uuid
import time
from openai import OpenAI

openai = OpenAI()

# Heuristic: full-text scores are unbounded, so this threshold is tuned by feel.
DEDUPE_SCORE = 8


def already_known(user_id: str, text: str) -> bool:
    hits = memories.query(filter={"text": text, "userId": user_id}, limit=1)
    return bool(hits) and hits[0].score > DEDUPE_SCORE


def remember(user_id: str, conversation: list[dict]) -> None:
    completion = openai.chat.completions.create(
        model="gpt-4o-mini",
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "Extract durable facts about the user worth remembering "
                    "across sessions (preferences, decisions, personal details). "
                    "Ignore small talk. Respond as JSON: {\"facts\": [\"...\"]}. "
                    "Empty if none."
                ),
            },
            {"role": "user", "content": json.dumps(conversation)},
        ],
    )

    facts = json.loads(completion.choices[0].message.content or '{"facts":[]}')["facts"]

    for text in facts:
        if already_known(user_id, text):
            continue
        memory_id = uuid.uuid4().hex
        redis.json.set(
            f"memory:{user_id}:{memory_id}",
            "$",
            {
                "text": text,
                "userId": user_id,
                "kind": "fact",
                "createdAt": int(time.time() * 1000),
            },
        )

Step 5: The chat loop

Now we wire it together. Each turn: recall relevant memories, build a prompt from those plus the working memory, call the model, persist the updated history, and remember new facts.

TypeScript
Python

export async function chat(userId: string, sessionId: string, input: string) {
  const [history, recalled] = await Promise.all([
    loadHistory(sessionId),
    recall(userId, input),
  ]);

  const system =
    "You are a helpful assistant. Use the following remembered facts about " +
    `the user when relevant:\n${recalled.map((m) => `- ${m}`).join("\n") || "(none yet)"}`;

  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: system },
      ...history,
      { role: "user", content: input },
    ],
  });

  const reply = completion.choices[0].message.content ?? "";

  const updated: Message[] = [
    ...history,
    { role: "user", content: input },
    { role: "assistant", content: reply },
  ];

  await saveHistory(sessionId, updated);
  await remember(userId, updated); // fire-and-forget in production

  return reply;
}

def chat(user_id: str, session_id: str, user_input: str) -> str:
    history = load_history(session_id)
    recalled = recall(user_id, user_input)

    facts = "\n".join(f"- {m}" for m in recalled) or "(none yet)"
    system = (
        "You are a helpful assistant. Use the following remembered facts "
        f"about the user when relevant:\n{facts}"
    )

    completion = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            *history,
            {"role": "user", "content": user_input},
        ],
    )

    reply = completion.choices[0].message.content or ""

    updated = history + [
        {"role": "user", "content": user_input},
        {"role": "assistant", "content": reply},
    ]

    save_history(session_id, updated)
    remember(user_id, updated)  # run in the background in production

    return reply

Try it

Run two sessions for the same user. Even after the first session’s working memory expires, the facts learned there are recalled in the second:

TypeScript
Python

await chat("user-1", "session-a", "I'm vegetarian and I love spicy food.");

// Redis Search indexes writes asynchronously, wait so the demo is deterministic.
await memories.waitIndexing();

// ...a brand new session...
const reply = await chat("user-1", "session-b", "Suggest a dinner for me.");
console.log(reply); // recalls "vegetarian" + "spicy" from long-term memory

chat("user-1", "session-a", "I'm vegetarian and I love spicy food.")

# Redis Search indexes writes asynchronously, wait so the demo is deterministic.
memories.wait_indexing()

# ...a brand new session...
reply = chat("user-1", "session-b", "Suggest a dinner for me.")
print(reply)  # recalls "vegetarian" + "spicy" from long-term memory

Redis Search indexes writes asynchronously: a JSON.SET returns before the document is searchable. For a deterministic demo or test, call waitIndexing() / wait_indexing() to block until pending updates are applied. In a real app the next user turn normally arrives later than the indexing window, so an explicit wait isn’t needed.

How it fits together

Working memory lives under chat:{sessionId} with a TTL: fast to read, self-expiring, scoped to one conversation.
Long-term memory lives under memory:{userId}:{id} and is searchable across sessions through the memories index.
Recall uses full-text relevance to surface the facts that matter for the current message; remember extracts and deduplicates new ones.

Next steps

Add a kind such as "preference" vs "event" and filter recall by it.
Boost recent memories with a score function.
Summarize older working-memory messages instead of dropping them.
Stream the reply to a chat UI and animate it smoothly. See Smooth Text Streaming in AI SDK v5.
Learn more about what Redis Search can do in the Search docs.

​Prerequisites

​Step 1: Create the long-term memory index

​Step 2: Working (short-term) memory

​Step 3: Recall relevant memories

​Step 4: Remember new facts

​Step 5: The chat loop

​Try it

​How it fits together

​Next steps

Prerequisites

Step 1: Create the long-term memory index

Step 2: Working (short-term) memory

Step 3: Recall relevant memories

Step 4: Remember new facts

Step 5: The chat loop

Try it

How it fits together

Next steps