Memory Router · Drop-in memory proxy for any LLM

Add persistent memory to any LLM with a single URL change

Memory Router is a transparent proxy that sits between your app and the model. Point your existing SDK at MemoryLake and every conversation gains long-term memory and an optimized context window. Run it two ways: bring your own provider key (BYOK), or use MemoryLake-hosted models with just one MemoryLake key.

Get your Router key Read the docs →Works with the OpenAI, Anthropic & Google SDKs you already use

The problem

Stateless APIs make you rebuild memory every time

Every LLM call is stateless. To fake continuity you re-send the entire history on every turn — which is slow, expensive, and eventually overflows the context window. Bolting on a vector DB and retrieval pipeline solves it, but it is weeks of plumbing you have to build and maintain.

Without a memory layer

Full chat history re-sent on every call — token cost climbs with conversation length.
Long sessions hit the context-window ceiling and start truncating mid-task.
Memory lives in one app — switch models or sessions and the context is gone.

Building it yourself

Stand up a vector DB, embeddings pipeline, chunking, and retrieval logic.
Write extraction, dedup, and relevance ranking — then keep it tuned.
Maintain it across every provider and every model you support.

Memory Router collapses all of that into one base-URL change. The memory layer is the proxy.

How it works

A transparent proxy in four steps

Your app

OpenAI / Anthropic / Google SDK

request

Memory Router

Transparent proxy

Trim redundant historyInject relevant memory

enhanced request

Model

BYOK or hosted

Your provider · BYOKMemoryLake-hosted

Memory store· async read / write

If MemoryLake is ever unavailable, requests pass straight through to your provider — zero downtime.

Intercept

Your app sends the request to Memory Router instead of the provider — same payload, same SDK, same response shape.

Optimize context

The Router trims redundant history, searches prior memories, and injects only the relevant context into the prompt.

Forward

The enhanced request goes to the model — your own provider (BYOK) or a MemoryLake-hosted model. Tokens in are lower than a raw replay.

Remember

New memories are extracted and stored asynchronously in the background — the response is never delayed.

Two ways to connect

BYOK or MemoryLake-hosted — your call

Unlike proxies that force you to bring your own key, Memory Router works both ways. Either change is just the base URL; everything else — prompts, streaming, tool calls — stays the same.

BYOK

Bring your own key

Use your own provider account. Your provider key is encrypted in transit, forwarded to the provider per call, and never stored on our servers.

Keep your existing OpenAI / Anthropic / Google account.
Your key, your billing, your rate limits.
Key is encrypted + passthrough only — never persisted or logged.

Keys: your provider key + MemoryLake key

No key needed

MemoryLake-hosted

No provider account required. MemoryLake runs the major models for you, so a single MemoryLake API key is all you need to start.

One key for everything — nothing else to sign up for.
Mainstream models built in and ready to call.
The simplest possible way to ship with memory.

Keys: MemoryLake key only

Key safety, by design: in BYOK mode your provider key is encrypted in transit and passed straight through to the provider on every call. MemoryLake never stores, logs, or reuses it.

BYOK — your provider key

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.memorylake.ai/v1/openai",
  apiKey: process.env.OPENAI_API_KEY,        // your provider key
  defaultHeaders: {
    // encrypted in transit · passthrough only · never stored
    "x-memorylake-api-key": process.env.MEMORYLAKE_API_KEY,
  },
});

MemoryLake-hosted — one key

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.memorylake.ai/v1",
  apiKey: process.env.MEMORYLAKE_API_KEY,    // one key — that's it
});

// Pick any built-in model, e.g. "claude-opus-4-8" or "gpt-5".

What you get

Memory infrastructure without the build

One-line integration

Change the base URL. Keep your SDK and your code exactly as they are.

BYOK or hosted

Bring your own provider key (encrypted, never stored) or use MemoryLake-hosted models with a single key.

Automatic context optimization

Redundant history is removed and only relevant memory is injected, shrinking tokens per call.

Shared memory pool

The Router and the MemoryLake API read and write the same memories — one source of truth.

Graceful fallback

If MemoryLake is ever unavailable, the request passes straight through to your provider. Zero downtime.

Full observability

Response headers report conversation IDs, context changes, token counts, and memories created or retrieved.

Compatibility

Works with the providers you already use

In BYOK mode, keep your provider account and key. In hosted mode, MemoryLake runs these models for you — same memory layer either way.

Provider	Status
OpenAI / GPT	Fully supported
Anthropic / Claude	Fully supported
Google Gemini	Fully supported
Groq, DeepSeek, OpenRouter	Fully supported
Any OpenAI-compatible endpoint	Supported
OpenAI Assistants API	Not yet supported

Transparency

Every response tells you what happened

Memory Router returns diagnostic headers so you can see exactly how each request was handled — no black box.

Conversation ID

The thread the request was attributed to, so you can group and inspect turns.

Context modified

Whether memory was injected or history was trimmed for this call.

Token counts

How many tokens were sent after optimization versus a raw replay.

Memories touched

How many memory chunks were retrieved and how many were created.

Setup

Live in three steps

1
Get a MemoryLake key
Sign up for MemoryLake and create an API key. The Free tier includes memory storage to start.
2
Pick a mode + swap the base URL
Point your SDK at the Router endpoint. For BYOK, keep your provider key and add the MemoryLake key as a header. For hosted, just use your MemoryLake key.
3
Call as normal
Send requests exactly as you do today. Memory is recalled and stored automatically; read the response headers to confirm.

The difference

Direct API call vs. Memory Router

Tokens sent per call, as a conversation grows

≈ 90% fewer tokens

Direct callfull history, every turn

Memory Routeronly relevant memory

	Direct provider call	With Memory Router
Long-term memory	You build and host it	Built in, automatic
Context window	Re-send everything, then truncate	Optimized — only what matters
Keys & accounts	A provider account is required	BYOK or use just a MemoryLake key
Code changes	New SDK + retrieval pipeline	One base-URL change
Across sessions & models	Memory is siloed per app	Shared memory pool
Provider outage of memory layer	Your problem to handle	Graceful passthrough
Visibility	None by default	Diagnostic response headers

FAQ

Do I need to change my code?

Only the base URL and one header. Your prompts, streaming, tool calls, and response handling stay identical — Memory Router speaks the same API as your provider.

Which providers are supported?

OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, OpenRouter, and any OpenAI-compatible endpoint. The OpenAI Assistants API is not yet supported.

What is the difference between BYOK and MemoryLake-hosted?

BYOK (Bring Your Own Key) means you supply your own provider key plus a MemoryLake key — billing and rate limits stay with your provider account. MemoryLake-hosted needs only a MemoryLake key: we run the major models for you, so you skip provider sign-up entirely.

In BYOK mode, is my provider key safe?

Yes. Your provider key is encrypted in transit and forwarded to the provider on each call. MemoryLake never stores, logs, or reuses it — it is passthrough only.

What happens if MemoryLake is down?

In BYOK mode Memory Router fails open: the request passes straight through to your provider so your application keeps working with zero downtime.

How does it reduce tokens?

Instead of replaying the entire history each turn, the Router removes redundant context and injects only the relevant memories — fewer tokens per call as conversations grow.

Is the memory shared with the MemoryLake API?

Yes. The Router and the MemoryLake API operate on the same memory pool, so anything stored one way is retrievable the other.

Is there a free plan?

Yes. Memory Router is available on the Free tier so you can integrate and test before scaling up.

Give every LLM a memory — change one URL.

Stop re-sending context and rebuilding retrieval. Point your SDK at Memory Router and ship memory today.

Get your Router key Read the docs →

Add persistent memory to any LLM with a single URL change

Stateless APIs make you rebuild memory every time

Without a memory layer

Building it yourself

A transparent proxy in four steps

Intercept

Optimize context

Forward

Remember

BYOK or MemoryLake-hosted — your call

Bring your own key

MemoryLake-hosted

Memory infrastructure without the build

One-line integration

BYOK or hosted

Automatic context optimization

Shared memory pool

Graceful fallback

Full observability

Works with the providers you already use

Every response tells you what happened

Conversation ID

Context modified

Token counts

Memories touched

Live in three steps

Get a MemoryLake key

Pick a mode + swap the base URL

Call as normal

Direct API call vs. Memory Router

Tokens sent per call, as a conversation grows

FAQ

Give every LLM a memory — change one URL.