Add persistent memory to any LLM with a single URL change
Memory Router is a transparent proxy that sits between your app and the model. Point your existing SDK at MemoryLake and every conversation gains long-term memory and an optimized context window. Run it two ways: bring your own provider key (BYOK), or use MemoryLake-hosted models with just one MemoryLake key.
Stateless APIs make you rebuild memory every time
Every LLM call is stateless. To fake continuity you re-send the entire history on every turn — which is slow, expensive, and eventually overflows the context window. Bolting on a vector DB and retrieval pipeline solves it, but it is weeks of plumbing you have to build and maintain.
Without a memory layer
- Full chat history re-sent on every call — token cost climbs with conversation length.
- Long sessions hit the context-window ceiling and start truncating mid-task.
- Memory lives in one app — switch models or sessions and the context is gone.
Building it yourself
- Stand up a vector DB, embeddings pipeline, chunking, and retrieval logic.
- Write extraction, dedup, and relevance ranking — then keep it tuned.
- Maintain it across every provider and every model you support.
Memory Router collapses all of that into one base-URL change. The memory layer is the proxy.
A transparent proxy in four steps
If MemoryLake is ever unavailable, requests pass straight through to your provider — zero downtime.
Intercept
Your app sends the request to Memory Router instead of the provider — same payload, same SDK, same response shape.
Optimize context
The Router trims redundant history, searches prior memories, and injects only the relevant context into the prompt.
Forward
The enhanced request goes to the model — your own provider (BYOK) or a MemoryLake-hosted model. Tokens in are lower than a raw replay.
Remember
New memories are extracted and stored asynchronously in the background — the response is never delayed.
BYOK or MemoryLake-hosted — your call
Unlike proxies that force you to bring your own key, Memory Router works both ways. Either change is just the base URL; everything else — prompts, streaming, tool calls — stays the same.
Bring your own key
Use your own provider account. Your provider key is encrypted in transit, forwarded to the provider per call, and never stored on our servers.
- Keep your existing OpenAI / Anthropic / Google account.
- Your key, your billing, your rate limits.
- Key is encrypted + passthrough only — never persisted or logged.
MemoryLake-hosted
No provider account required. MemoryLake runs the major models for you, so a single MemoryLake API key is all you need to start.
- One key for everything — nothing else to sign up for.
- Mainstream models built in and ready to call.
- The simplest possible way to ship with memory.
Key safety, by design: in BYOK mode your provider key is encrypted in transit and passed straight through to the provider on every call. MemoryLake never stores, logs, or reuses it.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://router.memorylake.ai/v1/openai",
apiKey: process.env.OPENAI_API_KEY, // your provider key
defaultHeaders: {
// encrypted in transit · passthrough only · never stored
"x-memorylake-api-key": process.env.MEMORYLAKE_API_KEY,
},
});import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://router.memorylake.ai/v1",
apiKey: process.env.MEMORYLAKE_API_KEY, // one key — that's it
});
// Pick any built-in model, e.g. "claude-opus-4-8" or "gpt-5".Memory infrastructure without the build
One-line integration
Change the base URL. Keep your SDK and your code exactly as they are.
BYOK or hosted
Bring your own provider key (encrypted, never stored) or use MemoryLake-hosted models with a single key.
Automatic context optimization
Redundant history is removed and only relevant memory is injected, shrinking tokens per call.
Shared memory pool
The Router and the MemoryLake API read and write the same memories — one source of truth.
Graceful fallback
If MemoryLake is ever unavailable, the request passes straight through to your provider. Zero downtime.
Full observability
Response headers report conversation IDs, context changes, token counts, and memories created or retrieved.
Works with the providers you already use
In BYOK mode, keep your provider account and key. In hosted mode, MemoryLake runs these models for you — same memory layer either way.
| Provider | Status |
|---|---|
| OpenAI / GPT | Fully supported |
| Anthropic / Claude | Fully supported |
| Google Gemini | Fully supported |
| Groq, DeepSeek, OpenRouter | Fully supported |
| Any OpenAI-compatible endpoint | Supported |
| OpenAI Assistants API | Not yet supported |
Every response tells you what happened
Memory Router returns diagnostic headers so you can see exactly how each request was handled — no black box.
Conversation ID
The thread the request was attributed to, so you can group and inspect turns.
Context modified
Whether memory was injected or history was trimmed for this call.
Token counts
How many tokens were sent after optimization versus a raw replay.
Memories touched
How many memory chunks were retrieved and how many were created.
Live in three steps
- 1
Get a MemoryLake key
Sign up for MemoryLake and create an API key. The Free tier includes memory storage to start.
- 2
Pick a mode + swap the base URL
Point your SDK at the Router endpoint. For BYOK, keep your provider key and add the MemoryLake key as a header. For hosted, just use your MemoryLake key.
- 3
Call as normal
Send requests exactly as you do today. Memory is recalled and stored automatically; read the response headers to confirm.
Direct API call vs. Memory Router
Tokens sent per call, as a conversation grows
≈ 90% fewer tokens| Direct provider call | With Memory Router | |
|---|---|---|
| Long-term memory | You build and host it | Built in, automatic |
| Context window | Re-send everything, then truncate | Optimized — only what matters |
| Keys & accounts | A provider account is required | BYOK or use just a MemoryLake key |
| Code changes | New SDK + retrieval pipeline | One base-URL change |
| Across sessions & models | Memory is siloed per app | Shared memory pool |
| Provider outage of memory layer | Your problem to handle | Graceful passthrough |
| Visibility | None by default | Diagnostic response headers |
FAQ
Do I need to change my code?
Only the base URL and one header. Your prompts, streaming, tool calls, and response handling stay identical — Memory Router speaks the same API as your provider.
Which providers are supported?
OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, OpenRouter, and any OpenAI-compatible endpoint. The OpenAI Assistants API is not yet supported.
What is the difference between BYOK and MemoryLake-hosted?
BYOK (Bring Your Own Key) means you supply your own provider key plus a MemoryLake key — billing and rate limits stay with your provider account. MemoryLake-hosted needs only a MemoryLake key: we run the major models for you, so you skip provider sign-up entirely.
In BYOK mode, is my provider key safe?
Yes. Your provider key is encrypted in transit and forwarded to the provider on each call. MemoryLake never stores, logs, or reuses it — it is passthrough only.
What happens if MemoryLake is down?
In BYOK mode Memory Router fails open: the request passes straight through to your provider so your application keeps working with zero downtime.
How does it reduce tokens?
Instead of replaying the entire history each turn, the Router removes redundant context and injects only the relevant memories — fewer tokens per call as conversations grow.
Is the memory shared with the MemoryLake API?
Yes. The Router and the MemoryLake API operate on the same memory pool, so anything stored one way is retrievable the other.
Is there a free plan?
Yes. Memory Router is available on the Free tier so you can integrate and test before scaling up.
Give every LLM a memory — change one URL.
Stop re-sending context and rebuilding retrieval. Point your SDK at Memory Router and ship memory today.