MemoryLake
Memory Router · Drop-in memory proxy for any LLM

Add persistent memory to any LLM with a single URL change

Memory Router is a transparent proxy that sits between your app and the model. Point your existing SDK at MemoryLake and every conversation gains long-term memory and an optimized context window. Run it two ways: bring your own provider key (BYOK), or use MemoryLake-hosted models with just one MemoryLake key.

Get your Router keyRead the docs →Works with the OpenAI, Anthropic & Google SDKs you already use
The problem

Stateless APIs make you rebuild memory every time

Every LLM call is stateless. To fake continuity you re-send the entire history on every turn — which is slow, expensive, and eventually overflows the context window. Bolting on a vector DB and retrieval pipeline solves it, but it is weeks of plumbing you have to build and maintain.

Without a memory layer

  • Full chat history re-sent on every call — token cost climbs with conversation length.
  • Long sessions hit the context-window ceiling and start truncating mid-task.
  • Memory lives in one app — switch models or sessions and the context is gone.

Building it yourself

  • Stand up a vector DB, embeddings pipeline, chunking, and retrieval logic.
  • Write extraction, dedup, and relevance ranking — then keep it tuned.
  • Maintain it across every provider and every model you support.

Memory Router collapses all of that into one base-URL change. The memory layer is the proxy.

How it works

A transparent proxy in four steps

Your app
OpenAI / Anthropic / Google SDK
request
Memory Router
Transparent proxy
Trim redundant historyInject relevant memory
enhanced request
Model
BYOK or hosted
Your provider · BYOKMemoryLake-hosted
Memory store· async read / write

If MemoryLake is ever unavailable, requests pass straight through to your provider — zero downtime.

1

Intercept

Your app sends the request to Memory Router instead of the provider — same payload, same SDK, same response shape.

2

Optimize context

The Router trims redundant history, searches prior memories, and injects only the relevant context into the prompt.

3

Forward

The enhanced request goes to the model — your own provider (BYOK) or a MemoryLake-hosted model. Tokens in are lower than a raw replay.

4

Remember

New memories are extracted and stored asynchronously in the background — the response is never delayed.

Two ways to connect

BYOK or MemoryLake-hosted — your call

Unlike proxies that force you to bring your own key, Memory Router works both ways. Either change is just the base URL; everything else — prompts, streaming, tool calls — stays the same.

BYOK

Bring your own key

Use your own provider account. Your provider key is encrypted in transit, forwarded to the provider per call, and never stored on our servers.

  • Keep your existing OpenAI / Anthropic / Google account.
  • Your key, your billing, your rate limits.
  • Key is encrypted + passthrough only — never persisted or logged.
Keys: your provider key + MemoryLake key
No key needed

MemoryLake-hosted

No provider account required. MemoryLake runs the major models for you, so a single MemoryLake API key is all you need to start.

  • One key for everything — nothing else to sign up for.
  • Mainstream models built in and ready to call.
  • The simplest possible way to ship with memory.
Keys: MemoryLake key only

Key safety, by design: in BYOK mode your provider key is encrypted in transit and passed straight through to the provider on every call. MemoryLake never stores, logs, or reuses it.

BYOK — your provider key
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.memorylake.ai/v1/openai",
  apiKey: process.env.OPENAI_API_KEY,        // your provider key
  defaultHeaders: {
    // encrypted in transit · passthrough only · never stored
    "x-memorylake-api-key": process.env.MEMORYLAKE_API_KEY,
  },
});
MemoryLake-hosted — one key
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.memorylake.ai/v1",
  apiKey: process.env.MEMORYLAKE_API_KEY,    // one key — that's it
});

// Pick any built-in model, e.g. "claude-opus-4-8" or "gpt-5".
What you get

Memory infrastructure without the build

One-line integration

Change the base URL. Keep your SDK and your code exactly as they are.

BYOK or hosted

Bring your own provider key (encrypted, never stored) or use MemoryLake-hosted models with a single key.

Automatic context optimization

Redundant history is removed and only relevant memory is injected, shrinking tokens per call.

Shared memory pool

The Router and the MemoryLake API read and write the same memories — one source of truth.

Graceful fallback

If MemoryLake is ever unavailable, the request passes straight through to your provider. Zero downtime.

Full observability

Response headers report conversation IDs, context changes, token counts, and memories created or retrieved.

Compatibility

Works with the providers you already use

In BYOK mode, keep your provider account and key. In hosted mode, MemoryLake runs these models for you — same memory layer either way.

ProviderStatus
OpenAI / GPTFully supported
Anthropic / ClaudeFully supported
Google GeminiFully supported
Groq, DeepSeek, OpenRouterFully supported
Any OpenAI-compatible endpointSupported
OpenAI Assistants APINot yet supported
Transparency

Every response tells you what happened

Memory Router returns diagnostic headers so you can see exactly how each request was handled — no black box.

Conversation ID

The thread the request was attributed to, so you can group and inspect turns.

Context modified

Whether memory was injected or history was trimmed for this call.

Token counts

How many tokens were sent after optimization versus a raw replay.

Memories touched

How many memory chunks were retrieved and how many were created.

Setup

Live in three steps

  1. 1

    Get a MemoryLake key

    Sign up for MemoryLake and create an API key. The Free tier includes memory storage to start.

  2. 2

    Pick a mode + swap the base URL

    Point your SDK at the Router endpoint. For BYOK, keep your provider key and add the MemoryLake key as a header. For hosted, just use your MemoryLake key.

  3. 3

    Call as normal

    Send requests exactly as you do today. Memory is recalled and stored automatically; read the response headers to confirm.

The difference

Direct API call vs. Memory Router

Tokens sent per call, as a conversation grows

≈ 90% fewer tokens
Direct callfull history, every turn
Memory Routeronly relevant memory
Direct provider callWith Memory Router
Long-term memoryYou build and host itBuilt in, automatic
Context windowRe-send everything, then truncateOptimized — only what matters
Keys & accountsA provider account is requiredBYOK or use just a MemoryLake key
Code changesNew SDK + retrieval pipelineOne base-URL change
Across sessions & modelsMemory is siloed per appShared memory pool
Provider outage of memory layerYour problem to handleGraceful passthrough
VisibilityNone by defaultDiagnostic response headers

FAQ

Do I need to change my code?

Only the base URL and one header. Your prompts, streaming, tool calls, and response handling stay identical — Memory Router speaks the same API as your provider.

Which providers are supported?

OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, OpenRouter, and any OpenAI-compatible endpoint. The OpenAI Assistants API is not yet supported.

What is the difference between BYOK and MemoryLake-hosted?

BYOK (Bring Your Own Key) means you supply your own provider key plus a MemoryLake key — billing and rate limits stay with your provider account. MemoryLake-hosted needs only a MemoryLake key: we run the major models for you, so you skip provider sign-up entirely.

In BYOK mode, is my provider key safe?

Yes. Your provider key is encrypted in transit and forwarded to the provider on each call. MemoryLake never stores, logs, or reuses it — it is passthrough only.

What happens if MemoryLake is down?

In BYOK mode Memory Router fails open: the request passes straight through to your provider so your application keeps working with zero downtime.

How does it reduce tokens?

Instead of replaying the entire history each turn, the Router removes redundant context and injects only the relevant memories — fewer tokens per call as conversations grow.

Is the memory shared with the MemoryLake API?

Yes. The Router and the MemoryLake API operate on the same memory pool, so anything stored one way is retrievable the other.

Is there a free plan?

Yes. Memory Router is available on the Free tier so you can integrate and test before scaling up.

Give every LLM a memory — change one URL.

Stop re-sending context and rebuilding retrieval. Point your SDK at Memory Router and ship memory today.