Back to Blog
Hot ProjectFebruary 12, 202616 min read

Inside OpenClaw's Memory System: Source Code Deep-Dive

We read every line of OpenClaw's memory implementation. Here is what we found — the architecture, the design decisions, the strengths, and the gaps that matter for production use.

memory-store.ts — OpenClaw Sourceimport { Database } from "better-sqlite3";12interface MemoryEntry {3 timestamp: string;4 category: string; // no typed enum5 confidence: number;6 content: string;7 tags: string[];8} // no temporal index910export class MemoryStore {11 private db: Database;12 private memoryDir: string;13 // ...3,200 lines of TypeScript14

Why Source Code Matters

Marketing copy tells you what a product wants to be. Source code tells you what it actually is. With OpenClaw surpassing 100,000 GitHub stars by February 2026, we decided to do what any responsible technical analyst should do: read the code. Not the README, not the blog posts, not the Twitter threads — the actual implementation.

What follows is a line-by-line analysis of OpenClaw's memory subsystem. We examine how memories are stored, how they are extracted, how they are retrieved, and what the code reveals about the system's design philosophy and limitations. Every claim in this article is verifiable against the public repository.

This is not a critique. OpenClaw's memory system is well-engineered for its intended purpose. But understanding what the code does — and does not do — is essential for teams deciding whether OpenClaw's built-in memory is sufficient for their use case or whether they need supplementary memory infrastructure.

Repository Overview

OpenClaw's memory system lives in a dedicated module within the main repository. The memory-related code comprises approximately 3,200 lines of TypeScript, organized into four main files: memory-store.ts (persistence layer), memory-extractor.ts (extraction pipeline), memory-retriever.ts (retrieval logic), and memory-index.ts (SQLite indexing). Supporting utilities add another 800 lines.

The codebase is remarkably clean. Functions are well-named, side effects are isolated, and the data flow is easy to follow. Steinberger's reputation for meticulous engineering is evident — this is not a prototype hastily open-sourced. It is a production-quality implementation within its design scope.

The memory module has minimal dependencies: better-sqlite3 for the local database, a tokenizer for text processing, and the LLM client for extraction. There are no external memory services, no vector databases, and no cloud APIs. Everything runs locally.

Memory Directory Structure

OpenClaw creates a .openclaw/memory directory in each project root. Inside this directory, memories are organized into three subdirectories: facts/ (extracted factual memories), daily/ (session-level daily notes), and meta/ (system metadata including the SQLite index). This separation provides a clean file-system organization that maps intuitively to how developers think about project context.

Each memory file in facts/ follows a consistent naming pattern: {timestamp}-{slugified-summary}.md. For example: 2026-02-10-user-prefers-rust-for-performance.md. This naming convention makes memories browsable by time (sort by name) and searchable by topic (grep the directory).

The daily/ directory contains one file per active session, named by date: 2026-02-10.md. These files capture session-level context — what was worked on, what decisions were made, what questions were asked. They serve as a running log of the development process.

The MEMORY.md Format

Each memory file uses YAML frontmatter followed by markdown content. The frontmatter includes: timestamp (ISO 8601), category (one of: preference, fact, decision, context, biographical), confidence (0.0-1.0, set by the extraction pipeline), source_turn (reference to the conversation turn that generated this memory), and tags (array of keyword strings).

The body is free-form markdown, typically 1-5 sentences. The extraction pipeline generates the content, aiming for concise, self-contained statements. A typical memory might read: "The user strongly prefers functional programming patterns over object-oriented approaches, citing better testability and composability. This preference was expressed while discussing the architecture of a new API service."

Notably absent from the format are: memory type (all memories share the same format regardless of whether they represent facts, events, or preferences), temporal relationships (no links to previous or subsequent versions of the same information), conflict markers (no flag indicating that this memory contradicts an existing one), and provenance chain (no record of modifications after initial creation).

Daily Notes System

The daily notes system is one of OpenClaw's most elegant features. At the start of each session, the system creates (or appends to) a daily note file. Throughout the session, it records high-level summaries of what was discussed, what code was written, and what decisions were made.

Daily notes serve a different purpose than factual memories. Where factual memories capture persistent, reusable knowledge ("user prefers Rust"), daily notes capture ephemeral, contextual information ("today we debugged the authentication middleware and decided to switch from JWT to session tokens"). Daily notes provide continuity between sessions without cluttering the permanent memory store.

The implementation is straightforward: at the end of each conversation turn, the extraction pipeline appends a one-line summary to the current daily note. At the end of the session, it generates a summary paragraph. This two-level approach (per-turn + per-session) captures both granular detail and high-level narrative.

Memory Extraction Pipeline

The extraction pipeline is the core of OpenClaw's memory system. It runs after each conversation turn, analyzing the exchange for information worth persisting. The pipeline consists of three stages: classification (is this turn memory-worthy?), extraction (what specific facts should be stored?), and deduplication (does this duplicate an existing memory?).

The classification stage uses a lightweight prompt to the LLM: given the conversation turn, determine whether it contains persistent information (preferences, decisions, facts, biographical details) or is purely ephemeral (greetings, acknowledgments, code output). This binary classification prevents the memory store from being flooded with noise.

The extraction stage takes classified-positive turns and generates structured memory entries. The LLM extracts specific facts, assigns a confidence score, generates tags, and writes the markdown content. The prompt instructs the LLM to prefer concise, self-contained statements over verbose narratives.

Entity Recognition

OpenClaw's extraction pipeline includes basic entity recognition — identifying people, technologies, projects, and organizations mentioned in conversations. Recognized entities are added to the memory tags and indexed in the SQLite layer, enabling entity-based retrieval: "What do I know about Project Atlas?" or "What technologies has the user discussed?"

The entity recognition is prompt-based rather than using a dedicated NER model. The extraction prompt instructs the LLM to identify named entities and technical terms, which are then normalized (lowercased, deduplicated) and stored as tags. This approach is simpler and more flexible than model-based NER but less consistent — the same entity might be tagged differently across memories.

One limitation of the prompt-based approach is the lack of entity resolution. "React," "ReactJS," and "React.js" are treated as separate entities rather than being resolved to a single canonical form. Over time, this creates fragmentation in the entity index.

SQLite Index Layer

The SQLite database serves as a performance layer on top of the markdown file store. It indexes memory metadata (timestamp, category, confidence, tags) and provides full-text search over memory content using SQLite's FTS5 extension. The database also stores pre-computed embedding vectors for semantic search.

The schema is straightforward: a memories table with columns for id, filepath, timestamp, category, confidence, content_text, and embedding_blob. A tags table provides many-to-many relationships between memories and tag strings. An FTS5 virtual table indexes memory content for keyword search.

The SQLite approach is well-suited to the local-first architecture. The database file lives alongside the markdown files in .openclaw/memory/meta/index.db. It requires no server, no configuration, and no separate process. If the index becomes corrupted, it can be rebuilt from the markdown source files — the files are the source of truth, not the database.

Retrieval Architecture

When a new conversation begins, the retrieval system assembles relevant context from the memory store. The retrieval happens in three passes: recency (most recent daily notes and recently created memories), relevance (semantically similar memories based on the conversation topic), and frequency (memories that have been retrieved most often, indicating high importance).

The three-pass approach produces a ranked list of candidate memories. The system then selects the top N memories (configurable, default 30) and formats them as context for the LLM prompt. The formatting is designed to be unobtrusive — memories appear as a structured "Known Context" block at the beginning of the system prompt.

One notable design choice: the retrieval system does not differentiate between memory types. Preferences, facts, decisions, and biographical details are all ranked on the same scale and compete for the same context budget. This means a highly relevant but less recent preference competes directly with a less relevant but more recent daily note.

Hybrid Search Strategy

The relevance pass uses a hybrid search strategy combining keyword search (FTS5) and semantic search (cosine similarity over embeddings). The scores from both approaches are normalized and combined with configurable weights (default: 0.4 keyword, 0.6 semantic). This hybrid approach provides better recall than either method alone.

The embedding model used for semantic search is configurable, with the default being a lightweight local model that runs without network access. This preserves the local-first architecture while providing genuine semantic understanding. Users can optionally configure a cloud-based embedding model for higher quality at the cost of a network dependency.

The search implementation handles the cold-start problem gracefully. When the memory store is empty or very small, the system falls back to injecting a simple summary of the project (derived from package.json, README, and directory structure) rather than returning no context at all.

Memory Lifecycle

OpenClaw implements a basic memory lifecycle: creation, retrieval, and deletion. Memories are created by the extraction pipeline, retrieved by the retrieval system, and can be deleted manually by the user (either by deleting the markdown file or through the UI). There is no automatic update, expiration, or consolidation.

The lack of automatic update is significant. If a user's preference changes ("I now prefer spaces over tabs"), the extraction pipeline creates a new memory without modifying or superseding the old one. Both memories coexist in the store, and it falls to the LLM at retrieval time to determine which is current. This is a reasonable approach for small memory stores but becomes problematic as stores grow.

There is no memory consolidation — the process of periodically reviewing memories, merging related ones, resolving conflicts, and synthesizing higher-level patterns. Production memory systems like MemoryLake implement consolidation as a background process that keeps the memory store coherent and efficient over time.

Strengths: What the Code Gets Right

The code reveals several genuine strengths. First, the separation between source (markdown files) and index (SQLite) means the system is resilient — the index can always be rebuilt from source. Second, the extraction pipeline's classification stage effectively prevents memory bloat. Third, the hybrid search strategy provides surprisingly good retrieval quality for a local-only system.

The daily notes system deserves special mention. It solves the "session continuity" problem elegantly without overloading the permanent memory store. A developer can close their laptop on Friday and pick up Monday with full context of where they left off — not because the system stored every line of dialogue, but because the daily note captured the essential narrative.

The code also demonstrates excellent engineering discipline. Error handling is thorough, the SQLite operations use proper transactions, and the extraction pipeline includes retry logic for LLM failures. This is production-quality code that developers can trust with their project data.

Limitations: What the Code Reveals

The code also reveals specific limitations. The lack of memory types means the retrieval system cannot optimize queries — it treats "What did I decide last week?" the same way it treats "What are my preferences?", even though these queries should hit different memory subsets. The absence of temporal indexing means time-based queries degrade to scan-and-filter operations.

The deduplication logic in the extraction pipeline is similarity-based: if a new memory is more than 85% similar (by embedding cosine) to an existing memory, it is discarded. This works for exact duplicates but misses semantic updates — "budget is $10K" and "budget is $15K" are only 60% similar and would both be stored without any conflict flag.

The lack of provenance tracking means there is no way to trace a memory back to its source conversation programmatically. The source_turn field in the frontmatter references a conversation turn ID, but conversation logs are not persisted by default — so the reference is often a dead link.

The Missing Pillars

Reading the code crystallizes what is missing for production memory use cases. Three architectural pillars are absent: computation (reasoning over memories to detect conflicts, infer patterns, and synthesize models), external enrichment (ingesting data from sources beyond conversations — documents, APIs, databases), and multi-agent coordination (sharing memories across agents with access control and conflict resolution).

These are not features that can be added incrementally. They require architectural changes to the memory format (adding type information, provenance chains, and conflict markers), the storage layer (adding temporal indexing and relationship graphs), and the retrieval layer (adding type-aware, time-aware, and conflict-aware query strategies). They represent a different category of memory system.

This is not a criticism of OpenClaw — it is a description of the boundary between developer tooling and memory infrastructure. OpenClaw is excellent developer tooling. Production memory infrastructure requires the additional pillars that MemoryLake provides.

Where MemoryLake Complements

The most natural integration between OpenClaw and MemoryLake is through MCP. OpenClaw's memory system handles the developer-facing experience — extraction, display, editing. MemoryLake handles the infrastructure — typing, temporal indexing, conflict detection, versioning, and cross-platform sync.

In this architecture, OpenClaw continues to store local markdown files for transparency and editability. But behind the scenes, each memory is also stored in MemoryLake with full typing, provenance, and temporal metadata. The retrieval system queries both stores and merges results, getting the best of both worlds: local speed and transparency plus infrastructure-grade memory capabilities.

This complementary architecture is what we recommend for teams that love OpenClaw's developer experience but need production-grade memory for their applications. You do not have to choose one or the other — you can have both.

Conclusion

Reading OpenClaw's source code confirms what its popularity suggests: this is a well-engineered system that genuinely solves the problem of developer-facing AI memory. The markdown format, SQLite indexing, hybrid search, and daily notes system are all thoughtfully designed and competently implemented.

The code also confirms the boundaries of OpenClaw's design. Typed memories, temporal indexing, conflict detection, and provenance tracking are architecturally absent — not as bugs but as design scope decisions. For individual developer workflows, these omissions rarely matter. For production systems, enterprise deployments, and multi-agent architectures, they are critical gaps.

OpenClaw represents the best developer-facing AI memory available today. MemoryLake represents the deepest memory infrastructure. Together, they provide a complete solution — great developer experience backed by production-grade memory capabilities. The source code tells a story of thoughtful engineering within deliberate constraints; the opportunity is to extend those constraints when your use case demands it.

References

  1. Park, J. S., et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023. arxiv.org/abs/2304.03442
  2. Maharana, A., Lee, D. H., Bansal, M. (2024). "Evaluating Very Long-Term Conversational Memory of LLM Agents." ACL 2024. arxiv.org/abs/2402.17753
  3. Wang, L., et al. (2025). "Memory in the Age of AI Agents: A Comprehensive Survey." arxiv.org/abs/2512.13564