Skip to main content
  1. Data Science Blog/

Cursor Chat: Architecture, Data Flow & Storage

·1318 words·7 mins· loading · ·
Artificial Intelligence Developer Tools Software Architecture Cursor IDE Cursor Chat AI Code Editor SQLite Turbopuffer Codebase Indexing RAG Semantic Search Data Flow Local Storage Composer

On This Page

Table of Contents
Share with :

Cursor Chat: Architecture, Data Flow & Storage

Cursor Chat: Architecture, Data Flow & Storage
#

This document explains how Cursor chat works end-to-end: what happens when you type a message, where data is saved, whether embeddings are used, and how the codebase index fits in.


1. What Happens When You Type in the Chat Window
#

When you send a message in Cursor’s chat (Ask, Agent, Composer, Cmd+K, etc.):

  1. Local UI
    The Electron app captures your message, any @-mentions (files, codebase, docs, MCP), rules, and current context (open file, cursor position, etc.).

  2. Context assembly
    Cursor gathers:

    • System prompt (role, formatting rules, citation format, etc.)
    • User message and optional <user_query>
    • Attached context: <current_file>, <attached_files>, <manually_added_selection>, rules, MCP data
    • For codebase-aware requests: semantic search results (see §4)
  3. Request to Cursor servers (not directly to LLM)
    The request is sent to Cursor’s own backend first. Your API credentials are forwarded there. Cursor does extra processing (auth, logging, prompt assembly, sometimes codebase retrieval) before calling the LLM.

  4. LLM call
    Cursor forwards the composed request to the configured provider (OpenAI, Anthropic, etc.). The model sees a single prompt (system + user + context). Tool use (file read, search, etc.) can add more context during the turn.

  5. Streaming response
    The LLM streams tokens back. Cursor displays them in the UI and may run an “apply model” for edits.

  6. Local persistence
    Once the turn is done, Cursor writes the conversation (messages, metadata) into local SQLite (state.vscdb). Chat content is not stored on Cursor’s servers; it’s local only (see §2).

So: You type → Cursor IDE → Cursor servers → LLM provider → response streamed back → local save to SQLite.


2. Where Chats and Metadata Are Saved
#

2.1 Storage locations
#

OSBase path
Windows%APPDATA%\Cursor\User\
macOS~/Library/Application Support/Cursor/User/
Linux~/.config/Cursor/User/

Two important subdirs:

  • globalStorage/ — app-wide state (settings, UI state, composer metadata like pane list, “last opened” IDs).
    • Main DB: globalStorage/state.vscdb
  • workspaceStorage/<hash>/per-workspace state. Each workspace has a hashed folder with its own state.vscdb.
    • Conversation content typically lives here (e.g. composer.composerData, sometimes legacy workbench.panel.aichat.view.aichat.chatdata).

So: chat content → usually workspaceStorage; sidebar list / UI metadataglobalStorage (and sometimes workspace).

2.2 Database format: state.vscdb
#

state.vscdb is a SQLite database. VS Code/Cursor use a generic key–value table:

CREATE TABLE ItemTable (
    key TEXT PRIMARY KEY,
    value TEXT   -- often JSON stored as text
);
  • key: string identifier (e.g. workbench.panel.aichat.view.aichat.chatdata, composer.composerData, workbench.backgroundComposer.persistentData).
  • value: JSON blob (or other serialized data) for that key.

Settings, workspace state, and chat data all go into ItemTable as key/value pairs.

2.3 Important keys (chat / composer)
#

  • workbench.panel.aichat.view.aichat.chatdata (global or workspace) — Legacy chat data: tabs[], bubbles[] (user/assistant messages).
  • composer.composerData (workspace) — Current composer/chat content per workspace.
  • workbench.backgroundComposer.persistentData (global) — Composer UI metadata (dataVersion, lastOpenedBcIds, etc.). Often small; no conversation content.
  • workbench.panel.composerChatViewPane.<id> (global or workspace) — Per-chat/tab UI state (e.g. .hidden), mapping to composer data.
  • workbench.panel.composerChatViewPane.<id>.hidden (global) — Visibility for specific panes.

Cursor has moved from aichat to composer. Newer versions use composer keys; old data may still live under aichat keys. The sidebar chat list is built from metadata (e.g. global); when you open a chat, Cursor resolves a “composer data handle” to the workspace-specific composer.composerData (or legacy chatdata). If that resolution fails → “No composer data handle found” and the chat doesn’t load.

2.4 Example: exporting chat-related data#

SELECT key, value
FROM ItemTable
WHERE key IN (
  'workbench.panel.aichat.view.aichat.chatdata',
  'workbench.backgroundComposer.persistentData'
)
   OR key LIKE 'workbench.panel.composer%'
   OR key = 'composer.composerData';

Run this against the relevant state.vscdb (global or workspace). Use read-only access if Cursor is open (?mode=ro).

2.5 Metadata
#

  • Stored in the same DB as chat data: ItemTable.
  • Metadata includes: tab IDs, open/closed state, “last opened” composer IDs, workspace–composer mappings, etc.
  • No separate “metadata store”; it’s key/value blobs in SQLite.

3. Is There an Embedding Database for Chats?
#

Cursor does not ship an embedding DB for your chat history.

  • Chat history is stored as JSON in SQLite (state.vscdb), not in a vector DB.
  • Cursor does not embed your past chats to answer new ones via semantic search over history.

Community projects exist to vectorize Cursor chat exports (e.g. with LanceDB) for your own RAG over past conversations. That’s outside Cursor’s built-in design.


4. Embedding Database: Codebase Index (RAG)
#

Cursor does use an embedding-based RAG pipeline for codebase indexing, not for chat history. It powers “Index & Docs”, @codebase, and code-aware answers.

4.1 High-level flow
#

  1. Chunking
    Code is split into semantic chunks (functions, classes, logical blocks) using tree-sitter (AST). Chunks respect code structure, not arbitrary character limits.

  2. Embeddings + metadata
    A custom embedding model produces a vector per chunk. Metadata (e.g. masked file path, line range) is stored with each vector. Paths are obfuscated on the client before upload.

  3. Vector store
    Vectors + metadata are stored in Turbopuffer (vector + full-text search, cloud-backed). Chunk hashes can be cached (e.g. in AWS) to speed up re-indexing.

  4. Semantic search at query time
    Your query is embedded with the same model. Cursor searches Turbopuffer, gets metadata (masked path + line ranges) only—not the raw code. The local client de-obfuscates paths and reads the actual code from your machine, then injects those snippets into the LLM context.

So: only embeddings + metadata live in the cloud; source code stays local. Code is sent to the LLM only temporarily, for the specific chunks used in that request.

4.2 Sync and Merkle tree
#

  • Index sync runs periodically (e.g. every few minutes).
  • A Merkle tree of file hashes is used to detect changes and update only affected files.

5. End-to-end architecture (Mermaid)
#

5.1 When you send a chat message
#

sequenceDiagram
    participant User
    participant IDE as Cursor IDE Electron
    participant Cursor as Cursor Servers
    participant Emb as Turbopuffer embeddings
    participant LLM as LLM OpenAI Anthropic

    User->>+IDE: Send chat message
    IDE->>IDE: Assemble context (files, rules, MCP)
    IDE->>+Cursor: Request + workspace fingerprint
    Cursor->>+Emb: Semantic search
    Emb-->>-Cursor: Metadata only (paths, line ranges)
    Cursor-->>-IDE: Metadata only, no raw code
    IDE->>IDE: Read actual code from local files
    IDE->>+Cursor: Full prompt (system + user + context)
    Cursor->>+LLM: Composed request
    LLM-->>-Cursor: Streamed tokens
    Cursor-->>-IDE: Streamed response
    IDE->>IDE: Render and save to local SQLite state.vscdb
    IDE-->>-User: Display response

5.2 Where data lives
#

graph TB
    subgraph Local[Your machine]
        IDE[Cursor IDE]
        FS[Project files]
        GLOBAL[globalStorage state.vscdb]
        WS[workspaceStorage hash state.vscdb]
    end

    subgraph CursorCloud[Cursor infrastructure]
        API[Cursor API Auth]
        TURBO[Turbopuffer]
    end

    subgraph LLMProviders[LLM providers]
        OAI[OpenAI]
        ANT[Anthropic]
    end

    IDE -->|Read/write chat, metadata| GLOBAL
    IDE -->|Read/write chat, metadata| WS
    IDE -->|Read code for context| FS
    IDE <-->|Auth, prompts, proxy to LLM| API
    API <-->|Vector search metadata only| TURBO
    API -->|Send prompts, receive stream| OAI
    API -->|Send prompts, receive stream| ANT

    style GLOBAL fill:#e1f5fe,stroke-width:4px
    style WS fill:#e1f5fe
    style TURBO fill:#fff3e0
  • Blue: Local SQLite (state.vscdb) — chat history + metadata (paths: globalStorage, workspaceStorage/<hash>/).
  • Orange: Turbopuffer — codebase embeddings + metadata only; no source code.

5.3 Chat storage layout (simplified)
#

graph LR
    subgraph Global[globalStorage]
        PD[persistentData composer UI meta]
        Pane[composerChatViewPane pane list]
    end

    subgraph Workspace[workspaceStorage hash]
        CD[composer.composerData conversation content]
        AICHAT[aichat.chatdata legacy tabs bubbles]
    end

    Sidebar[Sidebar chat list] --> Pane
    Sidebar --> PD
    Open[Open chat] --> Handle[Composer data handle]
    Handle --> CD
    Handle -.->|legacy| AICHAT

    style PD fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
    style Pane fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
    style CD fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000
    style AICHAT fill:#fff3e0,stroke:#333,stroke-width:2px,color:#000

6. Summary
#

  • What happens when I type in chat? — IDE → Cursor servers → LLM. Context (rules, @mentions, codebase search) is assembled; Cursor proxies the request; response is streamed back and saved locally.
  • Where are chats saved?Local SQLite state.vscdb in globalStorage and workspaceStorage (composer.composerData, legacy aichat.chatdata).
  • Where is metadata saved? — Same ItemTable in state.vscdb (e.g. workbench.backgroundComposer.persistentData, composerChatViewPane.*).
  • Is there an embedding DB for chats?No. Chats are JSON in SQLite. You can add your own (e.g. LanceDB) over exported history.
  • Embedding DB for code?Yes. Codebase index uses Turbopuffer. Embeddings + metadata only; source code stays local.

7. References
#

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

Safeguarding PII When Using LLMs in Alternative Investment Banking
·4261 words·21 mins· loading
Artificial Intelligence Financial Technology Data Privacy PII Protection LLM Privacy Alternative Investment Banking BFSI Data Privacy AI Compliance Differential Privacy Federated Learning Financial AI Security
Safeguarding PII When Using LLMs in Alternative Investment Banking # 1. Introduction # The …
AI Hallucinations in BFSI - A Comprehensive Guide
·2975 words·14 mins· loading
Artificial Intelligence Financial Technology AI Hallucinations BFSI AI Implementation Financial AI Risk Management Banking AI Ethics RAG in Finance Knowledge Graphs BFSI LLM Risk Mitigation Financial AI Compliance
AI Hallucinations in the BFSI Domain - A Comprehensive Guide # Introduction # Artificial …
Roadmap to Reality
·990 words·5 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Scientific Journey Self-Discovery Personal Growth Cosmic Perspective Human Evolution Technology Biology Neuroscience
Roadmap to Reality # A Scientific Journey to Know the Universe — and the Self # 🌱 Introduction: The …
From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours
·893 words·5 mins· loading
Personal Branding Cybersecurity Technology Trends & Future Personal Branding LinkedIn Profile Professional Identity Cybersecurity Online Presence Digital Identity Online Branding
💔 From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours # “In …