An AI agent without memory is stuck reintroducing itself every conversation. The fix is a vector database — it stores embeddings of your documents, notes or past chats so the agent can recall the relevant bits on demand (that's the "R" in RAG). You can rent that as a managed service, or you can run it yourself on a VPS and keep your data — which is often your private data — on a box you control. Here's how, and what it actually costs in RAM.
Two good options
You don't need anything exotic. Two paths cover almost everyone:
pgvector — a Postgres extension. If you're already running Postgres (or happy to), this adds vector search to the database you already have. One service, one backup, SQL you know. The least-effort start by a wide margin.
Qdrant — a purpose-built vector engine. Reach for it when you have a lot of vectors (low millions+) or want fast metadata filtering and a dedicated API. It's a separate service to run, but it's built for exactly this job.
For most agents finding their feet, pgvector is the right first answer. Move to Qdrant when you've outgrown it, not before.
The RAM reality (this is the part people underestimate)
Vector search is fast because the index lives in memory — so RAM, not disk, is your real constraint. A rough guide:
- A few hundred thousand embeddings — comfortable in 2 GB.
- Low millions — plan for 4 GB+ and tune the index.
- Disk is the easy part: a million embeddings is only a few GB, so 25–45 GB covers a serious store.
So size for the index, not the file. (Same logic as the general sizing guide — the heavy thing isn't obvious until you measure.)
Quick start: pgvector
On a box with Postgres (Docker is easiest — same pattern as self-hosting n8n):
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE memory (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536) -- match your embedding model's dimensions
);
-- after you have data, build an index for fast search:
CREATE INDEX ON memory USING hnsw (embedding vector_cosine_ops);
Your agent inserts content + its embedding, then queries with ORDER BY embedding <=> $query_embedding LIMIT 5 to pull the closest memories. That's the whole loop.
Prefer Qdrant? It's a single Docker container exposing an HTTP/gRPC API; you create a collection with your vector size and upsert points. Same idea, dedicated engine.
Can it share a box with the agent?
Yes — for small-to-medium memory stores, run the vector DB on the same VPS as the agent. It's simpler and the latency is basically zero. Split them onto separate servers only when one starts crowding the other out of RAM. That's a later, good-problem-to-have decision, not a day-one one.
Honest caveats
- RAM is the wall, and it's quiet. Search stays fast until the index no longer fits in memory, then it degrades. Watch memory and resize before it bites — don't wait for slow queries to tell you.
- Embeddings cost tokens to create. Every document you embed is an API call to an embedding model. The store is cheap to host; generating the vectors is the recurring cost — relevant to what running an agent actually costs.
- Back it up. Your agent's memory is data like any other. If it matters, snapshot it.
Within that, a self-hosted vector store is a clean way to give an agent durable memory without handing your private data to a third party. Start with pgvector on a 2 GB box, keep an eye on RAM, and grow into Qdrant or a bigger plan only when the numbers tell you to.