pgvector or Qdrant — which should I self-host?

If you already run Postgres (or want one database for both your app data and embeddings), pgvector is the least-effort choice — it's just an extension. If you have millions of vectors or want a purpose-built engine with fast filtering, Qdrant is worth the separate service. For most agents starting out, pgvector on Postgres is plenty.

How much RAM does a vector database need?

More than you'd guess, because good search wants the index in memory. A rough rule: a few hundred thousand embeddings sit comfortably in 2 GB; once you reach low millions, plan for 4 GB+ and tune the index. Start at 2 GB, watch memory, resize when search slows.

Why self-host a vector store instead of a managed one?

Two reasons people actually do it: your embeddings often contain your private data (notes, docs, customer content), and a self-hosted store keeps that on a server you control. And it's flat-cost — a managed vector service bills by vectors and queries, while a VPS is one monthly number with no per-query meter.

Can a vector DB and my agent run on the same VPS?

Yes, for small to medium workloads — co-locating the agent and a pgvector/Qdrant instance on one box is simple and cuts latency to near-zero. Split them onto separate servers only when one starts starving the other for RAM, which is a nice problem to have later, not a day-one concern.

Do embeddings take a lot of disk?

A single embedding is a few kilobytes, so a million of them is a few gigabytes plus index overhead — meaningful but not huge. 25–45 GB of disk covers a substantial memory store for most agents. RAM, not disk, is usually the first limit you hit.

Host a vector database for AI agent memory on a VPS

An AI agent without memory is stuck reintroducing itself every conversation. The fix is a vector database — it stores embeddings of your documents, notes or past chats so the agent can recall the relevant bits on demand (that's the "R" in RAG). You can rent that as a managed service, or you can run it yourself on a VPS and keep your data — which is often your private data — on a box you control. Here's how, and what it actually costs in RAM.

Two good options

You don't need anything exotic. Two paths cover almost everyone:

pgvector — a Postgres extension. If you're already running Postgres (or happy to), this adds vector search to the database you already have. One service, one backup, SQL you know. The least-effort start by a wide margin.

Qdrant — a purpose-built vector engine. Reach for it when you have a lot of vectors (low millions+) or want fast metadata filtering and a dedicated API. It's a separate service to run, but it's built for exactly this job.

For most agents finding their feet, pgvector is the right first answer. Move to Qdrant when you've outgrown it, not before.

The RAM reality (this is the part people underestimate)

Vector search is fast because the index lives in memory — so RAM, not disk, is your real constraint. A rough guide:

A few hundred thousand embeddings — comfortable in 2 GB.
Low millions — plan for 4 GB+ and tune the index.
Disk is the easy part: a million embeddings is only a few GB, so 25–45 GB covers a serious store.

So size for the index, not the file. (Same logic as the general sizing guide — the heavy thing isn't obvious until you measure.)

Quick start: pgvector

On a box with Postgres (Docker is easiest — same pattern as self-hosting n8n):

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE memory (
  id bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)        -- match your embedding model's dimensions
);

-- after you have data, build an index for fast search:
CREATE INDEX ON memory USING hnsw (embedding vector_cosine_ops);

Your agent inserts content + its embedding, then queries with ORDER BY embedding <=> $query_embedding LIMIT 5 to pull the closest memories. That's the whole loop.

Prefer Qdrant? It's a single Docker container exposing an HTTP/gRPC API; you create a collection with your vector size and upsert points. Same idea, dedicated engine.

Yes — for small-to-medium memory stores, run the vector DB on the same VPS as the agent. It's simpler and the latency is basically zero. Split them onto separate servers only when one starts crowding the other out of RAM. That's a later, good-problem-to-have decision, not a day-one one.

Honest caveats

RAM is the wall, and it's quiet. Search stays fast until the index no longer fits in memory, then it degrades. Watch memory and resize before it bites — don't wait for slow queries to tell you.
Embeddings cost tokens to create. Every document you embed is an API call to an embedding model. The store is cheap to host; generating the vectors is the recurring cost — relevant to what running an agent actually costs.
Back it up. Your agent's memory is data like any other. If it matters, snapshot it.

Within that, a self-hosted vector store is a clean way to give an agent durable memory without handing your private data to a third party. Start with pgvector on a 2 GB box, keep an eye on RAM, and grow into Qdrant or a bigger plan only when the numbers tell you to.

Host a vector database for AI agent memory on a VPS

Two good options

The RAM reality (this is the part people underestimate)

Quick start: pgvector

Can it share a box with the agent?

Honest caveats

FAQ