Self-Hosting
LongMem runs comfortably on a single small server — the reference deployment is a 4 GB box serving the public instance.
The stack#
| Component | Role |
|---|---|
| PostgreSQL + pgvector | memories, collections, embeddings, full-text — one database for everything; tenant isolation via row-level security |
| Redis | queues (async ingest + extraction), result cache, one-time tokens |
| FastAPI / uvicorn | the API (app.main) |
| Ingest worker | python -m app.ingest_worker — file processing + graph extraction; never loads the embedding model (embeds via the API) |
| nginx | TLS, static landing/docs, API proxy; serve HTML with Cache-Control: no-cache |
Configuration (.env)#
DATABASE_URL=postgresql+asyncpg://longmem:…@localhost:5432/longmem
REDIS_URL=redis://localhost:6379/0
EMBEDDING_PROVIDER=local # on-box ONNX embeddings — no text leaves the box
OPENAI_API_KEY=sk-… # only used for extraction / Whisper / vision captions
STORAGE_SECRET_KEY=… # Fernet key for encrypting BYO credentials at rest
S3_ENDPOINT_URL=… S3_BUCKET=… S3_ACCESS_KEY=… S3_SECRET_KEY=… # file storage
RESEND_API_KEY=… # transactional email (optional)
With EMBEDDING_PROVIDER=local and a BYO/local extraction endpoint, the
only third-party touchpoints left are Whisper/vision for media — skip media and nothing
leaves your infrastructure.
Database setup#
CREATE EXTENSION vector; CREATE EXTENSION pg_trgm;
# apply versioned migrations as a DDL-capable role (the app role stays unprivileged):
sudo -u postgres .venv/bin/python3 -m app.migrate apply --dsn postgresql:///longmem
The app's database role must NOT be a superuser — row-level security depends on it.
Services#
Systemd units ship in deploy/: the API, the ingest worker, an SMTP
receiver (email drop), and an uptime checker on a 1-minute timer that emails on
consecutive health-check failures. The reference deploy model is autopull: a timer
fetches the git branch, resets, reinstalls, rsyncs the landing, restarts, health-checks
— a merge is live in ~60–120s.
Sizing#
- 4 GB RAM / 2 vCPU runs Postgres + Redis + API (with the local embedding model) + worker with headroom.
- The worker embeds through the API's internal endpoint so only one copy of the embedding model is ever loaded.