Vault — host configuration¶

Source of truth for how the Vault container is run on O1 (vault.448.global).

Files in this directory¶

File	Purpose
`docker-compose.yml`	The canonical Compose file. Reconstructed 2026-05-06 after the original Portainer-managed compose was lost.

Rebuild from this repo¶

If the Vault container is gone (Portainer reset, host re-imaged, OCI maintenance, etc.) and the data on disk at /home/ubuntu/docker/vault/ is intact:

ssh ubuntu@<o1>
cd /tmp
git clone --depth=1 https://<creds>@git.projecteidos.com/internal/engineering.git
sudo mkdir -p /home/ubuntu/docker/vault
sudo cp engineering/infra/vault/docker-compose.yml /home/ubuntu/docker/vault/
cd /home/ubuntu/docker/vault
sudo docker compose up -d
sudo docker logs --tail=20 vault   # expect "Vault server started"
curl -sk https://vault.448.global/v1/sys/seal-status | jq   # expect "sealed": true

Then unseal per RB-003 Path A.

Why these specific config choices¶

Choice	Reason
Image pinned to `hashicorp/vault:1.18.5`	Originally `:latest`; Watchtower auto-pulled a newer image whose entrypoint requires `SETFCAP`, breaking the container (`incident 2026-05-01`).
`cap_add: [IPC_LOCK, SETFCAP]`	`IPC_LOCK` for memory locking. `SETFCAP` is in the bounding set even though we don't end up using it because of `SKIP_SETCAP=true` — added defensively in case the entrypoint logic changes.
`SKIP_SETCAP=true` env	Container runs as non-root `vault` user (Dockerfile default). A non-root user cannot call `setcap` on the binary even with `CAP_SETFCAP` granted, because Docker doesn't grant ambient caps to non-root users by default. `SKIP_SETCAP=true` skips the entrypoint's `setcap` step entirely.
`com.centurylinklabs.watchtower.enable=false` label	Opt out of Watchtower auto-updates. Vault upgrades must be deliberate.
`expose: 8200` instead of host `ports:` mapping	Caddy reaches Vault over the `caddy_default` Docker network using the hostname `vault`. The original setup had `8200/tcp -> "0"` which exposed a random host port; switching to `expose` removes that leak.
`networks: caddy_default (external: true)`	Caddy stack creates this network. Vault joins it so the reverse proxy can reach it.

Trade-off recorded for `SKIP_SETCAP=true`¶

Without mlockall(), the Vault process's memory pages can be swapped to disk under memory pressure. On a Free Tier Ampere A1 instance with no historical memory pressure this is very unlikely to bite, but it should be reviewed if O1 is upgraded to a paid shape with different swap behaviour.

Operational notes¶

Bind-mount paths on the host (/home/ubuntu/docker/vault/{config,file,logs,data}) are the canonical data location. data/ contains the Raft cluster state. Treat this directory as Tier-0 — back it up off-host (currently aspirational; tracked under RM-013).
Vault config file lives at /home/ubuntu/docker/vault/config/vault.hcl (or whatever HCL files are in that directory). It's NOT in this repo today — should be committed under infra/vault/config/ once it has been redacted of any sensitive endpoints.
Unseal-key custody — see apps/15-vault.md for the holder list and recovery procedure. Threshold is typically 3 of 5.
Image upgrade procedure — when bumping hashicorp/vault:<version>:
Take a Raft snapshot first.
Pull and review the new image's release notes (capability changes, breaking changes).
Update image: in this file, commit, push.
On O1: cd /home/ubuntu/docker/vault && sudo docker compose pull && sudo docker compose up -d.
Verify Vault is up and sealed; unseal; confirm a secret read works.

apps/15-vault.md — Vault application doc
runbooks/RB-003-vault-sealed.md — recovery runbook (Paths A, B, C, D)
incidents/2026-05-01-vault-container-down.md — the incident that produced this file
KI-015 — config-on-host-only pattern (Vault was the first one fixed)
KI-033 — the incident as a tracked KI (now resolved)