IT Estate — Executive Summary & Continuity Brief¶
Single-page briefings for each section; expand the sub-sections for detail. Last reviewed 2026-05-07.
What you can do with this document¶
| Need | Section |
|---|---|
| Understand what we run, at a glance | §1 The estate at a glance |
| Brief a new colleague or external stakeholder | §2 What it does for the business |
| Discuss IT spend in the next finance review | §3 What it costs + §4 Why we self-host |
| Understand current operational risks | §5 Where the risk concentrates |
| See the modernization plan | §6 The improvement plan |
| Operate the estate without Vishnu present | §7 Continuity & access |
| Approve specific decisions | §8 Decisions needed from leadership |
1. The estate at a glance¶
We operate 4 internet domains, supported by 3 servers and 5 managed Oracle databases, all hosted in Oracle Cloud (UK London). Domain registration and corporate email run through GoDaddy. Microsoft Azure (Entra) is also used — not for compute, but for app registrations that connect our applications to the Microsoft 365 ecosystem. These registrations span two different M365 tenants — one we administer, one we don't:
- In our own Project Eidos M365 tenant (we administer it):
- Teams Bot registered via the Bot Framework so Microsoft Teams can deliver chat messages to our
bot.projecteidos.comwebhook. - Eidos TnE Connect SSO + calendar integration — App Registrations that federate Microsoft sign-in into the workforce app for our own staff, plus Microsoft Graph delegated permissions used by the leave-application feature to read / write Outlook calendar events.
- In Fourway's own M365 tenant (administered by Fourway, not by us):
- Fourway TnE Connect SSO + calendar integration — equivalent App Registrations live in Fourway's tenant. Their tenant administrator manages consent and the registration itself; we hold only the client ID + secret APEX needs to complete the sign-in handshake. If Fourway revokes consent, the Fourway tenant of TnE Connect cannot sign users in until they restore it.
There are no other clouds, no other VPS providers, no on-premises hardware.
graph TB
Net((Public Internet))
subgraph EIDOS["OCI tenancy: EIDOSDev1 (uk-london-1)"]
E1["E1 — Caddy reverse proxy<br/>1 vCPU / 6 GB · Free tier"]
E2["E2 — Dokploy host<br/>3 vCPU / 18 GB · Free tier<br/>9 PE-side apps incl. WordPress"]
E5[("E5 — Paid Oracle DB<br/>Parallax for the UR client")]
E3[("E3 — Free DB<br/>Eidos workforce tenant")]
E4[("E4 — Free DB<br/>Fourway workforce tenant")]
end
subgraph G448["OCI tenancy: ORA448Global (uk-london-1)"]
O1["O1 — All-in-one host<br/>~15 internal-tools apps incl.<br/>identity (Authentik), secrets (Vault),<br/>storage (MinIO), monitoring"]
O2[("O2 — Free DB · internal dev")]
O3[("O3 — Free DB · internal dev")]
end
subgraph EXT["External providers"]
GD["GoDaddy<br/>4 domain registrations<br/>+ Microsoft 365 mail"]
AZ["Microsoft Azure / Entra<br/>App registrations:<br/>· Teams Bot (Bot Framework)<br/>· TnE Connect SSO (per tenant)<br/>· TnE Connect calendar via MS Graph"]
end
Net --> E1
Net --> E2
Net --> O1
E2 --> E3
E2 --> E4
E2 --> E5
O1 --> O2
O1 --> O3
GD -. DNS / mail .- Net
AZ -. SSO + Graph .- E3
AZ -. SSO + Graph .- E4
AZ -. Bot Framework .- E2
38 distinct services are documented across these hosts (32 operational apps + 6 internal R&D / future products), organised into eight functional categories:
| Function | Examples | Count |
|---|---|---|
| Customer-facing products | Parallax (UR), TnE Connect tenants × 2, three CRM instances | 5 |
| Customer-facing brand sites | The three corporate WordPress sites | 3 |
| Engineering & development platform | GitLab, Dokploy, Coder, Teams Bot | 4 |
| Identity, secrets & networking | Authentik (SSO), Vault (secrets), Wireguard (VPN), Microsoft 365 | 4 |
| Storage & data | MinIO, PE Tube (video), 5 Oracle databases | 3 + 5 |
| Monitoring & operations | Portainer, Beszel, Gotify, Watchtower | 4 |
| Internal productivity | n8n, Open WebUI, Draw.io, IT Tools, internal dev DBs | 5 |
| Internal R&D / future products | Content Connect (paused AI marketing app), Dot Connect (active visual-PM build), Pitch Connect / Risk Connect (idea only), Supabase (abandoned trial), 448G OCI CI Prod (aspirational SSH-automation) | 6 |
The full inventory and per-service detail is at apps/ in this repo.
2. What it does for the business¶
A short business-language description of each layer.
Customer-facing products (the revenue surface)¶
| Product | What it does for the customer | Today |
|---|---|---|
Parallax at parallax.projecteidos.com |
Hospitality dynamic-pricing platform built for Untapped Revenue Solutions (UR). UR's hotel clients use it to set room prices based on demand, events, and competitor pricing. | ~40 users across 23 properties. Strategic relationship — possible 5-year support engagement + UR plans to resell as SaaS. |
TnE Connect — Fourway tenant at fourway.tneconnect.app |
Workforce-management SaaS (timesheets, employee records, scheduling). The Fourway client is our paying customer who helped us prove the product. | ~150 users. £5,000/year (friends-and-family pricing). |
TnE Connect — Eidos tenant at eidos-global.tneconnect.app |
Same product, used internally for our own staff. Doubles as the dogfood reference for prospects. | ~30 users. Internal use. |
Three CRM instances at crm.eidos-global.com, in.crm.eidos-global.com, crm.tneconnect.app |
Sales-pipeline + customer-account tracking. Twenty CRM (open-source). | ~6 users total. Strategic; usage scaling up. |
Brand & marketing¶
| Site | Purpose |
|---|---|
eidos-global.com |
Eidos Global corporate brand front door. WordPress. |
tneconnect.app |
TnE Connect product marketing site (recently relaunched with RocketSaas). |
projecteidos.com (apex) |
Project Eidos brand front; currently 301-redirects to eidos-global.com. |
Engineering & developer platform¶
| System | Role |
|---|---|
GitLab at git.projecteidos.com |
Source-of-truth for our in-house code, CI/CD pipelines, container registry. |
Dokploy at platform.projecteidos.com |
Self-hosted "platform-as-a-service" — automates deploying applications onto our servers. |
Coder at coder.448.global |
Browser-based developer environments where engineers work without installing anything locally. |
Teams Bot at bot.projecteidos.com |
A Microsoft Teams chatbot integration. |
Identity, secrets & networking (the trust layer)¶
| System | Role |
|---|---|
Authentik at auth.448.global |
The company's "single sign-on centre". Federates with Microsoft 365 and grants access to 15 downstream applications. |
Vault at vault.448.global |
The vault for every password, API key, and signing certificate the business depends on. |
Wireguard at wg.448.global |
The company's VPN for accessing internal-only systems. |
Storage, monitoring & internal productivity¶
The remainder are smaller internal tools — file storage (MinIO), video hosting (PE Tube), monitoring dashboards (Beszel, Portainer), automation (n8n), AI chat (Open WebUI), diagramming (Draw.io), and similar. Useful but not load-bearing for revenue.
Internal R&D and future products¶
Six additional applications are registered with Authentik for SSO but are not part of the operational 32-app inventory because they are at various stages between idea and active build:
| App | Stage | Notes |
|---|---|---|
| Content Connect | Paused — built with Lovable, set aside under other priorities | AI-powered marketing tool — image generation against company brand guidelines, role-based career-news / social-media drafting, scheduled posting to LinkedIn and X. Strategic intent is to fold these capabilities into the Teams Bot as an "AI employee" that can also drive actions against our APEX production apps. |
| Dot Connect (Dev + Prod) | Active build — paused last year, now reviving in gaps between higher-priority work | A visual project-management product designed to replace tools like JIRA and MS Project. Hierarchical "dots" with colour-coded status, automated time / cost / budget calculations, and visible linkages between work items. |
| Pitch Connect | Idea only — never started | Placeholder OIDC client; concept yet to be defined. |
| Risk Connect | Idea only — never started | Placeholder OIDC client; concept yet to be defined. |
| Supabase | Abandoned trial | Was evaluated as a backend for Content Connect; not adopted. The OIDC client should be removed from Authentik. |
| 448G OCI CI Prod | Aspirational | Authentik RAC provider set up to explore secured / scalable SSH-via-Vault automation. Not in production today; the underlying direction is still worth pursuing. |
Per-app detail is in apps/14-authentik.md.
3. What it costs¶
Cloud infrastructure (Oracle Cloud)¶
pie title OCI estate by tier
"Free Tier (no cost)" : 8
"Paid (Parallax ADB only today)" : 1
| Resource | Tier | Monthly |
|---|---|---|
| EIDOSDev1 — E1 Caddy VPS (Ampere A1, 1/6) | Always Free | £0 |
| EIDOSDev1 — E2 Dokploy VPS (Ampere A1, 3/18) | Always Free | £0 |
| EIDOSDev1 — E3 TnE Connect (Eidos) ADB | Always Free | £0 |
| EIDOSDev1 — E4 Fourway tenant ADB | Always Free | £0 |
| EIDOSDev1 — E5 Parallax paid ADB (2 OCPU, 20 GB, autoscale to 3×) | Paid | ~£40–£50/month (recent: Apr £45.21, Mar £45.29, Feb £40.06; January was £23.36) |
| EIDOSDev1 — Object storage (PECommon bucket + others) | Pay-per-GB | ~£0/month today (a couple of GB used; Oracle lists ~£20/TB-month, so 1 TB would be ~£20/month if we ever scale up) |
| Egress / outbound bandwidth | Pay-per-GB | ~£0/month |
| ORA448Global — O1 all-in-one VPS (Ampere A1) | Always Free | £0 |
| ORA448Global — O1 backup snapshots (weekly + monthly + yearly) | Pay-per-GB | ~£15/month |
| ORA448Global — O2 + O3 internal dev ADBs | Always Free | £0 |
Total OCI today: ~£55–£65/month (Parallax paid ADB + O1 backups; everything else zero).
A noteworthy structural saving: running our own Caddy reverse-proxy on E1 (a Free Tier VPS) lets us avoid the higher-tier paid ADB plan that Oracle would otherwise charge ~£500/month for vanity-URL support across our APEX databases. The £45/month paid ADB tier is the minimum needed for Parallax's autoscaling capacity; the vanity URLs themselves cost us nothing extra because we proxy them ourselves.
Both tenancies are billed separately today: EIDOSDev1 on the Project Eidos corporate card; ORA448Global on a 448 Global card (sister company; merger in flight). Renewals and budget oversight should look at both.
Domains & email (GoDaddy)¶
GoDaddy handles 4 domain registrations + Microsoft 365 mail across (at least) 3 separate tenants. There are no practical alternatives without significant migration effort — GoDaddy stays as-is. Cost detail is held by leadership; not actionable from the engineering side.
SaaS subscriptions¶
| Provider | Used for | Current monthly |
|---|---|---|
| OpenAI API | LLM API calls from Open WebUI + Coder workspaces | ~£20–£30/month (prepaid top-ups, light usage today) |
| BFL Labs (Black Forest Labs / Flux) | Image generation API | ~£0/month ($10 added, never used) |
| Claude.ai | Anthropic subscription on a single shared seat across Coder workspaces | ~£75/month |
| Bitbucket | TnE Connect product source repo | £0/month (Free tier — capped at 5 users; near limit) |
| JIRA + Confluence (Atlassian) | Ticket tracking + wiki + n8n integration | £0/month (Free tier — capped at 10 users; near limit) |
| RocketSaas | TnE Connect marketing partner | [£X] — held by leadership |
| Brevo | Set up at some point on projecteidos.com; current use unknown (KI-040) |
unknown — to be investigated |
Total SaaS today: ~£100/month, dominated by Claude.ai.
Engineering time¶
The largest "hidden" cost. Vishnu spends a meaningful portion of his time on infrastructure operations alongside his architecture role. Phase 2 of the plan estimates ~30% of one engineer for 6 months to bring the estate to professional-grade.
Total approximate run-rate today¶
| Bucket | Monthly |
|---|---|
| OCI (compute + storage + backup) | ~£55–£65 |
| SaaS subscriptions | ~£100 |
| Domains + Microsoft 365 (at GoDaddy) | held by leadership |
| RocketSaas marketing partner | held by leadership |
| Engineering time | ~30% of 1 FTE on infra |
The directly-engineering-controlled cloud + SaaS run-rate is on the order of £150–£200/month — modest by the standard of an estate this size. The bulk of the cost-control story is structural: we self-host most of what would otherwise be paid SaaS.
4. Why we self-host — the cost-savings story¶
graph LR
SH["Self-hosted (today)"] -->|Saves| Diff[Annual delta]
SaaS["Equivalent commercial SaaS"] -->|Costs| Diff
The 32 documented apps include nine systems that would otherwise be paid SaaS subscriptions. Below is a conservative list-price comparison for each at our current usage (the savings would scale further as the team grows).
| Self-hosted system | SaaS equivalent (list price) | Approx. annual cost if outsourced | Self-hosted cost | Annual saving |
|---|---|---|---|---|
| GitLab CE | GitLab.com Premium @ $29/user/month | $29 × ~10 users × 12 = ~$3,500 | included in OCI free tier | ~£2,800 |
| Vault (HashiCorp Cloud Platform) | HCP Vault @ ~$0.20/secret-hour minimum tier | minimum ~$1,500/month = ~$18,000 | free tier OCI compute | ~£14,000 |
| Authentik | Auth0 @ $11–$25/user/month | mid-estimate $15 × ~30 users × 12 = ~$5,400 | free tier OCI compute | ~£4,300 |
| Twenty CRM × 3 instances | HubSpot Sales Hub Pro @ $90/user/month | $90 × ~6 users × 12 = ~$6,500 | included in E2 footprint | ~£5,200 |
| WordPress × 3 sites | WordPress.com Business @ £25/site/month | £25 × 3 × 12 = ~£900 | included in E2 footprint | ~£900 |
| n8n | Zapier Professional @ $73.5/month for 2k tasks | mid-estimate ~$1,800/year (scales with workflow volume) | included in O1 footprint | ~£1,400 |
| Coder dev environments | GitHub Codespaces @ $0.18/hour × ~hours used | usage-dependent ~$1,000–$3,000/year at small team scale | included in O1 footprint | ~£1,500 |
| MinIO / object storage | AWS S3 + bandwidth | usage-dependent ~£200–£500/year at current volumes | included in O1 footprint | ~£300 |
| PE Tube / video | Wistia Pro @ $99/month or Vimeo Business @ $50/month | ~$600–£1,200/year | included in O1 footprint | ~£700 |
Rough total annual saving from self-hosting: £25,000–£35,000.
Specific comparison: self-hosted GitLab CE vs. Atlassian (Bitbucket + JIRA + Confluence)¶
A worked example for the planning question we're facing now — moving Bitbucket repos onto our own GitLab CE while also moving JIRA + Confluence to Atlassian Standard for 15 users.
| Scenario | Bitbucket | JIRA | Confluence | Monthly | Annual |
|---|---|---|---|---|---|
| Today (Free tiers, capped at 5 / 10 / 10 users — near limit) | £0 | £0 | £0 | £0 | £0 |
| All Atlassian Standard at 15 users (hypothetical) | ~£40 (Bitbucket Standard ~$3.30/user) | ~£105 (~$8.60/user) | ~£75 (~$6.40/user) | ~£220 | ~£2,640 |
| Planned: GitLab CE self-hosted + JIRA + Confluence Standard for 15 users | £0 (on our infra) | ~£105 | ~£75 | ~£180 | ~£2,160 |
Compared to the all-Atlassian-paid path: the planned GitLab CE move saves ~£40/month, ~£500/year, at 15 users — modest at this team size, growing meaningfully as the team scales (every new engineer adds ~£3.30/month they would have cost on Bitbucket Standard).
The bigger Atlassian-side question is whether to keep JIRA + Confluence (the team prefers JIRA's project management today) or to consolidate into a different stack later — see also Dot Connect, our internal project-management product in active build.
Why move Bitbucket to GitLab CE specifically: beyond the small recurring saving, it consolidates source-control on infrastructure we own and lets us close KI-034 (the TnE Connect source-on-Bitbucket dependency).
This is the structural argument for our self-hosting choice. It does come with three trade-offs:
- Engineering time — someone has to keep the systems running. Today that is ~1 FTE-fraction; this is the cost we pay in lieu of subscriptions.
- Risk concentration — we own the failure modes. Phase 2 of the plan addresses the biggest of these.
- Procurement vs build — every new SaaS feature that the rest of the market is paying for, we have to either find in an open-source alternative or build ourselves.
The strategic question on self-hosting isn't "is it cheaper?" — it clearly is. It's "is the engineering effort to maintain it being well-spent vs. its alternatives?" Today the answer is yes; this should be re-tested annually.
5. Where the risk concentrates¶
quadrantChart
title Estate risk — criticality vs current maturity
x-axis "Low maturity" --> "High maturity"
y-axis "Low criticality" --> "Critical"
quadrant-1 "OK"
quadrant-2 "Invest first"
quadrant-3 "Fine to defer"
quadrant-4 "Polish"
"Vault": [0.2, 0.95]
"Authentik": [0.3, 0.95]
"GitLab": [0.3, 0.85]
"Parallax (UR)": [0.4, 0.95]
"Fourway tenant": [0.25, 0.85]
"WordPress sites": [0.4, 0.5]
"Twenty CRMs": [0.45, 0.4]
"Caddy proxy (E1)": [0.2, 0.7]
"Wireguard": [0.5, 0.55]
"Internal tools": [0.55, 0.25]
The five items in the upper-left quadrant — high criticality, low maturity — are where Phase 2 invests first.
The five risks leadership most needs to understand¶
| # | Risk | Plain-language meaning | Cost if it bites | Cost to fix |
|---|---|---|---|---|
| 1 | Vault on :latest + Watchtower (KI-037) — same risk applies to Authentik right now |
A surprise overnight image update breaks our identity provider. Already happened to Vault on 2026-05-01 (5 days down). Authentik is the next target. | One-day outage of every internal-app login the next time it happens. | 5 minutes of engineering time to pin the image and add the exclusion label. |
| 2 | Fourway tenant on Free Tier ADB without restorable backup (KI-019, KI-035) | Our paying customer's database has no SLA from Oracle and no rollback. A corruption event = data loss for 150 users. | Customer relationship damaged or terminated; reputational impact on the SaaS go-to-market. | ~£200–£500/month for a paid Oracle DB tier. Single largest line item in Phase 2. |
| 3 | Oracle 19c → 26ai migration required without restorable backup (KI-036) | Oracle is asking us to migrate the TnE Connect databases to a new major version. With Free Tier we cannot roll back if the migration fails. | Unrecoverable data loss in worst case. | Sequenced after (2) — the upgrade to paid tier enables a safe migration. |
| 4 | Configuration files only on the host (KI-001) | If a server needs to be rebuilt, we cannot rebuild the configuration cleanly because it isn't in source control. Already caused a real outage in April 2026. Caddyfiles, n8n workflows, custom container images all in this category. | One-time hours-to-days of hand-rebuilding under stress; ongoing exposure each time a server is touched. | Engineering time. Vault has already been fixed (this set the pattern); Authentik, Caddy, n8n still to do. |
| 5 | Single-region (UK London) for everything | A region-level Oracle Cloud incident affects every product simultaneously. | Multi-hour outage of every customer system. | Long-term project — Wave 3 of the plan. Phase 2 starts with off-region backup as the precursor. |
Other significant risks (briefer)¶
- No backups of identity / secrets / source-control yet (KI-017, KI-018) — the first off-host Vault backup landed on 2026-05-06. Authentik and GitLab still have none.
- No tested restore drills ever performed (KI-016) — every recovery scenario today is uncalibrated.
- Operational admin path depends on a personal-account VPN (KI-010).
- Operating-system patches not applied on the three servers (KI-023) — six months of accumulating CVEs.
- No external uptime monitoring (KI-021) — outages are detected by user complaint.
- Three separate Microsoft 365 tenants (KI-030) — admin / billing / policy fragmented; consolidation candidate.
The full operational risk register (41 entries as of 2026-05-07) is at infra/known-issues.md.
6. The improvement plan¶
gantt
title Phase-2 modernization waves (May–Oct 2026)
dateFormat YYYY-MM-DD
section Wave 1 — Stop the bleeding
Vault & Authentik backup :w1a, 2026-05-08, 14d
Caddyfile & critical configs in Git :w1b, 2026-05-08, 14d
Fourway ADB upgrade to paid tier :w1c, 2026-05-08, 7d
Tailscale to company account :w1d, 2026-05-15, 14d
Cert-expiry & SPF/DMARC fixes :w1e, 2026-05-15, 7d
section Wave 2 — Build the floor
Parallax pre-prod isolation :w2a, 2026-06-01, 30d
OS-patch monthly cadence :w2b, 2026-06-01, 30d
Restore-test drills :w2c, 2026-06-15, 30d
External uptime + alert delivery :w2d, 2026-06-15, 30d
section Wave 3 — Modernize
OCI Security Lists in Terraform :w3a, 2026-08-01, 60d
OCI ↔ Authentik federation :w3b, 2026-08-01, 60d
Off-OCI backup destination :w3c, 2026-08-01, 30d
M365 tenant rationalization :w3d, 2026-09-01, 60d
What changes for the business by wave¶
| Wave | Months | What the business gets |
|---|---|---|
| Wave 1 — Stop the bleeding | May | The most embarrassing gaps closed. Customer-facing outages of the type we had in April become much harder to repeat. Paying customer (Fourway) on infrastructure with restorable backups. |
| Wave 2 — Build the floor | June–July | Routine operations become routine: monthly maintenance windows, tested restore drills, external uptime monitoring, separated production / pre-production for our paying client. The team operates on a known cadence. |
| Wave 3 — Modernize | August–October | Multi-region resilience foundations, infrastructure-as-code, identity consolidation, off-OCI backups, Microsoft 365 tenant rationalization. These are the "professional grade" hardening moves. |
Estimated additional monthly operating cost from Phase 2¶
| Item | Estimate |
|---|---|
| Paid Oracle DB tier for the Fourway tenant | £200–£500/month ← single largest decision |
| Centralized VPN service (or self-hosted equivalent) | <£50/month |
| External uptime monitoring | £0 (free tier) |
| Off-region backup storage (Backblaze B2 or similar) | <£20/month |
| Engineering time | ~30% of one engineer for 6 months |
The full detailed action register (40 items, each with target outcomes and verification steps) is at overview/phase-2-roadmap.md. It is designed so the engineering team and (in time) automated tooling can pick up items, execute them, and verify completion against unambiguous criteria.
7. Continuity & access¶
The "Vishnu is on holiday" section. Where to find access, who knows what, and what to do in different categories of incident.
Who runs the estate¶
| Person | Role |
|---|---|
| Stacy Carpenter | Company owner — holds Vault unseal keys; manages the GoDaddy account |
| Adam Pitt-Stanley | Company owner — holds Vault unseal keys; manages the GoDaddy account |
| Tracey Weetman (traceyweetman@projecteidos.com) | Oracle Lead — primary Oracle relationship; admin of the EIDOSDev1 OCI tenancy; commercial escalation for anything Oracle-related |
| Bradley Leggett (BradleyLeggett@projecteidos.com) | Database Administrator — operates the Oracle databases day-to-day; holds Vault unseal keys |
| Vishnu Kant (vishnukant@projecteidos.com) | Solutions Architect — additional admin on the ORA448Global OCI tenancy (Adam is the tenancy owner); day-to-day operator of all 32 documented applications; holds Vault unseal keys |
| Sergiu Pop | IT assets + Oracle APEX consultant — looks after laptops, office networking, and the UK / India office leased-line setup; also takes on Oracle APEX development work alongside Vishnu and Bradley |
Out-of-hours / vacation cover today is informal. Phase 2 includes formalising the on-call rota.
Where credentials live¶
The single source of truth is vault.448.global. Every operational password, API key, and signing certificate that has been migrated lives there. The Vault is organised into four key-value mounts:
| Vault mount | What it holds |
|---|---|
448G_KV/ |
Secrets for *.448.global apps — including the Authentik admin credentials |
kv_pe/ |
Project Eidos / shared infra credentials — Oracle ADB admin passwords, Oracle Email Delivery SMTP, GitLab root password |
ur/ |
Untapped Revenue Solutions / Parallax-specific credentials |
fourway_kv/ |
Fourway TnE Connect tenant-specific credentials |
To log into Vault, an authorised engineer browses to vault.448.global and signs in using their Microsoft 365 corporate account (the federation chain is M365 → Authentik → Vault).
Break-glass: if Microsoft 365 federation is broken, an akadmin Authentik account exists for direct sign-in; its credential is also stored in Vault.
Vault unseal-key custody. The Vault was initialised with a 5-share / 3-threshold Shamir split. Four custodians hold keys today:
| Holder | |
|---|---|
| Vishnu Kant | day-to-day Vault operator |
| Stacy Carpenter | company owner |
| Adam Pitt-Stanley | company owner |
| Bradley Leggett | DBA |
Each of the four currently holds all 5 shares. This is operationally robust (any one custodian can unseal alone if Vault has to be brought back up) but security-wise weaker than the intended threshold of 3 (KI-039). The Phase-2 recommendation is to add a fifth custodian and re-issue with one share per person.
| Credential type | Vault path | Held also by |
|---|---|---|
| Authentik akadmin + secret key | 448G_KV/auth.448.global |
Vishnu (offline copy) |
| GitLab root account | kv_pe/git.projecteidos.com/ |
Vishnu, Bradley |
| Parallax / UR credentials (APEX workspace, ADB ADMIN) | ur/ |
Vishnu, Bradley |
| Fourway tenant credentials (Azure SSO, OTP) | fourway_kv/ |
Vishnu, Bradley |
| Oracle Email Delivery SMTP (shared across estate) | kv_pe/OCI-SMTP |
Vishnu, Bradley |
| OCI ADB credentials (per database) | kv_pe/<ADB-NAME>-ATP-* |
Vishnu, Bradley |
| Eidos tenant credentials | currently in Bradley's personal Bitwarden, not Vault (KI-007) | Bradley only — Wave 1 migration item |
Important: the OCI tenancy console for EIDOSDev1 is administered by Tracey (with Bradley as a full admin); ORA448Global is owned by Adam Pitt-Stanley with Vishnu as an additional admin. The GoDaddy account (domains + Microsoft 365 mail) credential is held by company leadership (Stacy + Adam) — not in
vault.448.global.
What to do if a system is down — runbook map¶
| Symptom | Runbook | Likely first responders |
|---|---|---|
| Customer URLs unreachable / TLS errors | RB-001 — Caddy down | Vishnu / Bradley |
| Vault sealed / unreachable / container not starting | RB-003 — Vault recovery (Path D specifically for the container-won't-start case from May 2026) | Vishnu (with 3-of-5 unseal-key holders for full unseal) |
Entire *.448.global estate offline |
RB-002 — O1 disaster recovery | Vishnu + leadership comms |
| Customer-facing app showing data issues | Per-app doc Section 11 — currently no formal RB-004 for Parallax / TnE Connect (Wave 2) | Vishnu + Bradley |
| Anything Oracle-side | Tracey is the Oracle relationship lead; Bradley operates the databases | Bradley first, Tracey for commercial escalation |
Emergency access path (if Vishnu is unavailable)¶
A trusted second engineer or leadership representative needs three things to operate the estate:
- OCI console access to both EIDOSDev1 (via Tracey or Bradley) and ORA448Global (Adam Pitt-Stanley as owner, plus Vishnu as additional admin).
- Vault session — log in via
vault.448.globalusing a corporate Microsoft 365 account that has been added to the appropriate Authentik group. Today this list is small (Vishnu, Bradley); expanding it is part of Phase 2. - GoDaddy account for domain renewals and DNS changes. The credential is held by company leadership (Stacy + Adam); it is not in
vault.448.globaland not held by the engineering team. For DNS changes that need engineering input, the request goes through leadership.
Until Wave 1 closes the gaps above, the practical recommendation if Vishnu becomes unavailable:
- Bradley can operate the Oracle layer and most of EIDOSDev1.
- Tracey can engage Oracle commercially and authorise tenancy-level changes.
- External engineer brought in on contract would need a 30-minute handover to learn the layout — this very document plus the runbooks under
infra/runbooks/is designed to be that handover.
8. Decisions needed from leadership¶
Specific approvals or directional calls leadership can take in the next finance / planning review.
Wave 1 — small, high-leverage approvals¶
| # | Decision | Cost / impact | Timing |
|---|---|---|---|
| D1 | Approve paid Oracle DB tier upgrade for the Fourway tenant (KI-019, KI-035) | +£200–£500/month | Wave 1 — single largest line item |
| D2 | Approve company-paid Tailscale subscription (or self-hosted Headscale equivalent) to remove the personal-account dependency | <£50/month | Wave 1 |
| D3 | Approve a small Backblaze B2 (or equivalent) account for off-region backup storage | <£20/month | Wave 1 |
| D4 | Approve engineering capacity of ~30% of one engineer for 6 months on Phase 2 | engineering time | Wave 1 sets the pattern |
Wave 3 — strategic decisions¶
| # | Decision | What it changes |
|---|---|---|
| D5 | Microsoft 365 tenant rationalization (KI-030) — keep three or merge | Long-term licensing efficiency + simplified off-boarding |
| D6 | Migrate TnE Connect source from Bitbucket to GitLab (KI-034) | Consolidates source-control onto our own infra; closes a third-party dependency |
| D7 | Multi-region disaster recovery for paying-customer systems (Parallax + Fourway) | Cost of a second-region paid ADB; cross-region replication setup |
| D8 | Adopt or formalise an SLA for Fourway and other future paying customers | Drives the resilience target for everything underneath |
Strategic re-test (annual)¶
| # | Question |
|---|---|
| D9 | Is continuing to self-host the right decision next year? Re-test the £25–35K/year saving against the engineering effort to maintain it. |
| D10 | What is the right on-call structure as the team and customer base scale? The current informal model works at three people; it doesn't at ten. |
9. How to use this document going forward¶
- Quarterly leadership review — re-read §3 cost, §5 risks, §6 plan progress. Pull current numbers into the placeholders.
- Annual strategic review — re-read §4 self-hosting argument and the D9 row in §8 (re-test the decision).
- Incident retrospective — when something goes wrong, a dated entry under
infra/incidents/(sibling files in that folder) updates this document's risk section. - New-leader briefing — print this document; pair with a 30-minute Vishnu-led walk-through of the OCI consoles and Vault. That should be enough to operate the estate without ongoing technical handholding.
The full source-of-truth for everything in this document is the rest of this repository:
overview/landscape.md— engineering-readable one-page view of the estateoverview/phase-2-roadmap.md— the engineering execution plan with verification steps (RM-001 to RM-043)overview/domains.md— DNS / registrar / email-authentication detailoverview/external-saas.md— third-party services we depend onoverview/risk-heatmap.md— criticality × maturity grid across the 32 appsinfra/known-issues.md— the 41-entry operational-risk registerinfra/README.md— infrastructure inventory: cloud accounts, servers, network, proxies, TLS, backups, runbooks, incidentsapps/01-parallax.md— first per-application doc; the rest of the 32 are sibling files (02-*.md…32-*.md)
Everything in this document is derived from those sources and stays current as they are updated.