Skip to content

IT Estate — Executive Summary & Continuity Brief

Single-page briefings for each section; expand the sub-sections for detail. Last reviewed 2026-05-07.

What you can do with this document

Need Section
Understand what we run, at a glance §1 The estate at a glance
Brief a new colleague or external stakeholder §2 What it does for the business
Discuss IT spend in the next finance review §3 What it costs + §4 Why we self-host
Understand current operational risks §5 Where the risk concentrates
See the modernization plan §6 The improvement plan
Operate the estate without Vishnu present §7 Continuity & access
Approve specific decisions §8 Decisions needed from leadership

1. The estate at a glance

We operate 4 internet domains, supported by 3 servers and 5 managed Oracle databases, all hosted in Oracle Cloud (UK London). Domain registration and corporate email run through GoDaddy. Microsoft Azure (Entra) is also used — not for compute, but for app registrations that connect our applications to the Microsoft 365 ecosystem. These registrations span two different M365 tenants — one we administer, one we don't:

  1. In our own Project Eidos M365 tenant (we administer it):
  2. Teams Bot registered via the Bot Framework so Microsoft Teams can deliver chat messages to our bot.projecteidos.com webhook.
  3. Eidos TnE Connect SSO + calendar integration — App Registrations that federate Microsoft sign-in into the workforce app for our own staff, plus Microsoft Graph delegated permissions used by the leave-application feature to read / write Outlook calendar events.
  4. In Fourway's own M365 tenant (administered by Fourway, not by us):
  5. Fourway TnE Connect SSO + calendar integration — equivalent App Registrations live in Fourway's tenant. Their tenant administrator manages consent and the registration itself; we hold only the client ID + secret APEX needs to complete the sign-in handshake. If Fourway revokes consent, the Fourway tenant of TnE Connect cannot sign users in until they restore it.

There are no other clouds, no other VPS providers, no on-premises hardware.

graph TB
    Net((Public Internet))
    subgraph EIDOS["OCI tenancy: EIDOSDev1 (uk-london-1)"]
        E1["E1 — Caddy reverse proxy<br/>1 vCPU / 6 GB · Free tier"]
        E2["E2 — Dokploy host<br/>3 vCPU / 18 GB · Free tier<br/>9 PE-side apps incl. WordPress"]
        E5[("E5 — Paid Oracle DB<br/>Parallax for the UR client")]
        E3[("E3 — Free DB<br/>Eidos workforce tenant")]
        E4[("E4 — Free DB<br/>Fourway workforce tenant")]
    end
    subgraph G448["OCI tenancy: ORA448Global (uk-london-1)"]
        O1["O1 — All-in-one host<br/>~15 internal-tools apps incl.<br/>identity (Authentik), secrets (Vault),<br/>storage (MinIO), monitoring"]
        O2[("O2 — Free DB · internal dev")]
        O3[("O3 — Free DB · internal dev")]
    end
    subgraph EXT["External providers"]
        GD["GoDaddy<br/>4 domain registrations<br/>+ Microsoft 365 mail"]
        AZ["Microsoft Azure / Entra<br/>App registrations:<br/>· Teams Bot (Bot Framework)<br/>· TnE Connect SSO (per tenant)<br/>· TnE Connect calendar via MS Graph"]
    end
    Net --> E1
    Net --> E2
    Net --> O1
    E2 --> E3
    E2 --> E4
    E2 --> E5
    O1 --> O2
    O1 --> O3
    GD -. DNS / mail .- Net
    AZ -. SSO + Graph .- E3
    AZ -. SSO + Graph .- E4
    AZ -. Bot Framework .- E2

38 distinct services are documented across these hosts (32 operational apps + 6 internal R&D / future products), organised into eight functional categories:

Function Examples Count
Customer-facing products Parallax (UR), TnE Connect tenants × 2, three CRM instances 5
Customer-facing brand sites The three corporate WordPress sites 3
Engineering & development platform GitLab, Dokploy, Coder, Teams Bot 4
Identity, secrets & networking Authentik (SSO), Vault (secrets), Wireguard (VPN), Microsoft 365 4
Storage & data MinIO, PE Tube (video), 5 Oracle databases 3 + 5
Monitoring & operations Portainer, Beszel, Gotify, Watchtower 4
Internal productivity n8n, Open WebUI, Draw.io, IT Tools, internal dev DBs 5
Internal R&D / future products Content Connect (paused AI marketing app), Dot Connect (active visual-PM build), Pitch Connect / Risk Connect (idea only), Supabase (abandoned trial), 448G OCI CI Prod (aspirational SSH-automation) 6

The full inventory and per-service detail is at apps/ in this repo.


2. What it does for the business

A short business-language description of each layer.

Customer-facing products (the revenue surface)

Product What it does for the customer Today
Parallax at parallax.projecteidos.com Hospitality dynamic-pricing platform built for Untapped Revenue Solutions (UR). UR's hotel clients use it to set room prices based on demand, events, and competitor pricing. ~40 users across 23 properties. Strategic relationship — possible 5-year support engagement + UR plans to resell as SaaS.
TnE Connect — Fourway tenant at fourway.tneconnect.app Workforce-management SaaS (timesheets, employee records, scheduling). The Fourway client is our paying customer who helped us prove the product. ~150 users. £5,000/year (friends-and-family pricing).
TnE Connect — Eidos tenant at eidos-global.tneconnect.app Same product, used internally for our own staff. Doubles as the dogfood reference for prospects. ~30 users. Internal use.
Three CRM instances at crm.eidos-global.com, in.crm.eidos-global.com, crm.tneconnect.app Sales-pipeline + customer-account tracking. Twenty CRM (open-source). ~6 users total. Strategic; usage scaling up.

Brand & marketing

Site Purpose
eidos-global.com Eidos Global corporate brand front door. WordPress.
tneconnect.app TnE Connect product marketing site (recently relaunched with RocketSaas).
projecteidos.com (apex) Project Eidos brand front; currently 301-redirects to eidos-global.com.

Engineering & developer platform

System Role
GitLab at git.projecteidos.com Source-of-truth for our in-house code, CI/CD pipelines, container registry.
Dokploy at platform.projecteidos.com Self-hosted "platform-as-a-service" — automates deploying applications onto our servers.
Coder at coder.448.global Browser-based developer environments where engineers work without installing anything locally.
Teams Bot at bot.projecteidos.com A Microsoft Teams chatbot integration.

Identity, secrets & networking (the trust layer)

System Role
Authentik at auth.448.global The company's "single sign-on centre". Federates with Microsoft 365 and grants access to 15 downstream applications.
Vault at vault.448.global The vault for every password, API key, and signing certificate the business depends on.
Wireguard at wg.448.global The company's VPN for accessing internal-only systems.

Storage, monitoring & internal productivity

The remainder are smaller internal tools — file storage (MinIO), video hosting (PE Tube), monitoring dashboards (Beszel, Portainer), automation (n8n), AI chat (Open WebUI), diagramming (Draw.io), and similar. Useful but not load-bearing for revenue.

Internal R&D and future products

Six additional applications are registered with Authentik for SSO but are not part of the operational 32-app inventory because they are at various stages between idea and active build:

App Stage Notes
Content Connect Paused — built with Lovable, set aside under other priorities AI-powered marketing tool — image generation against company brand guidelines, role-based career-news / social-media drafting, scheduled posting to LinkedIn and X. Strategic intent is to fold these capabilities into the Teams Bot as an "AI employee" that can also drive actions against our APEX production apps.
Dot Connect (Dev + Prod) Active build — paused last year, now reviving in gaps between higher-priority work A visual project-management product designed to replace tools like JIRA and MS Project. Hierarchical "dots" with colour-coded status, automated time / cost / budget calculations, and visible linkages between work items.
Pitch Connect Idea only — never started Placeholder OIDC client; concept yet to be defined.
Risk Connect Idea only — never started Placeholder OIDC client; concept yet to be defined.
Supabase Abandoned trial Was evaluated as a backend for Content Connect; not adopted. The OIDC client should be removed from Authentik.
448G OCI CI Prod Aspirational Authentik RAC provider set up to explore secured / scalable SSH-via-Vault automation. Not in production today; the underlying direction is still worth pursuing.

Per-app detail is in apps/14-authentik.md.


3. What it costs

Cloud infrastructure (Oracle Cloud)

pie title OCI estate by tier
    "Free Tier (no cost)" : 8
    "Paid (Parallax ADB only today)" : 1
Resource Tier Monthly
EIDOSDev1 — E1 Caddy VPS (Ampere A1, 1/6) Always Free £0
EIDOSDev1 — E2 Dokploy VPS (Ampere A1, 3/18) Always Free £0
EIDOSDev1 — E3 TnE Connect (Eidos) ADB Always Free £0
EIDOSDev1 — E4 Fourway tenant ADB Always Free £0
EIDOSDev1 — E5 Parallax paid ADB (2 OCPU, 20 GB, autoscale to 3×) Paid ~£40–£50/month (recent: Apr £45.21, Mar £45.29, Feb £40.06; January was £23.36)
EIDOSDev1 — Object storage (PECommon bucket + others) Pay-per-GB ~£0/month today (a couple of GB used; Oracle lists ~£20/TB-month, so 1 TB would be ~£20/month if we ever scale up)
Egress / outbound bandwidth Pay-per-GB ~£0/month
ORA448Global — O1 all-in-one VPS (Ampere A1) Always Free £0
ORA448Global — O1 backup snapshots (weekly + monthly + yearly) Pay-per-GB ~£15/month
ORA448Global — O2 + O3 internal dev ADBs Always Free £0

Total OCI today: ~£55–£65/month (Parallax paid ADB + O1 backups; everything else zero).

A noteworthy structural saving: running our own Caddy reverse-proxy on E1 (a Free Tier VPS) lets us avoid the higher-tier paid ADB plan that Oracle would otherwise charge ~£500/month for vanity-URL support across our APEX databases. The £45/month paid ADB tier is the minimum needed for Parallax's autoscaling capacity; the vanity URLs themselves cost us nothing extra because we proxy them ourselves.

Both tenancies are billed separately today: EIDOSDev1 on the Project Eidos corporate card; ORA448Global on a 448 Global card (sister company; merger in flight). Renewals and budget oversight should look at both.

Domains & email (GoDaddy)

GoDaddy handles 4 domain registrations + Microsoft 365 mail across (at least) 3 separate tenants. There are no practical alternatives without significant migration effort — GoDaddy stays as-is. Cost detail is held by leadership; not actionable from the engineering side.

SaaS subscriptions

Provider Used for Current monthly
OpenAI API LLM API calls from Open WebUI + Coder workspaces ~£20–£30/month (prepaid top-ups, light usage today)
BFL Labs (Black Forest Labs / Flux) Image generation API ~£0/month ($10 added, never used)
Claude.ai Anthropic subscription on a single shared seat across Coder workspaces ~£75/month
Bitbucket TnE Connect product source repo £0/month (Free tier — capped at 5 users; near limit)
JIRA + Confluence (Atlassian) Ticket tracking + wiki + n8n integration £0/month (Free tier — capped at 10 users; near limit)
RocketSaas TnE Connect marketing partner [£X] — held by leadership
Brevo Set up at some point on projecteidos.com; current use unknown (KI-040) unknown — to be investigated

Total SaaS today: ~£100/month, dominated by Claude.ai.

Engineering time

The largest "hidden" cost. Vishnu spends a meaningful portion of his time on infrastructure operations alongside his architecture role. Phase 2 of the plan estimates ~30% of one engineer for 6 months to bring the estate to professional-grade.

Total approximate run-rate today

Bucket Monthly
OCI (compute + storage + backup) ~£55–£65
SaaS subscriptions ~£100
Domains + Microsoft 365 (at GoDaddy) held by leadership
RocketSaas marketing partner held by leadership
Engineering time ~30% of 1 FTE on infra

The directly-engineering-controlled cloud + SaaS run-rate is on the order of £150–£200/month — modest by the standard of an estate this size. The bulk of the cost-control story is structural: we self-host most of what would otherwise be paid SaaS.


4. Why we self-host — the cost-savings story

graph LR
    SH["Self-hosted (today)"] -->|Saves| Diff[Annual delta]
    SaaS["Equivalent commercial SaaS"] -->|Costs| Diff

The 32 documented apps include nine systems that would otherwise be paid SaaS subscriptions. Below is a conservative list-price comparison for each at our current usage (the savings would scale further as the team grows).

Self-hosted system SaaS equivalent (list price) Approx. annual cost if outsourced Self-hosted cost Annual saving
GitLab CE GitLab.com Premium @ $29/user/month $29 × ~10 users × 12 = ~$3,500 included in OCI free tier ~£2,800
Vault (HashiCorp Cloud Platform) HCP Vault @ ~$0.20/secret-hour minimum tier minimum ~$1,500/month = ~$18,000 free tier OCI compute ~£14,000
Authentik Auth0 @ $11–$25/user/month mid-estimate $15 × ~30 users × 12 = ~$5,400 free tier OCI compute ~£4,300
Twenty CRM × 3 instances HubSpot Sales Hub Pro @ $90/user/month $90 × ~6 users × 12 = ~$6,500 included in E2 footprint ~£5,200
WordPress × 3 sites WordPress.com Business @ £25/site/month £25 × 3 × 12 = ~£900 included in E2 footprint ~£900
n8n Zapier Professional @ $73.5/month for 2k tasks mid-estimate ~$1,800/year (scales with workflow volume) included in O1 footprint ~£1,400
Coder dev environments GitHub Codespaces @ $0.18/hour × ~hours used usage-dependent ~$1,000–$3,000/year at small team scale included in O1 footprint ~£1,500
MinIO / object storage AWS S3 + bandwidth usage-dependent ~£200–£500/year at current volumes included in O1 footprint ~£300
PE Tube / video Wistia Pro @ $99/month or Vimeo Business @ $50/month ~$600–£1,200/year included in O1 footprint ~£700

Rough total annual saving from self-hosting: £25,000–£35,000.

Specific comparison: self-hosted GitLab CE vs. Atlassian (Bitbucket + JIRA + Confluence)

A worked example for the planning question we're facing now — moving Bitbucket repos onto our own GitLab CE while also moving JIRA + Confluence to Atlassian Standard for 15 users.

Scenario Bitbucket JIRA Confluence Monthly Annual
Today (Free tiers, capped at 5 / 10 / 10 users — near limit) £0 £0 £0 £0 £0
All Atlassian Standard at 15 users (hypothetical) ~£40 (Bitbucket Standard ~$3.30/user) ~£105 (~$8.60/user) ~£75 (~$6.40/user) ~£220 ~£2,640
Planned: GitLab CE self-hosted + JIRA + Confluence Standard for 15 users £0 (on our infra) ~£105 ~£75 ~£180 ~£2,160

Compared to the all-Atlassian-paid path: the planned GitLab CE move saves ~£40/month, ~£500/year, at 15 users — modest at this team size, growing meaningfully as the team scales (every new engineer adds ~£3.30/month they would have cost on Bitbucket Standard).

The bigger Atlassian-side question is whether to keep JIRA + Confluence (the team prefers JIRA's project management today) or to consolidate into a different stack later — see also Dot Connect, our internal project-management product in active build.

Why move Bitbucket to GitLab CE specifically: beyond the small recurring saving, it consolidates source-control on infrastructure we own and lets us close KI-034 (the TnE Connect source-on-Bitbucket dependency).

This is the structural argument for our self-hosting choice. It does come with three trade-offs:

  1. Engineering time — someone has to keep the systems running. Today that is ~1 FTE-fraction; this is the cost we pay in lieu of subscriptions.
  2. Risk concentration — we own the failure modes. Phase 2 of the plan addresses the biggest of these.
  3. Procurement vs build — every new SaaS feature that the rest of the market is paying for, we have to either find in an open-source alternative or build ourselves.

The strategic question on self-hosting isn't "is it cheaper?" — it clearly is. It's "is the engineering effort to maintain it being well-spent vs. its alternatives?" Today the answer is yes; this should be re-tested annually.


5. Where the risk concentrates

quadrantChart
    title Estate risk — criticality vs current maturity
    x-axis "Low maturity" --> "High maturity"
    y-axis "Low criticality" --> "Critical"
    quadrant-1 "OK"
    quadrant-2 "Invest first"
    quadrant-3 "Fine to defer"
    quadrant-4 "Polish"
    "Vault": [0.2, 0.95]
    "Authentik": [0.3, 0.95]
    "GitLab": [0.3, 0.85]
    "Parallax (UR)": [0.4, 0.95]
    "Fourway tenant": [0.25, 0.85]
    "WordPress sites": [0.4, 0.5]
    "Twenty CRMs": [0.45, 0.4]
    "Caddy proxy (E1)": [0.2, 0.7]
    "Wireguard": [0.5, 0.55]
    "Internal tools": [0.55, 0.25]

The five items in the upper-left quadrant — high criticality, low maturity — are where Phase 2 invests first.

The five risks leadership most needs to understand

# Risk Plain-language meaning Cost if it bites Cost to fix
1 Vault on :latest + Watchtower (KI-037) — same risk applies to Authentik right now A surprise overnight image update breaks our identity provider. Already happened to Vault on 2026-05-01 (5 days down). Authentik is the next target. One-day outage of every internal-app login the next time it happens. 5 minutes of engineering time to pin the image and add the exclusion label.
2 Fourway tenant on Free Tier ADB without restorable backup (KI-019, KI-035) Our paying customer's database has no SLA from Oracle and no rollback. A corruption event = data loss for 150 users. Customer relationship damaged or terminated; reputational impact on the SaaS go-to-market. ~£200–£500/month for a paid Oracle DB tier. Single largest line item in Phase 2.
3 Oracle 19c → 26ai migration required without restorable backup (KI-036) Oracle is asking us to migrate the TnE Connect databases to a new major version. With Free Tier we cannot roll back if the migration fails. Unrecoverable data loss in worst case. Sequenced after (2) — the upgrade to paid tier enables a safe migration.
4 Configuration files only on the host (KI-001) If a server needs to be rebuilt, we cannot rebuild the configuration cleanly because it isn't in source control. Already caused a real outage in April 2026. Caddyfiles, n8n workflows, custom container images all in this category. One-time hours-to-days of hand-rebuilding under stress; ongoing exposure each time a server is touched. Engineering time. Vault has already been fixed (this set the pattern); Authentik, Caddy, n8n still to do.
5 Single-region (UK London) for everything A region-level Oracle Cloud incident affects every product simultaneously. Multi-hour outage of every customer system. Long-term project — Wave 3 of the plan. Phase 2 starts with off-region backup as the precursor.

Other significant risks (briefer)

  • No backups of identity / secrets / source-control yet (KI-017, KI-018) — the first off-host Vault backup landed on 2026-05-06. Authentik and GitLab still have none.
  • No tested restore drills ever performed (KI-016) — every recovery scenario today is uncalibrated.
  • Operational admin path depends on a personal-account VPN (KI-010).
  • Operating-system patches not applied on the three servers (KI-023) — six months of accumulating CVEs.
  • No external uptime monitoring (KI-021) — outages are detected by user complaint.
  • Three separate Microsoft 365 tenants (KI-030) — admin / billing / policy fragmented; consolidation candidate.

The full operational risk register (41 entries as of 2026-05-07) is at infra/known-issues.md.


6. The improvement plan

gantt
    title Phase-2 modernization waves (May–Oct 2026)
    dateFormat  YYYY-MM-DD
    section Wave 1 — Stop the bleeding
    Vault & Authentik backup           :w1a, 2026-05-08, 14d
    Caddyfile & critical configs in Git :w1b, 2026-05-08, 14d
    Fourway ADB upgrade to paid tier   :w1c, 2026-05-08, 7d
    Tailscale to company account       :w1d, 2026-05-15, 14d
    Cert-expiry & SPF/DMARC fixes      :w1e, 2026-05-15, 7d
    section Wave 2 — Build the floor
    Parallax pre-prod isolation        :w2a, 2026-06-01, 30d
    OS-patch monthly cadence           :w2b, 2026-06-01, 30d
    Restore-test drills                :w2c, 2026-06-15, 30d
    External uptime + alert delivery   :w2d, 2026-06-15, 30d
    section Wave 3 — Modernize
    OCI Security Lists in Terraform    :w3a, 2026-08-01, 60d
    OCI ↔ Authentik federation         :w3b, 2026-08-01, 60d
    Off-OCI backup destination         :w3c, 2026-08-01, 30d
    M365 tenant rationalization        :w3d, 2026-09-01, 60d

What changes for the business by wave

Wave Months What the business gets
Wave 1 — Stop the bleeding May The most embarrassing gaps closed. Customer-facing outages of the type we had in April become much harder to repeat. Paying customer (Fourway) on infrastructure with restorable backups.
Wave 2 — Build the floor June–July Routine operations become routine: monthly maintenance windows, tested restore drills, external uptime monitoring, separated production / pre-production for our paying client. The team operates on a known cadence.
Wave 3 — Modernize August–October Multi-region resilience foundations, infrastructure-as-code, identity consolidation, off-OCI backups, Microsoft 365 tenant rationalization. These are the "professional grade" hardening moves.

Estimated additional monthly operating cost from Phase 2

Item Estimate
Paid Oracle DB tier for the Fourway tenant £200–£500/month ← single largest decision
Centralized VPN service (or self-hosted equivalent) <£50/month
External uptime monitoring £0 (free tier)
Off-region backup storage (Backblaze B2 or similar) <£20/month
Engineering time ~30% of one engineer for 6 months

The full detailed action register (40 items, each with target outcomes and verification steps) is at overview/phase-2-roadmap.md. It is designed so the engineering team and (in time) automated tooling can pick up items, execute them, and verify completion against unambiguous criteria.


7. Continuity & access

The "Vishnu is on holiday" section. Where to find access, who knows what, and what to do in different categories of incident.

Who runs the estate

Person Role
Stacy Carpenter Company owner — holds Vault unseal keys; manages the GoDaddy account
Adam Pitt-Stanley Company owner — holds Vault unseal keys; manages the GoDaddy account
Tracey Weetman (traceyweetman@projecteidos.com) Oracle Lead — primary Oracle relationship; admin of the EIDOSDev1 OCI tenancy; commercial escalation for anything Oracle-related
Bradley Leggett (BradleyLeggett@projecteidos.com) Database Administrator — operates the Oracle databases day-to-day; holds Vault unseal keys
Vishnu Kant (vishnukant@projecteidos.com) Solutions Architect — additional admin on the ORA448Global OCI tenancy (Adam is the tenancy owner); day-to-day operator of all 32 documented applications; holds Vault unseal keys
Sergiu Pop IT assets + Oracle APEX consultant — looks after laptops, office networking, and the UK / India office leased-line setup; also takes on Oracle APEX development work alongside Vishnu and Bradley

Out-of-hours / vacation cover today is informal. Phase 2 includes formalising the on-call rota.

Where credentials live

The single source of truth is vault.448.global. Every operational password, API key, and signing certificate that has been migrated lives there. The Vault is organised into four key-value mounts:

Vault mount What it holds
448G_KV/ Secrets for *.448.global apps — including the Authentik admin credentials
kv_pe/ Project Eidos / shared infra credentials — Oracle ADB admin passwords, Oracle Email Delivery SMTP, GitLab root password
ur/ Untapped Revenue Solutions / Parallax-specific credentials
fourway_kv/ Fourway TnE Connect tenant-specific credentials

To log into Vault, an authorised engineer browses to vault.448.global and signs in using their Microsoft 365 corporate account (the federation chain is M365 → Authentik → Vault).

Break-glass: if Microsoft 365 federation is broken, an akadmin Authentik account exists for direct sign-in; its credential is also stored in Vault.

Vault unseal-key custody. The Vault was initialised with a 5-share / 3-threshold Shamir split. Four custodians hold keys today:

Holder
Vishnu Kant day-to-day Vault operator
Stacy Carpenter company owner
Adam Pitt-Stanley company owner
Bradley Leggett DBA

Each of the four currently holds all 5 shares. This is operationally robust (any one custodian can unseal alone if Vault has to be brought back up) but security-wise weaker than the intended threshold of 3 (KI-039). The Phase-2 recommendation is to add a fifth custodian and re-issue with one share per person.

Credential type Vault path Held also by
Authentik akadmin + secret key 448G_KV/auth.448.global Vishnu (offline copy)
GitLab root account kv_pe/git.projecteidos.com/ Vishnu, Bradley
Parallax / UR credentials (APEX workspace, ADB ADMIN) ur/ Vishnu, Bradley
Fourway tenant credentials (Azure SSO, OTP) fourway_kv/ Vishnu, Bradley
Oracle Email Delivery SMTP (shared across estate) kv_pe/OCI-SMTP Vishnu, Bradley
OCI ADB credentials (per database) kv_pe/<ADB-NAME>-ATP-* Vishnu, Bradley
Eidos tenant credentials currently in Bradley's personal Bitwarden, not Vault (KI-007) Bradley only — Wave 1 migration item

Important: the OCI tenancy console for EIDOSDev1 is administered by Tracey (with Bradley as a full admin); ORA448Global is owned by Adam Pitt-Stanley with Vishnu as an additional admin. The GoDaddy account (domains + Microsoft 365 mail) credential is held by company leadership (Stacy + Adam) — not in vault.448.global.

What to do if a system is down — runbook map

Symptom Runbook Likely first responders
Customer URLs unreachable / TLS errors RB-001 — Caddy down Vishnu / Bradley
Vault sealed / unreachable / container not starting RB-003 — Vault recovery (Path D specifically for the container-won't-start case from May 2026) Vishnu (with 3-of-5 unseal-key holders for full unseal)
Entire *.448.global estate offline RB-002 — O1 disaster recovery Vishnu + leadership comms
Customer-facing app showing data issues Per-app doc Section 11 — currently no formal RB-004 for Parallax / TnE Connect (Wave 2) Vishnu + Bradley
Anything Oracle-side Tracey is the Oracle relationship lead; Bradley operates the databases Bradley first, Tracey for commercial escalation

Emergency access path (if Vishnu is unavailable)

A trusted second engineer or leadership representative needs three things to operate the estate:

  1. OCI console access to both EIDOSDev1 (via Tracey or Bradley) and ORA448Global (Adam Pitt-Stanley as owner, plus Vishnu as additional admin).
  2. Vault session — log in via vault.448.global using a corporate Microsoft 365 account that has been added to the appropriate Authentik group. Today this list is small (Vishnu, Bradley); expanding it is part of Phase 2.
  3. GoDaddy account for domain renewals and DNS changes. The credential is held by company leadership (Stacy + Adam); it is not in vault.448.global and not held by the engineering team. For DNS changes that need engineering input, the request goes through leadership.

Until Wave 1 closes the gaps above, the practical recommendation if Vishnu becomes unavailable:

  • Bradley can operate the Oracle layer and most of EIDOSDev1.
  • Tracey can engage Oracle commercially and authorise tenancy-level changes.
  • External engineer brought in on contract would need a 30-minute handover to learn the layout — this very document plus the runbooks under infra/runbooks/ is designed to be that handover.

8. Decisions needed from leadership

Specific approvals or directional calls leadership can take in the next finance / planning review.

Wave 1 — small, high-leverage approvals

# Decision Cost / impact Timing
D1 Approve paid Oracle DB tier upgrade for the Fourway tenant (KI-019, KI-035) +£200–£500/month Wave 1 — single largest line item
D2 Approve company-paid Tailscale subscription (or self-hosted Headscale equivalent) to remove the personal-account dependency <£50/month Wave 1
D3 Approve a small Backblaze B2 (or equivalent) account for off-region backup storage <£20/month Wave 1
D4 Approve engineering capacity of ~30% of one engineer for 6 months on Phase 2 engineering time Wave 1 sets the pattern

Wave 3 — strategic decisions

# Decision What it changes
D5 Microsoft 365 tenant rationalization (KI-030) — keep three or merge Long-term licensing efficiency + simplified off-boarding
D6 Migrate TnE Connect source from Bitbucket to GitLab (KI-034) Consolidates source-control onto our own infra; closes a third-party dependency
D7 Multi-region disaster recovery for paying-customer systems (Parallax + Fourway) Cost of a second-region paid ADB; cross-region replication setup
D8 Adopt or formalise an SLA for Fourway and other future paying customers Drives the resilience target for everything underneath

Strategic re-test (annual)

# Question
D9 Is continuing to self-host the right decision next year? Re-test the £25–35K/year saving against the engineering effort to maintain it.
D10 What is the right on-call structure as the team and customer base scale? The current informal model works at three people; it doesn't at ten.

9. How to use this document going forward

  1. Quarterly leadership review — re-read §3 cost, §5 risks, §6 plan progress. Pull current numbers into the placeholders.
  2. Annual strategic review — re-read §4 self-hosting argument and the D9 row in §8 (re-test the decision).
  3. Incident retrospective — when something goes wrong, a dated entry under infra/incidents/ (sibling files in that folder) updates this document's risk section.
  4. New-leader briefing — print this document; pair with a 30-minute Vishnu-led walk-through of the OCI consoles and Vault. That should be enough to operate the estate without ongoing technical handholding.

The full source-of-truth for everything in this document is the rest of this repository:

Everything in this document is derived from those sources and stays current as they are updated.