# Security Architecture Review Triage v1

Status: current-state triage and production-readiness backlog source.

Date: 2026-06-03

Owner: `SEC-ARCH-REVIEW-TRIAGE-001`

## Purpose

Classify the security feedback in `~/Downloads/sec-review` against the current
GPUaaS architecture instead of treating the older `Core42_GPUaaS_Cloud.pdf`
whitepaper as the source of truth.

The review feedback is useful, but the reviewed PDF is stale. It predates the
Platform Shared Services Model, Fairway queue model, IAM department hierarchy,
current route namespace cleanup, Pomerium/managed-ingress direction, and several
runtime trust changes. This triage separates:

1. criticism that is correct against the stale PDF;
2. gaps that remain correct against the current repo;
3. stale assumptions that should not create implementation tasks;
4. action items needed before production or regulated-customer claims.

## Source Artifacts

Reviewed files:

1. `~/Downloads/sec-review/Core42_GPUaaS_Cloud.pdf`
2. `~/Downloads/sec-review/GPUaaS Cloud Security Architecture Review.docx`
3. `~/Downloads/sec-review/GPUaaS_Architecture_Review_Findings.docx`

Current repo anchors:

1. `doc/architecture/platform-foundation/Platform_Shared_Services_Model_v2.md`
2. `doc/architecture/platform-foundation/Platform_Shared_Services_Completion_Roadmap_v1.md`
3. `doc/architecture/IAM_MFA_Policy_and_Keycloak_Enforcement_v1.md`
4. `doc/architecture/Platform_IAM_Model_v1.md`
5. `doc/architecture/Node_Agent_Spec.md`
6. `doc/architecture/PKI_Spec.md`
7. `doc/architecture/MAAS_Bare_Metal_Lifecycle_v1.md`
8. `doc/architecture/Allocation_Capacity_Shapes_and_GPU_Slices_v1.md`
9. `doc/operations/Enterprise_Readiness_Gap_Work_Plan_v1.md`
10. `doc/architecture/Production_Deployment_Readiness_v1.md`

## Executive Classification

The security review is directionally correct, but its evidence target is stale.
The PDF makes claims the current platform should not repeat as current state:

1. FedRAMP-ready posture;
2. PCI-DSS alignment without CDE scoping;
3. HIPAA readiness without Security Rule risk analysis;
4. cryptographic audit immutability;
5. PostgreSQL RLS as the primary tenant isolation boundary;
6. software Vault and crypto posture sufficient for regulated/FIPS use;
7. optional MFA as adequate for privileged access.

Current GPUaaS should replace those broad claims with a current-state security
architecture package and tracked production-readiness backlog.

## Findings Disposition

| Review finding area | Correct against PDF? | Current repo disposition | Action |
|---|---:|---|---|
| FedRAMP-ready claim unsupported | Yes | Do not claim FedRAMP-ready as current state. No 3PAO RAR, SSP, FIPS boundary, or FedRAMP authorization package exists. | Remove or qualify stale claims; create compliance posture matrix. |
| PCI-DSS claim unsupported | Yes | Current platform uses Stripe/provider payments; PCI scope must be explicitly declared in or out. No CDE package exists. | Create PCI scope decision and payment data-flow evidence. |
| HIPAA claim unsupported | Yes | No HIPAA Security Rule risk analysis, ePHI data classification, BAA package, or breach-notification matrix exists. | Treat HIPAA as future regulated-customer readiness, not current claim. |
| MFA optional | Yes | Architecture decision now requires MFA for platform admin/ops first slice, but implementation is pending. | Continue `IAM-MFA-*` tasks; add no duplicate MFA epic. |
| Audit immutability overclaimed | Yes | Current audit/ledger append-only posture is useful but not cryptographic/WORM immutability. | Add tamper-evident audit and WORM/Object Lock design task. |
| Tenant isolation relies only on RLS | Yes for PDF | Current architecture is not accurately described by the PDF; repo emphasizes IAM/project scope, domain boundaries, policy, and product/platform contracts. Isolation evidence is still incomplete. | Create tenant-isolation evidence package spanning API authz, DB constraints, Redis/NATS/Temporal/cache, and negative tests. |
| GPU workload isolation unspecified | Yes | Current docs now distinguish bare metal, VM slices, PCI passthrough, MAAS full reimage, and user-revoke/default isolation. Evidence still needs consolidation. | Create workload isolation model/evidence package, including fabric/RDMA and slice constraints. |
| Node bootstrap weakly bound to MAC | Directionally | Current node model has step-ca/mTLS, enrollment tokens, optional TPM env, MAAS onboarding flows. Hardware-rooted attestation remains future hardening. | Create node trust hardening roadmap for TPM/secure boot/attestation and enrollment approval. |
| FIPS crypto modules missing | Yes | Current platform should not claim FIPS/FedRAMP crypto posture. | Create regulated-crypto decision package; do not block normal production baseline on FIPS unless a regulated profile is selected. |
| Vault/HSM gap | Yes for regulated profile | Current Vault/secrets baseline is production-readiness work, but HSM-backed KEKs are not current baseline. | Track under regulated-crypto/key-custody task. |
| Session timeout too permissive | Yes for PDF | Current Keycloak/session policy needs explicit risk-tier settings for privileged/in-scope users. | Create session policy task or attach to IAM-MFA implementation. |
| Retention/erasure conflicts | Yes | Current docs contain scattered retention references. | Create data classification, retention, legal hold, and erasure matrix. |
| Incident reporting/SOC model incomplete | Mostly | Repo has runbooks and ops surfaces, but no single outbound notification/SOC operating matrix. | Create incident notification and SOC operating model task. |
| Supply-chain controls incomplete | Yes | Repo already notes package/SBOM/provenance is scaffold-level. | Track SBOM, provenance, signing, runner hardening, and evidence gate. |
| Architecture package insufficient | Yes | Docs are rich but fragmented; doc portal cleanup is underway. | Create current security architecture package and retire stale PDF. |

## Current-State Principles

1. Do not use `Core42_GPUaaS_Cloud.pdf` as an active source of truth.
2. Do not claim compliance certifications or readiness labels unless evidence
   exists in the repo and an owner accepts the claim.
3. Separate production baseline from regulated-profile hardening. FIPS, HSM,
   FedRAMP, PCI CDE, and HIPAA ePHI readiness are profile-specific programs.
4. Preserve the PSSM boundary: IAM, billing, audit/evidence, status, policy,
   registry, and credential custody are platform services, not per-product
   security inventions.
5. Make evidence API-first where possible. Direct SQL is acceptable only while
   the owning read model is missing.
6. Treat docs, code, tests, runbooks, and Fairway evidence as one security
   architecture package.

## Required Work Packages

### 1. Stale Security Whitepaper Retirement

Replace the PDF as an active architecture artifact with a current-state security
architecture map. The new map must identify which old claims are superseded,
which are future regulated-profile requirements, and which are current controls.

### 2. Compliance Claims And Scope Matrix

Create a short posture matrix:

1. current production baseline;
2. future enterprise baseline;
3. future regulated profile;
4. explicit non-claims.

At minimum, cover FedRAMP, PCI, HIPAA, SOC 2, ISO 27001, GDPR/UAE PDPL, and
customer responsibility boundaries.

### 3. Tenant And Workload Isolation Evidence

Produce a testable evidence package for:

1. IAM org/department/project/resource scope;
2. API negative authorization tests;
3. database constraints and query boundaries;
4. Redis keyspace and session isolation;
5. NATS subject and consumer boundaries;
6. Temporal workflow namespace/task isolation;
7. terminal/proxy/app runtime tenant boundaries;
8. GPU bare-metal, VM slice, and fabric/RDMA isolation assumptions.

### 4. Audit Tamper-Evidence

Define the next maturity step beyond append-only audit rows:

1. hash-chained audit batches;
2. signing key custody;
3. external replication;
4. WORM/Object Lock retention profile;
5. DBA/operator separation of duties;
6. alerting on audit pipeline control changes.

### 5. Node Trust Hardening

Document the production and regulated-profile path for:

1. enrollment token issuance and approval;
2. TPM private-key storage;
3. TPM quote/attestation or explicit deferral;
4. secure/measured boot;
5. firmware/BMC trust;
6. node quarantine/re-enrollment;
7. MAAS site profile evidence.

### 6. Regulated Crypto And Key Custody

Decide which requirements apply to baseline versus regulated profiles:

1. FIPS-validated modules;
2. HSM-backed KEKs;
3. Vault Enterprise/FIPS or managed KMS/HSM;
4. WireGuard versus FIPS IPsec for regulated node traffic;
5. CMVP certificate evidence expectations.

### 7. Data Retention And Erasure Matrix

Create one matrix covering:

1. audit logs;
2. usage/rating/ledger lines;
3. payment records;
4. support/incident data;
5. runtime/app logs;
6. backups;
7. object/storage data;
8. legal hold;
9. deletion and pseudonymization behavior.

### 8. Incident/SOC Operating Model

Tie current runbooks into:

1. severity definitions;
2. SOC/on-call coverage assumptions;
3. MTTA/MTTD targets;
4. customer/regulator notification matrix;
5. evidence custody;
6. post-incident review requirements.

### 9. Supply Chain Evidence Gate

Finish the existing scaffolded supply-chain posture:

1. SBOM generation;
2. SBOM signing;
3. image/artifact signing;
4. SLSA/in-toto provenance;
5. CI runner hardening;
6. exception handling;
7. release evidence attachment.

## Fairway Package

This triage creates the `SEC-ARCH-REVIEW-EPIC` package in the active Fairway
queue. `SEC-ARCH-REVIEW-TRIAGE-001` owns this document and may be closed once
the queue package is imported.

The remaining tasks should be ordered as:

1. `SEC-ARCH-REVIEW-CURRENT-STATE-DOC-001`
2. `SEC-ARCH-COMPLIANCE-SCOPE-MATRIX-001`
3. `SEC-ARCH-TENANT-WORKLOAD-ISOLATION-EVIDENCE-001`
4. `SEC-ARCH-AUDIT-TAMPER-EVIDENCE-001`
5. `SEC-ARCH-NODE-TRUST-HARDENING-001`
6. `SEC-ARCH-RETENTION-ERASURE-MATRIX-001`
7. `SEC-ARCH-INCIDENT-SOC-MODEL-001`
8. `SEC-ARCH-SUPPLY-CHAIN-EVIDENCE-GATE-001`
9. `SEC-ARCH-REGULATED-CRYPTO-KEY-CUSTODY-001`

MFA is intentionally not duplicated here. It remains under `IAM-MFA-EPIC`.

## Definition Of Done

The security review triage is complete when:

1. stale PDF assumptions are explicitly superseded;
2. all valid current gaps have Fairway tasks;
3. current compliance claims are separated from future regulated-profile work;
4. MFA is linked to existing `IAM-MFA-*` tasks, not duplicated;
5. the active Fairway queue imports successfully.
