# Secrets, PKI, And Runtime Trust Model v1

Status: active — credential custody model defined (PSSM-PROD-C2-CREDENTIAL-CUSTODY-001); Phase 6 rotation-evidence and extraction-packet tasks remain open
Owner: Security / Infra / Platform Architecture
Last updated: 2026-06-03

## Purpose

Define the Phase 6 runtime-trust contract for secret purpose, credential
delivery, certificate lifecycle evidence, and extraction decisions.

Secrets / PKI is a coordination layer over Vault, step-ca, cert-manager,
Kubernetes Secrets, and `packages/shared/pki`. It owns purpose metadata,
delivery contracts, audit posture, and Status/Ops evidence. It does not replace
the underlying secret or certificate custody tools.

This model defines the baseline production credential-custody contract. It does
not by itself claim FIPS validation, HSM-backed custody, CMVP evidence,
FedRAMP-ready crypto, PCI CDE readiness, or HIPAA/ePHI readiness. Regulated
profile requirements are defined separately in
`Regulated_Crypto_And_Key_Custody_Decision_Package_v1.md`.
The CMVP/module evidence fields for those claims are defined in
`Regulated_Crypto_CMVP_Evidence_Model_v1.md`.
The regulated key custody read-model fields are defined in
`Regulated_Key_Custody_Read_Model_v1.md`.

## Scope

In scope:

- secret purpose contract;
- credential delivery contract;
- certificate lifecycle evidence;
- secret rotation evidence;
- grace-window and emergency-disable posture;
- extraction readiness packet requirements.

Out of scope:

- implementing a new vault;
- replacing step-ca or cert-manager;
- exposing secret material in read models;
- customer-managed private key custody;
- physical service extraction before service-auth and smoke evidence exist.

## Operating Principles

1. Raw secret material appears only at the approved one-time delivery boundary.
2. Read models, audit rows, evidence items, and portal pages must never include
   private keys, provider secrets, wrapped-token bytes, access tokens, refresh
   tokens, or raw Vault responses.
3. Every secret or certificate has a purpose, owner, custody tool, delivery
   mode, rotation schedule, and evidence component ID.
4. New issuance and renewal fail closed outside documented grace windows.
5. Existing valid short-lived credentials may continue until expiry when the
   issuer is unavailable, but the stale/expiring posture must be visible.
6. Extraction decisions require evidence packets, not deployment diagrams.

## Secret Purpose Contract

| Field | Meaning |
|---|---|
| `purpose_id` | stable ID such as `node_agent_client_cert` or `registry_pull_credential` |
| `owner_product_id` | product or platform service that owns the purpose |
| `material_kind` | `certificate`, `service_token`, `provider_credential`, `runtime_secret`, `signing_key` |
| `custody_tool` | `vault`, `step_ca`, `cert_manager`, or `kubernetes_secret` |
| `delivery_mode` | `vault_wrapped`, `mounted_secret`, `runtime_injection`, or `certificate_renewal` |
| `rotation_period` | expected maximum age before rotation |
| `grace_period` | allowed expiry/rotation exception window |
| `audit_action` | audit action emitted for privileged issuance, renewal, disable, or rotation |
| `evidence_component_id` | Status/Ops component row that proves health |
| `lifecycle` | `draft`, `active`, `deprecated`, `retired` |

Purpose registry — all active platform credential types:

| Purpose ID | Category | Custody | Delivery | Storage tier | Rotation period | Owner | One-time reveal | Evidence |
|---|---|---|---|---|---|---|---|---|
| `node_agent_client_cert` | certificate | step-ca | certificate renewal | ephemeral | 24 h | platform_automated | no | `runtime-cert-rotation` |
| `worker_client_cert` | certificate | step-ca | certificate renewal | ephemeral | 24 h | platform_automated | no | `runtime-cert-rotation` |
| `ingress_wildcard_cert` | certificate | cert-manager | mounted secret | kubernetes_secret | 60 d | platform_automated | no | `runtime-cert-rotation` |
| `registry_pull_credential` | runtime secret | Vault KV | mounted secret | vault_kv | 90 d | platform_ops | no | `secret-rotation` |
| `app_runtime_provider_credential` | runtime secret | Vault KV | vault wrapped | vault_kv | 90 d | platform_ops | no | `secret-rotation` |
| `platform_service_account_token` | service-account key | Vault KV | runtime injection | vault_kv | 90 d | iam_facade | **yes** | `secret-rotation` |
| `api_client_key` | API key | Vault KV | runtime injection | vault_kv | 90 d | iam_facade | **yes** | `secret-rotation` |
| `platform_recovery_token` | recovery token | Vault KV | runtime injection | vault_kv | 30 d | platform_ops | **yes** | `secret-rotation` |
| `oidc_client_secret` | OIDC secret | Vault KV | mounted secret | vault_kv | 90 d | keycloak_admin | no | `secret-rotation` |
| `terminal_gateway_session_key` | gateway credential | Vault KV | mounted secret | vault_kv | 90 d | platform_ops | no | `secret-rotation` |
| `jwks_signing_key` | signing key | Vault KV | mounted secret | vault_kv | 90 d | keycloak_admin | no | `secret-rotation` |
| `provisioning_control_key` | signing key | Vault KV | mounted secret | vault_kv | 60 d | platform_ops | no | `secret-rotation` |
| `node_task_signing_key` | signing key | Vault transit | mounted secret | vault_transit | 60 d | platform_ops | no | `task-signer-version-custody` |

New purpose IDs require a registry entry in `packages/platform/secrets/registry.go` and
a row in this table before any code can issue or consume that credential type.

## Credential Delivery Contract

Credential delivery is metadata plus a one-time delivery boundary.

| Field | Meaning |
|---|---|
| `purpose_id` | secret purpose being delivered |
| `caller_product_id` | product requesting the credential |
| `environment` | environment/profile such as `kind`, `demo`, or `platform-control` |
| `subject` | workload, node, service account, route, or app instance |
| `scopes` | platform scopes authorized for this delivery |
| `credential_source` | Vault path, cert-manager Secret ref, step-ca profile, or IAM issuer ref |
| `delivery_mode` | one of the purpose-approved delivery modes |
| `audience` | service or API that will accept the credential |
| `expires_in` | credential lifetime |
| `correlation_id` | trace and audit correlation |

Rules:

- delivery mode must match the registered purpose;
- `expires_in` must not exceed `rotation_period + grace_period`;
- credential delivery writes audit for privileged issuance or rotation;
- read models may store session IDs, purpose IDs, expiry, rotation due time,
  and evidence hrefs, but not credential material;
- emergency disable revokes future issuance and records residual risk for any
  already-issued short-lived credential that cannot be revoked immediately.

## Rotation Evidence

Status/Ops receives two runtime-trust component rows from
`scripts/ci/platform_status_snapshot.sh`:

| Component | Type | Input examples |
|---|---|---|
| `runtime-cert-rotation` | `runtime_trust` | `PLATFORM_STATUS_CERT_MIN_REMAINING_DAYS`, `PLATFORM_STATUS_CERT_RENEWAL_FAILURES`, `PLATFORM_STATUS_CERT_GRACE_EXCEPTIONS` |
| `secret-rotation` | `runtime_trust` | `PLATFORM_STATUS_SECRET_MAX_AGE_DAYS`, `PLATFORM_STATUS_SECRET_ROTATION_FAILURES`, `PLATFORM_STATUS_SECRET_GRACE_EXCEPTIONS` |

Default posture:

- cert remaining days `<= 3` is unhealthy;
- cert remaining days `<= 14` is degraded;
- secret max age `>= 60` days is unhealthy;
- secret max age `>= 30` days is degraded;
- any renewal/rotation failure or grace exception is degraded by default;
- absent metrics produce `unknown` with `details.missing_artifact`.

## Storage Tier Model

Platform credential material must rest at exactly one of these tiers. Services must not
introduce tiers outside this set without a security architecture review and a doc update.

| Tier | Identifier | Constraint |
|---|---|---|
| Vault transit | `vault_transit` | Signing and encryption only; raw key material never leaves Vault. Target for all signing keys. |
| Vault KV | `vault_kv` | Encrypted at rest in Vault; accessed via Kubernetes auth or workload identity. Default for all secrets. |
| Kubernetes Secret | `kubernetes_secret` | Transitional cache only; source of truth must be Vault-backed and documented as migrating. No new net-new use. |
| Ephemeral | `ephemeral` | In-memory, not persisted; for certificates delivered by step-ca renewal or derived per-request material. |

Storage tier names describe custody and delivery posture. They are not
regulated-profile claims. A regulated profile must additionally prove HSM/KMS
backing, FIPS/module status, CMVP evidence, key lifecycle evidence, and
separation-of-duties controls for every in-scope key purpose.

Rotation owner values:

| Owner | Identifier | Meaning |
|---|---|---|
| Platform automated | `platform_automated` | cert-manager or step-ca drives renewal; no human trigger required. |
| Platform ops | `platform_ops` | Ops team executes rotation via documented runbook. |
| IAM facade | `iam_facade` | IAM service API drives rotation lifecycle end-to-end. |
| Keycloak admin | `keycloak_admin` | Keycloak admin API rotates OIDC/JWKS material; ops triggers. |
| Stripe platform | `stripe_platform` | Stripe provider owns key rotation; ops triggers via Stripe dashboard/API. |

## One-Time Reveal Boundary

Credentials marked `one_time_reveal: true` in the registry follow this boundary:

- Raw credential material is returned exactly once to the authorized caller at issuance.
- No platform service, read model, portal page, or audit record may re-expose the raw
  material after delivery.
- Subsequent API calls for the same credential must return only metadata:
  `purpose_id`, `expires_at`, `rotation_due_at`, `status`, `evidence_href`.
- The stored form is a hash or Vault reference, never the raw material.
- If the caller loses the credential, the only recovery path is rotation (issue new,
  revoke old). No "show again" path exists.

Credentials with `one_time_reveal: false` (certificates, OIDC secrets, gateway keys)
may be re-fetched from Vault or the issuing CA at any time by the authorized workload,
but not by product services or user-facing APIs.

## Revocation, Emergency Disable, and Compromise Response

### Planned Revocation

For each credential type the registry defines a `RevocationPath`. Revocation always
writes a `RevocationRecord` to `audit_logs` before the credential is invalidated.
Required audit fields: `purpose_id`, `subject`, `correlation_id`, `revoked_at`,
`revoked_by`, `revoked_by_role`, `reason`, `residual_risk`.

`residual_risk` must be populated when any already-issued short-lived credentials
(certificates, tokens, wrapped secrets) cannot be immediately revoked and will remain
valid until natural expiry. Acceptable entries: `"none"` or a plain-language description
of the window and scope.

### Emergency Disable

Emergency disable stops all future issuance for a purpose immediately. It does not
retroactively invalidate already-issued short-lived credentials. The `EmergencyDisableAction`
field in each registry entry names the action to execute.

Emergency disable procedure (all credential types):

1. Execute the `EmergencyDisableAction` for the affected purpose.
2. Write a `RevocationRecord` with `reason: "emergency_disable"` and a populated
   `residual_risk` field for any in-flight credentials.
3. Confirm the issuer (Vault policy, step-ca provisioner, Keycloak client) rejects
   new issuance requests.
4. Continue monitoring until all residual credentials expire or are individually revoked.

### Compromise Response Classification

| Trigger | Severity | Immediate actions |
|---|---|---|
| Private key or signing key material exfiltrated or suspected exfiltrated | SEV-1 | Emergency disable → rotate → force-refresh JWKS (`POST /internal/auth/jwks/refresh`) → revoke all active sessions for affected purpose |
| Long-lived API key or service-account token leaked (confirmed) | SEV-1 | Revoke token immediately → rotate → audit all calls made with compromised credential |
| OIDC client secret exposed | SEV-1 | Rotate via Keycloak admin API → update Vault KV → redeploy affected services |
| Recovery token exposed | SEV-1 | Invalidate all recovery tokens (`platform.auth.recovery_token.invalidate_all`) → issue new tokens only to verified operators |
| Gateway session key exposed | SEV-1 | Rotate → redeploy terminal-gateway → invalidate all active terminal sessions |
| Certificate private key exposed | SEV-1 | Revoke via step-ca/cert-manager → re-issue new cert → update all nodes/workers |
| Rotation failure (not compromise) | SEV-3 | Investigate failure root cause → retry rotation → confirm Status/Ops evidence clears |
| Grace window exceeded | SEV-3 | Escalate to rotation owner → execute runbook → record evidence |

All SEV-1 compromise responses require a post-incident review entry in
`doc/operations/evidence/secrets_key_ops.md` within 48 hours.

## Audit and Evidence Rules

Every privileged secret operation must write an `audit_logs` row. Minimum required fields:

| Field | Value |
|---|---|
| `actor_user_id` | authenticated operator or service account ID |
| `actor_role` | role at time of operation |
| `action` | the `audit_action` from the credential's `SecretPurpose` |
| `target_type` | `"credential"` |
| `target_id` | `purpose_id:subject` composite |
| `result` | `"success"` or `"failure"` |
| `correlation_id` | request correlation ID |

Privileged operations that require an audit row:

- credential issuance (new or renewal);
- credential rotation;
- credential revocation;
- emergency disable;
- grace-window exception granted.

Status/Ops evidence components (`runtime-cert-rotation`, `secret-rotation`) must be
populated by runtime jobs for every active credential type. Absent metrics produce
`unknown` status and block release promotion when required evidence is missing.

## Product Custody Prohibition

Product services — GPUaaS, Token Factory, JupyterLab operator, or any future product —
must not store, issue, rotate, or cache raw credential material for any purpose registered
in `WellKnownCustodySpecs`.

The enforcement mechanism is `AssertPlatformCustody(spec)` in
`packages/platform/secrets/custody.go`. Any call path in product code that would take
custody of a registered credential must call this function first. If the function returns
`ErrProductCustodyDenied`, the operation must be delegated to the platform IAM facade,
Vault, or the relevant custodian tool.

Exceptions (when `ProductCustodyAllowed` may be set to `true`) require:

1. written approval from security architecture in a linked ADR or security review;
2. a corresponding registry update naming the product, scope, and expiry of the exception;
3. an audit action and rotation owner assigned to the product for the duration.

No exception may be self-granted by the product team.

## Extraction Decision Rule

Secrets/PKI must stay as a coordination contract until these are true:

- service-auth packet exists for each product-to-platform caller;
- purpose registry or schema-backed contract exists;
- Status/Ops cert and secret rotation rows are populated from runtime jobs;
- Vault/step-ca/cert-manager health evidence is available;
- split smoke proves issuance, renewal, disable, rollback, and stale-status
  behavior;
- emergency disable and residual-risk handling are tested.

The current recommendation is `keep_in_process` for the coordination layer and
`split_worker` only for bounded rotation/evidence collectors when runtime jobs
prove they need independent cadence or credentials.

## Related Artifacts

- `packages/platform/secrets` — purpose, custody spec, and revocation types; `WellKnownCustodySpecs` registry
- `packages/shared/pki`
- `doc/architecture/PKI_Spec.md`
- `doc/architecture/Node_Agent_Host_Certificate_Lifecycle_v1.md`
- `doc/architecture/Platform_Vault_Secrets_Baseline_v1.md` — secret classes and Vault path model
- `doc/architecture/platform-foundation/Regulated_Crypto_And_Key_Custody_Decision_Package_v1.md` — baseline versus regulated crypto and key-custody boundary
- `doc/architecture/platform-foundation/Regulated_Key_Custody_Read_Model_v1.md` — regulated key custody read-model contract
- `doc/operations/runbooks/Key_Rotation_and_Compromise_Response_Runbook.md` — JWKS/terminal/provisioning runbook
- `.fairway/artifacts/platform-shared-services-extraction-packets.yaml`
