# Platform Control CI/CD Multi-Environment Model

As of: 2026-05-24

## Purpose

The platform-control release path now targets more than one environment:
local-kind, standalone dev-control, demo, and later staging/production profiles.
The previous model let deploy scripts combine profile-owned values with generic
GitLab CI globals. That caused late failures such as runtime images published to
`aicloud-dev-registry.core42.dev` while Kubernetes image pull credentials still
targeted a retired `retired IP-derived DNS` registry.

This document defines the environment/profile split and the gates required
before any deploy mutates a target.

## Layer Model

### Environment

An environment is a durable target such as:

- `local-kind`
- `dev-control`
- `demo`
- `staging`
- `prod-public-ingress`
- `prod-private-ingress`
- `airgapped-private-ca`

The environment owns durable facts:

- cluster endpoint and access method
- DNS and edge implementation
- public and private hostnames
- namespace names
- registry host and credentials source
- storage roots and persistent volume policy
- provider reachability constraints
- observability endpoints
- allowed bootstrap modes

### Release Profile

A release profile defines how a release is applied to an environment.

Examples:

- `dev-control-rke2`
- `demo-rke2`
- `web-fast`
- `api-fast`
- `runtime-fast`
- `node-agent-fast`
- `validation-only`

The profile owns operational behavior:

- deploy mode
- preflight mode
- validation suite
- rollout timeout policy
- allowed fast-lane shortcuts
- whether registry bootstrap is required
- whether app namespaces are expected before deploy
- whether public endpoint readiness is checked before or after deploy

### Release Artifact

The release artifact is immutable and environment-neutral:

- source commit SHA
- runtime image digests
- release artifact refs
- node bootstrap package refs
- schema/seed identifiers
- generated SDK refs
- release manifest schema version

This is the unit promoted between environments.

### Gate Policy

Gate policy maps an environment/profile pair to required checks.

Examples:

- `dev-control-rke2`: contract checks, resolved-profile validation, bootstrap
  preflight, deploy, smoke validation.
- `demo-rke2`: full release validation, workflow smoke, terminal smoke, app
  route smoke.
- `api-fast`, `web-fast`, `runtime-fast`, and `node-agent-fast` with
  `PLATFORM_CONTROL_RELEASE_ENV_PROFILE=dev-control-rke2`: dev-control-scoped
  fast lanes that publish/deploy only the selected runtime set and require
  dev-control-specific SSH credentials.
- `prod-public-ingress`: approval, canary, rollback plan, status notice,
  full validation, incident escalation path.
- `validation-only`: no mutation, read-only health and drift checks.

## Single Source Of Truth

Every deploy target must resolve exactly one environment/profile contract before
package, preflight, deploy, or validation jobs run.

The resolved contract is the only source scripts may consume for:

- runtime image registry host
- release artifact registry host
- pull secret name
- pull/publish credential source
- Kubernetes overlay
- namespace names
- public URLs
- terminal and notification websocket URLs
- node API URL
- remote kubectl command
- preflight mode

Generic CI variables such as `CI_REGISTRY`, `CI_REGISTRY_IMAGE`, and
`CI_REGISTRY_PASSWORD` are build-system inputs. They are not environment
contracts and must not override a named release profile.

Allowed precedence:

1. explicit release profile values
2. environment descriptor values rendered into the release profile
3. legacy `CI_REGISTRY*` fallback only for the legacy/standard profile

Named profiles must fail closed if a required value is missing.

## Resolved Profile Artifact

CI/CD must generate a resolved profile artifact before deploy intent proceeds:

`dist/platform-control-profile-resolved.json`

Required fields:

- `profile`
- `environment`
- `cluster.service`
- `cluster.kubectl`
- `cluster.ssh_host`
- `k8s.overlay`
- `namespaces.core`
- `namespaces.infra`
- `namespaces.observability`
- `registry.host`
- `registry.runtime_image_repo_prefix`
- `registry.pull_secret`
- `registry.username_present`
- `registry.password_present`
- `public_urls.app`
- `public_urls.api`
- `public_urls.auth`
- `public_urls.terminal`
- `public_urls.registry`
- `preflight.mode`
- `preflight.required_namespaces`
- `preflight.workload_baseline`
- `preflight.public_readiness`

Secrets must be represented only as presence booleans or secret references.
Passwords, tokens, kubeconfigs, and private keys must never be written to the
artifact.

## Required Invariants

The resolved profile validation must fail before deploy if any invariant is
violated.

### Registry Invariants

- Runtime image digest hosts must match the resolved runtime registry host.
- Kubernetes image pull secret host must match the runtime image digest host.
- Publish credentials and pull credentials must be resolved from the selected
  environment/profile.
- Named Cloudflare profiles must not resolve active hosts or repo prefixes to
  retired `retired IP-derived DNS` domains.
- Registry bootstrap must complete before any package fanout that pushes to the
  target registry.

### Endpoint Invariants

- Browser-facing URLs must match the selected environment edge namespace.
- WebSocket URLs must use the profile terminal/notification hosts.
- Internal node URLs must reflect provider reachability, not browser URL
  convenience.
- Public readiness checks must run only after the relevant routes are expected
  to exist.

### Bootstrap Invariants

- Bootstrap preflight checks only the target can accept deploy:
  host access, Kubernetes readiness, registry bootstrap, and explicitly
  pre-existing namespaces.
- Bootstrap preflight must not require app namespaces or app public endpoints
  before first app overlay apply.
- Steady-state preflight may require app namespaces, core config, app secrets,
  current service health, and public readiness.

### Manifest Invariants

- Every runtime image in the release manifest must be pinned by digest.
- The release manifest source commit must match the promoted release candidate.
- Deploy must consume the manifest; it must not rebuild artifacts.
- Post-deploy validation must compare live Kubernetes image refs against the
  manifest and report drift.

## Pipeline Phases

### 1. Resolve Profile

Input: selected release profile.

Output: resolved profile artifact.

Gate: fail on invalid profile values, stale domains, registry mismatch, missing
required credentials, or missing cluster access fields.

### 2. Code CI

Input: commit.

Output: code correctness signal.

No shared environment mutation.

### 3. Build And Publish Artifacts

Input: commit and resolved profile when target registry publication is required.

Output: immutable artifact refs and runtime image digests.

Build registry and runtime/deploy registry are separate concepts. If they are
the same for a profile, the resolved profile should say so explicitly.

### 4. Assemble Release Manifest

Input: immutable refs.

Output: release manifest.

No environment mutation.

### 5. Environment Preflight

Input: resolved profile plus release manifest.

Output: readiness report.

Bootstrap and steady-state modes are different. The selected profile must name
which mode is active.

### 6. Deploy

Input: resolved profile plus release manifest.

Actions:

- apply overlay
- apply migrations by policy
- create/update pull secret from resolved profile
- validate pull secret host against manifest image hosts
- patch deployments to manifest digests
- wait for rollouts
- write deploy evidence

No compilation or image building.

### 7. Remote Validation

Input: resolved profile plus release manifest plus deploy evidence.

Validation must be split into rerunnable phases:

- health
- runtime config
- observability
- authz
- artifact lifecycle
- app runtime trace
- terminal/node path when selected
- cleanup
- release drift

### 8. Status And Version Evidence

The deployed environment should expose intended-vs-actual component state:

- expected release manifest
- live deployment image digest
- pod readiness
- public route status
- registry status
- provider adapter status
- validation evidence
- planned maintenance notices
- unplanned incident notices

This is the product/operator surface for release drift and SLA evidence.

## Environment/Profile Matrix

| Environment | Profile | Edge | Registry | Preflight | Validation |
|---|---|---|---|---|---|
| `local-kind` | `local-kind` | local/Cloudflare-compatible | local/kind registry | steady-state after bootstrap | local parity |
| `dev-control` | `dev-control-rke2` | Cloudflare dev hosts | `aicloud-dev-registry.core42.dev` | bootstrap on first deploy, steady-state after | smoke + drift |
| `demo` | `demo-rke2` | Cloudflare demo hosts | `aicloud-demo-registry.core42.dev` | steady-state | full demo workflow |
| `staging` | planned | production-like | staging registry | steady-state | full release |
| `prod-public-ingress` | planned | public ingress | production registry | steady-state + approval | canary + full |
| `prod-private-ingress` | planned | private ingress | production/private registry | steady-state + approval | canary + full |
| `airgapped-private-ca` | planned | site-local private CA | site-local registry | offline bootstrap | offline validation |

## Implementation Direction

Immediate work:

1. keep `scripts/ci/platform_control_resolve_release_profile_contract.sh` as
   the profile resolver/validator;
2. run it before package, preflight, deploy, and validation jobs;
3. make deploy consume only resolved registry/profile values;
4. add manifest-vs-pull-secret host validation before rollout;
5. update docs/runbooks to distinguish bootstrap and steady-state preflight.

Follow-up work:

1. move profile definitions from shell-only exports toward structured profile
   descriptors;
2. generate shell env from the structured descriptor;
3. add live drift read models for the status/version page;
4. retire legacy `dev-control` overlay paths that still encode `retired IP-derived DNS` once
   all active users are on `dev-control-rke2`.
