# Platform Proxy Provider-Neutral Edge Model v1

Status: draft
Date: 2026-05-15
Related:
- `doc/architecture/Platform_Proxy_OSS_Data_Plane_ADR_v1.md`
- `doc/architecture/Pomerium_Edge_Migration_Next_Steps_v1.md`
- `doc/operations/proxy/pomerium_spike_report.md`

## Purpose

Cloudflare Tunnel is an implementation detail for local and demo access. It is
not part of the GPUaaS platform proxy product contract.

The durable edge model is:

```text
DNS/TLS/edge provider -> Pomerium -> platform/app upstreams
```

GPUaaS route intent stays provider-neutral. Pomerium is the first route
renderer. Cloudflare, local DNS, public ingress, private ingress, and
air-gapped private PKI are deployment profiles around that renderer.

## Non-Negotiable Design Rules

1. Product/API/manifest contracts must not mention Cloudflare, Tunnel IDs,
   `core42.dev`, `aicloud-kind-*`, cert-manager issuer names, or DNS provider
   resources.
2. Proxy route intent uses GPUaaS-owned fields: owner, endpoint, host, auth
   pattern, target, policy, and lifecycle state.
3. Edge-provider configuration is environment automation, not app metadata.
4. Every edge profile must have smoke tests and operations evidence that can
   separate:
   - edge-provider failure,
   - Pomerium failure,
   - IdP/OIDC failure,
   - upstream service failure,
   - GPUaaS route-intent/reconcile failure.
5. Debuggability is part of done. A route is not production-ready until logs,
   metrics, traces, and runbook pivots exist for its failure modes.

## Supported Edge Profiles

| Profile | Primary Use | DNS | TLS / CA | Edge Provider | IdP Callback |
|---|---|---|---|---|---|
| `kind_cloudflare` | Local/demo public sharing | Cloudflare DNS / Tunnel hostnames | Cloudflare edge TLS; local proxy to kind | Cloudflare Tunnel + local Docker nginx proxies | Public `https://<authn-host>/oauth2/callback` |
| `kind_local_dns` | Offline workstation parity | `/etc/hosts`, local resolver, or `*.localhost` | mkcert or local dev CA | kind ingress NodePort / local reverse proxy | Local `https://<authn-host>/oauth2/callback` |
| `prod_public_ingress` | Normal internet-facing SaaS | Public DNS provider | Public ACME, usually cert-manager DNS-01 | LoadBalancer/Ingress/F5/HAProxy/NGINX before Pomerium | Public callback URL |
| `prod_private_ingress` | Private enterprise network | Corporate/private DNS | Enterprise CA or private ACME | Internal LoadBalancer/Ingress/F5/HAProxy/NGINX | Private callback URL reachable from users and IdP |
| `airgapped_private_ca` | Disconnected customer site | Private DNS only | Site private CA, offline-distributed trust bundle | Internal ingress or appliance only | Local IdP/OIDC callback, no public dependency |

Cloudflare is only the `kind_cloudflare` realization unless a customer
explicitly chooses it for production.

## Portable Host Namespace Contract

Every edge profile should expose the same logical host namespaces. The edge
implementation may differ, but app manifests, route intent, V3 UX, and smoke
tests must not.

Canonical namespace shape:

```text
Core product/API       <service>.<env>.aicloud.core42.dev
Platform tools         <tool>.platform.<env>.aicloud.core42.dev
Runtime app endpoints  <route>.apps.<env>.aicloud.core42.dev
```

Examples:

```text
app.kind.aicloud.core42.dev
api.kind.aicloud.core42.dev
auth.kind.aicloud.core42.dev
authn.kind.aicloud.core42.dev
term.kind.aicloud.core42.dev

grafana.platform.kind.aicloud.core42.dev
headlamp.platform.kind.aicloud.core42.dev

code.apps.kind.aicloud.core42.dev
openclaw.apps.kind.aicloud.core42.dev
web-1236f62f.apps.kind.aicloud.core42.dev
openai-1236f62f.apps.kind.aicloud.core42.dev
```

Some edge implementations cannot physically serve the canonical deep hostnames
without extra certificate products. In those cases, the environment may use a
profile-specific physical hostname shape, but it must still preserve the same
logical categories (`core`, `platform`, `apps`) in route intent, V3 UX, smoke
tests, and operator docs.

Runtime app endpoint hosts stay flat under `*.apps.<env>.aicloud.core42.dev`.
Do not use multi-label app hosts such as
`web.1236f62f.apps.<env>.aicloud.core42.dev` unless the environment explicitly
provisions deeper wildcard certificates. A single-label wildcard is the default
portable contract.

Each edge profile must prove TLS coverage for the canonical host namespace it
serves. For Cloudflare Tunnel, DNS wildcard records are not enough by
themselves: the Cloudflare edge certificate must cover
`*.platform.<env>.aicloud.core42.dev` and `*.apps.<env>.aicloud.core42.dev`.
If the account only has an apex `*.core42.dev` edge certificate, the request
fails at Cloudflare before it reaches `cloudflared`.

For `kind_cloudflare` with only the current Universal SSL pack
(`core42.dev`, `*.core42.dev`), use classed single-label hostnames:

```text
Core product/API       aicloud-kind-<service>.core42.dev
Platform tools         aicloud-kind-platform-<tool>.core42.dev
Runtime app endpoints  aicloud-kind-apps-<route>.core42.dev
```

Examples:

```text
aicloud-kind-app.core42.dev
aicloud-kind-api.core42.dev
aicloud-kind-auth.core42.dev
aicloud-kind-authn.core42.dev
aicloud-kind-term.core42.dev

aicloud-kind-platform-grafana.core42.dev
aicloud-kind-platform-swagger.core42.dev

aicloud-kind-apps-code.core42.dev
aicloud-kind-apps-openclaw.core42.dev
aicloud-kind-apps-jupyter.core42.dev
aicloud-kind-apps-openai.core42.dev
```

This is a physical Cloudflare edge constraint, not the product contract. It
also means `kind_cloudflare` cannot fully satisfy "new app route without any
Cloudflare edit" until Cloudflare has certificate/DNS support for the canonical
wildcard namespaces.

Path-prefix app routing under `app.<env>.aicloud.core42.dev/<app>` is not the
default. It is allowed only when the app manifest declares that the app supports
base-path hosting and the route renderer can prove rewrites, cookies,
websockets, redirects, and static assets work under that base path.

## Edge Implementation Compatibility Matrix

| Edge profile | Host namespace implementation | What changes when adding an app endpoint | Required proof |
|---|---|---|---|
| `kind_cloudflare` | Preferred: Cloudflare DNS wildcard/CNAME or Tunnel public hostname wildcard for `*.apps.<env>` and `*.platform.<env>` plus matching Cloudflare edge certificate coverage. Current Universal SSL fallback: classed single-label hosts under `*.core42.dev`, such as `aicloud-kind-apps-code.core42.dev`. | Preferred: GPUaaS creates route intent and Pomerium renders route with no per-app Cloudflare edit after wildcard bootstrap. Current fallback: new app hostnames must be listed by environment automation until deeper wildcard certs exist. | TLS handshake succeeds at Cloudflare; verifier proves host reaches Pomerium; unauthenticated app host returns Pomerium auth redirect/deny. |
| `kind_local_dns` | `/etc/hosts`, local resolver, or `*.localhost` plus local reverse proxy/port-forward. Wildcards can be simulated with Host headers in smoke tests. | GPUaaS creates route intent; smoke supplies Host header. No public DNS change. | Provider-neutral smoke passes with explicit host map and no Cloudflare dependency. |
| `prod_public_ingress` | Public DNS wildcard points to public LoadBalancer/Ingress/F5/HAProxy/NGINX before Pomerium; TLS from public ACME/cert-manager DNS-01 or enterprise-managed certs. | GPUaaS creates route intent; ingress/Pomerium route reconciles. No per-app DNS/cert ticket. | Public HTTPS smoke verifies wildcard TLS, auth redirect, route publish, and upstream reachability. |
| `prod_private_ingress` | Corporate/private DNS wildcard points to internal ingress before Pomerium; TLS from enterprise CA or private ACME. | GPUaaS creates route intent; no public internet DNS required. | Private-network smoke verifies trust chain from managed clients, IdP callback reachability, and route publish. |
| `airgapped_private_ca` | Site-local DNS wildcard points to internal ingress/appliance before Pomerium; TLS from site private CA and offline-distributed trust bundle. | GPUaaS creates route intent; no Cloudflare/public ACME/public DNS dependency. | Offline smoke verifies local DNS, private CA trust, local IdP callback, route publish, and local observability pivots. |

Operational requirement: adding a new runtime app endpoint after environment
bootstrap must be a GPUaaS/Pomerium route lifecycle event. It must not require
manual DNS, Cloudflare, certificate, or ingress edits.

## Profile Requirements

Each profile must declare:

- public/user-facing hostnames,
- Pomerium authenticate host,
- route host wildcard or explicit host list,
- upstream reachability from Pomerium pods,
- DNS owner and provisioning method,
- TLS issuer, trust bundle, and certificate renewal path,
- IdP issuer URL and callback URL,
- whether public internet egress is required,
- smoke-test host map,
- rollback path,
- log/trace/dashboard location.

## Air-Gapped Constraints

The `airgapped_private_ca` profile must not depend on:

- Cloudflare APIs,
- public ACME,
- public DNS,
- public OIDC callbacks,
- public image/chart registries at runtime,
- public telemetry collectors.

It must provide:

- local IdP/OIDC,
- private DNS for app/tool/authn hosts,
- site CA trust distribution to browsers, Pomerium, Keycloak, and upstreams,
- offline image/chart/package import,
- local OTEL/Loki/Tempo/Grafana stack,
- documented renewal and trust-rotation procedure.

## Observability Contract

Every profile must produce the same operational pivots, even if the edge
provider differs:

- request ID generated or preserved at the first edge hop,
- W3C `traceparent` preserved where possible,
- edge access log with host, path, method, status, upstream status, duration,
  and request ID,
- Pomerium access/authorize/authenticate logs with route host, policy decision,
  subject, upstream, status, and request ID,
- IdP logs keyed by realm/client/session where available,
- upstream logs with request ID and trace context,
- route reconcile evidence in GPUaaS with route ID and target host,
- dashboards for latency, 4xx/5xx, denied decisions, authenticate redirects,
  upstream errors, route reconcile failures, and active WebSockets.

No operational runbook may require direct database inspection as the first
diagnostic step. It may include direct DB inspection only as an explicit fallback
while the owning read model is missing.

## Smoke Test Contract

Provider-neutral smokes take a host map, not hardcoded Cloudflare names:

```text
APP_HOST
API_HOST
AUTH_HOST
POMERIUM_AUTHN_HOST
GRAFANA_HOST
SWAGGER_HOST
```

The reusable smoke entrypoint is:

```bash
EDGE_PROFILE=kind_local_dns \
APP_HOST=app.kind.gpuaas.localhost \
API_HOST=api.kind.gpuaas.localhost \
AUTH_HOST=auth.kind.gpuaas.localhost \
POMERIUM_AUTHN_HOST=authn.kind.gpuaas.localhost \
SWAGGER_HOST=swagger.kind.gpuaas.localhost \
GRAFANA_HOST=grafana.kind.gpuaas.localhost \
scripts/ops/pomerium_edge_profile_smoke.sh
```

For `kind_local_dns`, the script port-forwards the Pomerium proxy service and
uses Host headers, so it does not require Cloudflare or public DNS. For
`kind_cloudflare`, it delegates to the Cloudflare-specific kind verifier. For
production and air-gapped profiles, it performs direct HTTPS checks against the
supplied host map.

The standardized evidence format is defined in
`doc/operations/Edge_Route_Smoke_Evidence_v1.md`. Smoke output must include the
edge profile, route, hostname, last HTTP status, Pomerium/upstream request ID
when present, Cloudflare Ray/request ID when present, and an operator-safe next
action.

For authenticated Pomerium routes, a complete smoke must verify:

1. route host returns unauthenticated redirect/deny, not upstream content,
2. redirect points at the configured authenticate host,
3. authenticate redirects to the configured IdP issuer host,
4. IdP login page or expected IdP response is reachable,
5. invalid/non-admin user is denied where policy requires it,
6. admin user can reach the upstream where automated credentials are available,
7. failure output prints the edge host, last HTTP status, request ID or
   Cloudflare Ray ID if present, and log lookup commands.

## Implementation Guidance

- Keep Cloudflare-specific scripts in `scripts/ops/*cloudflare*`.
- Keep provider-neutral smoke logic in reusable scripts that receive host/env
  inputs.
- Do not add route-renderer branches for Cloudflare. Render Pomerium route
  intent, then let the selected edge profile point DNS/TLS at Pomerium.
- Add a runbook section whenever a new edge profile is introduced.
- Update dashboards before declaring a profile production-ready.
