# Managed Ingress Tenant Isolation and Scaling v1

Status: draft
Date: 2026-05-15
Related:
- `doc/architecture/Platform_Proxy_Target_Architecture_v1.md`
- `doc/architecture/Platform_Proxy_OpenAI_M2M_Auth_Model_v1.md`
- `doc/architecture/Platform_Proxy_OSS_Data_Plane_ADR_v1.md`
- `doc/architecture/Platform_Proxy_Provider_Neutral_Edge_Model_v1.md`

## Purpose

Managed ingress cannot rely on "Pomerium authenticated the caller" as the
tenant isolation model. GPUaaS must remain the authority for tenant, project,
app, allocation, route, billing, and audit state.

This document defines the isolation and scaling constraints that must be true
before implementing `api_bearer` route authz or retiring legacy app proxy paths.

## Isolation Boundary

Every managed-ingress route is owned by GPUaaS route intent:

```text
org_id
project_id
app_instance_id
endpoint_name
route_id
proxy_pool_id
client_auth_mode
route_family
target binding
```

Pomerium is a renderer and edge runtime. It must not be the source of truth for
tenant/project ownership, app lifecycle, allocation lifecycle, billing, or
business audit.

## Per-Request Authorization

For `api_bearer` routes, every request must validate:

1. bearer token signature, expiry, issuer, and revocation state,
2. actor type (`user` or `service_account`),
3. actor tenant/project scope,
4. route tenant/project scope,
5. app instance and allocation lifecycle,
6. route status and route version,
7. endpoint `client_auth_mode = api_bearer`,
8. rate-limit and quota policy for the actor/project/route.

For `browser_oidc` routes, Pomerium may own the browser login redirect, but
GPUaaS still needs an authorization check for route ownership and stale route
denial before sensitive route families become production routes.

## Header Rules

The edge path must strip caller-controlled identity material before upstream:

- `Authorization`
- `Cookie` when not needed by the upstream app
- `X-GPUaaS-*`
- `X-Pomerium-*` unless produced by the trusted Pomerium hop
- `X-Forwarded-*` values that are not from the trusted edge chain

After authz, the trusted hop may inject:

- `X-GPUaaS-Org-ID`
- `X-GPUaaS-Project-ID`
- `X-GPUaaS-Actor-Type`
- `X-GPUaaS-Actor-ID`
- `X-GPUaaS-App-Instance-ID`
- `X-GPUaaS-Route-ID`
- `X-GPUaaS-Proxy-Pool-ID`
- `X-Request-ID`
- W3C `traceparent`

App-local upstream credentials, such as Jupyter tokens, come only from
workload-access material and must be route/app scoped.

## Network Isolation

Public ingress must terminate at the selected edge profile and Pomerium pool.
Tenant workloads must not expose public node ports around Pomerium.

Allowed upstream shapes:

1. in-cluster Service,
2. private node-agent/API relay,
3. per-node internal HTTPS endpoint with mTLS, after PKI and revocation are
   defined.

Current bare-metal and VM app routes use private LAN targets for kind/demo. That
is acceptable only as a demo/dev edge profile. Production must add network
policy/firewall controls so the public internet cannot reach upstream workload
ports directly.

Network isolation remains a separate production work package. Before the first
dedicated-pool customer, define the implementation for:

- Kubernetes `NetworkPolicy` between Pomerium pools and in-cluster workloads,
- host firewall rules for bare-metal or VM targets,
- tenant/project network boundaries for dedicated pools,
- egress policy for workload -> external service traffic,
- and validation commands operators can run without direct DB inspection.

## Proxy Pool Scopes

`proxy_pool_id` must be treated as a real isolation and scaling key.

Supported target scopes:

| Scope | Use | Isolation | Scaling notes |
|---|---|---|---|
| `shared` | Default SaaS/demo | Multiple tenants/projects share one pool | Requires per-request authz, quotas, noisy-neighbor controls |
| `tenant` | Larger tenant or stricter isolation | One tenant per pool | Tenant-level HPA, logs, certs, and policy |
| `project` | Regulated/high-value project | One project per pool | Stronger blast-radius reduction, higher cost |
| `app_instance` | Very high traffic or high-risk app | One app route group per pool | Highest cost, simplest traffic attribution |

MVP may use one shared pool, but route records and logs must carry
`proxy_pool_id` from day one.

Dedicated pools are not just a flag. A non-shared pool implies extra Pomerium
proxy/authorize/authenticate capacity, DNS and certificate scope, dashboards,
alerts, reconciler placement logic, and pool-migration tooling. Product and
pricing must treat `tenant`, `project`, and `app_instance` pools as paid
isolation/capacity features, not as a free enterprise toggle.

## Route Families

`route_family` is a first-class closed enum on route intent, not a derived
label:

| Family | Use | Typical auth mode |
|---|---|---|
| `platform_admin` | GPUaaS-owned platform tools such as Grafana, Temporal, Swagger, Redoc, Netdata | `browser_oidc` |
| `browser_app` | App-owned interactive browser endpoints such as JupyterLab | `browser_oidc` |
| `api_app` | App-owned API endpoints such as OpenAI-compatible vLLM | `api_bearer` |
| `terminal_ws` | Terminal/session WebSocket routes | token-bound WebSocket/session auth |

Rate-limit policy, timeout policy, dashboards, drain behavior, and pool
placement should key on `route_family`. Implementations must not infer family
from host naming, app slug, or endpoint name once route intent carries it.

## Scaling Requirements

Shared pools must support:

- horizontal Pomerium replicas,
- autoscaling on CPU, RPS, active connections, and WebSocket count where
  metrics are available,
- per-route and per-project connection/request limits,
- idle and hard session TTLs by route family,
- separate dashboards for platform tools, browser apps, API apps, and terminal
  routes,
- route reconcile rate limits so one tenant cannot starve route updates for
  others.

`api_bearer` adds a request-rate pressure point. The first implementation may
serve route authz from `cmd/api`, but high-RPS API routes require one of:

- short-TTL edge/authz response caching keyed by token id, route id, and route
  version,
- token validation cache with explicit revocation propagation,
- or a horizontally scaled dedicated `cmd/proxy-authz` service extracted from
  `cmd/api`.

Default target: p99 authz decision latency below 30 ms under expected shared
pool load. Extraction to `cmd/proxy-authz` is mandatory if proxy authz traffic
materially competes with control-plane API latency or database connection pool
headroom.

The concrete caching, revocation, extraction, latency, and load-smoke policy is
defined in `Platform_Proxy_Authz_Caching_and_Extraction_v1.md`.

Scaling signals must be tagged with:

- `org_id`
- `project_id`
- `route_id`
- `app_instance_id`
- `proxy_pool_id`
- `client_auth_mode`
- route family (`platform_admin`, `browser_app`, `api_app`, `terminal_ws`)

High-cardinality labels must be controlled in Prometheus; full-cardinality
route and tenant details can live in logs/traces/read models.

## Noisy Neighbor Controls

Before shared production pools host multiple tenants, GPUaaS must define:

- per-project request rate limits,
- per-project concurrent connection limits,
- per-route upstream timeout budgets,
- max body size for API routes,
- WebSocket/session hard TTLs,
- backpressure behavior (`429` vs `503`),
- operator override and emergency deny controls.

## Observability And Audit

Operators must be able to answer:

1. which tenant/project owns this host,
2. which route/app/allocation was targeted,
3. which proxy pool handled it,
4. whether denial came from token validation, project authz, route lifecycle,
   pool policy, Pomerium, or upstream,
5. whether the request contributed to billable managed-ingress usage.

Direct DB inspection is not an acceptable primary diagnostic path. If an
operator needs route ownership or pool state repeatedly, add a read model/API.

High-rate API routes must not write one immutable audit row per successful
request by default. Baseline policy:

- deny decisions are audit-grade,
- admin/platform tool opens are audit-grade,
- successful `api_app` requests are access-log/trace/metric events by default,
- successful `api_app` audit sampling is enabled by default at 1/1000
  successful requests after route authz and before upstream forwarding,
- per-route success audit sampling can be disabled or overridden only by
  explicit route policy/config rendered from GPUaaS route intent, never by a
  hand-edited Pomerium route,
- deny decisions are audit-grade and are not sampled; all authz denials,
  stale-route denials, route-policy denials, and upstream policy denials must
  write audit evidence regardless of route family or success sample rate.

### Successful api_app Audit Sampling

The default sample rate for successful high-volume `api_app` managed-ingress
requests is `1/1000`. The sample applies only to successful request decisions
for app-owned API routes, such as OpenAI-compatible vLLM endpoints, where a
per-request immutable audit row would create excessive write volume. It does
not apply to:

- denial decisions,
- privileged/admin route access,
- billing or metering records,
- app-runtime lifecycle events,
- terminal/session events.

Per-route override mechanics:

- Route intent may carry an audit sampling policy for success decisions with
  three allowed modes: `inherit_default`, `disabled`, or `explicit_rate`.
- `inherit_default` uses the platform default `1/1000` successful `api_app`
  sample rate.
- `disabled` is allowed only for routes whose product/compliance posture accepts
  access logs, traces, and metrics as the success-path observability record.
- `explicit_rate` must be represented as a rational rate such as `1/100`,
  `1/1000`, or `1/10000`; implementations should store numerator and
  denominator rather than floating point values.
- Overrides are route-scoped and versioned with route intent. Changing the
  sample rate is a control-plane policy change and should be visible in route
  configuration history or audit/control-plane evidence.
- Pomerium configuration is a rendered artifact. Operators must not change
  success audit sampling by editing Pomerium policy directly.

Sampling decisions must be deterministic enough for replayable diagnostics.
The preferred decision key is a stable hash over `route_id`, route version,
request id or trace id, and a sampling salt owned by GPUaaS. Implementations
must avoid sampling based on local process counters because that produces
replica-dependent evidence gaps during scale-out or restart.

Retention expectations:

- Unsampled successful `api_app` requests remain available through normal edge
  access logs, traces, metrics, and any route read models built from them.
- Sampled successful audit rows follow the standard immutable audit retention
  class for platform audit evidence.
- Deny audit rows follow the same audit retention class as other security
  decisions and must not be downgraded to short-lived access-log retention.
- Raw high-volume access logs may have a shorter operational retention window
  than immutable audit evidence, but the exact retention period is an
  environment/product policy decision and must be documented in the deployment
  retention profile before production launch.

Billing and metering evidence is separate from sampled audit. Managed-ingress
usage that contributes to billable app runtime or route usage must be recorded
through the metering pipeline with the required aggregation keys and retention
for billing reconciliation. A missing sampled success audit row is expected for
most successful `api_app` requests and must never be interpreted as proof that
the request was not billable or did not occur.

Managed-ingress metering dimensions:

- `building_block=managed_ingress`
- `usage_source=app_runtime`
- `org_id`, `project_id`, `app_instance_id`
- `route_id`, `endpoint_name`, `route_family`, `client_auth_mode`
- `proxy_pool_id`
- request count, response bytes, and connection duration when available

Ledger debits remain separate immutable billing entries. The request/byte/
duration evidence may be aggregated before billing reconciliation; it must not
be inferred from sampled success-audit rows.

## Implementation Gate For api_bearer

`A-PLATFORM-PROXY-API-BEARER-ROUTE-AUTHZ-001` must not be considered complete
unless it enforces the isolation rules above for shared-pool routes:

- host -> route lookup is GPUaaS-owned,
- token project scope must match route project scope,
- stale/inactive app routes are denied,
- caller identity headers are stripped,
- trusted GPUaaS identity headers are injected only after authz,
- denial logs include tenant/project/route/pool reason codes.

Current backend status:

- `/api/v1/platform-proxy/route-authz` provides the GPUaaS-owned decision point
  for `api_bearer` routes.
- `/api/v1/platform-proxy/route-forward/...` provides the current GPUaaS-owned
  forwarding boundary for `api_bearer` routes. Pomerium performs host routing
  and deliberately avoids browser OIDC redirects for this route family; GPUaaS
  performs bearer auth, route/project checks, caller-header stripping, trusted
  header injection, and upstream forwarding.
- The inventory decision denies inactive routes, non-running app instances,
  non-`api_bearer` endpoints, non-service-account actors, and org/project scope
  mismatches.
- Live Cloudflare/kind positive and negative smokes passed on 2026-05-15 for
  the vLLM OpenAI-compatible route: no-token `401`, invalid-token `401`,
  wrong-project bearer `403`, valid project service-account `GET /v1/models`
  `200`, and valid project service-account `POST /v1/chat/completions` `200`.