# Platform Proxy Target Architecture v1

Status: historical migration record
Last updated: 2026-05-18
Audience: platform engineering, app platform engineering, security, product

> 2026-05-18 status: this document is superseded by host-based Pomerium
> managed ingress for active implementation work. It remains useful as the
> historical migration record for why path-prefix proxying failed. Treat
> browser-facing `/p/*`, `/backend/p/*`, `/w/*`, `/backend/w/*`, and
> `/proxy-launch` examples below as retired public compatibility shapes unless a
> section explicitly says it is describing an upstream/runtime prefix-rewrite
> contract behind a host-based route.

## Purpose

Define the long-term platform proxy model for exposing workload and app UIs
through GPUaaS.

This document exists because the first JupyterLab platform-proxy slice exposed
the wrong abstraction. The problem is not "make Jupyter work"; the problem is a
managed app data-plane gateway that can safely expose many classes of
tenant/project/app-owned HTTP applications.

## Decision Summary

GPUaaS should treat platform proxy as a managed platform runtime with proxy
pools and app routes.

The target model is:

```text
browser
  -> app data-plane hostname / LB / ingress / funnel
  -> platform-proxy pool
  -> app proxy route resolver
  -> workload/app endpoint
```

The control-plane API remains responsible for app lifecycle, entitlement,
policy, route intent, and read models. The platform proxy owns dynamic browser
traffic: HTTP, WebSocket, SSE, redirects, cookies, connection limits, and
upstream health.

The MVP should implement one shared proxy pool, but the data model must allow
future tenant-dedicated, project-dedicated, and app-dedicated pools without a
rewrite.

## Problem Statement

Interactive app UIs are not stable API routes. They are browser applications
with their own assumptions:

1. path prefixes,
2. root-relative links,
3. redirects,
4. cookies,
5. CSRF/XSRF,
6. WebSockets,
7. SSE/streaming,
8. large uploads/downloads,
9. app-native auth,
10. per-app base URL support.

Routing these through the Next.js `/backend` API rewrite path is not robust.
The API route namespace also collides with app paths such as `/api`, `/lab`,
`/static`, and `/terminals`.

## Terminology

- **API gateway**: control-plane route for GPUaaS APIs such as
  `/api/v1/allocations`.
- **Platform proxy**: app data-plane route for user/workload/application HTTP
  traffic.
- **Proxy pool**: one deployment/runtime of the platform proxy.
- **Proxy route**: binding from an app endpoint to a proxy pool plus public URL.
- **Proxy scope**: isolation boundary for a pool: `shared`, `tenant`, `project`,
  or `app_instance`.
- **Base-url-aware app**: app can run safely under a path prefix such as
  `/w/{app_instance_id}/web/`.
- **Link-out**: platform opens a separate managed URL instead of embedding or
  rewriting inside the shell.

## API Gateway vs Platform Proxy

The API gateway:

1. serves stable control-plane APIs,
2. follows OpenAPI request/response contracts,
3. handles JSON and platform resources,
4. authenticates API clients,
5. talks to platform services.

The platform proxy:

1. serves dynamic app instance traffic,
2. handles arbitrary HTTP, WebSocket, and streaming behavior,
3. preserves browser semantics,
4. enforces GPUaaS tenant/project/app access before proxying,
5. talks to per-allocation/per-app endpoints that appear and disappear,
6. applies connection/session/body/timeout limits.

These should have separate browser-facing hostnames and operational scaling
controls.

## Target Public Route Shape

Current active public route shape:

```text
https://{route-host}.{env-domain}/...
```

Examples:

```text
https://aicloud-kind-jupyter.core42.dev/lab
https://aicloud-kind-openai.core42.dev/v1/models
https://grafana.platform.<env-domain>/
```

The app/runtime may still run with an upstream base path such as
`/w/{app_instance_id}/{endpoint_name}/` when that is needed for app correctness
or Pomerium prefix rewriting. That upstream path is not a public launch
contract.

Historical path-prefix mode:

```text
https://apps.<env-domain>/w/{app_instance_id}/{endpoint_name}/...
```

Examples:

```text
https://apps.kind.gpuaas.example/w/a7f9.../web/
https://apps.kind.gpuaas.example/w/a7f9.../web/lab
https://apps.kind.gpuaas.example/w/model-1/chat/
```

Historical optional subdomain mode, now the primary managed-ingress direction:

```text
https://{route_slug}.apps.<env-domain>/...
```

Use subdomain mode for apps that cannot safely run under a path prefix.

Do not use the control-plane API path as the long-term browser namespace for
interactive apps.

## Current MVP Slice

The first implementation slice keeps proxy execution inside `cmd/api` but moves
the app browser namespace to the target data-plane shape:

```text
/w/{app_instance_id}/{endpoint_name}/...
```

This is a transitional implementation detail. The route shape, app manifest
contract, browser-session cookie scope, redirect handling, and WebSocket support
should remain stable when the handler moves into a dedicated `cmd/platform-proxy`
service.

Current behavior:

1. app-runtime reserves `/w/{app_instance_id}/{endpoint_name}` for new
   `platform_proxy` routes,
2. base-url-aware apps receive that path as their upstream base path,
3. JupyterLab launches with `--ServerApp.base_url=/w/.../web/`,
4. the browser-session endpoint mints a scoped HttpOnly cookie and returns the
   `/w/...` URL,
5. the API root mounts `/w/` through the same authenticated middleware chain as
   `/api/v1/`,
6. redirects that already include the reserved prefix are preserved, avoiding
   doubled paths such as `/web/w/.../web/lab`,
7. the legacy `/api/v1/projects/{project_id}/workloads/{app_instance_id}/ui/...`
   proxy route remains for compatibility during migration.

Additional behavior validated on `platform-control` on 2026-04-24:

8. app-proxy browser-session minting for `base_url_strategy=platform_path`
   workloads may issue cookies for both:
   - the browser-visible path, such as `/backend/w/{app_instance_id}/web`
   - the upstream/runtime-emitted path, such as `/w/{app_instance_id}/web`
9. the shared launcher performs one bounded bootstrap recovery attempt:
   - mint browser-session
   - fetch the proxied HTML
   - fetch one referenced JS asset
   - if bootstrap fails, mint once more and retry before redirecting the user
10. the origin page treats popup-open success as sufficient and does not keep a
    stale error banner alive after the child tab has already opened correctly.

Validated in kind on 2026-04-23 with a fresh JupyterLab app instance:

```text
route_path=/w/c6a63cf0-270c-458c-8468-c7c9f9e88c83/web
open_url=https://gpuaas-kind-api.tailfe39f5.ts.net/w/c6a63cf0-270c-458c-8468-c7c9f9e88c83/web
result=Jupyter redirect/login page loaded without path-prefix corruption
```

The remaining Jupyter-specific gap is app-local token/session bridging. The
proxy path is now stable, but Jupyter still redirects unauthenticated users to
its token login page unless the user supplies the app-local token or the
platform injects an app-session bridge.

## Proxy Pool Model

Proxy pools are managed platform runtimes.

Supported target scopes:

1. `shared`: one pool serves many tenants/projects,
2. `tenant`: one pool serves one tenant,
3. `project`: one pool serves one project,
4. `app_instance`: one pool serves one large or high-risk app instance.

The MVP implements only `shared`.

The route model must still include `proxy_pool_id` from day one.

## Admin Workflow

Platform admin defines what each tenant may use:

```text
tenant.allowed_proxy_scopes = shared, tenant, project
tenant.default_proxy_scope = shared
tenant.max_tenant_proxy_pools = 1
tenant.max_project_proxy_pools = 3
tenant.custom_proxy_domain_allowed = false
```

Tenant or project admin selects within those bounds:

```text
project dev: proxy_scope = shared
project test: proxy_scope = shared
project prod: proxy_scope = project
```

If an admin requests a scope not allowed by platform policy, the API denies the
change. App developers may express a preference, but platform policy decides
the effective pool.

## App Developer Workflow

App developers do not write Go for ordinary web apps. They declare endpoint
requirements in the app artifact manifest.

## Proxy Adapter Model

The platform proxy runtime is shared, but each proxied browser surface must be
resolved through a small adapter model instead of ad hoc route-specific code.

The split is:

1. shared proxy runtime:
   - browser-session minting,
   - scoped HttpOnly cookie auth,
   - reverse proxying,
   - redirect and cookie path rewriting,
   - WebSocket forwarding,
   - tab-open browser UX.
2. per-surface adapter behavior:
   - entrypoint strategy,
   - upstream auth bridge strategy,
   - upstream base-path strategy,
   - HTML rewrite safety,
   - transport flags.

Target adapter shape:

```json
{
  "scope_strategy": "project_required | org_only",
  "entrypoint_strategy": "root | default_open_path | stable_deep_link",
  "default_open_path": "/lab",
  "stable_deep_link_template": "/spaces/{hostname}/rooms/local/overview",
  "auth_strategy": "none | bearer_passthrough | upstream_token_bridge",
  "base_url_strategy": "none | platform_path | upstream_subpath_native",
  "rewrite_strategy": "headers_only | headers_and_html",
  "response_rewrite_strategy": "none | swagger_initializer | grafana_html | temporal_html | redoc_html",
  "cookie_scope_strategy": "public_prefix | dual_public_and_runtime_prefix",
  "verify_strategy": "none | html_only | html_plus_asset",
  "websocket_required": true,
  "root_requires_trailing_slash": false
}
```

Additional adapter rules validated during kind and platform-control rollout:

1. `scope_strategy`
   - `project_required`: allocation-scoped and project-scoped tools such as Netdata
   - `org_only`: admin-global platform tools such as Grafana, Temporal UI, Swagger, Redoc
2. `response_rewrite_strategy`
   - `swagger_initializer`: inject request interceptor and normalize OpenAPI URL
   - `grafana_html`: preserve app-visible subpath and native Grafana base URL
   - `temporal_html`: rewrite asset/import/base paths and recompute CSP hash for rewritten inline bootstrap script
   - `redoc_html`: rewrite static root-relative assets
3. `root_requires_trailing_slash`
   - canonical root redirects such as `/backend/p/grafana -> /backend/p/grafana/` must be adapter-driven instead of service-name-specific
4. `verify_strategy`
   - `none`: browser-session mint and redirect only
   - `html_only`: require successful HTML bootstrap fetch
   - `html_plus_asset`: require HTML plus one discovered bootstrap asset:
     - `script[src]`
     - `link rel="modulepreload"`
     - `link rel="preload" as="script"`
     - inline `import("...")` fallback for modern SPA shells

This is the required design direction:

1. shared runtime handles generic proxy mechanics
2. adapter config owns scope, entrypoint, base-path, cookie, and response-bootstrap behavior
3. new proxied UIs must add adapter data plus smoke coverage, not ad hoc handler branches

Current validated examples:

1. JupyterLab:
   - `entrypoint_strategy=default_open_path`
   - `default_open_path=/lab`
   - `auth_strategy=upstream_token_bridge`
   - `base_url_strategy=platform_path`
   - `cookie_scope_strategy=dual_public_and_runtime_prefix`
   - `rewrite_strategy=headers_only`
2. Netdata:
   - `entrypoint_strategy=stable_deep_link`
   - `stable_deep_link_template=/v3/` for modern Netdata dashboard generation
   - legacy deep-link fallback remains only for truly old agents
   - `auth_strategy=none`
   - `base_url_strategy=none`
   - `rewrite_strategy=headers_only`
3. Swagger UI:
   - `entrypoint_strategy=root`
   - `auth_strategy=none` at the upstream UI plus a service-specific initializer
     bridge so "Try it out" carries the user bearer token
   - `base_url_strategy=none`
   - `rewrite_strategy=headers_only`
4. Redoc:
   - `entrypoint_strategy=root`
   - `auth_strategy=none`
   - `base_url_strategy=none`
   - `rewrite_strategy=headers_and_html`
5. Grafana:
   - `entrypoint_strategy=root`
   - `auth_strategy=none`
   - `base_url_strategy=upstream_subpath_native`
   - `cookie_scope_strategy=dual_public_and_runtime_prefix`
   - `rewrite_strategy=headers_only`
   - `websocket_required=true`
6. Temporal UI:
   - `entrypoint_strategy=preserve_requested_path`
   - `auth_strategy=none`
   - `base_url_strategy=none`
   - `cookie_scope_strategy=public_prefix`
   - `rewrite_strategy=headers_and_html`
   - `response_rewrite_strategy=temporal_html`
   - `verify_strategy=html_plus_asset`

The key lesson is that "open the proxied root" is not a safe universal rule.
Some apps, like Netdata, have a stable deep-link entrypoint that avoids path
canonicalization conflicts. Others, like JupyterLab, can safely start at a
declared open path as long as the upstream base URL and auth bridge are set.

For Netdata specifically, the proxy should treat "dashboard generation" as a
capability decision instead of assuming the URL path matches the product semver:

1. older agents may still use older dashboard entry patterns,
2. modern agents may report product version `2.x` but still require the V3
   dashboard entrypoint `/v3/`,
3. the adapter should therefore choose the open path from probe/capability
   evidence, not from a naive `major == 3` rule.

Example for JupyterLab:

```json
{
  "endpoints": [
    {
      "name": "web",
      "protocol": "http",
      "container_port": 8888,
      "exposure_modes": ["private", "platform_proxy"],
      "proxy": {
        "mode": "path_prefix",
        "base_url_aware": true,
        "base_url_arg": "--ServerApp.base_url={{ .ProxyBasePath }}/",
        "default_open_path": "/lab",
        "websocket_required": true,
        "cookie_scope_strategy": "dual_public_and_runtime_prefix",
        "auth_bridge": {
          "mode": "authorization_header",
          "source": "workload_access.token",
          "header": "Authorization",
          "scheme": "token"
        },
        "rewrite": "none",
        "idle_timeout_seconds": 900
      }
    }
  ]
}
```

Example for a vLLM chat UI:

```json
{
  "endpoints": [
    {
      "name": "chat",
      "protocol": "http",
      "container_port": 3000,
      "exposure_modes": ["platform_proxy"],
      "proxy": {
        "mode": "path_prefix",
        "base_url_aware": true,
        "default_open_path": "/",
        "websocket_required": true
      }
    }
  ]
}
```

Example for an app that is not path-prefix-safe:

```json
{
  "proxy": {
    "mode": "subdomain",
    "base_url_aware": false,
    "cookie_isolation": "host",
    "websocket_required": true
  }
}
```

Go changes are required only for new platform capabilities, not for each new
app.

### App Auth Bridge

Some curated apps have app-local bootstrap auth in addition to platform auth.
JupyterLab is the first example: app-runtime generates a short app-local token
and launches the container with that token.

The platform proxy must not put this app-local token in a URL. Instead, an app
manifest can declare a server-side bridge:

```json
{
  "auth_bridge": {
    "mode": "authorization_header",
    "source": "workload_access.token",
    "header": "Authorization",
    "scheme": "token"
  }
}
```

At request time the proxy:

1. authenticates the browser with the GPUaaS session or proxy session cookie,
2. resolves the route and workload access material,
3. strips the browser's GPUaaS `Authorization` header,
4. injects the configured upstream app auth header only for that route,
5. never exposes the app-local token in browser-visible URLs.

For base-path-aware apps that emit runtime asset paths without the public
browser prefix, the browser-session layer may also mint a second HttpOnly cookie
scoped to the runtime path. JupyterLab is the first validated example:

1. browser opens `/backend/w/{app_instance_id}/web/lab`,
2. Jupyter emits assets under `/w/{app_instance_id}/web/static/...`,
3. the proxy session cookie must therefore be valid on both prefixes to avoid
   a blank page caused by `401` asset fetches.
4. when both public app-host and public api-host bases are configured, the
   browser-session mint must prefer `APP_BASE_URL + /backend` whenever the
   request clearly originated from the app host. Otherwise the launcher may
   redirect the user onto the API host and fail even though the upstream app is
   healthy.

This is intentionally config-driven. New auth bridge modes require platform
implementation and security review; ordinary apps should only select existing
bridge modes.

### Native Subpath Upstreams

Some upstreams do not behave well with proxy-only HTML patching. They expect to
be configured with their public subpath and then generate redirects, asset
paths, SPA routes, and WebSocket URLs from that configuration themselves.

Grafana is the first validated example of this class.

For these upstreams, the adapter should declare:

```json
{
  "base_url_strategy": "upstream_subpath_native",
  "cookie_scope_strategy": "dual_public_and_runtime_prefix",
  "websocket_required": true
}
```

Meaning:

1. the upstream deployment is configured with the public proxied subpath,
2. the proxy forwards the canonical public prefix via headers and preserves it
   in redirects,
3. the proxy may mint session cookies for both:
   - the browser-visible prefixed route, such as `/backend/p/grafana`
   - the runtime route the upstream may emit, such as `/p/grafana`
4. proxy-side HTML rewriting is treated as a compatibility shim, not the
   primary source of truth for base-path behavior.

The same cookie-scope rule also applies to base-path-aware app proxies such as
JupyterLab when the public path is `/backend/w/...` but the upstream emits
runtime asset paths under `/w/...`.

Temporal UI established an additional rule for modern SPA shells protected by a
strict Content Security Policy:

1. HTML rewriting may change the inline bootstrap script bytes,
2. if the upstream CSP uses script hashes, the proxy must recompute the hash
   from the exact rewritten bytes,
3. trimming or normalizing the inline script before hashing is incorrect and
   will still produce a blank page in real browsers.

### Launcher Recovery

The shared launcher is now part of the platform-proxy contract, not just a UI
detail.

Required behavior:

1. the origin tab opens a same-origin launcher page in a new tab,
2. the launcher page mints a browser-session and resolves the final `open_url`,
3. the launcher preflights the proxied HTML and one referenced JS asset,
4. on bootstrap failure, it remints the browser-session once and retries,
5. only after bootstrap succeeds does it redirect to the final URL,
6. the origin tab does not surface a failure after popup-open succeeded unless
   popup creation itself failed.

This keeps stale browser/session state recoverable without requiring users to
open private windows or clear cookies manually.

For Grafana, the deployment contract should be:

```ini
[server]
root_url = https://<public-host>/backend/p/grafana/
serve_from_sub_path = true
```

and the proxy must support:

1. `/backend/p/grafana/...` browser entry,
2. `/p/grafana/...` runtime follow-up requests when Grafana emits them,
3. `/api/live/` WebSocket forwarding under the same subpath contract.

This is a reusable adapter pattern for other self-aware browser apps that want
native subpath configuration rather than generic proxy rewriting.

## Platform Service Routes

The same proxy runtime should serve selected platform-owned browser tools such
as Netdata, Swagger UI, Redoc, Grafana, and internal dashboards.

These should not be forced into the app-instance namespace. Target route shapes
should be separate, for example:

```text
https://apps.<env-domain>/p/{service_slug}/...
https://apps.<env-domain>/p/netdata/{node_id}/...
```

The shared model is:

```text
platform_proxy_routes (
  id,
  owner_type text, -- app_instance | platform_service
  owner_id text,
  endpoint_name text,
  proxy_pool_id uuid,
  route_mode text,
  public_path text,
  upstream_target jsonb,
  policy jsonb,
  status text,
  created_at timestamptz,
  updated_at timestamptz,
  deleted_at timestamptz
)
```

The MVP app-instance table can remain as the first concrete implementation.
Before adding Netdata or docs routes, introduce the common route owner model or
a parallel `platform_service_proxy_routes` table so app lifecycle and platform
service lifecycle remain cleanly separated.

## Runtime Scaling vs Proxy Scaling

App runtime scaling and proxy protection are separate.

An app may declare runtime scaling:

```json
{
  "runtime": {
    "kind": "kubernetes",
    "replicas": {
      "mode": "horizontal",
      "min": 1,
      "max": 8,
      "scale_metric": "concurrent_sessions",
      "target": 20
    }
  },
  "proxy": {
    "load_balancing": "least_connections"
  }
}
```

Bare-metal allocation-local apps such as a single-user Jupyter server usually
remain `fixed: 1`.

The platform proxy has its own scaling:

1. stateless proxy replicas,
2. HPA on active connections, CPU, memory, and request rate,
3. Redis-backed distributed counters for global limits,
4. in-process semaphores for per-pod protection,
5. circuit breakers for unhealthy upstreams.

## Protection Model

The proxy enforces limits before traffic reaches the app.

Limit dimensions:

1. platform-wide,
2. tenant,
3. project,
4. app instance,
5. user,
6. proxy pod.

Examples:

```text
tenant: max 500 active app-proxy connections
project: max 100 active app-proxy connections
app instance: max 20 active sessions
user: max 4 active sessions per app
proxy pod: max 2,000 active connections
```

Manifest-requested limits are capped by policy:

```text
effective_limit = min(app_manifest_limit, project_policy, tenant_policy, platform_hard_limit)
```

When a limit is exceeded, new traffic receives `429` or `503`. Existing
connections are not killed unless they exceed idle or hard TTL.

## Required Data Model

Directionally:

```sql
proxy_pools (
  id,
  org_id nullable,
  project_id nullable,
  scope text, -- shared | tenant | project | app_instance
  status text, -- requested | provisioning | active | failed | deleting | deleted
  hostname text,
  ingress_class text,
  route_modes jsonb,
  replicas_min int,
  replicas_max int,
  limits_json jsonb,
  runtime_state jsonb,
  created_at timestamptz,
  updated_at timestamptz,
  deleted_at timestamptz
);

app_proxy_routes (
  id,
  app_instance_id,
  endpoint_name,
  proxy_pool_id,
  route_mode text, -- path_prefix | subdomain
  public_url text,
  public_path text,
  upstream_host text,
  upstream_port int,
  target_metadata jsonb,
  status text,
  failure_reason text,
  created_at timestamptz,
  updated_at timestamptz,
  deleted_at timestamptz
);
```

Policy keys:

```text
proxy.allowed_scopes
proxy.default_scope
proxy.max_tenant_pools
proxy.max_project_pools
proxy.max_connections_per_pool
proxy.custom_domain_allowed
proxy.default_idle_timeout_seconds
proxy.default_hard_ttl_seconds
```

## Route Selection Algorithm

When an app endpoint requests `platform_proxy`:

```text
1. read project proxy config
2. read tenant/platform proxy policy
3. compute desired proxy scope
4. verify desired scope is allowed
5. find or request proxy pool for that scope
6. create app_proxy_route bound to proxy_pool_id
7. render public URL from pool hostname and route mode
8. reconcile route until active or failed
```

MVP may short-circuit step 5 to the one shared pool.

## Controller Responsibilities

The proxy controller reconciles `proxy_pools` and `app_proxy_routes`.

Pool reconciliation:

1. create/update Deployment for platform-proxy,
2. create/update Service,
3. create/update Ingress/Funnel/Gateway route,
4. manage hostname/cert status,
5. write pool status.

Route reconciliation:

1. validate upstream target exists,
2. program route into proxy runtime or route cache,
3. mark route active/failed,
4. remove route on app stop/decommission/delete,
5. periodically reconcile stale routes.

## Implementation Options From Open Source

No evaluated open-source component fully replaces the GPUaaS platform proxy
control plane, because the hard part is route/session/policy integration with
GPUaaS tenant, project, allocation, and app runtime state.

However, OSS components may replace or support the data-plane proxy runtime.

### Pomerium

Pomerium is an open-source identity-aware proxy with route policies, identity
headers, and support for long-lived streaming/websocket-style traffic. It also
integrates with OIDC providers such as Keycloak.

Good fit:

1. identity-aware app access,
2. per-route policy,
3. route portal/user-facing access patterns,
4. possible alternative to writing auth/session pieces ourselves.

Risks:

1. route/session model may not map cleanly to per-allocation dynamic app routes,
2. GPUaaS-specific quotas and route lifecycle still need our controller,
3. embedded app base-path/cookie behavior still must be validated.

### oauth2-proxy

oauth2-proxy is a common open-source auth reverse proxy and supports WebSocket
proxying.

Good fit:

1. simple OIDC-authenticated reverse proxy,
2. external auth sidecar pattern.

Risks:

1. less suitable as the dynamic route control plane,
2. per-app route lifecycle and quota model remain custom,
3. mostly solves login/session, not app runtime orchestration.

### JupyterHub configurable-http-proxy

JupyterHub uses `configurable-http-proxy`, based on `node-http-proxy`, as its
dynamic route proxy for notebook servers.

Good fit:

1. proven dynamic routing pattern for Jupyter-like per-user servers,
2. route management API,
3. strong reference model for path-prefix notebook access.

Risks:

1. Jupyter-specific ecosystem,
2. not a full tenant/project policy engine,
3. not enough for generalized GPUaaS app exposure alone.

### Envoy / Envoy Gateway

Envoy is a strong L7 proxy runtime with xDS dynamic configuration and filters.

Good fit:

1. high-performance HTTP/WS/gRPC proxying,
2. dynamic route configuration,
3. ext_authz integration,
4. rate-limit service integration,
5. production-grade observability.

Risks:

1. more operational complexity,
2. we still need GPUaaS route/session/policy controller,
3. custom app session bridge likely remains ours.

### Traefik

Traefik is an open-source dynamic reverse proxy/load balancer with Kubernetes
CRD support and middleware patterns such as ForwardAuth.

GPUaaS already uses Traefik in parts of the environment. That makes it a strong
candidate for Kubernetes-backed app ingress and for the implementation layer
behind a proxy pool when the upstream target is a Kubernetes Service.

Good fit:

1. easy dynamic routing in Kubernetes,
2. middleware ecosystem,
3. WebSocket/gRPC/HTTP2 support.

Risks:

1. policy/session model still external,
2. route creation is Kubernetes-object-centric,
3. fine-grained GPUaaS quota and app lifecycle coupling remains custom,
4. not a complete fit for bare-metal allocation-local endpoints that are not
   represented as Kubernetes Services.

Conclusion: Traefik should remain an implementation option for proxy pools,
especially Kubernetes-backed pools, but GPUaaS should not expose Traefik as the
product abstraction. The product abstraction remains `proxy_pool` and
`app_proxy_route`.

### Ory Oathkeeper

Ory Oathkeeper is an open-source identity and access proxy / access-decision
component.

Good fit:

1. authorization and request mutation,
2. pairing with an existing gateway,
3. identity-aware reverse proxy cases.

Risks:

1. not a dynamic app runtime router by itself,
2. route lifecycle and GPUaaS-specific isolation remain ours.

### Kubernetes Gateway API

Gateway API provides a role-oriented model for infrastructure providers,
cluster operators, and app developers, with Gateway and HTTPRoute resources.

Good fit:

1. good conceptual model for our admin/app-developer separation,
2. possible implementation layer when apps run in Kubernetes,
3. portable HTTP routing primitives.

Risks:

1. spec/implementation split means behavior depends on controller,
2. not enough for non-Kubernetes bare-metal allocation-local endpoints,
3. GPUaaS app sessions/quotas still need custom logic.

## Buy/Build Recommendation

Do not build a raw L7 proxy engine from scratch if Envoy, Traefik, or Pomerium
can serve as the data-plane runtime.

Do build the GPUaaS proxy control plane:

1. proxy pool model,
2. app route model,
3. tenant/project policy,
4. app manifest integration,
5. route lifecycle controller,
6. app session bridge,
7. UI/read-model surfaces,
8. metrics/billing attribution.

Recommended next step before implementation:

1. prototype Pomerium for identity-aware app access,
2. prototype Envoy or Traefik for dynamic route/data-plane behavior,
3. test Netdata or another node-metrics dashboard because current internal-IP
   links are not usable from the public app shell,
4. keep the GPUaaS route/pool schema independent of the chosen runtime,
5. choose runtime after a Jupyter + vLLM chat + simple dashboard smoke.

## MVP Plan

Phase 0: analysis and spike

1. test JupyterLab path-prefix with Pomerium,
2. test JupyterLab path-prefix with Envoy or Traefik plus GPUaaS auth shim,
3. test vLLM/chat-style app with WebSocket/SSE,
4. record compatibility findings.

Phase 1: control-plane model

1. add `proxy_pools` with one shared default pool,
2. add `proxy_pool_id` to app proxy routes,
3. add policy keys but enable only `shared`,
4. expose route/pool status in read models.

Phase 2: dedicated proxy runtime

1. create `cmd/platform-proxy` or select an OSS runtime plus adapter,
2. move app browser traffic off `/backend` and API route namespace,
3. serve `/w/{app_instance_id}/{endpoint_name}/...`,
4. support HTTP, WebSocket, and SSE,
5. enforce app/session limits.

Phase 3: app integration

1. update Jupyter manifest to use the new proxy base path,
2. add vLLM chat UI manifest,
3. add one simple dashboard manifest, preferably Netdata or a Netdata-style
   metrics surface,
4. make private/direct fallback explicit.

Phase 4: isolation expansion

1. enable tenant-dedicated pools,
2. enable project-dedicated pools,
3. add admin UI controls and policy checks,
4. add custom hostname/cert workflow.

## Open Questions

1. Do we require path-prefix support for all first-party curated apps, or allow
   subdomain mode in MVP?
2. Should app proxy session be backed by Redis opaque sessions or signed JWT
   cookies?
3. Should the first data-plane runtime be Go custom, Envoy, Traefik, or
   Pomerium?
4. How much route programming must work for bare-metal allocation-local
   endpoints before Kubernetes-backed apps?
5. Should proxy usage be billable or only metered for ops/capacity?

## References

1. Pomerium routing and long-lived streaming route settings:
   <https://www.pomerium.com/docs/capabilities/routing>
2. Pomerium OIDC/Keycloak integration:
   <https://docs.pomerium.com/docs/integrations/user-identity/oidc>
3. oauth2-proxy WebSocket proxy option:
   <https://github.com/openai/oauth2_proxy>
4. JupyterHub proxy API:
   <https://jupyterhub.readthedocs.io/en/latest/reference/api/proxy.html>
5. JupyterHub custom proxy guidance:
   <https://jupyterhub.readthedocs.io/en/5.3.0/howto/proxy.html>
6. Ory Oathkeeper identity and access proxy:
   <https://www.ory.sh/oathkeeper>
7. Kubernetes Gateway API role-oriented model and HTTPRoute:
   <https://kubernetes.io/docs/concepts/services-networking/gateway/>
8. Kubernetes Gateway API WebSocket discussion:
   <https://kubernetes.io/blog/2024/11/21/gateway-api-v1-2/>
