Workload Access and Runtime Surfaces designed

This page explains the runtime-facing surfaces a workload exposes once it is running. It is the architecture view behind the user-facing workload detail screens and the operator-facing route, terminal, and telemetry checks.

Surface Families

Surface	Primary use	Owning path	External exposure model
Terminal	Shell access to an allocation	API -> terminal gateway -> node agent	Short-lived session binding, not a public node port
Browser app	JupyterLab, Headlamp, dashboards, other interactive tools	Managed ingress / platform proxy	Browser login plus route ownership checks
API app	OpenAI-compatible or other machine-consumed endpoints	Managed ingress / platform proxy	API bearer auth plus route/project ownership
Metrics and status	Runtime health, logs, traces, dashboards, alerts	Observability stack and read models	Read-only ops surfaces, not workload-owned auth
Platform admin tools	Grafana, Temporal, Swagger, Redoc, ops consoles	Platform proxy route family	Platform-owned route intent and policy

Runtime Access Model

Surface Sequence View

Terminal

Terminal access is a controlled runtime surface, not an infrastructure backdoor.

The browser receives a short-lived terminal binding, not a reusable secret.
The terminal gateway is the public WebSocket boundary.
The API remains the authority for allocation ownership and session binding.
The node agent exposes the least-privilege shell path on the target node.

Use terminal when the user needs shell access to a running allocation. Do not use it as the primary app-open path for notebook or API products.

Browser App and API App Routes

Interactive tools and app endpoints go through managed ingress. GPUaaS owns the route intent; the edge runtime renders and enforces it.

Important boundary rules:

GPUaaS remains the source of truth for tenant, project, route, app instance, lifecycle, and audit state.
The edge runtime is not the ownership authority.
Browser routes and API routes are distinct route families with different auth and scaling behavior.
Public exposure must terminate through the approved edge profile, not direct workload node ports.

Why This Surface Model Matters

The platform is stronger when a reader can tell these apart immediately:

terminal is a controlled interactive session;
browser-app routing is an edge-owned route family;
API-app routing is a machine-consumed route family;
observability is an operator/runtime surface, not a hidden backend detail;
platform-admin tools are platform routes, not product exceptions.

Metrics, Logs, and Correlation

Metrics are a runtime surface too. The product is incomplete if a workload can be opened but not observed.

Operators should be able to answer:

who owns the workload;
which route or allocation was used;
whether the failure is terminal, proxy, runtime, or upstream;
which dashboard, alert, or runbook owns the symptom.

The preferred path is:

workload detail
  -> status and route read models
  -> correlation id
  -> logs / traces / metrics
  -> owning runbook

What Product and Ops Should Verify

Question	Expected portal answer
How does a user open a runtime?	Through terminal, browser route, or API route from the workload/app surface
How is access controlled?	Allocation binding for terminal; managed route ownership for browser/API routes
How do we know what is live?	Status, route, and runtime read models plus observability signals
Where does scaling or noisy-neighbor control live?	Proxy pool, route family, and policy-driven runtime controls
Where does a platform tool fit?	Platform-owned route family, not an app-specific special case

Canonical sources

Surface Families​

Runtime Access Model​

Surface Sequence View​

Terminal​

Browser App and API App Routes​

Why This Surface Model Matters​

Metrics, Logs, and Correlation​

What Product and Ops Should Verify​