Skip to main content

Workload Access and Runtime Surfaces designed

This page explains the runtime-facing surfaces a workload exposes once it is running. It is the architecture view behind the user-facing workload detail screens and the operator-facing route, terminal, and telemetry checks.

Surface Families

SurfacePrimary useOwning pathExternal exposure model
TerminalShell access to an allocationAPI -> terminal gateway -> node agentShort-lived session binding, not a public node port
Browser appJupyterLab, Headlamp, dashboards, other interactive toolsManaged ingress / platform proxyBrowser login plus route ownership checks
API appOpenAI-compatible or other machine-consumed endpointsManaged ingress / platform proxyAPI bearer auth plus route/project ownership
Metrics and statusRuntime health, logs, traces, dashboards, alertsObservability stack and read modelsRead-only ops surfaces, not workload-owned auth
Platform admin toolsGrafana, Temporal, Swagger, Redoc, ops consolesPlatform proxy route familyPlatform-owned route intent and policy

Runtime Access Model

Surface Sequence View

Terminal

Terminal access is a controlled runtime surface, not an infrastructure backdoor.

  • The browser receives a short-lived terminal binding, not a reusable secret.
  • The terminal gateway is the public WebSocket boundary.
  • The API remains the authority for allocation ownership and session binding.
  • The node agent exposes the least-privilege shell path on the target node.

Use terminal when the user needs shell access to a running allocation. Do not use it as the primary app-open path for notebook or API products.

See also: Terminal Session Security

Browser App and API App Routes

Interactive tools and app endpoints go through managed ingress. GPUaaS owns the route intent; the edge runtime renders and enforces it.

Important boundary rules:

  • GPUaaS remains the source of truth for tenant, project, route, app instance, lifecycle, and audit state.
  • The edge runtime is not the ownership authority.
  • Browser routes and API routes are distinct route families with different auth and scaling behavior.
  • Public exposure must terminate through the approved edge profile, not direct workload node ports.

Why This Surface Model Matters

The platform is stronger when a reader can tell these apart immediately:

  • terminal is a controlled interactive session;
  • browser-app routing is an edge-owned route family;
  • API-app routing is a machine-consumed route family;
  • observability is an operator/runtime surface, not a hidden backend detail;
  • platform-admin tools are platform routes, not product exceptions.

Metrics, Logs, and Correlation

Metrics are a runtime surface too. The product is incomplete if a workload can be opened but not observed.

Operators should be able to answer:

  1. who owns the workload;
  2. which route or allocation was used;
  3. whether the failure is terminal, proxy, runtime, or upstream;
  4. which dashboard, alert, or runbook owns the symptom.

The preferred path is:

workload detail
-> status and route read models
-> correlation id
-> logs / traces / metrics
-> owning runbook

See also: Observability

What Product and Ops Should Verify

QuestionExpected portal answer
How does a user open a runtime?Through terminal, browser route, or API route from the workload/app surface
How is access controlled?Allocation binding for terminal; managed route ownership for browser/API routes
How do we know what is live?Status, route, and runtime read models plus observability signals
Where does scaling or noisy-neighbor control live?Proxy pool, route family, and policy-driven runtime controls
Where does a platform tool fit?Platform-owned route family, not an app-specific special case