Workload Access and Runtime Surfaces designed
This page explains the runtime-facing surfaces a workload exposes once it is running. It is the architecture view behind the user-facing workload detail screens and the operator-facing route, terminal, and telemetry checks.
Surface Families
| Surface | Primary use | Owning path | External exposure model |
|---|---|---|---|
| Terminal | Shell access to an allocation | API -> terminal gateway -> node agent | Short-lived session binding, not a public node port |
| Browser app | JupyterLab, Headlamp, dashboards, other interactive tools | Managed ingress / platform proxy | Browser login plus route ownership checks |
| API app | OpenAI-compatible or other machine-consumed endpoints | Managed ingress / platform proxy | API bearer auth plus route/project ownership |
| Metrics and status | Runtime health, logs, traces, dashboards, alerts | Observability stack and read models | Read-only ops surfaces, not workload-owned auth |
| Platform admin tools | Grafana, Temporal, Swagger, Redoc, ops consoles | Platform proxy route family | Platform-owned route intent and policy |
Runtime Access Model
Surface Sequence View
Terminal
Terminal access is a controlled runtime surface, not an infrastructure backdoor.
- The browser receives a short-lived terminal binding, not a reusable secret.
- The terminal gateway is the public WebSocket boundary.
- The API remains the authority for allocation ownership and session binding.
- The node agent exposes the least-privilege shell path on the target node.
Use terminal when the user needs shell access to a running allocation. Do not use it as the primary app-open path for notebook or API products.
See also: Terminal Session Security
Browser App and API App Routes
Interactive tools and app endpoints go through managed ingress. GPUaaS owns the route intent; the edge runtime renders and enforces it.
Important boundary rules:
- GPUaaS remains the source of truth for tenant, project, route, app instance, lifecycle, and audit state.
- The edge runtime is not the ownership authority.
- Browser routes and API routes are distinct route families with different auth and scaling behavior.
- Public exposure must terminate through the approved edge profile, not direct workload node ports.
Why This Surface Model Matters
The platform is stronger when a reader can tell these apart immediately:
- terminal is a controlled interactive session;
- browser-app routing is an edge-owned route family;
- API-app routing is a machine-consumed route family;
- observability is an operator/runtime surface, not a hidden backend detail;
- platform-admin tools are platform routes, not product exceptions.
Metrics, Logs, and Correlation
Metrics are a runtime surface too. The product is incomplete if a workload can be opened but not observed.
Operators should be able to answer:
- who owns the workload;
- which route or allocation was used;
- whether the failure is terminal, proxy, runtime, or upstream;
- which dashboard, alert, or runbook owns the symptom.
The preferred path is:
workload detail
-> status and route read models
-> correlation id
-> logs / traces / metrics
-> owning runbook
See also: Observability
What Product and Ops Should Verify
| Question | Expected portal answer |
|---|---|
| How does a user open a runtime? | Through terminal, browser route, or API route from the workload/app surface |
| How is access controlled? | Allocation binding for terminal; managed route ownership for browser/API routes |
| How do we know what is live? | Status, route, and runtime read models plus observability signals |
| Where does scaling or noisy-neighbor control live? | Proxy pool, route family, and policy-driven runtime controls |
| Where does a platform tool fit? | Platform-owned route family, not an app-specific special case |