Launch, Allocation, And Runtime Flow designed
The product flow is compute-first and extensible. A user should be able to get GPU capacity quickly, then optionally attach durable storage, apply a managed runtime, launch a workload profile, grant project-member access, restart, and release.
Object Model
| Object | What it means |
|---|---|
| Allocation | One active compute environment, machine binding, terminal/SSH endpoint, and billing/runtime record |
| Attached storage | Durable project-scoped data attached into an allocation |
| Managed runtime bundle | Optional supported environment layered onto the allocation, such as PyTorch or Jupyter |
| SSH access grant | Project/user access to the allocation using user-owned public keys |
| Launchable workload profile | Runnable app/workload profile on top of an allocation, usually OCI-backed |
These objects should not be collapsed into one overloaded launch configuration.
Create Flow
The minimum fast path remains:
- Choose compute.
- Choose access.
- Allocate.
Optional enrichments should be steps, not required complexity:
- Compute selection.
- Access setup.
- Optional persistent storage.
- Optional managed runtime.
- Review and create.
Post-Create Flow
Once the allocation is active, allocation detail should let the user manage:
- SSH access,
- attached storage,
- managed runtime bundle,
- launchable workload profiles,
- restart,
- metrics and health,
- activity and evidence.
Capacity And Placement Model
Users launch a compute outcome, not a raw host implementation. Underneath that, GPUaaS can realize the request as:
- a whole-node
baremetalallocation, or - a
gpu_sliceallocation backed by approved host-local slot bundles.
For product and operations, the important distinction is that the control plane chooses the placement outcome and the node/runtime layer executes it. The platform does not infer slice safety from a GPU count alone.
See GPU Slicing And Scheduler Layers for the control-plane versus node-plane model.
Runtime Journey
User Experience Rules
- Launch is a full-page wizard, not a modal.
- Compute comes first because it is the primary purchase/intent.
- Access comes early because users need to know how they will get in.
- Storage is optional but durable and should not be hidden under a single workload.
- Runtime is an overlay, not the base machine identity.
- Restart preserves storage intent, SSH access state, and managed runtime association.
- Collaboration uses each user's own public keys; the platform does not share private keys.
Implementation Implications
| Product requirement | Implementation implication |
|---|---|
| One launch submit | Frontend should not loop over per-node calls; use one idempotent submit path |
| Visible lifecycle | Task and allocation states must remain discoverable after submit |
| Durable data | Storage pages and allocation detail must cross-link attachments |
| Runtime overlays | Managed runtime state should be changeable after allocation creation where supported |
| Access grants | Project-member SSH access needs explicit grants and auditability |
| Release/restart safety | Destructive or disruptive actions require confirmation and visible recovery state |