Skip to main content

Launch, Allocation, And Runtime Flow designed

The product flow is compute-first and extensible. A user should be able to get GPU capacity quickly, then optionally attach durable storage, apply a managed runtime, launch a workload profile, grant project-member access, restart, and release.

Object Model

ObjectWhat it means
AllocationOne active compute environment, machine binding, terminal/SSH endpoint, and billing/runtime record
Attached storageDurable project-scoped data attached into an allocation
Managed runtime bundleOptional supported environment layered onto the allocation, such as PyTorch or Jupyter
SSH access grantProject/user access to the allocation using user-owned public keys
Launchable workload profileRunnable app/workload profile on top of an allocation, usually OCI-backed

These objects should not be collapsed into one overloaded launch configuration.

Create Flow

The minimum fast path remains:

  1. Choose compute.
  2. Choose access.
  3. Allocate.

Optional enrichments should be steps, not required complexity:

  1. Compute selection.
  2. Access setup.
  3. Optional persistent storage.
  4. Optional managed runtime.
  5. Review and create.

Post-Create Flow

Once the allocation is active, allocation detail should let the user manage:

  • SSH access,
  • attached storage,
  • managed runtime bundle,
  • launchable workload profiles,
  • restart,
  • metrics and health,
  • activity and evidence.

Capacity And Placement Model

Users launch a compute outcome, not a raw host implementation. Underneath that, GPUaaS can realize the request as:

  • a whole-node baremetal allocation, or
  • a gpu_slice allocation backed by approved host-local slot bundles.

For product and operations, the important distinction is that the control plane chooses the placement outcome and the node/runtime layer executes it. The platform does not infer slice safety from a GPU count alone.

See GPU Slicing And Scheduler Layers for the control-plane versus node-plane model.

Runtime Journey

User Experience Rules

  • Launch is a full-page wizard, not a modal.
  • Compute comes first because it is the primary purchase/intent.
  • Access comes early because users need to know how they will get in.
  • Storage is optional but durable and should not be hidden under a single workload.
  • Runtime is an overlay, not the base machine identity.
  • Restart preserves storage intent, SSH access state, and managed runtime association.
  • Collaboration uses each user's own public keys; the platform does not share private keys.

Implementation Implications

Product requirementImplementation implication
One launch submitFrontend should not loop over per-node calls; use one idempotent submit path
Visible lifecycleTask and allocation states must remain discoverable after submit
Durable dataStorage pages and allocation detail must cross-link attachments
Runtime overlaysManaged runtime state should be changeable after allocation creation where supported
Access grantsProject-member SSH access needs explicit grants and auditability
Release/restart safetyDestructive or disruptive actions require confirmation and visible recovery state