GPU Slicing And Scheduler Layers designed

GPUaaS does not treat “GPU count” as enough scheduling truth. The platform separates three different concerns:

capacity shape
control-plane placement
node-local runtime execution

That separation is what lets the platform support whole-node leasing today and host-local GPU slice products without collapsing scheduling into ad hoc node scripts.

The Two Capacity Shapes

Shape	Meaning	Scheduling unit
`baremetal`	one allocation owns an entire enrolled GPU host	physical node
`gpu_slice`	one allocation owns one or more approved host-local resource bundles	resource slot set on one node

The first slice model is not “fractional GPU by inference”. It is an approved slot inventory per host. A slot is a schedulable bundle that can include:

GPU device identity
NVMe or volume identity
network or fabric device identity
NUMA locality
CPU and memory reservation metadata
VM/runtime metadata needed to launch safely

This is the key product and architecture rule: the platform schedules approved slots, not raw PCI devices and not just a GPU count.

Control Plane Vs Node Plane

What The Control Plane Owns

The control plane is the authority for intent and placement. It owns:

the public SKU and capacity-shape model;
tenant, project, policy, and entitlement boundaries;
allocation intent and lifecycle;
placement candidate resolution;
durable resource claims;
read models shown to users, admins, and operators;
reconciliation when provider or node truth drifts.

Important consequence: GPUaaS should not let node-local runtime code invent its own product model. The control plane decides:

whether a request is baremetal or gpu_slice,
which node family is eligible,
which slot set is compatible,
whether a multi-slot request must stay on one node,
when a slot or node is blocked, draining, cleanup-blocked, or reusable.

What The Node Plane Owns

The node plane is the authority for host-local execution truth. It owns:

discovery of candidate slot topology on the host;
verification that the host is actually slice-ready;
realization of an approved allocation into a VM or runtime instance;
host-local cleanup and proof on release.

For slice mode, node-agent tasks already expose this boundary:

slice.topology_discover
slice.vm_provision
slice.vm_release

Those tasks are bounded host primitives. They are not a generic remote shell.

Scheduler Layers

There are two scheduler layers, and they solve different problems.

1. Control-plane scheduler

This is the platform scheduler layer. It decides:

which node family or node can satisfy the request;
whether the request is whole-node or slice-backed;
whether a compatible same-node slot set exists;
whether the request violates policy, health, drain, or occupancy rules;
which placement candidate wins.

This layer works with product constructs such as:

SKUs
capacity shapes
node occupancy
resource claims
policy overlays
placement candidates

2. Node or app runtime scheduler

This is the runtime execution layer on top of placed capacity. It decides how work runs inside already allocated capacity. Examples:

a VM runtime using a selected slice bundle;
Slurm or another scheduler app dispatching jobs inside project-owned capacity;
future app operators managing queues, workers, and runtime topology.

This layer must not replace the control-plane scheduler. App schedulers consume platform placement outcomes; they do not own node inventory truth.

Why This Separation Matters

Without the split, three failure modes show up quickly:

product SKUs drift from what hosts can actually run;
node-local scripts become the real scheduler, outside tenancy and audit;
runtime schedulers such as Slurm start leaking scheduler-specific behavior into core allocation APIs.

GPUaaS explicitly avoids that. The platform keeps scheduler-agnostic control primitives in core and pushes runtime-specific logic behind app/operator boundaries.

Placement And Occupancy Rules

The architecture docs already define occupancy as a derived aggregate, not a single “node busy” flag.

Node posture	Meaning
fully available	no blocking claims and node is schedulable
partially allocated	some approved slots are claimed
fully allocated by slices	no additional compatible slots remain
exclusively allocated by baremetal	whole-node claim blocks all slot use
draining	existing allocations continue, new placement stops
cleanup blocked	host or slot cleanup proof is not sufficient for reuse
unavailable	health or admin posture blocks placement

For the first slice implementation, gpu_slice placement must fit on a single physical node. The control plane should fail closed with sku_unavailable when no same-node compatible slot set exists.

Intent, Observed State, And Repair

GPUaaS uses a hybrid control model:

desired state: what the platform intends
observed state: what provider or node evidence says is true
execution state: what the current workflow is doing
projected state: what the UI currently shows

That matters for slicing because a node may be:

intended for slice use,
observed as missing a required host prerequisite,
executing a cleanup or release workflow,
still projected in the UI as available or blocked depending on reconciliation.

The portal should make this easy to understand: placement is not the same as execution, and neither is the same as what the UI last projected.

What Product And Ops Should Read From This

Product view

GPU slice is a first-class product shape, not a hidden runtime detail.
A user buys a capacity shape and SKU outcome, not a raw host implementation.
Future MIG, vGPU, or shared-GPU products should land as explicit shapes or child-slot models, not by overloading the first slice contract.

Ops and infra view

slice capability is a promotion decision with readiness evidence, not a hardware guess;
approved slot inventory is the scheduling source of truth;
cleanup proof is part of safe reuse, not an afterthought;
node-agent remains the bounded execution surface for host-local operations.

Architecture and developer view

core allocation and placement logic stays scheduler-agnostic;
runtime schedulers such as Slurm belong behind the app/operator boundary;
placement correctness depends on claims and slots, not UI read models.

Canonical sources

The Two Capacity Shapes​

Control Plane Vs Node Plane​

What The Control Plane Owns​

What The Node Plane Owns​

Scheduler Layers​

1. Control-plane scheduler​

2. Node or app runtime scheduler​

Why This Separation Matters​

Placement And Occupancy Rules​

Intent, Observed State, And Repair​

What Product And Ops Should Read From This​

Product view​

Ops and infra view​

Architecture and developer view​

Related Portal Pages​