Skip to main content

Platform Proof Points implemented

This page exists to correct a recurring portal problem: the codebase proves more than the portal currently claims. These are not roadmap aspirations. They are shipped proof points already present in the repo and runtime model.

The Main Correction

GPUaaS should not be described only as:

  • a readiness packet,
  • a GPU allocation console,
  • or an App SDK direction.

It already proves a reusable platform through working controllers, enforced architecture boundaries, a deep node runtime, and compliance-grade custody paths.

Proof Map

If you need the reviewer-first answer to "what is implemented, what is production-shaped, and what is still partial", start with Platform Capability Summary.

1. App SDK Proof: Two Real Controllers Exist

The strongest proof that the App SDK and app-platform composition model are real is not the manifest docs. It is the existence of two non-trivial reference controllers built on the same platform:

Reference controllerCurrent proof in repoWhat it proves
Slurm reference controllercmd/slurm-reference-controller/main.go with about 2.9K non-test Go linesscheduler lifecycle, cluster/member reconciliation, runtime state reporting, token and artifact mediation
RKE2 self-managed controllercmd/rke2-self-managed-controller/main.go with about 1.7K non-test Go linescluster bootstrap, server/agent reconcile, join/drain/member lifecycle, app-runtime-style orchestration

That means the right claim today is:

  • the App SDK path is not hypothetical;
  • two working distributed-systems controllers already prove the composition model;
  • broader self-service productization can still be incomplete without erasing that shipped proof.

Open App SDK Proof for the builder-oriented explanation of this same claim.

2. Architecture Is Enforced, Not Just Documented

GPUaaS already has an executable architecture-enforcement system, not only architecture diagrams.

The boundary guard at scripts/ci/platform_foundation_boundary_report.sh checks:

  • import boundaries
  • schema ownership
  • event ownership
  • route placement
  • frontend boundary drift
  • guarded debt with explicit owner and expiry handling

This is what makes the platform/product split credible. It means:

  • shared services are not just a naming convention;
  • later products do not get to silently fork platform authority;
  • architecture violations can fail CI instead of becoming cleanup work months later.

3. The Node Agent Is A Real Runtime Subsystem

The node agent is one of the deepest pieces of engineering in the repo.

The implementation already includes:

  • pull-based typed-task execution
  • mTLS enrollment and cert lifecycle
  • terminal stream bridging
  • allocation user lifecycle
  • storage/runtime cleanup tasks
  • GPU slice topology discovery
  • slice VM provision and release
  • bounded host runtime operations

The current implementation spread is substantial:

  • cmd/node-agent/**: about 11K non-test Go lines
  • core execution surfaces include agent.go, catalog.go, slice_vm.go, slice_topology.go, oci_workload.go, and terminal_stream.go

This is not “SSH plus some scripts.” It is a platform-controlled runtime executor with bounded host authority.

Open Node Agent Runtime Depth for the deeper runtime and authority view.

4. Compliance And Custody Are Already First-Class

The current portal can over-focus on readiness gaps and hide the compliance substrate that is already shipped.

What already exists:

AreaCurrent proof
Audit integritychained integrity and replication logic in packages/platform/audit/**
Evidence-first posturerelease, UAT, rollback, and approval artifacts are first-class operating surfaces
Immutable money pathbilling types/service model in packages/platform/billing/**
Scoped shared authorityIAM, policy, audit, billing, and registry boundaries are separated from product code

Combined, these are the beginnings of a defensible sovereign and regulated platform substrate, even where final external claims, deployment profiles, or certification language are still appropriately blocked.

5. What This Means For Portal Calibration

The portal should keep caution where caution belongs:

  • do not claim sovereign deployment just because custody primitives exist;
  • do not claim full platform-managed app maturity just because reference controllers exist;
  • do not claim certification or production posture without evidence.

But the portal should also stop labeling shipped proof as if it were only a design concept.

Best Current Claims

ClaimCurrent truth
shared-platform modelimplemented and enforced
App SDK proofimplemented through shipped Slurm and RKE2 reference controllers
node runtime depthimplemented through the node-agent task/runtime model
audit and ledger custodyimplemented as platform primitives
full production readiness everywherestill evidence- and environment-dependent

Where To Go Next

If you want to inspect...Open this
platform/product splitPlatform Shared Services
strongest existing platform storyWhat GPUaaS Already Proves as a Platform
App SDK pathBuild on AI Cloud
node runtime modelNode Agent Runtime Depth, Node Lifecycle, and GPU Slicing And Scheduler Layers
production/readiness cautionSecurity & Production Readiness