Platform Proof Points implemented
This page exists to correct a recurring portal problem: the codebase proves more than the portal currently claims. These are not roadmap aspirations. They are shipped proof points already present in the repo and runtime model.
The Main Correction
GPUaaS should not be described only as:
- a readiness packet,
- a GPU allocation console,
- or an App SDK direction.
It already proves a reusable platform through working controllers, enforced architecture boundaries, a deep node runtime, and compliance-grade custody paths.
Proof Map
If you need the reviewer-first answer to "what is implemented, what is production-shaped, and what is still partial", start with Platform Capability Summary.
1. App SDK Proof: Two Real Controllers Exist
The strongest proof that the App SDK and app-platform composition model are real is not the manifest docs. It is the existence of two non-trivial reference controllers built on the same platform:
| Reference controller | Current proof in repo | What it proves |
|---|---|---|
| Slurm reference controller | cmd/slurm-reference-controller/main.go with about 2.9K non-test Go lines | scheduler lifecycle, cluster/member reconciliation, runtime state reporting, token and artifact mediation |
| RKE2 self-managed controller | cmd/rke2-self-managed-controller/main.go with about 1.7K non-test Go lines | cluster bootstrap, server/agent reconcile, join/drain/member lifecycle, app-runtime-style orchestration |
That means the right claim today is:
- the App SDK path is not hypothetical;
- two working distributed-systems controllers already prove the composition model;
- broader self-service productization can still be incomplete without erasing that shipped proof.
Open App SDK Proof for the builder-oriented explanation of this same claim.
2. Architecture Is Enforced, Not Just Documented
GPUaaS already has an executable architecture-enforcement system, not only architecture diagrams.
The boundary guard at scripts/ci/platform_foundation_boundary_report.sh
checks:
- import boundaries
- schema ownership
- event ownership
- route placement
- frontend boundary drift
- guarded debt with explicit owner and expiry handling
This is what makes the platform/product split credible. It means:
- shared services are not just a naming convention;
- later products do not get to silently fork platform authority;
- architecture violations can fail CI instead of becoming cleanup work months later.
3. The Node Agent Is A Real Runtime Subsystem
The node agent is one of the deepest pieces of engineering in the repo.
The implementation already includes:
- pull-based typed-task execution
- mTLS enrollment and cert lifecycle
- terminal stream bridging
- allocation user lifecycle
- storage/runtime cleanup tasks
- GPU slice topology discovery
- slice VM provision and release
- bounded host runtime operations
The current implementation spread is substantial:
cmd/node-agent/**: about 11K non-test Go lines- core execution surfaces include
agent.go,catalog.go,slice_vm.go,slice_topology.go,oci_workload.go, andterminal_stream.go
This is not “SSH plus some scripts.” It is a platform-controlled runtime executor with bounded host authority.
Open Node Agent Runtime Depth for the deeper runtime and authority view.
4. Compliance And Custody Are Already First-Class
The current portal can over-focus on readiness gaps and hide the compliance substrate that is already shipped.
What already exists:
| Area | Current proof |
|---|---|
| Audit integrity | chained integrity and replication logic in packages/platform/audit/** |
| Evidence-first posture | release, UAT, rollback, and approval artifacts are first-class operating surfaces |
| Immutable money path | billing types/service model in packages/platform/billing/** |
| Scoped shared authority | IAM, policy, audit, billing, and registry boundaries are separated from product code |
Combined, these are the beginnings of a defensible sovereign and regulated platform substrate, even where final external claims, deployment profiles, or certification language are still appropriately blocked.
5. What This Means For Portal Calibration
The portal should keep caution where caution belongs:
- do not claim sovereign deployment just because custody primitives exist;
- do not claim full platform-managed app maturity just because reference controllers exist;
- do not claim certification or production posture without evidence.
But the portal should also stop labeling shipped proof as if it were only a design concept.
Best Current Claims
| Claim | Current truth |
|---|---|
| shared-platform model | implemented and enforced |
| App SDK proof | implemented through shipped Slurm and RKE2 reference controllers |
| node runtime depth | implemented through the node-agent task/runtime model |
| audit and ledger custody | implemented as platform primitives |
| full production readiness everywhere | still evidence- and environment-dependent |
Where To Go Next
| If you want to inspect... | Open this |
|---|---|
| platform/product split | Platform Shared Services |
| strongest existing platform story | What GPUaaS Already Proves as a Platform |
| App SDK path | Build on AI Cloud |
| node runtime model | Node Agent Runtime Depth, Node Lifecycle, and GPU Slicing And Scheduler Layers |
| production/readiness caution | Security & Production Readiness |