Skip to main content

Platform Overview implemented

GPUaaS is a GPU capacity and app platform. Users provision GPU-backed runtime capacity, access it through SSH or browser terminal, attach storage, and pay for usage. Operators manage the control plane, worker fleet, node fleet, release evidence, and production readiness. Developers build apps on top of the platform through contracts, manifests, and the App SDK path.

What Is Easy To Miss

The product is also the first concrete consumer of a broader shared-platform model. The repo already contains reusable platform contracts for:

  • IAM and scoped authorization;
  • billing, metering, and immutable ledger custody;
  • audit and evidence capture;
  • policy, quota, and registry contracts;
  • runtime access surfaces beyond simple SSH.

That matters because future products such as Token Factory are intended to build on those contracts rather than fork the platform.

Overview Decision Route

If the reader needs to know...Open this firstThen go here
what the product actually does todayUse GPUaaSProduct Team Handoff
how the platform is built and operatedArchitectureOperators
how developers integrate with itBuild on GPUaaSDeveloper APIs
whether it is ready for internal or external reviewSecurity & Production ReadinessPortal Roadmap

What Has Been Built

CapabilityWhat existsPortal path
User and tenant workflowsLaunch, access, billing, storage, troubleshooting, and tenant-admin journeys are mapped to the v3 product IAUse GPUaaS
App developer surfaceApp SDK docs plus two working reference controllers that prove the composition model with Slurm and RKE2Build on GPUaaS
API and event contractsREST and AsyncAPI artifacts, auth/access guidance, error model, idempotency, and contract syncDeveloper APIs
Platform foundationShared services model, domain ownership, code/deployment architecture, enforced architecture guards, and gap portfolioArchitecture
Security and readinessCurrent controls, production-readiness gaps, release evidence, and guard graduation modelSecurity & Production Readiness
OperationsRelease operations, runbook index, day-2 management model, observability and evidence expectationsOperators
Documentation governanceSource-of-truth rules, publication tracks, visual standards, review guide, and Fairway evidencePortal Roadmap

Why The Platform Is More Than A Demo Stack

DifferentiatorWhy it matters
Shared-platform service modelmakes GPUaaS a reusable platform foundation, not a one-off product
Shipped proof pointsSlurm, RKE2, node-agent runtime, and boundary guards show platform behavior, not only design intent
Hierarchical IAM and billing attributionkeeps org, department, project, and resource ownership coherent
Multiple runtime surface familiesterminal, browser app, API app, metrics, and platform-admin tools are modeled separately
Evidence-first operator pathrelease, UAT, security, and rollback proofs are first-class, not afterthoughts
Config-driven promotion modelkind, dev, demo, staging, and later prod are meant to be one automation family

Product Shape

The platform has four main user-facing surfaces:

SurfacePrimary audiencePurpose
GPUaaS consoleUsers and tenant adminsFind capacity, launch workloads, manage access, storage, and billing
Operator/Admin surfacesPlatform operators and supportManage inventory, releases, incidents, audits, and readiness evidence
App platformInternal, partner, and future external developersPackage and publish apps that run on GPUaaS without owning the full platform
Developer APIs and CLIAPI consumers and automationIntegrate through stable REST/event contracts and generated artifacts

What This Page Should Settle Fast

  • Is this primarily a user product question, an operator question, a developer integration question, or a review/readiness question?
  • Which top-level surface owns the next step?
  • Is the reader looking for implemented behavior or roadmap/readiness context?

Capability Map

How To Read The Portal

Start with this overview when you need the product shape. Use System Overview when you need the runtime and control-plane flow. Use Day-2 Operations when you need to understand how operators manage the platform after deployment.

Authority Rule

This page is a front door, not the source of truth. It should route a reader to the correct owned section quickly, then get out of the way.