Platform Overview implemented
GPUaaS is a GPU capacity and app platform. Users provision GPU-backed runtime capacity, access it through SSH or browser terminal, attach storage, and pay for usage. Operators manage the control plane, worker fleet, node fleet, release evidence, and production readiness. Developers build apps on top of the platform through contracts, manifests, and the App SDK path.
What Is Easy To Miss
The product is also the first concrete consumer of a broader shared-platform model. The repo already contains reusable platform contracts for:
- IAM and scoped authorization;
- billing, metering, and immutable ledger custody;
- audit and evidence capture;
- policy, quota, and registry contracts;
- runtime access surfaces beyond simple SSH.
That matters because future products such as Token Factory are intended to build on those contracts rather than fork the platform.
Overview Decision Route
| If the reader needs to know... | Open this first | Then go here |
|---|---|---|
| what the product actually does today | Use GPUaaS | Product Team Handoff |
| how the platform is built and operated | Architecture | Operators |
| how developers integrate with it | Build on GPUaaS | Developer APIs |
| whether it is ready for internal or external review | Security & Production Readiness | Portal Roadmap |
What Has Been Built
| Capability | What exists | Portal path |
|---|---|---|
| User and tenant workflows | Launch, access, billing, storage, troubleshooting, and tenant-admin journeys are mapped to the v3 product IA | Use GPUaaS |
| App developer surface | App SDK docs plus two working reference controllers that prove the composition model with Slurm and RKE2 | Build on GPUaaS |
| API and event contracts | REST and AsyncAPI artifacts, auth/access guidance, error model, idempotency, and contract sync | Developer APIs |
| Platform foundation | Shared services model, domain ownership, code/deployment architecture, enforced architecture guards, and gap portfolio | Architecture |
| Security and readiness | Current controls, production-readiness gaps, release evidence, and guard graduation model | Security & Production Readiness |
| Operations | Release operations, runbook index, day-2 management model, observability and evidence expectations | Operators |
| Documentation governance | Source-of-truth rules, publication tracks, visual standards, review guide, and Fairway evidence | Portal Roadmap |
Why The Platform Is More Than A Demo Stack
| Differentiator | Why it matters |
|---|---|
| Shared-platform service model | makes GPUaaS a reusable platform foundation, not a one-off product |
| Shipped proof points | Slurm, RKE2, node-agent runtime, and boundary guards show platform behavior, not only design intent |
| Hierarchical IAM and billing attribution | keeps org, department, project, and resource ownership coherent |
| Multiple runtime surface families | terminal, browser app, API app, metrics, and platform-admin tools are modeled separately |
| Evidence-first operator path | release, UAT, security, and rollback proofs are first-class, not afterthoughts |
| Config-driven promotion model | kind, dev, demo, staging, and later prod are meant to be one automation family |
Product Shape
The platform has four main user-facing surfaces:
| Surface | Primary audience | Purpose |
|---|---|---|
| GPUaaS console | Users and tenant admins | Find capacity, launch workloads, manage access, storage, and billing |
| Operator/Admin surfaces | Platform operators and support | Manage inventory, releases, incidents, audits, and readiness evidence |
| App platform | Internal, partner, and future external developers | Package and publish apps that run on GPUaaS without owning the full platform |
| Developer APIs and CLI | API consumers and automation | Integrate through stable REST/event contracts and generated artifacts |
What This Page Should Settle Fast
- Is this primarily a user product question, an operator question, a developer integration question, or a review/readiness question?
- Which top-level surface owns the next step?
- Is the reader looking for implemented behavior or roadmap/readiness context?
Capability Map
How To Read The Portal
Start with this overview when you need the product shape. Use System Overview when you need the runtime and control-plane flow. Use Day-2 Operations when you need to understand how operators manage the platform after deployment.
Authority Rule
This page is a front door, not the source of truth. It should route a reader to the correct owned section quickly, then get out of the way.