Architecture and Design Principles designed
This page states the main design principle in plain language.
Main Principle
GPUaaS is a contract-first control plane on top of shared platform services.
That means:
- product behavior starts from explicit contracts, not accidental runtime behavior;
- shared concerns such as IAM, billing, audit, evidence, policy, and status should not be reimplemented per product;
- product domains own customer value and runtime-specific logic on top of those shared services;
- environments and providers are profiles, not the product model itself.
What That Means In Practice
| Principle | Practical meaning |
|---|---|
| Contract first | API and event changes start in contracts and canonical docs before implementation |
| Control plane, not scripts | GPUaaS should express state, lifecycle, authz, and evidence as durable product surfaces |
| Shared services first | IAM, billing, audit, evidence, policy, registry, secrets, and status should stay reusable across products |
| Product domains own runtime logic | GPU inventory, allocations, node lifecycle, terminal, app runtime specifics, and future token-factory behavior stay in product-owned domains |
| Provider-neutral boundary | Keycloak, Cloudflare, Pomerium, MAAS, k8s, and node providers are execution adapters, not the product source of truth |
| Config-driven environments | kind, dev, demo, staging, and production should differ by profile/config and automation inputs, not by one-off hand edits |
| Evidence before claims | release, security, UAT, and ops claims require durable evidence and readback, not chat memory or manual intuition |
One-Sentence Product Explanation
If a product owner asked what we are building, the answer should be:
GPUaaS is a multi-tenant control plane that lets products and users consume GPU-backed runtimes through consistent identity, billing, access, policy, audit, and evidence models.
Can Users Understand It Today?
Partially, but not well enough before this portal pass.
What a normal user should understand:
- they sign in once and work inside a project/tenant boundary;
- they launch workloads or apps through one shell;
- billing, access, and recovery feel product-native;
- they do not need to understand Keycloak, Pomerium, Cloudflare, or internal workers.
What security, ops, and architecture should understand:
- the product model stays stable even when the underlying provider, edge, or environment changes;
- deployment, security, and runtime posture are separate from user-facing workflows;
- production-readiness is an evidence and environment problem, not just a UI completion problem.
That is why the portal needs both user-facing workflow pages and internal architecture/deployment pages.
Design Test
Use these questions to decide whether a change follows the model:
- Is this product behavior expressed in a contract or durable model?
- Does it reuse shared IAM/billing/audit/policy/evidence instead of inventing a local variant?
- Does it keep provider or environment specifics behind an adapter/profile boundary?
- Can product, security, and ops explain it without reading implementation code?
- Can staging and production prove it with the same evidence model?
If the answer to several of these is no, the design is drifting.