Skip to main content

Architecture and Design Principles designed

This page states the main design principle in plain language.

Main Principle

GPUaaS is a contract-first control plane on top of shared platform services.

That means:

  1. product behavior starts from explicit contracts, not accidental runtime behavior;
  2. shared concerns such as IAM, billing, audit, evidence, policy, and status should not be reimplemented per product;
  3. product domains own customer value and runtime-specific logic on top of those shared services;
  4. environments and providers are profiles, not the product model itself.

What That Means In Practice

PrinciplePractical meaning
Contract firstAPI and event changes start in contracts and canonical docs before implementation
Control plane, not scriptsGPUaaS should express state, lifecycle, authz, and evidence as durable product surfaces
Shared services firstIAM, billing, audit, evidence, policy, registry, secrets, and status should stay reusable across products
Product domains own runtime logicGPU inventory, allocations, node lifecycle, terminal, app runtime specifics, and future token-factory behavior stay in product-owned domains
Provider-neutral boundaryKeycloak, Cloudflare, Pomerium, MAAS, k8s, and node providers are execution adapters, not the product source of truth
Config-driven environmentskind, dev, demo, staging, and production should differ by profile/config and automation inputs, not by one-off hand edits
Evidence before claimsrelease, security, UAT, and ops claims require durable evidence and readback, not chat memory or manual intuition

One-Sentence Product Explanation

If a product owner asked what we are building, the answer should be:

GPUaaS is a multi-tenant control plane that lets products and users consume GPU-backed runtimes through consistent identity, billing, access, policy, audit, and evidence models.

Can Users Understand It Today?

Partially, but not well enough before this portal pass.

What a normal user should understand:

  • they sign in once and work inside a project/tenant boundary;
  • they launch workloads or apps through one shell;
  • billing, access, and recovery feel product-native;
  • they do not need to understand Keycloak, Pomerium, Cloudflare, or internal workers.

What security, ops, and architecture should understand:

  • the product model stays stable even when the underlying provider, edge, or environment changes;
  • deployment, security, and runtime posture are separate from user-facing workflows;
  • production-readiness is an evidence and environment problem, not just a UI completion problem.

That is why the portal needs both user-facing workflow pages and internal architecture/deployment pages.

Design Test

Use these questions to decide whether a change follows the model:

  1. Is this product behavior expressed in a contract or durable model?
  2. Does it reuse shared IAM/billing/audit/policy/evidence instead of inventing a local variant?
  3. Does it keep provider or environment specifics behind an adapter/profile boundary?
  4. Can product, security, and ops explain it without reading implementation code?
  5. Can staging and production prove it with the same evidence model?

If the answer to several of these is no, the design is drifting.