Skip to main content

App SDK Overview implemented

The App SDK is the developer-facing product surface for building apps on GPUaaS. Its purpose is to let app teams declare runtime needs, publish trusted artifacts, operate app instances, and integrate with GPUaaS identity, billing, audit, and lifecycle controls without patching platform-core code.

Shipped Reference Proof

The SDK path already has real implementation proof, not only documentation:

Proof pointWhat it shows
cmd/slurm-reference-controller/main.goscheduler-grade runtime lifecycle and reconcile behavior on platform contracts
cmd/rke2-self-managed-controller/main.gocluster bootstrap and member/runtime operations on the same app-platform path

This does not mean every future app flow is productized equally well yet. It does mean the platform should stop describing the whole App SDK surface as if it were still conceptual.

Composition Model

The SDK is not “one helper library.” It is the boundary that lets app teams reuse platform authority without patching product-core behavior.

What GPUaaS Owns

  • Identity, IAM, tenant and project hierarchy.
  • App catalog, entitlements, app instances, and shared runtime resources.
  • Allocation lifecycle, placement primitives, billing attribution, and audit.
  • Credential custody and delivery through supported platform paths.
  • Common UX shell, API contracts, and evidence/correlation surfaces.

What The App Team Owns

  • Runtime-specific controller logic.
  • Runtime bootstrap, reconcile, health, recovery, and teardown behavior.
  • App-specific operational knowledge and failure handling.
  • Manifest metadata, version metadata, and artifact package discipline.

Developer Mental Model

Catalog and entitlement
-> app is visible and allowed for a project

App instance or shared runtime
-> durable control-plane resource owned by a project or tenant

App-owned worker/operator
-> runtime-specific reconcile loop using public APIs

Runtime/data plane
-> Slurm, Ray, MLflow, model gateway, notebook, or another app runtime

Contract Rules

  • Build against public APIs and committed contracts, not internal Go packages.
  • Do not assume database access or undocumented routes.
  • Express behavior through declared capabilities, endpoint types, auth pattern, lifecycle hooks, and manifest fields.
  • Treat closed enums as platform commitments; add new values deliberately.
  • Keep app-specific runtime intelligence outside platform core unless it becomes a reusable primitive.

Change Classification

Every app-facing change should name its class before implementation.

ClassBelongs inExamples
Runtime fixruntime/controller/backend owning layerapp route reconciliation, artifact selection bugs, node-task reconciliation, runtime readiness
Catalog or manifest changeApp SDK and manifest contractports, health paths, route intent, auth mode, connect actions, launch defaults
SDK/developer contract changeSDK examples, validators, portal, smoke testsservice-account expectations, artifact promotion, launch/connect/decommission flows, developer-visible failure behavior

The SDK should become the app developer contract plus validation harness. Backend runtime implements that contract; seed data should not be the only place where app behavior lives.

Use This With The Practical Onboarding Path

This page is the mental model. The operational sequence for getting a real app into GPUaaS lives in Add a New App. Use both together:

  • this page to understand the boundary and structure;
  • the onboarding page to execute manifest -> artifact -> service account -> catalog -> entitlement -> launch/connect/decommission.

The first readiness artifact is a manifest/launch/connect matrix for supported apps such as vLLM, Headlamp, OpenClaw, Jupyter, and Slurm. It should show what the SDK can express today, what is still a backend compatibility bridge, and which examples need launch/connect/decommission smoke coverage.

Current Readiness Path

The current platform-foundation docs treat the App SDK as both a product surface and an internal developer platform capability. The readiness path is:

  1. use the App SDK readiness matrix to decide which app behaviors are contract-backed, example-backed, or still backend-compatibility bridges;
  2. use the executable product onboarding packet for product/app registration, ownership, evidence, release, and support expectations;
  3. use the registry and artifact trust docs for artifact type, trust state, signing/provenance, and promotion evidence;
  4. keep SDK examples tied to launch, connect, decommission, and runtime smoke evidence rather than seed data or backend-only assumptions.

Start with the quickstart, then the manifest guide, then the artifact trust and promotion model. Scheduler or clustered apps should also read the external app team integration and reference workflow docs.