# App SDK Readiness Matrix v1

Status: draft for architecture/product/app-developer review (PF-APP-SDK-READINESS-MATRIX-001)
Owner: Platform Architecture / App Platform
Last updated: 2026-06-01

## Purpose

Define the readiness matrix for moving app-facing behavior from backend seed
assumptions and runtime one-offs into an App SDK/developer contract.

The goal is not to say every app change must go through the SDK immediately.
The goal is to classify app changes correctly so runtime bugs are fixed in the
runtime owner, while manifest defaults, launch behavior, connect behavior,
artifact promotion, UAT coverage, and developer-visible failures move toward
SDK-visible manifests, validators, examples, smoke tests, and portal pages.

Use this with:

- `doc/architecture/App_SDK_Design_Principles_v1.md`;
- `doc/architecture/App_Manifest_Registration_Guide_v1.md`;
- `doc/architecture/Launchable_OCI_Workload_Profile_Contract_v1.md`;
- `doc/architecture/App_Artifact_Trust_and_Promotion_v1.md`;
- `doc/architecture/platform-foundation/Platform_Registry_Contract_v1.md`;
- `doc/architecture/platform-foundation/Platform_Evidence_Status_Slice_v1.md`;
- `packages/python-sdk/README.md`;
- `scripts/ops/demo_sdk_smoke.sh`;
- `packages/docs/docs/build-on-gpuaas/`.

## Current Assessment

GPUaaS already has useful App SDK building blocks:

- app/version manifests stored in `app_versions.manifest`;
- launchable OCI workload manifest examples in `scripts/seed.sql`;
- repo-owned launchable OCI manifest contract fixtures under
  `packages/products/appplatform/sdk/testdata/manifests/launchable-oci/`;
- an executable App SDK manifest contract validator in
  `packages/products/appplatform/sdk`;
- app catalog, entitlement, app instance, artifact, shared runtime, and worker
  APIs;
- a Python SDK for control-plane flows;
- a demo SDK smoke script that exercises auth, compute launch precheck,
  allocation connect, app catalog, app instances, app routes, app launch
  precheck, service accounts, billing, and usage;
- developer portal pages for Build on GPUaaS, manifest model, quickstart,
  artifact trust, and example app workflow.

The gap is that these are not yet one cohesive App SDK product surface. Some
app behavior still lives only in seed data, backend route logic, runtime
adapters, or UAT scripts. The next platform-foundation step should make the SDK
the declared contract and validation harness for app-facing behavior, while the
backend runtime implements that contract.

## Change Classification Rule

Every app-facing change must declare one class before implementation.

| Class | Owning layer | Examples | Required follow-through |
|---|---|---|---|
| Runtime fix | runtime/controller/backend owner | route reconciliation bug, OCI artifact selection bug, node-task reconciliation, runtime readiness, proxy target cleanup | fix owning runtime layer, add regression evidence, update SDK only if public behavior changes |
| Catalog or manifest change | App SDK and manifest contract | ports, health paths, route intent, auth mode, connect action, launch default, resource/storage defaults, required dependencies | update manifest schema/examples/validators, seed or registration path, portal docs, and smoke coverage |
| SDK/developer contract change | SDK/developer platform | service-account flow, publish/promotion path, launch/connect/decommission workflow, product-owned failure rendering, example app behavior | update SDK/API wrapper, example, contract test, portal page, and UAT/evidence mapping |

Backend-only edits are acceptable for runtime fixes. Backend-only edits are not
enough for catalog/manifest or SDK/developer contract changes unless the SDK
and manifest path are updated in the same work package or a follow-up gap is
recorded.

## Contract Families

| Contract family | SDK should own or validate | Current source | Target state |
|---|---|---|---|
| Manifest defaults | ports, health paths, route intent, auth mode, connect actions, resource defaults | `scripts/seed.sql` manifests, app architecture docs | manifest package plus validator and generated examples |
| Runtime contract | env vars, mounted credentials, service accounts, storage, network posture, runtime evidence hooks | app manifests, runtime adapters, app runtime service | SDK-visible manifest schema and adapter conformance tests |
| Publish contract | artifact selection, digest requirement, trust state, promotion, versioning | artifact APIs, promotion docs, seed artifacts | SDK publish/promote workflow plus provenance/trust validation |
| Launch contract | required inputs, optional dependencies, default dependency creation, validation errors | V3 launch precheck routes, seed manifest UI hints, frontend launch pages | SDK launch helper, precheck contract tests, manifest-driven forms |
| Connect contract | Open app, Try endpoint, Open cluster, terminal, API-key/token flow, kubeconfig/headlamp routes | workload detail UI, proxy routes, runtime state | typed connect actions generated from manifest and read model |
| UAT contract | launch, connect, decommission, failure smoke coverage per supported app | `scripts/ops/demo_sdk_smoke.sh`, UAT automation | SDK example smokes mapped to product invariant IDs |
| Failure contract | app-auth failure, upstream 503, missing token, bad route, unavailable artifact, node-task timeout | edge proxy handlers, app runtime errors, frontend states | product-owned error taxonomy and SDK-visible failure fixtures |

## Supported App Matrix

Readiness states:

- `ready`: usable through an SDK-visible or manifest-visible contract today;
- `partial`: implemented, but split across seed/runtime/UI/smoke code;
- `gap`: not yet represented as an SDK/developer contract;
- `n/a`: not applicable to this app type.

| App | Current class | Manifest defaults | Launch contract | Connect contract | Publish contract | UAT contract | Failure contract | Notes |
|---|---|---|---|---|---|---|---|---|
| `jupyterlab` | launchable OCI workload | partial | partial | partial | partial | partial | gap | Manifest declares workspace, exposure, resources, port `8888`, endpoint, readiness. Need SDK example that launches, opens notebook route, validates auth/proxy behavior, and decommissions. |
| `vllm-openai` | launchable OCI workload | partial | partial | partial | partial | partial | gap | Manifest declares model, dtype, resources, port `8000`, OpenAI endpoint, readiness path `/v1/models`. Need SDK `Try endpoint` example and service-account/API bearer flow coverage. |
| `code-server` | launchable OCI workload | partial | partial | partial | partial | partial | gap | Manifest declares browser route, port `8080`, resource defaults, workspace mount. Need SDK smoke for browser route readiness and auth failure behavior. |
| `openclaw` | launchable OCI workload | partial | partial | partial | partial | partial | gap | Manifest declares model backend mode, memory mode, browser route, port `18789`. Needs composed-app dependency story for project vLLM endpoint and SDK validation. |
| `rke2-self-managed` | software runtime / scheduler-style app | partial | partial | partial | n/a | partial | gap | Manifest declares cluster config, Headlamp route, auth bridge, root-disk requirement, readiness commands. Need SDK `Open cluster`/kubeconfig/Headlamp contract and failure fixtures. |
| `slurm-reference` | software runtime / scheduler app | partial | partial | partial | n/a | partial | gap | Manifest declares controller/worker topology, bootstrap fields, readiness commands. Need SDK example for shared runtime, worker attach, job submission readiness, and decommission. |

Overall posture: the app model is usable for internal platform/reference apps,
but not yet ready to tell internal or partner developers that the SDK is the
complete source of truth. The first executable validator now covers the
manifest-family contract for the simple launchable OCI apps; the next high
value step is an example-smoke harness that proves launch, connect,
decommission, and failure behavior through the same SDK/developer contract.

## Current SDK Coverage

### Python SDK

`packages/python-sdk` currently provides a control-plane convenience layer for:

- catalog reads;
- allocation create/list/release/get/poll;
- terminal token minting;
- billing balance and usage;
- tenant-shared app runtimes;
- shared runtime attachments;
- shared runtime workers and worker operations.

Current limits documented by the SDK:

- it does not replace app manifest registration;
- it does not define external app-worker packaging;
- it does not own runtime-specific orchestration logic.

For platform-foundation, that is acceptable as current state. The gap is that
the App SDK product promise is broader than the current Python client.

### Demo SDK Smoke

`scripts/ops/demo_sdk_smoke.sh` is valuable because it already checks several
developer-facing invariants:

- auth and whoami;
- project visibility;
- compute catalog and compute launch precheck;
- active allocation metrics and terminal connect;
- workload readiness;
- app catalog presence;
- app instance and proxy route presence;
- app launch precheck for JupyterLab;
- service accounts;
- billing balance and usage.

This script should become evidence input for `APP-CONTRACT-001`,
`APP-LAUNCH-001`, and `APP-FAILURE-001` in the platform evidence/status slice.

## Readiness Gates

| Order | Gate | Pass condition | Current state | First output |
|---:|---|---|---|---|
| 1 | API contract | app catalog, app version, app instance, artifact, shared runtime, launch/precheck, connect, and failure contracts are documented and generated SDK smoke passes | partial | list missing OpenAPI/SDK model coverage |
| 2 | Manifest contract | supported apps express defaults, resources, storage, network, auth, launch UI, outputs, and readiness in manifest shape | partial, first executable validator exists | `go test ./packages/products/appplatform/sdk` validates repo-owned launchable OCI manifest fixtures |
| 3 | Example app | at least one app can be launched, connected, and decommissioned entirely through public SDK/API path | partial | JupyterLab or vLLM SDK example with smoke |
| 4 | Credential flow | service account, scoped token/API key, rotation/revocation/audit path is documented and SDK-smoked | partial | vLLM bearer-token flow example |
| 5 | Portal entry | app developer can follow docs without reading Go packages or seed SQL | partial | portal page linking SDK examples, manifest schema, and smoke commands |
| 6 | Runtime evidence | launch/connect/decommission produces evidence, audit, billing attribution, and status hooks | partial | evidence mapping for each SDK smoke result |
| 7 | Artifact trust | artifact registration, verification, promotion, and digest selection are SDK-visible | partial | publish/promote SDK workflow and fixture |
| 8 | Failure behavior | common app failures render product-owned errors and SDK-visible exceptions | gap | failure fixture suite and product error catalog mapping |

## Recommended First Implementation

1. Keep the repo-owned manifest fixture directory for supported apps aligned
   with the current seed/app-version manifests. Current launchable OCI fixtures
   live under `packages/products/appplatform/sdk/testdata/manifests/launchable-oci/`.
2. Keep the App SDK manifest validator green. It checks required contract
   families: `profile`, `artifacts`, `parameters`, `resources`, `storage`,
   `network`, `execution`, `outputs`, and `validation`.
3. Extend the SDK smoke to emit structured evidence with invariant IDs:
   `APP-CONTRACT-001`, `APP-LAUNCH-001`, `APP-FAILURE-001`, and
   `PLATFORM-AUDIT-001`.
4. Pick one launchable OCI app as the first end-to-end SDK example. JupyterLab
   is the best first target because the connect path is browser-oriented and
   easy for product, architecture, and operations reviewers to understand.
5. Add vLLM second because it proves the API-app path and service-account bearer
   token flow.
6. Add RKE2/Headlamp and Slurm after the simple launchable OCI apps because
   they are scheduler-style runtimes with higher trust and topology complexity.

## Developer Portal Impact

The Docusaurus portal should continue to present "Build on GPUaaS" as the
developer entry point, but pages must be explicit about current versus target
state:

- App SDK Overview: product promise and change classification;
- Quickstart: current API/SDK path and manual/admin bridges;
- Manifest Model: manifest families and validator expectations;
- Artifact Trust and Promotion: digest, trust, promotion, and provenance;
- Example App Workflow: runnable examples and smoke output;
- API Reference: generated OpenAPI/SDK reference;
- Playground: later API playground, preferably backed by mock or sandbox
  environment rather than production.

Portal pages should link to SDK examples and smoke artifacts once they exist,
not only architecture documents.

## Open Review Questions

1. Should the first published App SDK surface be the existing Python SDK, a Go
   module, a TypeScript package, or a documented multi-language contract with
   Python as the first implementation?
2. Should manifest fixtures be generated from `scripts/seed.sql` initially, or
   should seed data be generated from canonical manifest fixture files?
3. Should JupyterLab or vLLM be the first complete SDK example?
4. Which failure cases must be blocking before internal developers can use the
   SDK path without platform-team handholding?
5. Should SDK smoke output write directly to the platform evidence/status API
   once that API exists, or remain CI artifact input first?

## Acceptance For This Task

This matrix is complete enough for the next review when:

- app changes are classified as runtime fix, catalog/manifest change, or
  SDK/developer contract change;
- supported apps are mapped across manifest, launch, connect, failure, publish,
  and UAT readiness;
- the current Python SDK and demo SDK smoke are represented honestly;
- first implementation outputs are named without forcing backend rewrites;
- open questions identify the decisions needed before developer-facing SDK
  work becomes public/internal-productized.
