# Launchable OCI Workload Images v1

Purpose:
- Define a product model for OCI-based launchable workloads such as JupyterLab and vLLM.
- Keep these images separate from raw compute allocations and from node-local managed runtime bundles.
- Establish OCI as the first canonical packaging format for this app class.

Inputs:
- `doc/product/Managed_Runtime_Bundles_v1.md`
- `doc/product/Allocation_And_Runtime_Flows_v1.md`
- `doc/architecture/App_Platform_OCI_Registry_Baseline_v1.md`
- `doc/architecture/App_Runtime_Operating_Modes_v1.md`
- `doc/architecture/Launchable_OCI_Workload_Profile_Contract_v1.md`

Related:
- `doc/product/Allocation_Storage_Model_v1.md`
- `doc/product/Navigation_Redesign_App_Platform_v1.md`

---

## 1. Executive Summary

Some user experiences are not best modeled as:
- raw compute allocations
- or node-local managed runtime bundles

Examples:
- JupyterLab
- vLLM inference server
- notebook workspace image
- model-serving image
- composed AI service stacks, such as model workers plus gateways and optional observability

These should be treated as:
- **launchable OCI workload images**

The first version should:
- use OCI as the canonical format
- use the platform registry as the canonical source
- allow the platform to pre-seed supported images
- let users choose workspace/storage attachment and launch settings intentionally

Apptainer or other packaging/runtime adapters can come later if needed.

---

## 2. Problem Statement

The platform needs a way to offer higher-level user experiences such as JupyterLab or vLLM without:
- baking every framework into base node images
- asking every user to build and maintain those environments manually over SSH
- pretending those experiences are the same thing as node-local managed runtime bundles

These image-backed workloads are different because they are:
- runnable units
- often network-exposed services
- sometimes multiple cooperating services rather than one process
- often backed by mounted workspace/storage paths
- often closer to “apps” than to “activation scripts”

---

## 3. Product Boundary

The platform should distinguish three layers:

### 3.1 Raw compute allocation

Provides:
- machine
- SSH
- terminal
- restart
- attached storage

### 3.2 Managed runtime bundle

Provides:
- platform-owned user-space environment on the machine
- installed into platform-owned paths such as `/opt/gpuaas/runtimes/...`

Examples:
- PyTorch base environment
- Jupyter kernel tooling

### 3.3 Launchable OCI workload image

Provides:
- a packaged runnable environment or service
- pulled from registry
- launched with explicit workspace/storage and exposure settings

Examples:
- JupyterLab server
- vLLM service
- training workspace image
- multi-lane model-serving stack with proxy and optional observability sidecars

These should not be collapsed into the same product type.

---

## 4. First-Version Decision

The first packaging format should be:
- **OCI**

The first registry source should be:
- **GPUaaS-controlled registry**

The platform should:
- pre-seed supported images
- version them
- expose them as product choices

The platform should not require Apptainer as the first canonical format.

Reason:
- OCI has broader ecosystem support
- better registry compatibility
- cleaner artifact promotion model
- simpler first implementation path

Apptainer support can be added later as:
- a runtime adapter
- a conversion path
- an HPC-specific execution mode

---

## 5. First Candidate Workload Images

Good initial candidates:
- JupyterLab
- vLLM

Likely later candidates:
- notebook workspace variants
- model-serving stacks
- training worker images
- inference gateway images
- composed AI service profiles with optional gateways, Grafana, metrics, and log collection sidecars

These should be curated and supportable, not an unbounded random-image launch feature in the first slice.

---

## 6. Registry Strategy

The platform should pre-seed the registry with supported images.

Why this matters:
- faster first launch
- predictable version support
- cleaner trust/promotion story
- easier documentation and troubleshooting
- clearer tenant/project app catalog entries

The product should present:
- friendly product label
- version
- image family/runtime notes
- support level

Internally, this maps to OCI artifacts in the platform registry.

---

## 7. Workspace and Storage Model

Launchable OCI workloads need an explicit workspace/storage story.

The user should be able to choose:
- where the workspace lives
- whether it is backed by attached persistent storage
- whether the workload has scratch-only local storage

Recommended first rule:
- OCI workload images can mount project persistent storage into a known workspace path

Examples:
- `/workspace`
- `/data`

This should be configurable by image profile, not arbitrary per-launch path text in the first slice.

---

## 8. Composition Profiles

Some realistic app scenarios are not a single container.

Examples:
- single-model inference server
- multi-lane model-serving stack with separate GPU placement per lane
- notebook workspace with a browser endpoint
- API gateway plus worker containers
- optional metrics exporter, Grafana, Alertmanager, Loki, or log collection sidecars

This can look like Docker Compose from the user's point of view, but the product should model it as a curated **composition profile**, not arbitrary user-supplied compose YAML in the first slice.

The profile should define:
- required containers
- optional add-ons
- allowed ports and exposure model
- workspace/storage mounts
- GPU placement and tensor-parallel intent where relevant
- resource expectations
- health checks
- validation/smoke test commands
- default dashboards or log views

For example:
- `model-server-basic`: one model-serving container on one or more GPUs
- `model-server-proxied`: model worker plus an internal or platform-proxied gateway
- `mixed-model-observable`: multiple model-serving lanes, gateway routing, metrics, dashboards, and log capture

The user-facing UX can be:
- choose workload image or profile
- choose workspace/model path
- choose GPU placement or accept the profile default
- choose optional add-ons
- launch

Optional add-ons must be explicit because they add resource cost and operational surface area.

For the first implementation, these profiles should still use OCI artifacts controlled by the platform registry. The profile can be implemented with Docker Compose, containerd, Kubernetes, or another runtime later, but the product contract should describe the workload profile rather than exposing the low-level runtime as the product.

---

## 9. Profile Manifest And UI Mapping

The platform should avoid building a custom UI and deployment path for every workload package.

The preferred model is:
- a typed workload profile manifest is the source of truth
- the UI form is generated from the manifest schema
- validated user input becomes a values object
- the platform renders one or more deployable artifacts from that values object

Execution engines and deployable artifacts may be:
- Docker Compose YAML for an allocation-local runtime
- Kubernetes YAML for a Kubernetes-backed runtime
- Helm values for charts where Helm is the right packaging layer
- dstack run configuration for ML-native dev environment, task, service, or fleet execution
- platform-native node-agent tasks for allocation-local actions

This gives us a bridge from product intent to implementation without exposing the low-level runtime as the product API.

### 9.1 Candidate standards to adapt

There are several existing patterns worth borrowing from:
- Helm chart `values.schema.json`: practical bridge from values to generated UI and validation
- React JSON Schema Form style UI schemas: separates data shape from form layout
- CUE: stronger typed configuration and Kubernetes YAML generation/validation
- OAM/KubeVela-style definitions: application definitions with CUE templates and parameter schemas
- Score: workload intent specification that can target different backends

Recommended first slice:
- define a GPUaaS workload profile manifest
- use JSON Schema for user-facing parameter schema and UI form generation
- support Helm-style values and `values.schema.json` as the first packaging/rendering adapter where the workload is Kubernetes-shaped
- allow later implementation adapters to render Docker Compose, raw Kubernetes YAML, dstack run configs, or node-agent tasks

The first concrete contract for this manifest is:
- `doc/architecture/Launchable_OCI_Workload_Profile_Contract_v1.md`

Recommended long-term direction:
- evaluate CUE specifically for Kubernetes-heavy profiles because it can express constraints and generate validated YAML more safely than string templating
- keep JSON Schema export for UI form generation and API validation even if CUE becomes an internal authoring/rendering layer

### 9.2 Manifest shape

A profile manifest should describe:
- identity: name, version, publisher, support level
- parameters: user-facing schema and defaults
- UI hints: order, widgets, grouping, descriptions, warnings
- components: containers, images, commands, ports, health checks
- resources: GPU count, GPU placement intent, CPU, memory, shared memory, storage mounts
- add-ons: optional gateway, metrics, dashboards, logs, alerts
- secrets: required secret names and injection paths, not raw secret values
- outputs: endpoints, dashboard links, generated commands
- validation: smoke tests, health checks, readiness gates
- execution engines: supported backends such as Docker Compose, Kubernetes, Helm, dstack, or node-agent

The important boundary is that the manifest defines product intent and constraints. The execution engine adapter owns backend-specific YAML, values, or run configuration.

### 9.3 Helm-first Kubernetes adapter

Helm is a good first adapter for Kubernetes-shaped packages because:
- it already has a chart packaging model
- it already has a values file model
- `values.schema.json` gives us a practical validation and UI-form bridge
- many existing apps can be adapted without inventing a full renderer immediately

The first Kubernetes-backed profile path can therefore be:
1. profile manifest references a chart or chart bundle
2. profile manifest exposes a curated parameter subset
3. UI renders from JSON Schema and UI hints
4. user input becomes a values object
5. platform validates entitlements, storage, secrets, and resource constraints
6. platform renders/applies Helm values through the runtime adapter

This does not mean users upload arbitrary Helm charts in the first slice. The chart/profile remains curated and supportable.

### 9.4 Execution engine adapters

The platform should treat Docker Compose, Kubernetes, Helm, dstack, and node-agent tasks as execution engines behind the same workload profile model.

Each adapter is responsible for:
- translating the curated profile values object into its native artifact
- submitting or applying the workload
- managing the native run/release/update operation
- returning status, logs, metrics, endpoints, and events to GPUaaS
- mapping native lifecycle into GPUaaS workload/app-instance lifecycle
- surfacing adapter-specific errors into the shared workload activity/error model

Examples:
- Docker Compose adapter: renders compose YAML for allocation-local container execution
- Kubernetes adapter: renders Kubernetes objects directly
- Helm adapter: renders curated values for a chart bundle
- dstack adapter: renders `.dstack.yml` or uses the dstack API for ML-native tasks, services, dev environments, and fleets
- node-agent adapter: dispatches platform-native node tasks for local allocation actions

This keeps the product model stable while allowing different workload classes to use the execution engine that fits best.

### 9.5 dstack adapter evaluation

dstack is a strong candidate to evaluate as one execution engine adapter for ML-native workloads, not as the replacement for the GPUaaS app platform control plane.

It maps well to workloads such as:
- developer environments
- training or batch tasks
- inference services
- fleet-backed GPU execution across Kubernetes, cloud GPUs, or SSH/on-prem nodes

The GPUaaS ownership boundary should remain:
- catalog and app entitlement
- tenant/project IAM and service accounts
- billing and quota policy
- storage and SSH access policy
- profile approval, provenance, and support level
- UI and API contract
- audit and lifecycle records

dstack, if adopted, should implement the same execution-engine adapter contract as the other engines. It should not introduce a separate product lifecycle outside the shared GPUaaS workload/app-instance model.

This gives us a useful path for ML-native orchestration without coupling our product model to dstack-specific concepts too early.

Open integration questions:
- Should GPUaaS run a shared dstack server per control plane, per tenant, or per environment?
- How do dstack projects map to GPUaaS tenants/projects?
- How do dstack fleets map to GPUaaS nodes, allocations, RKE2 clusters, and future MAAS pools?
- Can we reconcile dstack events/logs/metrics into the same app-instance activity model?
- Which workloads should prefer Helm/Kubernetes directly versus dstack as an adapter?

### 9.6 Why not raw YAML as the user interface

Raw YAML should remain available for advanced operators and debugging, but it should not be the primary product experience.

Reasons:
- users need guided fields, defaults, warnings, and validation
- different backends need different YAML for the same product intent
- we need entitlement, billing, storage, and security checks before rendering
- supportability requires knowing which profile and parameter set produced the running workload

The UI should therefore map to a typed values object, not directly to arbitrary YAML text.

---

## 10. Launch Modes

Two future-compatible modes are likely:

### 10.1 Launch on existing allocation

User already has compute.
The workload image runs on that allocation.

Best for first slice:
- simpler
- avoids tying compute provisioning and workload scheduling together immediately

### 10.2 Provision-and-launch

User picks the workload image and the platform acquires the required compute automatically.

Better long-term UX:
- stronger app-platform experience
- fewer steps for users

But it is a larger orchestration and billing problem.

Recommended decision:
- start with **launch on existing allocation**

---

## 11. UX Direction

For the first slice, the user flow should be:
1. get or select an active allocation
2. choose a supported OCI workload image
3. choose workspace/storage attachment
4. choose exposure/access mode
5. launch

For composition profiles, add:
- optional add-on selection, such as Grafana or Loki
- a clear cost/resource warning for each add-on
- pre-baked dashboards where observability add-ons are selected
- GPU placement and model-lane summary where the profile spans more than one worker

The workload detail should show:
- image name and version
- current state
- workspace/storage mount info
- exposed endpoint if any
- recent activity/logs
- enabled add-ons and their endpoints
- bundled dashboards or log views where applicable
- model lanes, GPU bindings, and health status where applicable

---

## 12. Access Model

These workloads often expose HTTP or notebook endpoints.

The product needs a clear access model for:
- private endpoint only
- platform-proxied endpoint
- auth/session handling

This should align with the broader embedded UI contract work, but the workload-image model itself should assume:
- some images are service-like and browser-facing
- some are only internal helper workloads

---

## 13. Relationship to Managed Runtime Bundles

Managed runtime bundles and launchable OCI images are complementary.

Managed runtime bundle:
- installs a supported environment onto the node
- user still works directly on the machine

Launchable OCI workload image:
- runs a packaged environment or service
- may expose a browser endpoint or service endpoint

Example distinction:
- `PyTorch` as managed runtime bundle
- `JupyterLab` as launchable OCI workload image using that class of environment
- `mixed model-serving stack` as a launchable composition profile

---

## 14. Out of Scope for First Slice

Not in the first workload-image slice:
- arbitrary user-provided image launches from any registry
- Apptainer as the canonical first format
- arbitrary user-supplied Docker Compose YAML
- arbitrary user-supplied Kubernetes manifests
- unbounded multi-container scheduling
- full Kubernetes dependency for the first version
- arbitrary workspace path selection

Those can come later once the product and trust boundary are clearer.

---

## 15. Open Questions

- Should the first JupyterLab or vLLM images be project-only apps, tenant-scoped apps, or allocation-scoped launchables?
- How much parameterization should be allowed at launch time?
- Should the first exposure model be platform-proxied only, or allow direct port binding as well?
- How should these launches be metered relative to the underlying allocation?
- Should composition profiles be stored as app manifests, OCI annotations, or a separate profile catalog?
- Which observability add-ons should be free/default versus explicit billable add-ons?
- How much GPU placement should users control directly versus inheriting from a tested profile?
- Should profile validation scripts be first-class artifacts so the platform can expose health and readiness uniformly?
- Should we store the canonical manifest in JSON Schema plus renderer templates, CUE with JSON Schema export, or both?
- Which renderer should be first: Docker Compose on allocation-local Docker, node-agent container tasks, or Kubernetes YAML?
- Should dstack be introduced as an optional adapter for ML-native dev environments/tasks/services before we build equivalent orchestration ourselves?

---

## 16. Decision Summary

The platform should introduce a distinct app class:
- launchable OCI workload images

The first version should:
- use OCI as canonical format
- use the platform registry as canonical source
- pre-seed supported images such as JupyterLab and vLLM
- start with launch-on-existing-allocation
- require an explicit workspace/storage model

This creates a clean path for browser- and service-oriented ML experiences without overloading raw allocations or node-local managed runtime bundles.