# Scheduler as Platform App v1

## Goal
Define a sustainable baseline for delivering scheduler capabilities (starting with Slurm) as platform apps, without coupling core allocation APIs to one scheduler implementation.

This document is the implementation baseline for:
1. Internal reference apps (for example Slurm).
2. Future first-party apps.
3. Third-party/team-owned apps built on the same control plane contracts.

Concrete first-adapter companion:
- `doc/architecture/Slurm_App_Runtime_Adapter_v1.md`
- `doc/architecture/Clustered_App_Model_v1.md`
- `doc/architecture/App_Platform_Primitive_Boundary_v1.md`

## Decision Summary
1. `Schedulers` are modeled as apps in App Catalog and instantiated per project/tenant policy.
2. Core platform keeps scheduler-agnostic primitives only (identity, policy, audit, events, tenancy boundaries).
3. Scheduler-specific logic (Slurm/K8s/Ray internals) lives in app operator runtime, not in core allocation handlers.
4. Authorization remains permission-key based; role labels may evolve without handler rewrites.
5. Initial production operating mode is `tenant_dedicated`; shared managed scheduler offerings are explicitly a later mode.

For product language, the scheduler family should expose:
1. `project-scoped mode`
2. `tenant-owned shared mode`
3. later `platform-managed shared mode`

Current mapping:
1. `project-scoped mode` -> `tenant_dedicated + project`
2. `tenant-owned shared mode` -> target `tenant_dedicated + tenant`
3. `platform-managed shared mode` -> `platform_managed + platform`

Important limitation:
- the current app-instance contract is still project-owned
- so `tenant-owned shared mode` is a product target that still needs an
  explicit attachment/ownership model
- see:
  - `doc/architecture/App_Tenant_Shared_Attachment_Model_v1.md`

## Scope
In scope:
1. Responsibility boundary.
2. IAM/policy requirements.
3. Artifact/source model (platform-shared + tenant-scoped).
4. Events and observability requirements.
5. Slurm pilot acceptance criteria and gap capture.
6. operating-mode expectations for scheduler backends.

Out of scope:
1. Slurm internals (controller tuning, partition strategy).
2. MaaS/hardware provisioning implementation details.
3. UI implementation details.

## Core vs App Responsibilities

| Area | Core platform (must provide) | Scheduler app/operator (must provide) |
|---|---|---|
| Identity | User auth, service account auth, project context enforcement | None (consume core identities only) |
| IAM/Authz | Permission evaluation, role bindings, policy overlays, audit logs | Declare required actions and call only allowed endpoints |
| API contracts | Stable control-plane endpoints, canonical error envelope | Adapter/operator APIs behind app runtime boundary |
| Lifecycle | App instance lifecycle (`requested` -> `running`/`failed`) | Scheduler deployment/upgrade/rollback mechanics |
| Events | Typed domain events + correlation propagation | Consume/emit app lifecycle events and runtime status |
| Tenancy | Project/tenant ownership and boundary checks | Never bypass project boundary; include context in all operations |
| Billing hooks | Usage attribution primitives by tenant/project | Scheduler usage metrics mapping (jobs/queues -> billable units) |

Rule: if functionality requires scheduler-specific branching inside core handlers, treat it as a platform defect and move it behind the app/operator boundary.

## Required IAM Model
Use action keys, not role-name checks, in handlers.

### Baseline action families
1. `scheduler.catalog.read`
2. `scheduler.instance.read`
3. `scheduler.instance.create`
4. `scheduler.instance.update`
5. `scheduler.instance.delete`
6. `scheduler.queue.submit`
7. `scheduler.queue.read`
8. `scheduler.queue.cancel`
9. `scheduler.node.read`
10. `scheduler.node.operate` (drain/cordon/uncordon/label)

### Scope rules
1. Tenant/project resources must enforce project ownership on every mutation.
2. Service accounts are same-project only.
3. Platform break-glass is allowed only on explicit admin endpoints and always audited.
4. Role display labels (`project_member`, `project_admin`, etc.) are UI concerns; permission keys are the enforcement contract.

## Artifact and Registry Model
Both source tiers are first-class:
1. Platform-shared registries/artifact sources (blessed global sources).
2. Tenant-scoped allowlisted sources (private enterprise registries/buckets).

Policy behavior:
1. Global hard-deny is non-overridable.
2. Tenant/project overlays can narrow, never broaden, beyond global deny.
3. Scheduler app deployment must resolve artifact sources through policy evaluation, not hardcoded host lists.

Direction:
1. Keep API neutral for source type (OCI and non-OCI blob/object sources).
2. Credential delivery remains short-lived and task-scoped.

## API Contract Direction
Scheduler app integration should use existing app-control-plane contracts:
1. Catalog/version publication for scheduler app entries.
2. Project entitlement enable/disable with policy overlays.
3. App instance create/read/delete for scheduler control-plane instances.

Required effective instance metadata:
1. `operating_mode`
2. `control_plane_scope`
3. `runtime_backend`

Allocation API remains scheduler-agnostic:
1. `allocations.scheduler_type` selects adapter path.
2. Scheduler references/metadata are stored as integration metadata.
3. Core allocation handlers do not embed Slurm-specific request semantics.

Clustered scheduler/operator apps must also follow the generic clustered-app model:
1. topology is app-level and tenant/project-admin controlled
2. physical node selection remains platform-owned
3. logical roles and mutable member lifecycle must not leak internal host-role assumptions into the public API

## Event and Observability Contract
Minimum required event flow:
1. `apps.instance.requested`
2. `apps.instance.running`
3. `apps.instance.failed`
4. `apps.instance.deleting`
5. `apps.instance.deleted`

Every scheduler app operation must include:
1. `correlation_id`
2. `org_id`
3. `project_id`
4. `app_slug`
5. `app_instance_id` (where applicable)

Triage path:
1. API/UI error envelope -> `correlation_id`
2. Loki lookup by `correlation_id`
3. Tempo trace lookup by `trace_id`
4. Event timeline reconstruction from `apps.instance.*`

## Slurm Pilot (Reference App)
Use Slurm as the first reference app to validate baseline completeness.

Initial operating-mode target:
1. `tenant_dedicated`
2. `control_plane_scope = project | tenant` depending on org policy and environment boundaries
3. project-owned app instances may attach to a project-scoped control plane for `dev/test/stage/prod` isolation or a tenant-scoped control plane for shared tenant schedulers

### Pilot phases
1. Register Slurm in app catalog + publish version.
2. Enable entitlement for test project(s).
3. Create Slurm app instance via app instance API.
4. Validate scheduler queue operations through permissioned endpoints.
5. Run upgrade/rollback and delete flows.

Lab baseline:
1. reference control-stack assets are deployed on `dev-lab-1`
2. worker-side join materials are deployed on `dev-gpu-1`
3. see `doc/operations/Slurm_Reference_Lab_Stack.md`

### Required acceptance criteria
1. No Slurm-specific branches in core allocation handlers.
2. All privileged actions produce audit logs with `correlation_id`.
3. Service account operator can manage only same-project scheduler instance.
4. Policy overlays correctly restrict regions/SKUs/artifact sources.
5. Full incident path is traceable across logs, traces, and events.

### Gap log template (capture during pilot)
1. Missing primitive in core.
2. Leaky scheduler-specific coupling in core.
3. Missing policy key or permission action.
4. Missing event for operational triage.
5. Missing billing attribution hook.

## Baseline for Any Future App Team
An app is ready for onboarding only if all are true:
1. Uses app catalog + entitlement + app instance contracts (no hidden DB coupling).
2. Uses service accounts for operator automation.
3. Passes tenant/project boundary checks under negative tests.
4. Emits required lifecycle events with correlation context.
5. Supports policy-governed artifact sources.
6. Defines upgrade, rollback, and delete behavior.
7. Provides a support runbook with correlation-first triage steps.

## Non-Negotiable Invariants
1. Internal and external apps use the same contracts.
2. No authz bypass for internal reference apps.
3. No runtime hard dependency on one scheduler vendor in control-plane API semantics.
4. No direct DB writes by app operators outside public contracts/events.

## Related Docs
1. `doc/architecture/App_Control_Plane_v1.md`
2. `doc/architecture/Clustered_App_Model_v1.md`
3. `doc/architecture/Service_Account_Model.md`
4. `doc/architecture/Role_and_Policy_Lifecycle_Model.md`
5. `doc/architecture/Allocation_Node_Placement_v1.md`
6. `doc/product/GPUaaS_vs_Armada_Bridge_Gap_Matrix.md`
7. `doc/architecture/App_Runtime_Operating_Modes_v1.md`
8. `doc/architecture/App_Tenant_Shared_Attachment_Model_v1.md`