# Allocation Storage Model v1

Purpose:
- Define the first storage model for GPU allocations.
- Clarify what persists after allocation release.
- Separate raw allocation-local disk from attached persistent storage.
- Establish the first product contract for shared storage before UI or API mutation surfaces grow around it.

Inputs:
- `doc/product/Allocation_Experience_Gaps_v1.md`
- `doc/product/ux-mocks/user-storage.md`
- `doc/api/openapi.draft.yaml`
- `packages/services/storage/`
- `packages/web/app/storage/`

Related:
- `doc/product/Managed_Runtime_Bundles_v1.md`
- `doc/architecture/Allocation_Node_Placement_v1.md`
- `doc/architecture/RKE2_External_Storage_Model_v1.md`

---

## 1. Executive Summary

The platform currently has:
- project-scoped object/file storage surfaces
- allocation-local machine disks

What it does not yet have is a clear user-facing attachment model that answers:
- what data survives allocation release
- what storage can be attached to a live allocation
- what can be shared between many allocations

The first version should be conservative:

1. **Allocation-local disk is ephemeral.**
   Users must not treat the node-local filesystem as durable after release.

2. **Persistent storage is an explicit attachable object.**
   If users want persistence, they must attach storage deliberately.

3. **The first shared storage model is project-scoped filesystem storage.**
   It may be attached to multiple allocations in the same project.

4. **The first write model is shared read/write within a project, but only for storage backends that support concurrent attachment semantics.**
   Backend capability must be explicit, not assumed.

5. **Storage attach should be allowed both during allocation creation and after allocation creation.**
   Restricting it to create-time only would be too limiting for real workflows.

---

## 2. Problem Statement

GPU users need to know:
- where datasets live
- where outputs persist
- how teammates can see the same files
- whether releasing compute destroys their work

Right now the platform is ambiguous because:
- allocation-local disk exists and is convenient
- project storage surfaces exist
- but there is no durable attachment story connecting them

That leads to accidental misuse:
- users keep important data only on node-local disk
- users do not know whether shared filesystems can be used across many allocations
- the product cannot clearly explain what survives release

---

## 3. Storage Object Types

The product should distinguish three storage classes.

### 3.1 Allocation-local disk

Definition:
- disk that is part of the compute node or ephemeral allocation image

Characteristics:
- automatically available on the allocation
- fast and simple
- **not durable across release**
- may be wiped, reimaged, or reused by the platform

User contract:
- suitable for scratch space and temporary work
- not suitable as the only copy of important data

### 3.2 Project persistent storage attachment

Definition:
- a project-owned storage object that can be attached into one or more allocations

Characteristics:
- explicit lifecycle independent of the allocation
- persists after allocation release
- attachable/detachable through the platform
- access governed by project membership and storage attachment rules

User contract:
- use this for datasets, checkpoints, model outputs, and durable project files

### 3.3 Object/file storage namespace

Definition:
- project-scoped storage namespace already exposed through the storage UI/API

Characteristics:
- persistent
- suitable for upload/download and artifact staging
- may or may not be the same underlying substrate as attachable filesystem storage

Product note:
- the first attachment-capable storage offering may be implemented on top of the same backend or a different one, but the user contract should make the difference explicit

---

## 4. First-Version Product Decision

### 4.1 What persists

Persist after allocation release:
- project persistent storage attachments
- project object/file storage

Do not persist after allocation release:
- allocation-local node disk
- arbitrary local filesystem state on the allocation unless copied to persistent storage

### 4.2 First attachment type

The first attachment type should be:
- **project-scoped shared filesystem storage**

Not the first attachment type:
- raw block volume presented directly to the user

Reason:
- shared filesystem maps better to common AI/ML workflows
- easier collaboration story
- less user-side filesystem setup burden

### 4.3 First attachment scope

The first attachment scope should be:
- attachable to many allocations in the same project

Reason:
- single-attach-only would be too limiting for the collaboration and multi-node cases users will expect
- datasets and shared outputs are commonly consumed by multiple active allocations

### 4.4 Backend capability must be explicit

The platform must not assume every backend can safely support:
- multi-attach
- concurrent write access
- POSIX-like semantics
- good performance under many active mounts

So the product model should include backend capability flags such as:
- `supports_multi_attach`
- `supports_shared_rw`
- `supports_concurrent_mounts`
- `max_recommended_active_attachments`

This is where Weka or any future backend must be validated rather than assumed.

---

## 5. Attach Timing

### 5.1 Allocation create

Users should be able to attach persistent storage during allocation creation.

This supports:
- dataset-ready startup
- notebooks/workspaces that need storage immediately
- fewer post-create manual steps

### 5.2 After allocation create

Users should also be able to attach or detach storage after allocation creation.

This supports:
- late-bound datasets
- switching project workspaces
- attaching shared storage to a long-lived allocation later

### 5.3 Why not create-time only

Create-time-only attach would be a product mistake because:
- GPU workflows evolve after the machine is already running
- collaborators may join later
- data may only become available after the allocation is already active

So the first design should support both.

---

## 6. Access and Collaboration Model

The first access boundary should be:
- project-scoped

That means:
- if an allocation and a storage object are in the same project, authorized project members can use the storage according to role and attachment policy

Open question for later implementation:
- whether allocation owner alone may attach storage
- or whether project admins and other roles may also attach/detach project storage

Recommended first rule:
- allocation owner and project admins may attach/detach project storage

---

## 7. Mount Model

The platform should provide a predictable mount location pattern.

Recommended pattern:
- `/mnt/gpuaas/<storage_name>`

or, if names are unstable:
- `/mnt/gpuaas/<storage_id>`

The user-facing UI should still show friendly names.

The mount path should be platform-owned, not random per manual user choice, because:
- supportability is better
- automation is safer
- restart/reconcile behavior is easier to reason about

---

## 8. Lifecycle Semantics

### 8.1 Allocation restart

Attached persistent storage should remain attached across restart.

### 8.2 Allocation release

Allocation release should:
- remove the live mount from the released allocation
- keep the storage object itself intact

### 8.3 Storage delete

Deleting persistent storage should be a separate explicit action.

It should not be implied by:
- allocation release
- allocation restart
- allocation recreation

### 8.4 Multi-allocation use

If a storage backend is marked multi-attach capable:
- many active allocations in the same project may mount it at once

If not:
- the platform must enforce the restriction explicitly in API/UI

---

## 9. UX Direction

Allocation create flow should show:
- ephemeral local disk by default
- optional persistent storage attachments
- clear persistence note

Allocation detail should show:
- attached storage objects
- mount paths
- attachment state
- attach/detach actions

Storage list/detail should show:
- whether the object is persistent
- whether it supports multi-attach
- which allocations currently use it

The UI must make the persistence boundary obvious:
- local disk is scratch
- attached project storage persists

---

## 10. Recommended First-Slice API Shape

The first contract likely needs explicit storage attachment objects rather than implicit fields on allocation alone.

Example conceptual objects:
- `storage_attachment`
- `allocation_storage_attachment`

First required behaviors:
- list project storage objects
- attach a storage object to an allocation
- detach a storage object from an allocation
- list current attachments on allocation detail

This should be modeled separately from the existing generic storage upload/download endpoints.

---

## 11. Out of Scope for First Slice

Not in the first storage attachment slice:
- snapshotting and cloning semantics
- cross-project sharing
- fine-grained ACLs inside the filesystem
- storage class tiering UX
- generic block-device expert mode
- guaranteeing Weka-specific semantics before backend validation is complete

---

## 12. Open Questions

- Can the first backend safely support many concurrent mounts from multiple active allocations?
- Should the first attachment mode be read/write for all attached allocations, or should read-only mounts also be supported immediately?
- What quota model should exist for project persistent storage?
- How should storage usage be metered and shown alongside allocation cost?
- Should detached storage remain warm and fast to reattach, or can it have a slower cold attach path?
- For self-managed RKE2 workloads, what is the first supported CSI-backed `StorageClass` story?
- If Weka is selected, what RWX/multi-attach guarantees are supported for many allocations and many Kubernetes app instances in the same project?
- Should Kubernetes storage be exposed only inside the RKE2 cluster as PVCs, or should GPUaaS also expose a direct external access path outside the cluster?
- What host/storage network should carry high-throughput storage traffic so storage does not accidentally depend on the default RKE2 pod overlay?
- Which storage configuration is platform-owned versus tenant-owned: CSI driver installation, `StorageClass` creation, PVC templates, Secrets, and mount paths?

### 12.1 RKE2-specific open item

The current self-managed RKE2 validation slice uses the default RKE2 network stack:
- CNI: `rke2-canal`
- pod network: `10.42.0.0/16`
- overlay backend: VXLAN
- Kubernetes `StorageClass`: none installed by GPUaaS today

Open item:
- do not expose Kubernetes storage attach UX until infra confirms the first backend, CSI driver support, multi-attach behavior, and external exposure boundary.

Expected direction:
- GPUaaS should model storage as a platform attachment object first, then translate it into RKE2 CSI configuration and PVCs for Kubernetes workloads.
- RKE2 app bundles should consume storage through PVCs and platform-defined mount paths, not raw host-path assumptions.
- If the backend is Weka or another high-performance shared filesystem, the data path should use the intended host/storage network rather than relying on the pod VXLAN overlay.

Detailed RKE2 external storage, `StorageClass`, CSI ownership, PVC mapping, and
exposure boundaries are defined in
`doc/architecture/RKE2_External_Storage_Model_v1.md`.

---

## 13. Decision Summary

The first storage model should be:
- explicit
- project-scoped
- persistent
- attachable at create time and later
- multi-allocation capable only when backend capability says it is safe

And the product must clearly teach users:
- allocation-local disk is scratch
- attached project storage is durable

That is the minimum clean model before deeper storage backend work or Weka-specific validation begins.