# Storage Provider Capability Model v1

## Purpose

Storage starts with WEKA, but the product/API model must stay provider-neutral so VAST, DDN, NVMe pools, S3-compatible object stores, or future
backends can be added without redesigning the v3 Storage UI.

The control plane exposes capability metadata, not provider credentials or raw backend internals.

Storage ownership, sharing, and IAM are defined in
`doc/architecture/Storage_Sharing_and_IAM_Model_v1.md`. In short: storage is
project-owned by default, cross-project sharing is explicit grant state, and
provider credentials are derived from GPUaaS IAM rather than making WEKA the
primary user directory.

Provider integration lessons from the prior Scality IAM project are documented
in `doc/architecture/Storage_IAM_External_Reference_Lessons_v1.md`.

The current WEKA capability assessment is documented in
`doc/architecture/Storage_WEKA_Capability_Assessment_v1.md`.

RKE2-specific CSI, `StorageClass`, PVC, data-path, and exposure boundaries are
defined in `doc/architecture/RKE2_External_Storage_Model_v1.md`.

## Capability Shape

Every storage-backed bucket/read model may include a `provider` object:

| Field | Meaning |
|---|---|
| `backend_type` | Stable provider class: `weka`, `vast`, `ddn`, `nvme_pool`, `s3_compatible`, `local_dev`, or `unknown`. |
| `display_name` | User-safe provider label. Example: `WEKA`, `VAST`, `Local dev`. |
| `performance_tier` | Product tier: `standard`, `performance`, `capacity`, `archive`, or `unknown`. |
| `access_protocols` | Supported access surfaces such as `posix`, `wekafs`, `nfs`, `s3`, `smb`, `csi`. |
| `mount_modes` | Supported mount modes: `read_only`, `read_write`, `multi_writer`. |
| `multi_attach` | Whether multiple workloads may attach concurrently. |
| `encryption` | Whether the backend supports encryption at rest for this class. |
| `kms_managed` | Whether customer/project KMS integration is supported. |
| `snapshots` | Whether snapshots are supported. |
| `versioning` | Whether object/file versioning is supported. |
| `retention` | Whether retention policy enforcement is supported. |
| `quotas` | Whether quota accounting/enforcement is supported. |
| `region_constraints` | User-safe region hints. No IPs, cluster IDs, or backend hostnames. |
| `fabric_constraints` | User-safe fabric hints such as `ethernet`, `roce`, `infiniband`, or `same_rack_preferred`. |

## Initial Provider Profiles

| Backend | Initial posture |
|---|---|
| WEKA | Performance dual-protocol backend: WEKAFS/POSIX over WEKA's client data path for training and app mounts, plus S3 when bucket/object workflows are enabled. Multi-attach capable, quota capable, snapshot capable, KMS depends on deployment integration. |
| VAST | Capacity/performance backend, NFS/S3/SMB-capable, multi-attach capable, quota/snapshot/retention capable, KMS depends on deployment integration. |
| Local dev | Filesystem-backed development adapter. Not representative of production performance, KMS, retention, or multi-attach semantics. |

## API Rules

- Do not expose provider endpoints, cluster names, IPs, credentials, access keys, mount secrets, or internal volume IDs.
- Treat capability fields as hints for UI and policy decisions, not as proof that runtime attachment has completed.
- Runtime mount state belongs to workload/storage mount read models.
- Provider-specific implementation detail belongs behind service interfaces in `packages/services/storage`, not in UI copy.
- Provider IAM policy material should be generated from GPUaaS storage grants
  and service-account/user intent. UI/read models receive only the safe summary.

## Launch UX Rules

- Inline bucket creation should ask for intent: purpose, capacity, encryption, lifecycle, and access.
- Provider selection can remain implicit until multiple provider classes are production-ready.
- When provider selection becomes visible, show product-level tiers and capabilities, not raw vendor internals.

## Provider Placement Model

Provider backend is a placement decision, not a global platform constant.

A single region may eventually contain more than one storage provider or more
than one provider instance, for example WEKA for performance POSIX workloads,
VAST for capacity/object workflows, and NVMe-local pools for node-affine
scratch. The product model must support this without changing the v3 Storage
surface.

Long-term placement inputs:

- region and availability/fabric zone
- storage class or product tier
- requested protocol: `wekafs`, `posix`, `csi`, `s3`, `nfs`, `smb`
- purpose: workspace, dataset, checkpoint, artifact, generic
- requested quota and current provider capacity
- access mode and write policy
- node/app placement constraints
- tenant/project policy and entitlements
- provider health and drift status

The backend selected for a storage object must be persisted on the storage
record. Attachments inherit that provider unless an explicit migration workflow
moves the storage object.

Environment variables such as `GPUAAS_STORAGE_PROVIDER_BACKEND` are only
development or single-provider fallback defaults. They are acceptable for kind
and early platform-control validation, but they are not the production
selection model.

Future implementation should introduce an explicit storage placement service or
table, similar in spirit to compute placement, that returns a provider
assignment and user-safe capability summary for each create request.

### Current Implementation Boundary

The current v3 bucket create path already treats provider placement as a
per-bucket assignment:

- `platform_storage_buckets.provider_backend` stores the selected backend class.
- `platform_storage_buckets.provider_filesystem` stores the selected filesystem or
  namespace when the backend needs one, for example WEKA `gpuaas-kind-fs` or
  `gpuaas-fs`.
- `platform_storage_buckets.provider_instance_id` stores an operator-safe provider
  instance handle used by control-plane reconciliation. It is not returned to
  users.
- Storage attachments inherit provider assignment from the bucket record, not
  from a global runtime default.

In local/kind and early single-provider environments, the assignment resolver
may still use environment defaults:

```bash
GPUAAS_STORAGE_PROVIDER_BACKEND=weka
GPUAAS_STORAGE_WEKA_FILESYSTEM=gpuaas-kind-fs
GPUAAS_STORAGE_PROVIDER_INSTANCE_ID=weka-kind
```

For multi-provider development, use ordered assignment rules instead of a
single backend default:

```bash
GPUAAS_STORAGE_PROVIDER_ASSIGNMENTS='purpose=dataset,protocol=wekafs,backend=weka,filesystem=gpuaas-kind-fs,instance=weka-kind;purpose=generic,backend=vast,instance=vast-capacity'
```

These env rules are still a bootstrap mechanism. Production should move the
same assignment shape into database-backed provider inventory and capacity
placement when multiple regions/providers are operational.

## WEKA Dual-Protocol Posture

The first WEKA production integration should support two product intents:

| Intent | Primary protocol | Product surface |
|---|---|---|
| Training, notebooks, app runtimes, POSIX-heavy workloads | WEKAFS/POSIX | Storage mounts, workload/app launch, Kubernetes PV/PVC where applicable |
| Bucket/object workflows, direct external clients, SDK access | S3 | Buckets, direct credentials, object-style app integrations |

WEKAFS/POSIX is the primary high-performance workload data path and the first
implementation target. S3 is capability-gated for WEKA until S3 protocol
hosts/containers are available; enable and validate it later when we need
bucket/object semantics, external S3 clients, or app integrations that expect
S3.

Operationally this means:

- GPUaaS storage objects compile into project/workload mount intent.
- Runtime delivery is a WEKAFS mount through the WEKA client/CSI path or a
  host-prepared mount exposed to the workload, depending on the final infra
  design.
- The first WEKA production provider profile should advertise S3 as unavailable
  until provider health reports `active=true` and at least one S3 host.
- WEKA's DPDK-backed data path is an infrastructure concern. UI/read models
  should show provider-neutral capability hints such as `performance`,
  `multi-writer`, and `same-rack preferred`, not raw DPDK configuration.
- S3/IAM/STS is part of the WEKA integration for bucket/object workflows, but
  it is independent from POSIX-only workload mounts.
- Access enforcement for WEKAFS mounts should be owned by GPUaaS placement,
  mount generation, project grants, service-account/workload identity, and
  filesystem/prefix layout. Provider S3 policies are not the primary enforcement
  boundary for POSIX mounts.
- Access enforcement for S3 buckets should use GPUaaS grants compiled into WEKA
  S3 IAM/session/bucket policy, with short-lived credentials for humans and
  scoped service accounts for workloads/automation.

## Ownership

Storage/Network owns this model because correctness depends on fabric topology, mount semantics, KMS, quota enforcement, and provider operations.
Frontend can render these fields generically; backend/storage infra owns their interpretation.