# UX Implementation Spec v0.2 (Coding Baseline)

Purpose:
- Translate product UX journeys into implementation-ready UI specifications.
- Enforce API-first UX where every interaction maps to OpenAPI/AsyncAPI contracts.

Inputs:
- `doc/product/PRD.md`
- `doc/product/UX_Journeys.md`
- `doc/product/UX_Redesign_Implementation_Plan.md`
- `doc/product/Admin_Ops_and_User_IA_v1.md`
- `doc/product/Brand_Guidelines.md`
- `doc/product/ux-mocks/*`
- `doc/api/openapi.draft.yaml`
- `doc/api/asyncapi.draft.yaml`
- `doc/governance/UX_Contract_Gate.md`

Normative UX references for the current redesign:
- `doc/product/ux-mocks/product-redesign-v3/index.html`
- `doc/product/ux-mocks/product-redesign-v3/README.md`
- `doc/product/UX_Redesign_Implementation_Plan.md`
- `doc/product/V3_Admin_Workbench_Model_v1.md`

Interpretation rule:
- The route model, page-family taxonomy, shared component vocabulary, and
  interaction standards in the redesign plan are normative for new pages and
  major page refactors.
- The mock HTML is a layout and visual reference, not the behavioral source of
  truth.
- Where this spec and the redesign plan disagree, the redesign plan wins.
- For `/platform/*` family landings, admin workbenches, focused
  details, operation drawers, and acknowledgement/suppression behavior,
  `doc/product/V3_Admin_Workbench_Model_v1.md` is the governing page-family
  contract.

Mock-to-production copy boundary:
- Descriptions in production pages serve the user's task, not the reviewer's
  understanding of the mock.
- Remove permanent copy that restates the title or explains design rationale.
- Put non-obvious context behind an info/help affordance.
- Put first-time guidance in empty states, decision prose in confirmations, and
  longer procedures in runbooks or docs.
- Permanent section subtitles are a warning sign that the page is still talking
  to design reviewers instead of acting like an operator tool.

Page-growth rule:
- For any major new page, major page refactor, new top-level/local navigation
  area, or multi-step workflow, do not start broad implementation until the
  following are explicit:
  - role or persona using the surface;
  - scope of control (`self`, `project`, `tenant`, `node`, `fleet`,
    `platform`, or `public`);
  - primary user intent;
  - page family;
  - shell/local-navigation placement;
  - supporting mock/spec updates where the current design is not already
    settled.
- This rule exists to prevent repeated redesign loops as the product grows. If a
  surface does not fit cleanly into the current IA, update the IA and mock
  package first instead of implementing another one-off page.

Journey-readiness rule:
- Before implementing or declaring a user-facing workflow ready for UAT, define
  the persona, canonical entry point, prerequisite create/select path, success
  signal, cleanup path, blocked-state copy, and owning layer for each failure
  class.
- Missing prerequisites must be recoverable from the workflow or from an
  explicit linked product surface with return context preserved. A seeded
  happy-path object is not a substitute for a user-visible create/select path.
- If UAT discovers that users need a hidden script, SQL query, API-only call, or
  copied token to continue, treat that as a UX readiness failure and update this
  spec, the journey, or the page mock before broadening the implementation.

## 1. Route and Screen Inventory

### User-facing
- `/auth/login` (tabbed sign-in entry: Work account + Personal account)
- `/` (Control Room)
- `/workloads` (operational workload list)
- `/workloads/:workload_id` (workload detail + actions)
- `/infra/catalog` (SKU/catalog browse)
- `/infra/allocations` (raw allocations workbench)
- `/infra/allocations/:allocation_id` (allocation detail / redirect surface)
- `/launch` (full-page launch wizard)
- `/tasks/:task_id` (async task/provisioning detail)
- `/billing` (balance, usage table, CSV export)
- `/infra/storage` (list/upload/download/mkdir/rename/delete)
- `/access/projects` (project directory and lifecycle)
- `/access/projects/:project_id` (project detail and membership/access context)
- `/access/team`
- `/access/ssh-keys`
- `/access/api-keys`
- `/access/policies`
- `/developer/docs` (developer docs center)
- `/developer/downloads` (downloads and release artifacts)
- `/developer/api-docs/swagger` (embedded API reference)
- `/developer/api-docs/redoc` (embedded API reference)

### Admin-facing
- `/admin/overview`
- `/ops/attention`
- `/ops/telemetry`
- `/admin/users`
- `/admin/users/:user_id`
- `/admin/nodes`
- `/admin/skus`
- `/admin/os-images`
- `/admin/quotas`
- `/admin/maas`
- `/admin/allocations`
- `/admin/audit-logs`
- `/admin/payments/sessions`

Admin IA follow-up:
- `doc/product/Admin_Ops_and_User_IA_v1.md`

## 2. Contract Traceability (Critical Actions)

### Auth
- Work authorize start: `GET /api/v1/auth/oidc/authorize` (optional advisory `tenant_hint`/`identity_hint`)
- Work exchange: `POST /api/v1/auth/oidc/exchange`
- Personal sign-in: `POST /api/v1/auth/personal/login`
- Personal sign-up: `POST /api/v1/auth/personal/signup`
- Session renew: `POST /api/v1/auth/token/refresh`
- Logout: `POST /api/v1/auth/logout`

### User SSH Keys
- List keys: `GET /api/v1/ssh-keys`
- Add key: `POST /api/v1/ssh-keys`
- Remove key: `DELETE /api/v1/ssh-keys/{key_id}`

### Allocations
- Create: `POST /api/v1/allocations`
- List: `GET /api/v1/allocations`
- Detail: `GET /api/v1/allocations/{allocation_id}`
- Release: `POST /api/v1/allocations/{allocation_id}/release`
- Terminal token: `POST /api/v1/allocations/{allocation_id}/terminal-token`
- Terminal realtime: `WS /ws/terminal/{allocation_id}`

### Tasks / Async surfaces
- Task detail: contract surface required for `/tasks/:task_id`
- Allocation/event timeline pagination must use backend cursor-based pagination
  where available; UI must not synthesize fake client-only pages for server-side
  event streams

### Projects
- List: `GET /api/v1/projects`
- Create: `POST /api/v1/projects`
- Update: `PATCH /api/v1/projects/{project_id}`
- Delete: `DELETE /api/v1/projects/{project_id}`
- Set default: `POST /api/v1/projects/{project_id}/set-default`

Department UX gate:
- Department attribution is mandatory in the backend model, but the default
  department must remain invisible for small tenants unless there are multiple
  departments or an explicit department capability is enabled.
- Project creation must not require a department selection for the default
  one-department tenant path. Omitting `department_id` delegates assignment to
  the API default-department rule.
- Department selectors and filters appear only when multiple departments exist
  or department context capabilities are enabled.
- Department admin, budget, policy-inheritance, and approval controls remain
  hidden unless their specific capability ships. A visible department selector
  must not imply those capabilities are available.

### Billing + Payments
- Balance: `GET /api/v1/billing/balance`
- Usage: `GET /api/v1/billing/usage` with optional `allocation_id`, `from`, `to`, `sort`
- Usage CSV: `GET /api/v1/billing/usage/csv` with optional `allocation_id`, `from`, `to`, `sort`
- Checkout session: `POST /api/v1/payments/checkout-session`
- Customer portal: `POST /api/v1/payments/customer-portal-session`

### Storage
- `GET /api/v1/storage/list`
- `GET /api/v1/storage/download`
- `PUT /api/v1/storage/upload`
- `POST /api/v1/storage/mkdir`
- `POST /api/v1/storage/rename`
- `DELETE /api/v1/storage/delete`

### Admin
- Overview: `GET /api/v1/admin/overview`
- Ops (pre-coding contract required): `GET /api/v1/admin/ops/overview`
- Runbook catalog (pre-coding contract required): `GET /api/v1/admin/runbooks`, `GET /api/v1/admin/runbooks/{runbook_id}`
- Users list/detail/create/balance/refunds
- Nodes list/create/probe/delete
- Allocations list/force-release
- Audit logs list/export
- Payments sessions: `GET /api/v1/admin/payments/sessions`

Admin role boundary:
- `/admin/*` in current MVP is platform-admin scope (`users.role=admin`).
- Tenant-admin features must use tenant membership roles (`tenant_memberships.role`) and tenant-scoped routes/surfaces.

## 3. State Matrix (Required for Every Screen)
- `loading`: skeleton/spinner with timeout fallback message.
- `empty`: explicit empty state copy + next action CTA.
- `error`: structured error display using `ErrorResponse.code/message`; include correlation id if present.
- `success`: action-confirmation toast/banner.
- `restricted`: role/tenant denial state for `403`.
- `rate_limited`: `429` with `Retry-After` countdown and retry action.

## 4. Shared Interaction Contract

This section is normative for all list, detail, task, and navigation surfaces.
Do not introduce page-specific behavior where a shared rule exists here.

### 4.1 List and Workbench pages

Applies to:
- `/workloads`
- `/infra/allocations`
- `/admin/allocations`
- `/admin/users`
- `/admin/nodes`
- `/admin/audit-logs`
- billing/workload cost tables and any future evidence/config tables

Rules:
- Every list page uses a shared `WorkbenchPage` pattern:
  - `PageHeader`
  - summary cards when relevant
  - search/filter toolbar
  - segmented status filter
  - data table or mobile card fallback
- Workbench controls remain visible even when the current result set is empty.
  An empty result is rendered inside the content region, not as a full-page
  replacement that hides filters or view controls.
- Sort belongs on column headers, not in ad hoc page-specific menus.
- Filter controls belong in the toolbar above the table, not embedded in
  headers.
- Primary row action is visible in-row as a button, usually `Open`.
- Secondary row actions live in a shared `MoreActions` menu.
- Dangerous actions in `MoreActions` are visually separated and require
  confirmation.
- Pagination is server-driven where the backend supports cursors. The UI must
  preserve sort/filter/search state when paging.
- Empty, loading, restricted, and error states use shared system components.

State-result rules:
- `no active items` must not prevent the user from switching to `All`,
  `Provisioning`, `Released`, `Failed`, or other relevant statuses.
- Workbench pages that group resources by status (for example workloads) must
  still provide a stable way to inspect non-active items and history.
- Mutations that change a resource status must move the row predictably to the
  destination status grouping/filter. They must not appear to vanish without the
  UI explaining which status/view now contains the row.

### 4.2 Data table behavior

Rules:
- Header labels are uppercase, compact, and muted.
- Sortable columns show one active sort at a time with direction indicator.
- Default sort is newest-first for time-based operational lists unless a screen
  explicitly documents otherwise.
- Status filters use a segmented control with stable labels. Do not rename the
  same status on different pages.
- Search input text and filter selections persist in the URL where practical so
  deep links and back-navigation restore state.
- Dropdown menus must render in a portal and must not be clipped by table
  scroll containers.
- Mobile layout must collapse to cards while preserving the same actions and
  statuses.
- Long identifiers may be visually truncated in the table, but every truncated
  value must have at least one full-detail path: copy action, tooltip, expanded
  cell, or detail drilldown.

Evidence/investigation rules:
- Audit and evidence tables are scan-oriented, but scanning alone is not
  sufficient. Each row must support a detail view via row click, `Open`, side
  panel, drawer, or detail page.
- The evidence detail view must surface full actor, target, correlation ID,
  timestamps, result, and raw metadata payload where allowed.
- Investigation pages should separate compact table columns from full-fidelity
  event detail instead of trying to fit all data into the row itself.

### 4.3 Detail pages

Rules:
- All resource detail pages use the shared `DetailPage` pattern:
  - `PageHeader`
  - facts row
  - tabs
  - two-column body where appropriate
- Resource ID and human-readable name must be copyable.
- Tabs must be ordered consistently for the same resource class.
- The "back" action returns to the previous list state when navigated from a
  list, including filters and pagination state.

### 4.4 Task and async pages

Rules:
- Long-running operations use `/tasks/:task_id` as the first-class status view.
- Transitional resources may link to their active task, but task progress
  should not be hidden inside detail pages.
- Timelines are newest-first for event history and use server-backed pagination
  for long event streams.
- Progress copy must be explicit about safe close/reopen behavior.

### 4.5 Navigation and state restoration

Rules:
- List-to-detail-to-list navigation must restore the prior list view state.
- Primary navigation labels, ordering, and grouping follow the page-family model
  in `UX_Redesign_Implementation_Plan.md`.
- Launch is a full-page flow without the sidebar and should resume at the last
  valid step when re-opened in the same session if draft state exists.

Project context rules:
- Workspace/project selection in the top context bar is not sufficient as the
  only project UX.
- Users need a first-class project management surface to create, rename, review,
  set default, and delete projects where policy allows.
- Project-switcher actions should deep-link into `/access/projects` rather than
  becoming an overloaded management UI inside the dropdown.

Developer/docs rules:
- Developer docs must be a first-class routed product surface, not a placeholder
  landing page.
- API reference pages should render inside the product shell or a shell-aligned
  embedded frame by default, not as a separate browser-window flow.
- External doc destinations may still exist, but they are secondary and should
  be clearly marked as leaving the product experience.
- Downloads, guides, API reference, examples, and app-developer onboarding
  should feel like one connected developer area.

### 4.6 Admin experience

Rules:
- Admin pages must follow workflow-family grouping, not a flat entity dump.
- Admin landing pages start with summary and action-required context before raw
  tables.
- Lifecycle, IAM, config, evidence, and finance pages may each have different
  density, but they must still follow the shared table/detail/task contracts.
- The admin-specific workflow model in `Admin_Ops_and_User_IA_v1.md` is
  normative for admin route refactors.

### 4.7 App shell extensibility

Rules:
- App-specific UI extends the shared product shell rather than replacing it.
- App flows inherit the same page header, toolbar, workbench, detail, task, and
  navigation contracts as core product pages unless a documented extension point
  allows a controlled variance.
- App-related developer surfaces should connect cleanly to docs, downloads, API
  references, service accounts, access credentials, and project context.
- Placeholder developer/app routes are product gaps and should be tracked as
  missing product surfaces, not left as acceptable steady-state behavior.
- App workload pages must follow the same state/filter/result behavior as core
  allocation pages. A failed or stopped app should move into a predictable
  status bucket, not disappear because the current page only renders active
  groups with no status-switch affordance.

## 5. Async UX Rules

### Provisioning lifecycle UI
- After create allocation, UI immediately shows `requested` state row/card.
- Poll `GET /api/v1/allocations/{allocation_id}` until terminal state or user leaves page.
- Show explicit transitions: `requested -> provisioning -> active`.
- If the backend exposes a task identifier, route the user to `/tasks/:task_id`
  while also surfacing the in-flight workload/allocation in workbench views.
- On `failed`, render `failure_reason` prominently with retry path.
- Card state spec:
  - `requested`: spinner + text "Request accepted, waiting for worker".
  - `provisioning`: spinner + text "Configuring node and SSH access".
  - `active`: green status chip + full action row enabled.
  - `failed`: red status chip + `failure_reason` + retry CTA.
- Long-running hint: if allocation remains `requested` or `provisioning` for 90 seconds, show non-blocking "Taking longer than expected" notice with support hint.
- Navigation behavior: provisioning cards must persist across route changes and page reload via server state polling.

### Release lifecycle UI
- Release action returns accepted async state.
- UI updates to `releasing` immediately and continues polling.
- Final states: `released` or `failed` (with reason and admin/support hint).
- `release_failed` UX is mandatory:
  - Card status chip `release_failed` with clear message from `release_failed_reason`.
  - Show explicit billing note: "Billing has stopped for this allocation."
  - Show retry action (`POST /api/v1/allocations/{allocation_id}/release`) and admin escalation guidance.

### Terminal UX
- UI requests terminal token immediately before opening WS.
- Token is single-use and short-lived; failed connect requires remint.
- UI handles `TerminalControl` frames (`session_ready`, `session_error`, `session_closed`) in addition to terminal stream output.
- Disconnect states: retry (remint token) and close.
- Allocation card action row layout is fixed: `Metrics` / `Console` / `Key` / `Release` icon buttons in one row.
- Active allocation views must show copy-ready SSH command:
  - `ssh <username_on_node>@<host> -p <port> -i <downloaded_key_path>`
- Key action should route to user key management (public keys) and SSH command help.

### Notification UX
- In-app notification payload requires: `id`, `occurred_at`, `severity`, `type`, `title`, `message`.
- Optional `action_url` maps to deep links (billing page, allocation detail, admin view).
- Notification shell requirements:
  - top-nav bell icon with unread badge.
  - panel lists latest notifications with severity color and timestamp.
  - clicking an item follows `action_url` when present.
- notifications persist in panel until dismissed/expired by retention policy.

### Billing + Payments UX
- Low balance must be persistent, not toast-only:
  - top-level banner on all authenticated pages when below threshold.
  - billing page balance card changes state (`ok`/`warning`/`depleted`).
- Stripe return handling must be explicit:
  - billing route (or `/billing/return`) handles return query params and renders success/cancel/failure status.
  - on return, refresh balance and usage queries.
- Provision modal must show cost estimate before confirmation:
  - `<node_count> x <gpus_per_node> x <price_per_gpu_per_hour> = <estimated_hourly_total>`.

### Admin Dashboard UX
- Admin overview auto-refresh every 5 seconds by default.
- Show "last updated" relative time and expose pause/resume auto-refresh toggle.
- Admin ops must expose contextual runbook links for degraded states using stable `runbook_id` mappings.
- Runbook panel design and metadata flow follow `doc/operations/Ops_Runbook_Architecture.md`.

## 6. Accessibility Baseline
- Keyboard-only navigation across all interactive controls.
- Modal focus trap and escape behavior.
- `aria-label` for icon-only actions.
- Error messages announced via accessible live region.
- Contrast meets WCAG AA for body text and actionable controls.

## 7. Responsive Baseline
- Breakpoints: mobile, tablet, desktop.
- Data tables provide mobile-safe condensed/card fallback.
- Critical actions remain reachable without hover interactions.
- Theme behavior follows the redesign token system and product direction; avoid
  page-specific theme overrides that break cross-page consistency.

## 8. UX Completion Checklist (Pre-coding Gate)
- [ ] Screen inventory finalized with ownership.
- [ ] Low-fidelity screen mocks completed under `doc/product/ux-mocks/`.
- [ ] Contract traceability complete for all critical actions.
- [ ] State matrix implemented in UI component spec.
- [ ] Async lifecycle UX documented and reviewed.
- [ ] Accessibility baseline accepted.
- [ ] Responsive baseline accepted.

## 9. Shared UX Foundation Packages (Build Before Feature Screens)

### API and Session Layer
- `packages/web/src/lib/api/client.ts`: typed contract client wrapper, auth header handling, correlation-id forwarding.
- `packages/web/src/lib/api/errors.ts`: map backend `ErrorResponse` -> UX-safe messages/actions.
- `packages/web/src/lib/session/*`: user/session/role state and protected route guard helpers.
- `packages/web/src/lib/api/rateLimit.ts`: shared `429` handling that reads `Retry-After`, exposes countdown state, and standard retry helpers.

### Query and Data Fetching Layer
- `packages/web/src/lib/query/keys.ts`: stable query key conventions.
- `packages/web/src/lib/query/defaults.ts`: retry/stale/caching defaults by data class.
- `packages/web/src/lib/query/pagination.ts`: cursor/page_size helpers.

### Shared UI System Layer
- `packages/web/src/components/system/LoadingState.tsx`
- `packages/web/src/components/system/EmptyState.tsx`
- `packages/web/src/components/system/ErrorState.tsx`
- `packages/web/src/components/system/RestrictedState.tsx`
- `packages/web/src/components/system/RateLimitedState.tsx`
- `packages/web/src/components/system/ConfirmActionModal.tsx`
- `packages/web/src/components/system/AlertModal.tsx` (severity variants: `info`/`success`/`warning`/`error`)
- `packages/web/src/components/system/DeletionModal.tsx` (destructive confirm with optional reason input)
- `packages/web/src/components/system/CursorTable.tsx`
- `packages/web/src/components/system/NotificationBell.tsx`
- `packages/web/src/components/system/NotificationPanel.tsx`
- `packages/web/src/components/product/PageHeader.tsx`
- `packages/web/src/components/product/WorkbenchPage.tsx`
- `packages/web/src/components/product/DetailPage.tsx`
- `packages/web/src/components/product/DataTable.tsx`
- `packages/web/src/components/product/MoreActions.tsx`
- `packages/web/src/components/product/StatusChip.tsx`
- `packages/web/src/components/product/Timeline.tsx`

### Accessibility Utilities
- `packages/web/src/components/a11y/FocusTrap.tsx`
- `packages/web/src/components/a11y/LiveRegion.tsx`
- `packages/web/src/components/a11y/useKeyboardShortcut.ts`

### Styling and Tokens
- `packages/web/src/styles/tokens.css` (or equivalent token source)
- `packages/web/src/styles/theme.css`

## 10. Vertical Slice UX Delivery Order (With API)

1. Auth + Profile slice
- UX: tabbed sign-in (`Work account` vs `Personal account`), redirect/exchange/logout, protected layout, expired-session UX.
- API: auth endpoints + `GET /users/me`.
  - Include first-run onboarding redirect when balance is zero and no allocations exist.

2. Control Room + Catalog + Workloads read slice
- UX: dashboard, SKU discovery, operational workload listing/detail.
- API: `GET /skus`, `GET /nodes`, allocation read endpoints.
  - Include explicit sold-out SKU card state (`Unavailable`, disabled provision CTA).

3. Launch + Tasks + Provision/Release + Terminal slice
- UX: stepped launch wizard, provisioning task progression, release progression,
  terminal connect/reconnect states.
- API: allocation create/release + terminal token + terminal WS.

4. Billing + Payments slice
- UX: balance/usage, CSV export, checkout/portal redirect outcomes.
- API: billing + payments endpoints.

5. Admin slice
- UX: users/nodes/allocations/audit admin pages with filters/actions.
- API: admin endpoints.

6. Storage slice
- UX: file explorer CRUD and safety/confirmation patterns.
- API: storage endpoints.

Delivery rule:
- For multi-service screens, implement panel-level degraded states and progressive enablement instead of blocking the whole page.
