# Admin IA v2 (Workflow-Oriented Admin Product Model)

Purpose:
- Redesign the admin-facing surfaces around operator workflows, not raw entity
  dumps.
- Make admin usable as a product for triage, lifecycle management,
  configuration, evidence, finance, and IAM.
- Reduce time-to-triage, time-to-safe-action, and time-to-root-cause across the
  admin experience.

Inputs:
- `packages/web/app/admin/**`
- `packages/web/app/ops/**`
- `doc/product/UX_Implementation_Spec.md`
- `doc/product/UX_Redesign_Implementation_Plan.md`
- `doc/product/IAM_UX_Information_Architecture_v1.md`
- `doc/product/Product_Surface_IA_and_Role_Model_v1.md`
- `doc/operations/Ops_Runbook_Architecture.md`
- `doc/operations/Observability_Baseline.md`
- `doc/api/openapi.draft.yaml`

## 1. Problem Statement

Current admin is still largely an entity dump:
- pages are organized as tables per backend entity, not by operator goal
- pages expose lots of data but weak prioritization
- mutation controls are often detached from the decision context they affect
- navigation assumes the operator already knows the data model
- cross-entity workflows require too much clicking and too much prior system
  knowledge

This was acceptable to bootstrap internal capability. It is not acceptable as a
usable platform product.

## 2. Design Principles

1. Summary first, action second, debug third.
2. Every admin screen must answer one primary question before exposing low-level detail.
3. Mutations must sit next to the object they change.
4. Canonical role and membership semantics remain unchanged; only presentation and workflow improve.
5. Observability pivots must preserve `correlation_id` and trace-based debugging.
6. Navigation groups must reflect operator mental models: triage, lifecycle,
   config, evidence, finance, IAM.
7. Tables are a tool inside workflows, not the workflow itself.
8. Admin landing pages must prioritize "what needs action now" over full
   inventory visibility.

## 3. Platform Product Model

Canonical boundary:
- per `Product_Surface_IA_and_Role_Model_v1.md`, `Platform` is the
  authenticated top-level shell group.
- inside `Platform`, `Ops`, `Lifecycle`, `Config`, `Evidence`, `Finance`,
  and `IAM` are peer local-navigation families.
- this document focuses on the privileged operator/platform side of that model.

## 3.1 Platform Families

Platform is split into workflow families:

1. Operate / Triage
- Goal: identify what needs intervention now
- Routes: `/ops/attention`, `/ops/telemetry`, `/platform/overview`

2. Lifecycle
- Goal: manage node, allocation, and decommission/onboarding workflows safely
- Routes: `/admin/nodes`, `/admin/nodes/onboardings/:id`,
  `/admin/nodes/decommissions/:id`, `/admin/allocations`

3. Config
- Goal: manage durable platform configuration
- Routes: `/admin/skus`, `/admin/os-images`, `/admin/quotas`, `/admin/maas`

4. Evidence
- Goal: investigate what happened and why
- Routes: `/admin/audit-logs`

5. Finance
- Goal: inspect money movement and intervene safely
- Routes: `/admin/payments/sessions`, finance views inside `/admin/users` and
  `/admin/allocations`

6. IAM
- Goal: understand who a user is, what access they have, and what can be
  changed safely
- Routes: `/admin/users`, `/admin/users/:user_id`

Each family should have its own page grammar, summary blocks, and action model.

## 4. Platform Local Navigation Model

### Primary admin groups

```
OPERATE
  Dashboard
  Attention
  Telemetry

LIFECYCLE
  Nodes
  Allocations

CONFIG
  SKUs
  OS Images
  Quotas
  MAAS

EVIDENCE
  Audit Logs

FINANCE
  Payment Sessions

IAM
  Users
```

Rules:
- Platform local nav must not appear as a flat list of entities.
- When inside a platform family, the local page copy and actions should reinforce
  the family goal.
- Cross-links between families should be explicit. Example: a node detail can
  link to evidence, allocation admin, or user detail where relevant.

## 5. Operate / Triage IA

### 5.1 Primary operator questions
1. Is the platform healthy right now?
2. Is there anything requiring intervention now?
3. Where do I click next to investigate?

### 5.2 Target page structure

### Section A: Decision Header
- freshness badge
- pause/resume refresh
- active incident count
- highest-severity signal summary

### Section B: Action Required
Only signals that currently need operator attention:
- DLQ backlog present
- auth failure spike
- worker failure spike
- release failure spike
- outbox relay degraded
- node metrics degraded

Each card must include:
- short diagnosis
- impact statement
- primary action link
- related runbook link

### Section C: Health Summary
Compact tiles only:
- queue health
- API health
- worker health
- auth health
- node fleet health

These are not the place for raw query text or large explanatory copy.

### Section D: Investigation Tools
- correlation / trace search
- observability shortcuts
- incident query pack
- recent log pivots

This section should be explicitly lower on the page than active incident signals.

### Section E: Fleet and Sample Detail
- node metrics sample
- control-plane counters
- supporting diagnostics

These remain useful, but they are secondary and should not dominate initial scan.

### 5.3 New required summary blocks

### Auth Health
Add a first-class auth block to `/admin/ops`:
- auth exchange failures (5m)
- token refresh failures (5m)
- 401/403 rate by critical auth routes
- top auth rejection code in current window

This is required because login failures often surface as `WARN` + `401`, not `ERROR` + `5xx`.

### Release Health
Add a dedicated release/reclaim block:
- release failures pending
- release retries in progress
- nodes stranded by failed release

### Billing Worker Health
Keep concise:
- worker ok/degraded
- failed accruals/min
- low-balance / depletion event rate

## 6. Lifecycle IA

### 6.1 Primary operator questions
1. What resources are currently moving through lifecycle transitions?
2. What is blocked, failed, or taking too long?
3. What action is safe to take next?

### 6.2 Nodes (`/admin/nodes`)

Target structure:

### Section A: Lifecycle summary
- nodes onboarding
- nodes active
- nodes draining
- nodes decommissioning
- nodes blocked / needing intervention

### Section B: Action-required queue
- failed onboarding
- stuck decommission
- unhealthy enrolled nodes
- capacity mismatch or agent offline cases

### Section C: Inventory table
The node table exists, but as a supporting workbench:
- search
- state filter
- site / pool / region filters
- primary action: Open
- secondary actions: Probe, Drain, Decommission, Copy ID, etc.

### Section D: Create / onboard
- onboarding entry point is secondary to the lifecycle queue
- creation form should live in a drawer or dedicated flow, not dominate the landing view

### 6.3 Allocations (`/admin/allocations`)

Target structure:

### Section A: Operational summary
- active allocations
- provisioning allocations
- releasing allocations
- failed / release_failed counts
- stranded-resource count

### Section B: Needs attention
- stuck provisioning
- release failures
- low-balance auto-release pending cases
- forced action candidates

### Section C: Allocation workbench
The table remains, but the page goal is intervention:
- newest-first default
- user / node / status / SKU filters
- provisioning filter includes derived in-flight slice provisioning state
- primary action: Open
- secondary actions: Force-release, Copy IDs, jump to user, jump to node

### Section D: Detail and task linkage
- transitional allocations should deep-link to their task/timeline
- admin does not need to infer task state from raw rows alone

### 6.4 Lifecycle details
- onboarding, decommission, and other long-running lifecycle operations should
  use the same task/timeline standard as user-facing async views
- actions on detail pages must be grouped into:
  - common lifecycle actions
  - recovery actions
  - dangerous actions

## 7. Config IA

Config pages are not operational dashboards. They are controlled setup surfaces.

### 7.1 Primary admin questions
1. What is the current effective configuration?
2. What can I safely change?
3. What rollout or validation state applies to that change?

### 7.2 Target pattern
- summary of effective state at top
- clear draft/published/effective distinctions where relevant
- validation and rollout safety before mutation
- table + form composition, not raw endless tables
- explicit linkage from config objects to where they affect operator workflows

Applies to:
- `/admin/skus`
- `/admin/os-images`
- `/admin/quotas`
- `/admin/maas`

## 8. Evidence IA

### 8.1 Primary admin questions
1. What happened?
2. Who did it?
3. What correlation or object should I inspect next?

### 8.2 Audit logs (`/admin/audit-logs`)

Audit is intentionally high density, but still needs workflow:
- newest-first
- strong filters up front
- export is secondary, not primary
- row detail should preserve context and allow pivots by actor, target,
  correlation, and action
- evidence pages should optimize for investigation, not CRUD
- truncated table cells are acceptable only if row drilldown exposes the full
  values and metadata
- investigation detail should be available as a side panel, drawer, or detail
  page without losing the current filtered table context

## 9. IAM IA

### 9.1 Primary admin questions
1. Who is this user?
2. What access do they currently have?
3. What can I change safely from here?
4. What recent actions/audit context exist for this user?

### 9.2 Directory page (`/admin/users`)

Target structure:

### Section A: Directory Header
- page title
- user count
- search
- role/platform-role filter

### Section B: Directory List
Each row/card should show:
- username
- platform role / base role
- tenant
- default project
- balance summary
- direct link to detail

### Section C: Create User
Move create-user into a secondary panel or drawer.
- do not make it the first dominant block on the page
- platform admin creation is a mutation tool, not the primary purpose of the directory

### 9.3 User detail page (`/admin/users/:user_id`)

Target structure:

### Section A: Identity
- username
- user id
- auth type / OIDC anchor presence
- platform role summary

### Section B: Access Summary
- tenant
- project
- tenant membership role
- project membership role
- platform roles

### Section C: Actions
Grouped by domain:
- access changes
- balance/refund actions
- platform role changes

### Section D: Related Activity
Links or embedded summaries for:
- allocations
- payment sessions
- audit logs
- app instances (when relevant)

### Section E: Dangerous / uncommon actions
Anything high-impact or rare should be visually separated from common admin actions.

### 9.4 Membership model presentation
The UI must make these distinctions explicit:
- platform role
- tenant membership
- project membership

Do not collapse them into a single generic “role” section.

## 10. Finance IA

### 10.1 Primary admin questions
1. What payment/balance issue needs intervention?
2. What user or allocation is affected?
3. What policy or audit trail applies?

### 10.2 Target pattern
- finance surfaces should not just dump sessions or balances
- they should expose policy context, recent outcome, and next safe action
- links between payment sessions, user detail, refunds, billing state, and
  allocations should be direct

## 11. Shared Admin Behavior Rules

1. Every admin landing page must start with summary and attention, not the raw
   table.
2. Every admin workbench must use the shared table contract:
   newest-first when recency matters, sortable headers, filter toolbar, primary
   row action, portal-based overflow menu, URL-restored filter state.
   Empty results must not hide the workbench controls or status filters.
3. Detail pages must preserve back-navigation to the prior filtered list state.
4. Mutations must be grouped by workflow domain and danger level.
5. Timelines and long-running operations use the shared task/async pattern, not
   custom ad hoc status blocks.
6. Debug pivots must preserve `correlation_id` and object identifiers.

## 12. Required Backend/Contract Support

No major IAM model rewrite is needed.

Potential additive backend improvements:
1. cleaner `/admin/ops/overview` auth-failure summary block
2. cleaner user detail response for effective access summary
3. optional recent-activity rollups on admin user detail
4. derived lifecycle/provisioning status in admin allocation listings so
   in-flight work is visible without requiring operators to infer from raw
   placement fields
5. summary endpoints for family landing pages where UI would otherwise need to
   over-aggregate raw entity lists

These should be additive and contract-first if needed; do not move interpretation complexity into the UI if the backend can summarize safely.

## 13. Delivery Sequence

1. Restructure admin navigation into workflow families
2. Rebuild admin landing pages around summary + attention + workbench
3. Rework lifecycle surfaces: nodes and allocations
4. Rework IAM surfaces: user directory and user detail
5. Rework config surfaces: SKUs, images, quotas, MAAS
6. Rework evidence and finance surfaces
7. Add backend summary/read-model improvements only where needed to keep UI simple

## 14. Acceptance Criteria

### Operate / Triage
- operator can identify active incident classes without scrolling into raw diagnostics
- auth failures are visible without querying Loki manually
- every degraded signal has a direct runbook/debug path

### Lifecycle
- operators can identify what is provisioning, stuck, releasing, or failed
  without reading raw entity dumps
- transitional resources link directly to their task/timeline and next actions

### IAM
- user detail page clearly separates identity, access, actions, and activity
- membership edits are understandable without prior IAM architecture context
- create-user flow no longer dominates the directory landing page

### Config / Evidence / Finance
- config pages emphasize safe change over raw table maintenance
- evidence pages optimize investigation flow over CRUD density
- finance pages connect money state to user/allocation impact and safe actions

## 15. Out of Scope

1. full custom-role authoring UX
2. tenant federation admin console polish
3. backend domain-model rewrites that are not needed to present a workflow-first admin UI
4. scheduler product UX beyond placeholder/baseline routing