# GPUaaS Program Execution Memory - 2026-06-16

## Goal

Finish the remaining high-value GPUaaS closeout work as one program, not as
isolated tasks:

1. Complete MFA as a product feature with user/admin/operator flows documented,
   validated through UAT, CI passing, and dev deployment evidence.
2. Stand up the documentation portal as the product-facing source for product,
   architecture, security, developer, SDK/app, IAM, infra, operations, and
   user-guide audiences.
3. Make staging a repeatable two-node production-like environment driven by
   scripts and config.
4. Create a fresh demo environment with supported apps and supporting docs/UAT.

## Working Decision

Portal iteration one will be a static Docusaurus site published through the
existing Cloudflare path. No auth in the first iteration. Persona-based
publication tracks remain in the content model so internal/customer/public
separation can be added later without redoing the IA.

Product naming direction:

- Product-facing name: `AI Cloud`
- Internal repo/code/history label: `GPUaaS` until a controlled sweep updates
  docs, portal copy, contracts, and operating-model references

Rule:

- Do not introduce new product-facing `GPUaaS` branding in UX, portal landing
  pages, release communication, or external handoff docs.
- Keep `GPUaaS` only where it is still the literal repo/module/domain term or
  where historical evidence would become misleading if rewritten casually.
- Treat the rename as a deliberate migration batch, not ad hoc copy edits.

Selected hostname:

- `docs.aicloud.core42.dev`

Rationale:

- cleaner than `aicloud-docs.core42.dev`;
- still shallow enough to avoid the old multi-label wildcard/TLS failure mode;
- does not require renaming the current runtime host fleet.

## Durable Sources

- Fairway queue:
  `/Users/subash/dev/GPUasService/.fairway/platform-foundation-implementation-track.yaml`
- Fairway config:
  `/Users/subash/dev/GPUasService/.fairway/platform-foundation-config.toml`
- Current execution plan:
  `/Users/subash/dev/GPUasService/tmp-ux/gpuaas-tonight-execution-plan-2026-06-15.md`
- This program file:
  `/Users/subash/dev/GPUasService/tmp-ux/gpuaas-program-execution-memory-2026-06-16.md`

## Execution Model

- Desktop thread stays the control/review surface.
- Trusted implementation lane is `tmux` session `gpuaas-git`.
- Use that lane for git operations, tests, codegen, deploys, CI watch, and
  long-running validation to avoid desktop sandbox/browser issues.
- Use desktop/browser only for manual product review, screenshots, and
  acceptance checks after deploy.

## Current Verified State

- MFA kind/dev closeout evidence task exists and is marked `done`:
  `IAM-MFA-KIND-DEV-UAT-DEPLOY-CLOSEOUT-001`
- Kind MFA product UAT:
  `/Users/subash/dev/GPUasService/dist/uat/mfa-product/20260616T030959Z-kind-mfa-product-uat/summary.md`
- Dev MFA product UAT:
  `/Users/subash/dev/GPUasService/dist/uat/mfa-product/20260616T031006Z-dev-mfa-product-uat/summary.md`
- Dev deploy pipeline `2677` passed for commit `7c1b15443262f3aec0412e6ed627cadc8af081f1`.
- Fairway now supports `product-quality` as a real role, so MFA product/UAT
  tasks import and route correctly.

### 2026-06-17 edge/governance closeout continuation

- Imported edge backlog tasks were closed from current accepted evidence, not by
  reopening runtime work:
  - `OPS-EDGE-POMERIUM-CONSOLIDATION-SWEEP-001`
  - `OPS-EDGE-NOTIFICATION-WS-AUTH-MODE-001`
  - `OPS-EDGE-TERMINAL-WS-HOST-PARITY-001`
- Evidence artifacts:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/ops-edge-pomerium-consolidation-sweep-20260617/closeout.md`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/ops-edge-notification-ws-auth-mode-20260617/closeout.md`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/ops-edge-terminal-ws-host-parity-20260617/closeout.md`
- `DEMO-API-CANONICAL-ROUTE-READINESS-001` was investigated and explicitly
  blocked on fresh demo deploy/read-only rerun readiness, not product-code
  rollback to `/api/v1/v3/*`.
- Demo blocker artifact:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/demo-api-canonical-route-readiness-20260617/blocker.md`
- The ready governance helper batch was also closed from existing checked-in
  docs/scripts:
  - `GOV-FAIRWAY-STRUCTURED-QUEUE-STORE-001`
  - `GOV-FAIRWAY-WATCHER-PREFLIGHT-HELPER-001`
  - `GOV-FAIRWAY-WATCHER-LANE-MODEL-001`
  - `GOV-FAIRWAY-CONTEXT-PACKET-HELPER-001`
- Grouped evidence:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/gov-fairway-helper-closeout-20260617/summary.md`
- End state from this thread after those closeouts:
  - `fairway ready`: empty
  - no active session remains from this thread
  - active reconcile noise belongs only to
    `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC` / session
    `architecture-control-docs-portal-audience-20260617`, owned by the docs
    portal control thread
  - next action should be an Architecture Control decision on either:
    1. fresh demo deploy/read-only rerun ownership/authorization, or
    2. another newly-readied GPUaaS batch

## Workstreams And Task Map

### 1. MFA product completion

Primary parent:
- `IAM-MFA-PRODUCT-COMPLETE-READINESS-001`

Open product-gap tasks:
- `PRODUCT-GAP-IAM-MFA-SENSITIVE-STEPUP-RUNTIME-001`
- `PRODUCT-GAP-IAM-MFA-SUPERADMIN-PHISHING-RESISTANT-ENV-001`
- `PRODUCT-GAP-IAM-MFA-PROVIDER-FACTOR-READBACK-001`
- `PRODUCT-GAP-IAM-MFA-FACTOR-MANAGE-FLOW-001`
- `UAT-BUG-IAM-MFA-PROVIDER-RETURN-FLOW-001`
- `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`
- `PRODUCT-GAP-IAM-MFA-FACTOR-RECOVERY-FLOW-001`
- `PRODUCT-GAP-IAM-MFA-ADMIN-BREAKGLASS-POLICY-FLOW-001`
- `PRODUCT-GAP-IAM-MFA-USER-FACING-BRANDING-SCAN-001`

Execution order:
1. Finish user-visible lifecycle correctness:
   provider readback, manage flow, remove/disable flow, recovery flow,
   provider return states, branding.
2. Re-run focused e2e/UAT for user/admin/operator journeys.
3. Close policy/product distinctions:
   daily admin versus break-glass, sensitive-op step-up runtime, superadmin
   phishing-resistant env posture.
4. Re-deploy to kind/dev only when code changed.

### 2. Docs portal

Original baseline epic:
- `DOCS-PORTAL-COMPLETION-EPIC` - already complete as the first Docusaurus
  portal baseline.

Current follow-up parent:
- `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC`

Current follow-up slices:
- `DOCS-PORTAL-PRODUCT-HANDBACK-001`
- `DOCS-PORTAL-AUDIENCE-HANDBOOKS-001`
- `DOCS-PORTAL-USER-GUIDE-FLOWS-001`
- `DOCS-PORTAL-PUBLISH-PATH-001`

Portal outcome for this program:
- product-facing IA is coherent;
- persona landing paths are usable;
- user-guide and screenshots can be layered in without reworking structure;
- static publication path is implemented.
- portal/product copy consistently presents the platform as `AI Cloud` instead
  of surfacing the older `GPUaaS` product label except where repo/history/code
  context requires it.
- resource model, hierarchy, identity, and naming are explicit enough that
  product, architecture, IAM, infra, and app developers can reason about what
  a resource is, who owns it, how it is named, and how it is addressed.

Current interpretation:
- the original portal build-out is done;
- the missing work is audience completeness and handoff quality for the teams
  explicitly named later in program steering: product, architecture,
  CISO/security, developers, app/sdk developers, token-factory/shared-platform
  builders, IAM, infra, operations, and end users.

### 2026-06-17 docs portal depth backlog added

The next high-value portal batch is no longer just audience routing. It now
includes explicit engineering-system and delivery-model documentation so the
portal shows how the platform was built, not only what the product does.

New durable Fairway tasks added under `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC`:

- `DOCS-PORTAL-ENGINEERING-SYSTEM-001`
  - explain Fairway, orchestrator/control roles, provider sessions, tmux lanes,
    durable evidence, and controlled coding-agent execution
- `DOCS-PORTAL-CONTRACT-FIRST-SDLC-001`
  - explain UX/flow-first, contract-first, codegen, implementation, e2e/UAT,
    and design-before-code lifecycle
- `DOCS-PORTAL-CI-CD-DELIVERY-SYSTEM-001`
  - explain the actual CI/CD pipeline, scripts, gate stack, deploy/readback
    model, and environment progression
- `DOCS-PORTAL-EVIDENCE-READINESS-MODEL-001`
  - explain evidence collection/exclusion, readiness claims, and non-code
    engineering outputs as first-class work
- `DOCS-PORTAL-RESOURCE-MODEL-NAMING-001`
  - add a dedicated portal-native explanation of:
    - organization -> department -> project -> principal -> product/resource
      hierarchy
    - canonical resource identifier / URN shape
    - display name vs slug vs id vs provider resource id
    - route/path naming versus canonical identity
    - product-facing examples for compute, storage, app instances, service
      accounts, and runtime members
- `DOCS-PORTAL-SHARED-SERVICE-DEEP-PAGES-001`
  - split `/architecture/shared-services` into:
    - one L1 shared-services map/index page
    - separate deep engineering pages for the main platform services
  - initial service page set should cover at least:
    - IAM / Access
    - Billing / Metering / Payments
    - Audit / Evidence
    - Policy / Entitlements
    - Registry / Artifacts
    - Secrets / PKI
    - Status / Ops / Observability
    - Notifications
  - each page should explain:
    - responsibility boundary
    - repo/code ownership
    - API/event/read-model contracts
    - storage/data model
    - workers/control loops
    - failure and recovery posture
    - evidence / audit / security implications
  - treat these as pre-coding design packets, not lightweight overview pages
  - standard page sections should include, where applicable:
    - service purpose and scope
    - context/C4 view
    - component/module decomposition
    - ER/data model
    - lifecycle/state model
    - key flows and sequence diagrams
    - API, event, and read-model contract surfaces
    - reconciliation/control-loop behavior
    - failure, rollback, and recovery model
    - security/audit/evidence model
    - implementation boundaries and deferred decisions

Why this batch exists:

- The portal currently has breadth, but it still undersells the amount of
  architecture, governance, CI/CD, UAT, operating-model, and delivery-system
  engineering behind GPUaaS.
- Developers should be able to start from the portal and understand where work
  goes and why the system is designed this way.
- Security/CISO and architecture reviewers should be able to review claims and
  controls without reconstructing the process from raw repo history.

### 2026-06-17 engineering-system / SDLC / CI-CD / evidence docs progress

Completed the first deepening pass for the new portal-delivery slices:

- added engineering-system page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/governance-agents/engineering-system/index.mdx`
- added contract-first SDLC page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/governance-agents/contract-first-sdlc/index.mdx`
- added CI/CD delivery-system page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/operators/ci-cd-delivery-system/index.mdx`
- added evidence/readiness model page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/security-readiness/evidence-readiness-model/index.mdx`

Portal routing was updated so these pages are discoverable from:

- governance landing
- operators landing
- security readiness landing
- developer handoff
- security assurance
- architecture review pack
- sidebar navigation

Validation:

- `git diff --check` passed
- `make docs-portal-check` passed

Purpose of this pass:

- show how GPUaaS is actually built with Fairway, coding agents, trusted tmux
  execution lanes, deterministic utilities, and evidence-first closeout;
- make the contract-first, design-before-code lifecycle reviewable by product,
  architecture, security, and developers;
- make CI/CD and environment promotion look like a real delivery system rather
  than scattered scripts;
- make evidence/readiness and non-code engineering output visible as first-class
  program work.

### 2026-06-17 remaining docs depth backlog made durable

The next docs pass is now explicitly tracked in Fairway, not left as chat-only
intent. Added tasks under `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC`:

- `DOCS-PORTAL-APP-SDK-PROOF-001`
  - standalone App SDK proof page with Slurm and RKE2 as first-class examples
- `DOCS-PORTAL-ARCH-GUARD-CI-ENFORCEMENT-001`
  - architecture guard and CI enforcement page explaining what each gate proves
- `DOCS-PORTAL-PLATFORM-CAPABILITY-SUMMARY-001`
  - reviewer-first capability summary separating implemented, partial, deferred,
    and queue-backed claims
- `DOCS-PORTAL-VISUAL-DEPTH-001`
  - missing diagrams for scheduler/control split, app composition, environment
    progression, and platform capability shape

These represent the remaining “make the depth visible” work so the portal shows
why GPUaaS is not a weekend project and so developers/reviewers can use it as a
real starting point.

### 2026-06-17 proof / guard / capability / visuals docs progress

Completed the next high-value portal batch:

- added App SDK proof page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/app-sdk-proof/index.mdx`
- added architecture guard / CI enforcement page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/architecture-guard-ci-enforcement/index.mdx`
- added reviewer-first platform capability summary:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/platform-capability-summary/index.mdx`
- expanded production deployment model with more visual depth:
  `/Users/subash/dev/GPUasService/packages/docs/docs/operators/production-deployment-model/index.mdx`

Also updated routing and reviewer entry points:

- `packages/docs/docs/build-on-gpuaas/index.mdx`
- `packages/docs/docs/architecture/index.mdx`
- `packages/docs/docs/architecture/platform-proof-points/index.mdx`
- `packages/docs/sidebars.ts`

Validation:

- `git diff --check` passed
- `make docs-portal-check` passed

Practical result:

- App SDK no longer reads as only a conceptual surface; Slurm and RKE2 are now
  first-class proof pages.
- Reviewers have a faster implemented-versus-partial summary page.
- Developers and architects now have an explicit architecture-guard/CI page
  instead of relying only on AGENTS and scattered ops docs.
- The portal now better represents the delivery-system and platform-proof depth
  already present in the repo.

### 2026-06-17 docs portal follow-up progress

- Added new Fairway follow-up epic:
  `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC`
- Added child slices:
  - `DOCS-PORTAL-PRODUCT-HANDBACK-001`
  - `DOCS-PORTAL-AUDIENCE-HANDBOOKS-001`
  - `DOCS-PORTAL-USER-GUIDE-FLOWS-001`
  - `DOCS-PORTAL-PUBLISH-PATH-001`
- Claimed the new epic and registered active provider session:
  `architecture-control-docs-portal-audience-20260617`
- Validated first two audience-expansion slices under the epic:
  - new product PM path:
    `/Users/subash/dev/GPUasService/packages/docs/docs/product/team-handoff/index.mdx`
  - new internal shared-platform builder path:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/shared-platform-builders/index.mdx`
  - updated landing/routing pages:
    `packages/docs/docs/product/index.mdx`
    `packages/docs/docs/start-here/index.mdx`
    `packages/docs/docs/internal-teams/index.mdx`
    `packages/docs/sidebars.ts`
- Validation passed:
  - `git diff --check`
  - `make docs-portal-check`
- Fairway evidence:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/docs-portal-audience-closeout-20260617/validation-summary.md`
- Epic remains `in_progress`; next intended docs slices are:
  - user-guide flow pack for end-user/admin guidance
  - explicit current publication path decision/doc for the first iteration

### 2026-06-17 docs portal user-guide/publication progress

- Completed the next two intended docs slices under
  `DOCS-PORTAL-AUDIENCE-CLOSEOUT-EPIC`:
  - expanded user-guide flow coverage
  - made the first-iteration publication decision explicit
- Added stable user/admin audience pages:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/account-access/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/launch-operate/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/tenant-admin/index.mdx`
- Added explicit internal handoff pages for the remaining target audiences:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/product/current-state-roadmap/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/review-pack/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/developer-handoff/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/iam-identity/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/infra-environments/index.mdx`
- Expanded existing flow pages:
  - `packages/docs/docs/use-gpuaas/index.mdx`
  - `packages/docs/docs/use-gpuaas/journeys/index.mdx`
  - `packages/docs/docs/use-gpuaas/troubleshooting/index.mdx`
- Publication-path docs now explicitly state the current operating choice:
  internal static Cloudflare Pages first, with later filtered external tracks.
- Portal config is now easier to maintain across environments:
  `packages/docs/docusaurus.config.ts` uses `DOCS_PORTAL_PUBLIC_URL`
  instead of a hardcoded local docs hostname.
- The selected publication path is now executable in-repo:
  - `make docs-portal-publish`
  - `scripts/ops/docs_portal_publish_cloudflare_pages.sh`
  - `doc/operations/Docs_Portal_Static_Cloudflare_Deployment_v1.md`
- Validation passed again:
  - `bash -n scripts/ops/docs_portal_publish_cloudflare_pages.sh scripts/ci/docs_portal_static_deploy_preflight.sh`
  - `git diff --check`
  - `make docs-portal-check`
- Updated Fairway evidence:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/docs-portal-audience-closeout-20260617/validation-summary.md`

### 2026-06-17 docs portal screenshot-backed MFA guide progress

- Added a screenshot-backed MFA walkthrough page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/mfa-guide/index.mdx`
- Wired the new page into the user-guide navigation and account-access path:
  - `packages/docs/docs/use-gpuaas/index.mdx`
  - `packages/docs/docs/use-gpuaas/account-access/index.mdx`
  - `packages/docs/sidebars.ts`
- Copied validated local screenshots into checked-in portal assets:
  - `packages/docs/static/img/portal/mfa/account-security-mfa-empty.png`
  - `packages/docs/static/img/portal/mfa/provider-managed-setup-screen.png`
  - `packages/docs/static/img/portal/mfa/account-security-mfa-active.png`
- Tightened portal maintenance expectations so screenshot-backed guides must
  ship with behavior changes:
  - `packages/docs/docs/portal-roadmap/maintenance/index.mdx`
- `make docs-portal-check` passed after correcting the page metadata to remove
  an internal-only memory-file reference from the customer-facing MFA guide.
- The portal now has a repeatable pattern for converting validated user flows
  into durable, publishable screenshots plus walkthrough content.

### 2026-06-17 docs portal audience completion progress

- Added explicit internal audience handoff pages for the two remaining named
  readers that were still too implicit:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/security-assurance/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/operations-handoff/index.mdx`
- Updated the internal-team landing page and sidebar routing so security/CISO
  and operations readers have direct portal entry points rather than relying on
  generic section discovery.
- `make docs-portal-check` passed again after the audience-route expansion.

### 2026-06-17 docs portal architecture-app-iam depth progress

- Added the missing builder implementation bridge page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/platform-contracts-for-builders/index.mdx`
- Wired it into the second-product builder routes:
  - `packages/docs/docs/build-on-gpuaas/shared-platform-consumer/index.mdx`
  - `packages/docs/docs/build-on-gpuaas/index.mdx`
  - `packages/docs/sidebars.ts`
- This closes the specific review gap that the portal explained the shared
  platform conceptually but still did not tell a second-product builder what
  package anchors and method shapes to code against.
- The new page now gives concrete call-shape guidance for:
  - platform IAM scope decisions
  - metered usage into shared billing
  - privileged mutation audit append
  - release/UAT/security evidence bundle recording
- Validation passed again:
  - `git diff --check`
  - `make docs-portal-check`

### 2026-06-17 docs portal under-sold platform strengths batch

- Promoted the strongest under-sold platform themes into portal-native pages
  and route framing instead of leaving them implicit in source docs:
  - new architecture page:
    `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/platform-strengths/index.mdx`
  - new security/readiness page:
    `/Users/subash/dev/GPUasService/packages/docs/docs/security-readiness/evidence-custody/index.mdx`
- Expanded existing portal pages so the platform reads as a reusable AI
  foundation rather than only a GPU product:
  - `packages/docs/docs/architecture/index.mdx`
  - `packages/docs/docs/architecture/shared-services/index.mdx`
  - `packages/docs/docs/architecture/workload-access-runtime-surfaces/index.mdx`
  - `packages/docs/docs/product/tenant-project-hierarchy/index.mdx`
  - `packages/docs/docs/security-readiness/index.mdx`
  - `packages/docs/docs/security-readiness/release-evidence/index.mdx`
  - `packages/docs/sidebars.ts`
- What this batch makes more explicit:
  - GPUaaS as the first shipped product on a shared platform
  - the hierarchy as a real IAM/billing/resource backbone
  - runtime access as distinct terminal/browser/API/operator surface families
  - audit/evidence/billing/release as one custody model
- Validation passed:
  - `git diff --check`
  - `make docs-portal-check`

- Added a first-class workload/runtime access architecture page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/workload-access-runtime-surfaces/index.mdx`
  - explains terminal, browser-app routes, API-app routes, metrics, and
    platform-proxy surfaces as one runtime model
  - linked from the architecture front door
- Added a practical app-onboarding page for app/sdk developers:
  `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/new-app-onboarding/index.mdx`
  - explains SDK structure as contract layers, not a vague platform concept
  - gives the concrete sequence manifest -> artifact -> service account ->
    catalog -> entitlement -> launch/connect/decommission -> evidence
  - linked from the Build on GPUaaS front door and App SDK overview
- Added a concrete IAM capabilities page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/iam-capabilities/index.mdx`
  - explains provider-vs-product IAM boundary
  - explains service-account lifecycle role
  - explains MFA ownership split and exclusions
  - linked from the IAM team guide
- Validation passed:
  - `git diff --check`
  - `make docs-portal-check`

### 2026-06-17 docs portal principle and production-model progress

- Added a portal-native architecture principle page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/design-principles/index.mdx`
  - states the main rule as contract-first control plane on shared platform
    services
  - explains product vs shared-service vs provider/profile boundaries
  - gives a plain-language “can users understand it?” framing
- Added a portal-native production deployment model page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/operators/production-deployment-model/index.mdx`
  - explains environment path kind -> dev -> demo -> UAT/security -> staging
    -> production
  - explains production shape, promotion model, rings/reserve posture, and
    current remaining gaps
- Updated the architecture and operator front doors:
  - `packages/docs/docs/architecture/index.mdx`
  - `packages/docs/docs/architecture/review-pack/index.mdx`
  - `packages/docs/docs/operators/index.mdx`
- Validation passed:
  - `git diff --check`
  - `make docs-portal-check`

### 2026-06-17 docs portal publication automation progress

- Added repo-native GitLab jobs for the portal path in `.gitlab-ci.yml`:
  - `docs_portal_quality_gate`
  - `docs_portal_publish_internal`
- Added checked-in local operator env template:
  `/Users/subash/dev/GPUasService/doc/operations/local-dev/docs-portal-cloudflare.env.example`
- Updated portal deployment docs and portal README so the publication path is
  now documented consistently across local operator use and CI.
- Validation passed:
  - `ruby -e 'require "yaml"; YAML.load_file(".gitlab-ci.yml", aliases: true)'`
  - `make docs-portal-check`
- Remaining publication blocker is external configuration only:
  `CLOUDFLARE_ACCOUNT_ID`, `CLOUDFLARE_PAGES_PROJECT`,
  `CLOUDFLARE_API_TOKEN`, and `DOCS_PORTAL_HOSTNAME` are still unset in the
  current execution surface.

### 2026-06-17 portal version metadata progress

- Added a user-visible portal build/version page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/reference/portal-build/index.mdx`
- Added reusable metadata component:
  `/Users/subash/dev/GPUasService/packages/docs/src/components/PortalBuildInfo.tsx`
- `packages/docs/docusaurus.config.ts` now injects:
  - git SHA
  - short SHA
  - build timestamp
  - published timestamp when provided by publish wrapper/CI
  - publication track
  - portal URL
- Footer now exposes concise build metadata directly in the rendered site.
- `scripts/ops/docs_portal_publish_cloudflare_pages.sh` now exports
  `DOCS_PORTAL_BUILD_SHA` and `DOCS_PORTAL_PUBLISHED_AT` before rebuild/publish
  so live publishes can carry commit and publish timing into the site itself.
- Validation passed again with `make docs-portal-check`.

### 2026-06-17 portal governance and second-product track progress

- Added practical governance/SDLC coverage for how GPUaaS actually uses
  Fairway, Desktop control threads, tmux execution lanes, and lighter-weight
  review scaling:
  - `packages/docs/docs/governance-agents/fairway-in-practice/index.mdx`
  - `packages/docs/docs/governance-agents/lightweight-review-model/index.mdx`
  - expanded `packages/docs/docs/governance-agents/agent-sdlc/index.mdx`
  - updated `packages/docs/docs/governance-agents/index.mdx`
- Added the missing second-product builder route so Token Factory or later
  product teams are no longer forced to infer their path from App SDK pages:
  - `packages/docs/docs/build-on-gpuaas/shared-platform-consumer/index.mdx`
- Added a faster product/program readback page for shipped foundations,
  active closeout work, and next queued blocks:
  - `packages/docs/docs/product/platform-status/index.mdx`
- Added JAD/ARB depth routing:
  - `packages/docs/docs/architecture/detailed-design-index/index.mdx`
  - expanded `packages/docs/docs/architecture/review-pack/index.mdx` with core
    workflow sequences and direct routing to canonical design families
- Updated:
  - `packages/docs/docs/build-on-gpuaas/index.mdx`
  - `packages/docs/docs/internal-teams/shared-platform-builders/index.mdx`
  - `packages/docs/docs/product/current-state-roadmap/index.mdx`
  - `packages/docs/docs/portal-roadmap/ownership-freshness/index.mdx`
  - `packages/docs/sidebars.ts`
- Added stricter source-recency validation support in:
  - `packages/docs/scripts/check-source-docs.mjs`
  with explicit opt-in enforcement via `DOCS_PORTAL_ENFORCE_SOURCE_RECENCY=1`
- Fixed the summary-to-source bridge across the whole portal:
  - added `packages/docs/scripts/sync-source-docs.mjs`
  - wired `pnpm check` / `make docs-portal-check` to mirror all referenced
    `source_docs` into `packages/docs/static/portal/source-docs`
  - updated `packages/docs/src/components/StatusBadge.tsx` so `SourceList`
    entries are clickable in the published portal instead of plain text
- Validation passed:
  - `git diff --check`
  - `make docs-portal-check`
- Important operating note:
  strict source-recency enforcement is useful, but the existing portal has too
  many stale `last_reviewed` dates to make it default-blocking tonight. Keep it
  opt-in until the older architecture/security/product pages are re-reviewed in
  a dedicated freshness batch.

### 2026-06-17 portal depth-review conclusion

- External review correctly identified a structural ceiling:
  the portal had good summaries, but the canonical design/source layer was not
  reachable because `SourceList` rendered plain text only.
- That structural issue is now fixed globally.
- Remaining depth work is content-specific, not routing-specific:
  - product pages need more live program/state readback and fuller user-guide
    walkthroughs;
  - security/ops pages need more concrete decision-packet and operating-proof
    coverage;
  - developer/shared-platform-consumer pages need integration guides with
    callable contracts/examples, not only checklists and reading order.

### 3. Staging

- `OPS-STAGING-TWO-NODE-REPEATABLE-SETUP-001`

Outcome:
- repeatable, config-driven, two-node environment using existing scripts as the
  production-like baseline.

### 4. Demo

- `OPS-DEMO-FRESH-ENV-SUPPORTED-APPS-UAT-001`

Outcome:
- fresh demo env, supported apps installed/validated, usable handoff docs/UAT.

### 5. Platform version visibility

- `OPS-PLATFORM-SERVICE-VERSION-SURFACE-001`

Outcome:
- operators/admins can see what version/commit/image each service is running.

## Stop Conditions

Do not claim MFA complete just because setup works. Product-complete means the
user/admin/operator journey is coherent, tested, and honest about boundaries.

Do not let docs-portal work drift into raw content dumping. Persona navigation
and publication filtering come first.

Do not run production-impacting or destructive environment actions without
explicit confirmation and readback evidence.

## Immediate Next Slice

Audit the current MFA account-security implementation against the remaining
open product-gap tasks, then implement the next highest-leverage user-visible
flow gap directly in code and UAT.

## 2026-06-16 Progress

- Implemented durable MFA recovery-request state in the account-security read
  model. Recovery is no longer browser-local-only after refresh.
- Extended the v3 account MFA recovery contract with persistent
  `request_id` and `submitted_at` fields.
- Updated the account-security MFA UI to respect persisted recovery state and
  keep removal/recovery actions disabled once a request is already open.
- Tightened the MFA user surface so it no longer claims a fake manage flow:
  existing-factor posture now renders as `MFA active`, `Registered
  authenticators`, and `Recover or reset access`, with add-backup and recovery
  actions instead of misleading `manage/remove` wording.
- Added a backend integration regression for
  `POST /api/v1/account/security/mfa/recovery-requests` to prove the audited
  recovery request path works with real project scope and audit-log writes.
- Validation passed:
  - `make codegen`
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA'`
  - `GOCACHE=/tmp/gpuaas-go-build go test -tags integration ./cmd/api -run TestV3AccountMFARecoveryRequestIntegration -count=1`
  - `pnpm --dir packages/web test -- v3-account-security-sessions.test.tsx`
  - `pnpm --dir packages/web typecheck`
- Focused browser e2e is currently blocked by the local Playwright/macOS
  launch issue:
  `mach_port_rendezvous ... Permission denied (1100)`.
  Treat this as a harness/environment blocker, not as an MFA product failure.
- The remaining MFA recovery issue seen from kind appears environment-specific:
  the local handler and integration path are green, so the next step is to
  inspect kind runtime/config state rather than rework the basic recovery API
  contract again.

## 2026-06-17 MFA product-closeout update

- Support-assisted reset/removal request intake is now a committed product
  contract:
  - task: `IAM-MFA-FACTOR-RESET-MUTATION-CONTRACT-001`
  - commit: `fae27322` (`feat: add MFA factor reset request contract`)
  - route: `POST /api/v1/platform/iam/mfa/factor-resets`
  - request must include audited custody anchors (`ticket_id` or
    `evidence_ref`)
  - response is fail-closed `execution_state=packet_required`
  - this does not mutate Keycloak/provider state; it only closes the operator
    request-intake gap.
- Account Security MFA primary copy is now action-first:
  - task: `PRODUCT-GAP-IAM-MFA-WORKFLOW-FIRST-HELP-AFFORDANCE-001`
  - commit: `8cc625f5` (`feat: make MFA account help workflow-first`)
  - setup/manage/recovery help now stays in secondary affordances instead of
    primary paragraph copy
  - focused evidence is recorded at:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-workflow-first-help-affordance-20260617/validation-summary.md`
- Remaining MFA product gap is no longer user-surface ambiguity. It is
  operator-side fulfillment proof:
  - exact authorization decision artifact:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/iam-mfa-factor-fulfillment-packet-20260617/nonlive_factor_fulfillment_proof_authorization_decision_82bdc2f1.md`
  - blocking task: `PRODUCT-GAP-IAM-MFA-FACTOR-FULFILLMENT-FLOW-001`
  - packet-only next boundary:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/iam-mfa-factor-fulfillment-packet-20260617/factor-fulfillment-execution-packet.md`
  - required next decision is explicit authorization for exactly one bounded
    non-live fulfillment proof run with before/after factor readback, audit,
    notification, privileged-human/last-factor guard proof, and redaction
    custody gates
  - no safe further MFA implementation slice is ready until that execution
    boundary is approved or re-scoped

### Current MFA account-flow slice

- Working branch HEAD before the current uncommitted slice:
  `0a8b16098b218712a9c18d5895fb52e27deab9be`
  (`iam: tighten MFA account lifecycle flows`), already deployed to kind.
- New uncommitted slice narrows the remaining product gap:
  - account MFA recovery request is now account-scoped instead of
    project-scoped;
  - provider-backed accounts with an existing factor now surface
    `Add backup` instead of misleading `Add authenticator`;
  - existing-factor rows expose explicit recovery-based removal actions;
  - the account page auto-refreshes once when returning from
    `?mfa=setup_complete`.
- Files changed in this slice:
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_account_mfa.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels_test.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_integration_test.go`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-subpages.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-storage-access-account.spec.ts`
- Focused validation already passed:
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web test -- v3-account-security-sessions.test.tsx`
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web typecheck`
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `GOCACHE=/tmp/gpuaas-go-build go test -tags integration ./cmd/api -run 'TestV3AccountMFARecoveryRequestIntegration|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `git diff --check`
- tmux execution lane for browser validation:
  - session `gpuaas-git`
  - focused e2e rerun window `mfa-e2e-rerun-0616`
- Important current finding:
  - the first focused e2e failure was a stale spec expectation
    (`No authenticator change was saved`) after the UI copy had already been
    tightened to `No change was saved`.
  - The spec has been corrected and is being rerun in tmux under `CI=1`.

### Immediate next action

- Wait for `gpuaas-git:mfa-e2e-rerun-0616` to finish.
- If green:
  - commit the current MFA account-flow slice;
  - redeploy to kind;
  - verify the live kind user flow again.
- If still red:
  - treat it as a real product/UAT bug and fix before commit/deploy.

- Account-flow slice committed as `148ef67974582fa0314510f813c2d241853f4202` (`iam: align MFA recovery with account-managed flows`).
- Kind fast deploy completed for `gpuaas-api` and `gpuaas-web`; both deployments now advertise git-sha `148ef67974582fa0314510f813c2d241853f4202`.

### 2026-06-16 late MFA user-flow follow-up

- Re-read the live MFA harness and live product spec to correct the earlier
  diagnosis: an MFA-enrolled identity cannot be validated through the
  no-factor password-grant path alone. Live product UAT must either:
  - use a no-factor persona such as `dev-platform-admin` for the setup/readback
    posture slice, or
  - use a real browser login plus `E2E_V3_LIVE_MFA_TOTP_SECRET` for the
    enrolled-factor slice.
- Verified current backend/user-flow slice still targets the product gap, not
  a governance-only path:
  - account-scoped MFA recovery request handler,
  - provider factor readback-backed `Set up MFA` vs `Add backup` action href,
  - support-assisted removal state in the Account Security page.
- Focused validation rerun passed:
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `pnpm --dir packages/web typecheck`
  - `CI=true E2E_SPEC='v3-account-mfa.spec.ts' bash scripts/ci/frontend_e2e.sh full`

## 2026-06-17 Progress

- Resumed from this memory file after context compaction.
- Closed `PRODUCT-GAP-IAM-MFA-USER-FACING-BRANDING-SCAN-001` with focused
  auth-callback copy cleanup, web tests, typecheck, scoped scan, and Fairway
  evidence.
- Claimed and implemented `PRODUCT-GAP-IAM-MFA-SENSITIVE-STEPUP-RUNTIME-001`.
  Platform role bind/revoke API mutations now fail closed with
  `step_up_required` before any database mutation when no fresh MFA step-up
  grant exists.
- Added canonical `step_up_required` error-code contract and regenerated
  OpenAPI Go/TypeScript artifacts.
- Focused validation passed:
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestAdminPlatformRoleMutationsRequireMFAStepUp|TestAdminRoutesRejectNonAdminClaims' -count=1`
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestErrorResponseConformance|TestAdminPlatformRoleMutationsRequireMFAStepUp' -count=1`
  - `bash scripts/ci/sdk_codegen_smoke.sh`
  - scoped `git diff --check`
- `CODEGEN_ENFORCE_CLEAN=1 bash scripts/ci/sdk_codegen_smoke.sh` regenerated
  successfully but failed the clean check because the shared worktree has
  intentional generated diffs from the current contract/account-flow slices.
  Treat this as commit-boundary/worktree cleanliness, not a codegen failure.
- Closed the remaining MFA product-readiness task group and committed the
  scoped slice as `473eebdb72ee906a29f59280b5d97b0f9791ac70`
  (`iam: close MFA product readiness gaps`).
- Promoted `473eebdb72ee906a29f59280b5d97b0f9791ac70` to
  `release/platform-control` from a clean detached worktree at
  `/tmp/gpuaas-ci-473eebdb`.
- The branch push did not auto-create a GitLab pipeline for this SHA, so an
  explicit validation pipeline was triggered:
  - pipeline: `2688`
  - ref: `release/platform-control`
  - sha: `473eebdb72ee906a29f59280b5d97b0f9791ac70`
  - mode/profile: `PLATFORM_CONTROL_RELEASE_MODE=fast`,
    `PLATFORM_CONTROL_RELEASE_PROFILE=standard`
  - monitor artifact root:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2688-mfa-product-readiness-473eebdb`
- Do not deploy dev until pipeline `2688` passes for this exact SHA.
- Pipeline `2688` exposed a sequencing miss: the release branch was promoted
  before `473eebdb72ee906a29f59280b5d97b0f9791ac70` was reachable from
  `origin/master`, so `platform_control_release_branch_guard` failed. Remote
  `master` was then pushed to `origin`; the local tracking-ref update failed
  only due the known Desktop sandbox `.git` write limitation.
- Remote readback now confirms both `origin/master` and
  `origin/release/platform-control` point at
  `473eebdb72ee906a29f59280b5d97b0f9791ac70`.
- Fresh validation pipeline `2690` was triggered for the same SHA and the
  release-branch guard passed.
- Continue with pipeline `2690`, not `2688`. Monitor artifact root:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2690-mfa-product-readiness-473eebdb`.
- Pipeline `2690` passed for `473eebdb72ee906a29f59280b5d97b0f9791ac70`
  with `failed_jobs=0`.
- Dev deploy pipeline `2691` was triggered for the same SHA using
  `PLATFORM_CONTROL_RELEASE_MODE=deploy` and
  `PLATFORM_CONTROL_RELEASE_PROFILE=dev-control-rke2`.
- Monitor artifact root:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2691-dev-deploy-473eebdb`.
- Pipeline `2691` failed in
  `platform_control_publish_release_artifacts` after resolving
  `dev-control-rke2`; the blocker was SSH auth to
  `hpcadmin@100.90.157.34`, not product code. This indicates the GitLab
  dev-control SSH credential is missing/stale for this profile.
- Retry pipeline `2692` was triggered for the same SHA and same deploy profile,
  explicitly passing the documented local
  `DEV_CONTROL_RKE2_SSH_PRIVATE_KEY_B64` value from
  `/Users/subash/.ssh/gpuaas-dev-control-rke2-cd`.
- Monitor artifact root:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2692-dev-deploy-473eebdb`.
- Do not claim dev deployment complete until pipeline `2692` passes and dev
  endpoint/readback/UAT evidence confirms the deployed version is this same
  SHA.

### Immediate next action after sensitive-stepup closure

- Fairway evidence has been recorded and
  `PRODUCT-GAP-IAM-MFA-SENSITIVE-STEPUP-RUNTIME-001` is closed.
- Continue MFA before portal/staging/demo:
  1. `PRODUCT-GAP-IAM-MFA-SUPERADMIN-PHISHING-RESISTANT-ENV-001`
  2. `PRODUCT-GAP-IAM-MFA-ADMIN-BREAKGLASS-POLICY-FLOW-001`
  3. focused MFA UAT/CI/deploy closeout for the current commit set.
  `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`:
  - active checkpoint recorded: focused gates green, deploy next;
  - evidence recorded against
    `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-remove-disable-flow-20260616/validation-summary.md`.
- Next action from this point:
  - deploy matching `api+web` current slice to kind;
  - verify the live no-factor and existing-factor MFA account journeys against
    the deployed build;
  - if live still diverges, treat it as deploy/runtime/config drift before
    expanding implementation scope again.

### Current continuation slice: recovery confirmation and stale-state refresh

- Current active Fairway task remains:
  `PRODUCT-GAP-IAM-MFA-FACTOR-RECOVERY-FLOW-001`
- New uncommitted continuation narrows the signed-in account MFA gap further:
  - stale `provider_unqueried` / `provider_pending` MFA cards auto-refresh once
    after load and after `?mfa=setup_complete`;
- recovery and removal no longer submit immediately on first click;
- account-security now requires an explicit confirmation click before

### Deferred follow-up captured

- Added deferred queue item:
  `PRODUCT-GAP-IAM-MFA-WORKFLOW-FIRST-HELP-AFFORDANCE-001`
- Purpose:
  convert documentation-like MFA page prose into workflow-first help affordances
  (`?`, tooltip, drawer, collapsed technical details) so the primary user
  surface stays action-first.
- Scope:
  start with Account Security MFA, then reuse the same pattern for other
  user/admin surfaces later instead of leaving it as an MFA-only cleanup note.

### Working lane reminder

- Use tmux session `gpuaas-git` for git, deploy, CI, and other sandbox-sensitive
  command work by default. Do not let desktop sandbox `index.lock` failures
  stall reviewed work that can safely execute in the trusted tmux lane.
    creating an MFA recovery request;
  - this makes recovery/removal behavior honest for the product-owner view and
    avoids silent mutation on exploratory clicks.
- Files changed in this continuation:
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-subpages.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
- Focused validation passed:
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `git diff --check -- packages/web/src/components/v3/v3-account-subpages.tsx packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
- Focused MFA web tests are green, but the shared account-security test file is
  still red from an unrelated SSH-key expectation:
  - `queries active SSH keys by default and exposes revoked keys through filters`
  - treat that as separate account-security debt unless the next live rerun
    proves the two surfaces now interfere.
- Provider/runtime truth confirmed directly from kind Keycloak:
  - user `dev-admin` currently has a TOTP factor registered;
  - credential inventory includes an OTP credential labeled `subash iphone`;
  - remaining user-facing issue is therefore readback/return/manage semantics,
    not absence of a factor at the provider.
- Next action:
  - record this continuation in Fairway;
  - redeploy current account MFA web slice to kind;
  - manually verify signed-in flows: existing-factor state, refresh behavior,
    recovery submit boundary, and the remaining manage/remove gap.
- The signed-in MFA recovery request bug on kind was traced to the owning API
  layer, not the browser:
  `POST /api/v1/account/security/mfa/recovery-requests` still tried to resolve
  project scope whenever the browser sent `X-Project-ID`, even though recovery
  is an account-scoped action.
- The recovery handler now resolves the internal account subject directly and
  ignores ambient project context.
- Focused validation passed:
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFAPosture|TestV3AccountSecurityCacheDoesNotReplayMFASessionPosture' -count=1`
  - `GOCACHE=/tmp/gpuaas-go-build go test -tags integration ./cmd/api -run 'TestV3AccountMFARecoveryRequestIntegration|TestV3AccountMFARecoveryRequestIntegrationResolvesOidcSubjectThroughProjectScope|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `git diff --check -- cmd/api/routes_v3_account_mfa.go cmd/api/routes_integration_test.go`
- Kind API was redeployed again with:
  - `BUILDX_CONFIG=/tmp/docker-buildx DOCKER_CONFIG=/tmp/docker-config bash scripts/ops/kind_fast_deploy.sh api`
  - rollout result: success

### Immediate next action

- Reconcile the UAT/product-flow coverage with the actual MFA factor lifecycle:
  refresh the live/mock MFA UAT expectations so they catch signed-in recovery,
  existing-factor posture, support-assisted removal, and provider-return states.
- Then continue the next user-visible gap from the same account-security slice
  instead of broadening into unrelated docs/deploy work.

## 2026-06-16 MFA account-flow rerun

- Focused MFA validation rerun passed again on the current worktree:
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext' -count=1`
  - `GOCACHE=/tmp/gpuaas-go-build go test -tags integration ./cmd/api -run 'TestV3AccountMFARecoveryRequestIntegration|TestV3AccountMFARecoveryRequestIntegrationWithoutProjectContext|TestV3AccountMFARecoveryRequestIntegrationResolvesOidcSubjectThroughProjectScope' -count=1`
  - `pnpm --dir packages/web exec vitest run src/components/v3/v3-account-security-sessions.test.tsx src/shared/auth/components/AuthCallbackClient.test.tsx src/shared/auth/session/session.test.ts`
  - `pnpm --dir packages/web typecheck`
  - `bash scripts/ci/v3_namespace_retirement_guard.sh`
  - `bash scripts/ci/route_structure_guard.sh`
- Dirty-worktree kind redeploy completed again for `api` and `web`.
- `kubectl -n gpuaas-core get deploy gpuaas-api gpuaas-web -o wide` shows both
  available at `1/1`.
- Important caveat: kind deploy stamping still shows commit

## 2026-06-16 live kind proof: normal-user setup and recovery

- Durable live proof now exists for a normal-user MFA journey on kind using
  the real browser path and provider flow.
- Verified end-to-end setup:
  - signed in through the real browser path;
  - completed provider TOTP setup for `dev-user`;
  - returned to `/account/security?mfa=setup_complete`;
  - account page showed protected state and a registered authenticator row.
- Verified signed-in post-setup account flow:
  - `/account/security` now shows `Add backup` instead of the old generic
    setup/manage wording for a protected normal user;
  - top action targets the backup flow:
    `/auth/mfa/setup?intent=backup&next=/account/security`;
  - that link lands on the provider setup screen directly, not the previously
    hanging provider account-console path.
- Verified signed-in recovery request flow:
  - `Start recovery` submission returned `202 Accepted`;
  - response carried `status=submitted`, `support_required=true`,
    `approval_required=false`;
  - recovery panel rendered submitted state for the user.
- Durable artifacts:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/live-kind-dev-user-setup-recovery-summary.md`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/kind-dev-user-setup-proof.tmux.txt`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/kind-dev-user-recovery-proof.tmux.txt`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/kind-dev-user-post-setup-proof.json`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/kind-dev-user-setup-proof.png`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/kind-dev-user-post-setup-proof.png`
- Remaining narrowed product gaps after this live proof:
  - `CURRENT SESSION` still reads unknown; the session-assurance contract is
    not complete from a product point of view;
  - removal still needs a clean-factor UAT proof after recovery state is
    already submitted for `dev-user`;
  - privileged/admin lifecycle proof still needs its own product-owned UAT.

## 2026-06-16 live MFA UAT harness correction

- The focused live MFA Playwright spec is now aligned with the actual product
  contract:
  - normal-user MFA verification uses real OIDC + TOTP instead of password
    grant;
  - platform IAM readiness is checked separately with platform-admin creds
    instead of assuming the normal user can access admin-only surfaces.
- Changed file:
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-live-mfa-product.spec.ts`
- Durable summary:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/live-kind-mfa-uat-harness-summary.md`
- Focused live validation passed:
  - `CI=1 E2E_V3_LIVE=1 E2E_BASE_URL=https://aicloud-kind-app.core42.dev E2E_API_BASE_URL=https://aicloud-kind-api.core42.dev E2E_KEYCLOAK_BASE_URL=https://aicloud-kind-auth.core42.dev E2E_V3_LIVE_USERNAME=dev-user E2E_V3_LIVE_PASSWORD=dev123 E2E_V3_LIVE_MFA_TOTP_SECRET=<secret> E2E_V3_PLATFORM_LIVE_USERNAME=dev-platform-admin E2E_V3_PLATFORM_LIVE_PASSWORD=platform123 pnpm --dir packages/web exec playwright test e2e/v3-live-mfa-product.spec.ts`
  - result: `2 passed`
- Immediate remaining product gap after the harness fix:
  - `CURRENT SESSION` still remains unknown after successful MFA login, so the
    product session-assurance contract is still unfinished even though setup
    and recovery now have live proof.
  `148ef67974582fa0314510f813c2d241853f4202`, but the running images include
  newer uncommitted local MFA account-flow changes. Do not treat this as a
  durable release SHA until the slice is committed.
- Current next step: live kind browser re-verification of the account security
  user flow, especially recovery submission and provider return behavior, before
  cutting the next durable commit.
- Focused browser validation passed in tmux using `CI=1 E2E_SPEC=packages/web/e2e/v3-storage-access-account.spec.ts bash scripts/ci/frontend_e2e.sh`.
- Dev release pipelines succeeded for the same SHA:
  - api-fast pipeline `2680`
  - web-fast pipeline `2681`
- Current MFA account-flow state after this slice: setup, setup-return, add-backup, recovery-request persistence, and recovery-based remove entrypoints are aligned between backend read model and account UI.
- Later dev deploy attempt for `4dc64e41dffd3fc53b99d35ca113c2ca14464794`
  triggered GitLab pipeline `2682`, which failed in `build_test`.
  Deterministic monitor evidence:
  `.fairway/artifacts/local-ci-monitor-2682-mfa-dev/summary.md`.
- Root cause for `2682`: the deployed/pushed SHA was a partial slice. It
  included MFA pages that import OIDC flow/session helpers, but not the session
  helper exports; backend failures were route registration 404s fixed by local
  uncommitted route wiring.
- Correct operating rule from `doc/governance/Multi_Agent_Lane_Worktrees_v1.md`
  and the dev runbook: commit to `master`, promote `release/platform-control`
  to exactly that committed `master` SHA, let CI run for that exact SHA, and
  deploy only the same SHA after CI passes. Do not deploy from a SHA that has
  no green pipeline evidence.

## 2026-06-16 V3 API Sweep Closeout

- Scoped active-path sweep completed for `/api/v1/v3/*` retirement in:
  - `/Users/subash/dev/GPUasService/cmd/api`
  - `/Users/subash/dev/GPUasService/packages/web/src/lib/v3/api.ts`
  - `/Users/subash/dev/GPUasService/packages/web/e2e`
  - `/Users/subash/dev/GPUasService/scripts/ops`
  - `/Users/subash/dev/GPUasService/scripts/ci`
  - `/Users/subash/dev/GPUasService/doc/api/openapi/domains/v3-read-models.yaml`
- Canonical active routes are now `/api/v1/*`. Remaining `/api/v1/v3/*` references are intentionally localized to compatibility registration, compatibility detection, test expectations, e2e mock translation, and CI guard logic.
- Important contract caveat discovered and fixed:
  - `doc/api/openapi/manifest.yaml` currently uses `doc/api/openapi.draft.yaml` as both `root_file` and `canonical_file`.
  - After moving read-model fragments to canonical `/api/v1/*`, the bundled artifact still retained legacy `/api/v1/v3/*` paths from the root document.
  - This created duplicate OpenAPI operation IDs and broke both Go and TypeScript codegen consumers.
  - Fix applied for now: all legacy compatibility paths in `doc/api/openapi.draft.yaml` use unique `Compat` operation IDs, and the three retained legacy fragment aliases use matching `Compat` operation IDs.
- Validation passed after the sweep:
  - `make codegen`
  - `bash scripts/ci/route_structure_guard.sh`
  - `bash scripts/ci/v3_namespace_retirement_guard.sh`
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run TestCanonicalV3RouteAlias -count=1`
  - `pnpm --dir packages/web typecheck`
  - `git diff --check`
- Intentional remaining `/api/v1/v3/*` residue after the sweep is limited to:
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels_domains.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels_test.go`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3/mock-backend.ts`
  - `/Users/subash/dev/GPUasService/scripts/ci/route_structure_guard.sh`
  - `/Users/subash/dev/GPUasService/scripts/ci/v3_namespace_retirement_guard.sh`
  - `/Users/subash/dev/GPUasService/doc/api/openapi/domains/v3-read-models.yaml`
- Follow-up backlog that should not block MFA flow completion:
  - replace the self-referential OpenAPI `root_file` model with a clean base-plus-fragments bundle model;
  - sweep non-path namespace residue later (`V3*` symbol names, `lib/v3`, `components/v3`, legacy file names) without conflating that with active API path retirement;
  - keep the four core-owned allocation compatibility routes (`access-grants`, `grant revoke`, `allocation ssh-keys`) localized until the core route/contract ownership cleanup is done.

## 2026-06-16 MFA Existing-Factor Flow Follow-On

- Active Fairway task:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-MANAGE-FLOW-001`
- Current local slice updates:
  - account MFA UI now treats existing-factor state as `Add backup` plus
    recovery/remove, not generic `Manage MFA`;
  - `Manage MFA` label is normalized back to product-owned `Add backup`;
  - the registered-authenticator panel now exposes explicit `Add backup`
    and `Remove via recovery` actions for existing-factor state;
  - lifecycle wording was tightened from `Recover or reset access` to
    `Recover access`;
  - the flow-coverage doc now records the product decision as explicit
    no-generic-manage support unless a real manage surface exists.
- Files changed in this follow-on slice:
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-subpages.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-storage-access-account.spec.ts`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_User_Factor_Setup_Manage_Flow_Coverage_v1.md`
- Validation readback for this follow-on slice:
  - `git diff --check -- <four files>`: pass
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web typecheck`: pass
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web vitest run src/components/v3/v3-account-security-sessions.test.tsx`:
    blocked by local optional dependency issue
    (`Cannot find module @rollup/rollup-darwin-arm64`)
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web playwright test e2e/v3-storage-access-account.spec.ts --project=chromium`:
    blocked by local Playwright/Chromium launch failure
    (`mach_port_rendezvous ... Permission denied (1100)`)
- Live kind runtime checks completed from CLI:
  - kind context is `kind-gpuaas-local-parity`;
  - deployed API pod has `KEYCLOAK_ISSUER_URL`, `KEYCLOAK_PUBLIC_ISSUER_URL`,
    `KEYCLOAK_ADMIN_USERNAME`, and `KEYCLOAK_ADMIN_PASSWORD` set;
  - Keycloak admin readback proves `dev-admin` currently has an OTP
    credential labeled `subash iphone`;
  - direct API check for `dev-user` with project header proves provider
    factor readback works in kind (`factor_evidence_source=provider`,
    `posture_source=provider`);
  - direct API POST for `dev-user` proves MFA recovery request creation works
    in kind when project context is supplied.
- Current narrowed gap after these checks:
  - the remaining user-observed mismatch is now the deployed admin/session
    UAT path, not the basic backend contract for provider readback or
    recovery-request creation.

## 2026-06-16 MFA Remove Disable Recovery UI Slice

- Active Fairway task:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`
- Product/UI changes in this slice:
  - account MFA now separates `Add backup`, per-factor `Request removal`,
    and `Start recovery` instead of treating them as one generic manage flow;
  - per-factor remove actions now use recovery reason
    `remove_registered_factor`;
  - lost-phone/app-upgrade recovery keeps using
    `lost_or_changed_authenticator`;
  - recovery/remove failures now render an inline panel with correlation-safe
    error text instead of relying only on toast state;
  - provider-backed remove posture now renders as explicit
    `Support-assisted removal` / `Recovery required` status, not generic
    manage wording.
- Files changed in this slice:
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-subpages.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-storage-access-account.spec.ts`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_User_Factor_Setup_Manage_Flow_Coverage_v1.md`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_Factor_Lifecycle_UAT_Coverage_v1.md`
- Validation readback for this slice:
  - `git diff --check -- <mfa ui files>`: pass
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web typecheck`: pass
  - kind web deploy re-run was started for this slice after typecheck using
    `BUILDX_CONFIG=/tmp/docker-buildx DOCKER_CONFIG=/tmp/docker-config`
    to avoid Desktop sandbox writes under `~/.docker/buildx`.
- Remaining gap after this slice:
  - support/admin queue fulfillment, notification proof, and actual
    self-service delete/disable are still incomplete, so this closes the
    honest product surface for request submission but not the full remove
    lifecycle.

## 2026-06-16 MFA Recovery Entry And Backup-Setup Slice

- Active Fairway task:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-RECOVERY-FLOW-001`
- Evidence recorded:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/validation-summary.md`
- Product/backend changes in this slice:
  - provider-backed MFA action hrefs now distinguish first-factor setup versus
    backup-factor enrollment:
    - `/auth/mfa/setup?intent=setup&next=/account/security`
    - `/auth/mfa/setup?intent=backup&next=/account/security`
  - sign-in now exposes a product-owned locked-out entry:
    `/auth/mfa/recovery`
  - `/auth/mfa/recovery` now exists as an AI Cloud-owned recovery entry point
    for signed-in and signed-out users without dropping them directly into the
    provider account console;
  - setup-page loading state now reflects backup enrollment versus first-time
    setup instead of using one generic label.
- Files changed in this slice:
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels_test.go`
  - `/Users/subash/dev/GPUasService/packages/web/app/auth/login/page.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/app/auth/login/page.test.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/app/auth/mfa/recovery/page.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/app/auth/mfa/setup/page.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_User_Factor_Setup_Manage_Flow_Coverage_v1.md`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_Factor_Lifecycle_UAT_Coverage_v1.md`
- Validation readback for this slice:
  - `git diff --check`: pass
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFA' -count=1`: pass
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFAPostureLinksProviderAccountWhenIssuerConfigured|TestV3AccountMFAPostureUsesValidatedTokenClaims|TestV3AccountMFAProviderReadbackObservesKeycloakFactors' -count=1`: pass
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web typecheck`: pass
  - `BUILDX_CONFIG=/tmp/docker-buildx DOCKER_CONFIG=/tmp/docker-config bash scripts/ops/kind_fast_deploy.sh api`: pass
  - `BUILDX_CONFIG=/tmp/docker-buildx DOCKER_CONFIG=/tmp/docker-config bash scripts/ops/kind_fast_deploy.sh web`: pass
  - focused Vitest remains locally blocked by the existing optional Rollup
    dependency issue: `Cannot find module @rollup/rollup-darwin-arm64`
- Current narrowed gap after this slice:
  - the product now has an honest recovery entry and honest setup/backup
    wording, but the broader product-owner complaint remains unresolved until
    Account Security reliably reflects existing enrolled factors after provider
    return and until the recovery/remove queue fulfillment path is proven.


## 2026-06-16 MFA Provider Return Flow Slice
- Fairway task: `UAT-BUG-IAM-MFA-PROVIDER-RETURN-FLOW-001`
- Evidence: `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-provider-return-flow-20260616/validation-summary.md`
- Result: MFA OIDC callback now returns product-owned states to `/account/security` instead of falling through the generic sign-in path.
- Success path: `setup_complete`
- Provider cancel path: `setup_cancelled`
- Provider error path: `setup_error`
- Validation:
  - focused vitest: pass
  - web typecheck: pass
  - kind web deploy: pass
- Remaining gaps after this slice:
  - provider factor readback/refresh proof
  - recovery/remove server-error path and end-to-end UAT proof


## 2026-06-16 MFA Recovery Server Fix Slice
- Evidence: `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-recovery-flow-20260616/server-fix/validation-summary.md`
- Root cause: account MFA recovery request handler resolved only raw internal user id; real kind flow carries project context and OIDC subject resolution.
- Fix: when `X-Project-ID` is present, recovery handler now resolves project scope first and uses `scope.UserID`; no-project fallback remains unchanged.
- Validation:
  - focused API unit test: pass
  - integration-tag recovery tests: pass
  - kind api deploy: pass
- Remaining gap: browser UAT still needs confirmation that recovery/remove no longer throws the server-side correlation error.

## 2026-06-16 MFA Remove / Recovery Contract Gate Split

- Active Fairway task:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`
- Evidence updated:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-remove-disable-flow-20260616/validation-summary.md`
- Product/UAT change:
  - extracted MFA account-security contract coverage into a dedicated Playwright
    spec:
    - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-account-mfa.spec.ts`
  - removed those MFA tests from the mixed storage/access/account suite to stop
    storage regressions from masking MFA product-flow proof.
  - registered the dedicated spec in:
    - `/Users/subash/dev/GPUasService/packages/web/playwright.config.ts`
- Validation:
  - `CI=true E2E_SPEC='v3-account-mfa.spec.ts' bash scripts/ci/frontend_e2e.sh full`: pass
  - managed harness phase summary:
    - base stack: pass
    - db reset: pass
    - app stack: pass
    - readiness: pass
    - v3 shell health: pass
    - playwright: pass
  - artifacts:
    - `/Users/subash/dev/GPUasService/.ci-artifacts/frontend-e2e-playwright.log`
    - `/Users/subash/dev/GPUasService/.ci-artifacts/playwright-report/index.html`
    - `/Users/subash/dev/GPUasService/dist/ci-timing/frontend_e2e.tsv`
- Current next action:
  - deploy the current web slice to kind and re-check the live dev-admin MFA
    journey against the dedicated product-flow expectations:
    - existing factor is visible
    - primary CTA is `Add backup`
    - removal is `Request removal`
    - recovery request submits without server error
    - no raw provider-branding or internal-implementation language leaks into
      the user flow

## 2026-06-16 late-night MFA durability update

- Active Fairway lane for the current continuation is:
  `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`
  The recovery-flow lane is already `done`; do not keep writing new live proof
  there unless the scope genuinely reopens recovery-only acceptance.
- Real product bug fixed in the deployed API:
  - recovery request creation on kind returned `202`, but Account Security
    refresh did not persist `submitted` recovery state;
  - root cause was MFA audit rows being excluded from the latest-audit readback
    that drives account-security recovery snapshot state;
  - fix landed in:
    - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels.go`
    - `/Users/subash/dev/GPUasService/cmd/api/routes_integration_test.go`
  - focused unit + integration tests passed;
  - kind `api` redeploy passed;
  - live kind readback now shows:
    - `recovery.state=submitted`
    - request id `1ab4920e-3257-4a90-a2b2-07014ecf311c`
- Live kind no-factor MFA product UAT now has durable PASS evidence through the
  tmux execution lane, not the Desktop surface:
  - tmux session/window: `gpuaas-git:41`
  - explicit persona:
    - `MFA_PRODUCT_UAT_USERNAME=dev-platform-admin`
    - `MFA_PRODUCT_UAT_PASSWORD=platform123`
  - command:
    - `bash scripts/ops/mfa_product_env_uat.sh --env kind --output-dir dist/uat/mfa-product/tmux-run/current-kind-live-explicit`
  - result: `TMUX_EXIT:0`
  - artifacts:
    - `/Users/subash/dev/GPUasService/dist/uat/mfa-product/tmux-run/current-kind-live-explicit/summary.md`
    - `/Users/subash/dev/GPUasService/dist/uat/mfa-product/tmux-run/current-kind-live-explicit/result.json`
    - `/Users/subash/dev/GPUasService/dist/uat/mfa-product/tmux-run/current-kind-live-explicit/playwright.log`
- Important interpretation to retain:
  - no-factor product proof should use `dev-platform-admin/platform123`;
  - factor-present proof for `dev-admin` needs either:
    - manual browser UAT from the enrolled account, or
    - live automation with a real `E2E_V3_LIVE_MFA_TOTP_SECRET`;
  - password-grant failure against an enrolled-factor account is expected and
    is not itself a product bug.
- Next product gap is narrower now:
  - factor-present lifecycle still needs proof that existing enrolled users see
    the right actions (`Add backup`, recovery/removal path),
  - do not regress into the raw provider setup page as if no factor exists,
  - removal/reset fulfillment and operator-facing proof still remain after the
    user-facing factor-present path is made coherent.

## 2026-06-16 factor-present manage-flow continuation

- Fairway execution lane moved from the closed recovery slice back to the real
  active gap:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-MANAGE-FLOW-001`
- Product-owner correction implemented:
  - existing enrolled-factor CTA must land on a product-owned AI Cloud manage
    surface, not directly in raw provider setup;
  - added:
    - `/Users/subash/dev/GPUasService/packages/web/app/auth/mfa/manage/page.tsx`
  - account-security now splits actions:
    - top CTA for factor-present users: `Manage MFA`
    - backup add CTA inside lifecycle card: `Add backup`
    - remove/recovery deep links route through product-owned manage flow
- Local code changes in this continuation:
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels.go`
  - `/Users/subash/dev/GPUasService/cmd/api/routes_v3_readmodels_test.go`
  - `/Users/subash/dev/GPUasService/packages/web/app/auth/mfa/manage/page.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-subpages.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/src/components/v3/v3-account-security-sessions.test.tsx`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-account-mfa.spec.ts`
  - `/Users/subash/dev/GPUasService/packages/web/e2e/v3-live-mfa-product.spec.ts`
- Real defect caught during focused proof:
  - manage-page remove/recovery links originally appended `mfa_panel` to the
    login URL instead of preserving it inside the encoded `next` target;
  - fixed so unauthenticated manage-page links now preserve:
    - `/auth/login?next=%2Faccount%2Fsecurity%3Fmfa_panel%3Dremove`
    - `/auth/login?next=%2Faccount%2Fsecurity%3Fmfa_panel%3Drecovery`
- Validation now green for this slice:
  - `pnpm --dir /Users/subash/dev/GPUasService/packages/web typecheck`: pass
  - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -run 'TestV3AccountMFAPostureLinksProviderAccountWhenIssuerConfigured|TestV3AccountMFAPostureUsesValidatedTokenClaims|TestV3AccountMFAProviderReadbackObservesKeycloakFactors' -count=1`: pass
  - `CI=true E2E_SPEC='v3-account-mfa.spec.ts' bash scripts/ci/frontend_e2e.sh full`: pass
- Current recommended next action:
  - deploy `api,web` to kind,
  - then verify the enrolled `dev-admin` live flow in kind:
    - existing factor is shown,
    - main CTA is `Manage MFA`,
    - backup CTA is `Add backup`,
    - removal opens request/recovery path instead of raw TOTP setup,
    - no internal provider branding leaks into the user journey.

## 2026-06-16 post-deploy doc alignment

- The factor-present manage-flow slice is now deployed on kind for both `api`
  and `web` at commit `148ef67974582fa0314510f813c2d241853f4202`.
- Canonical MFA flow/UAT docs were refreshed so they no longer overstate
  completion:
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_User_Factor_Setup_Manage_Flow_Coverage_v1.md`
  - `/Users/subash/dev/GPUasService/doc/operations/MFA_Factor_Lifecycle_UAT_Coverage_v1.md`
  - `/Users/subash/dev/GPUasService/doc/operations/Product_Quality_Flow_Coverage_Matrix_v1.md`
- Current hard boundary remains the same:
  - live enrolled-factor kind proof for `dev-admin`
  - prove factor-present UI shows `Manage MFA`
  - prove backup enrollment stays `Add backup`
  - prove removal/recovery do not bounce into raw provider setup as if no
    factor exists

## 2026-06-16 live harness execution-path fix

- Found and fixed a real UAT harness gap:
  - `/Users/subash/dev/GPUasService/scripts/ops/mfa_product_env_uat.sh`
    previously only supported the no-factor posture slice;
  - it could not drive the enrolled-factor browser journey even though
    `packages/web/e2e/v3-live-mfa-product.spec.ts` already supports real OIDC
    + TOTP when a secret is provided.
- The harness now supports two explicit modes:
  - default no-factor posture mode;
  - enrolled-factor OIDC mode when `MFA_PRODUCT_UAT_TOTP_SECRET` is set.
- It also now passes separate platform-IAM personas through:
  - `MFA_PRODUCT_UAT_PLATFORM_USERNAME`
  - `MFA_PRODUCT_UAT_PLATFORM_PASSWORD`
- Exact next live proof command for kind once the enrolled-factor secret is
  available:
  - `MFA_PRODUCT_UAT_USERNAME=dev-admin MFA_PRODUCT_UAT_PASSWORD=admin123 MFA_PRODUCT_UAT_TOTP_SECRET=<secret> MFA_PRODUCT_UAT_PLATFORM_USERNAME=dev-platform-admin MFA_PRODUCT_UAT_PLATFORM_PASSWORD=platform123 bash scripts/ops/mfa_product_env_uat.sh --env kind`

## 2026-06-16 late deploy and proof continuation

- Deferred for later by explicit task:
  - `PRODUCT-GAP-IAM-MFA-WORKFLOW-FIRST-HELP-AFFORDANCE-001`
  - purpose: move verbose MFA instructional prose into cleaner help affordances
    (`?` / contextual help) instead of keeping explanation text inline on the
    primary user workflow surface.
- Real live kind factor-lifecycle proof is now durable for the current MFA
  product slice:
  - committed product SHA: `4dc64e41dffd3fc53b99d35ca113c2ca14464794`
    (`iam: finish MFA account lifecycle flows`)
  - tmux/browser evidence:
    `/Users/subash/dev/GPUasService/dist/uat/mfa-product/live-bootstrap-dev-user-rerun4/summary.md`
  - result: `PASS`
  - proven path:
    - clean no-factor user bootstrap through provider setup
    - post-enrollment return to Account Security
    - factor-present product-managed `Manage MFA`
    - backup/recovery/remove affordance visibility
    - Platform IAM readiness surface still renders
- Important provider/account truth at this point:
  - `dev-user` is the controlled live MFA bootstrap + factor-present proof
    persona.
  - `dev-platform-admin` remains the no-factor/platform-readiness persona.
  - `dev-admin` no longer serves as the reliable factor-present proof persona;
    current provider truth there is no-factor plus submitted recovery history.
- Local focused web unit test lane is currently blocked by host dependency
  drift, not product logic:
  - `pnpm exec vitest ...` fails locally because
    `@rollup/rollup-darwin-arm64` is missing from the current `packages/web`
    install on this machine.
  - Treat this as a local harness/install defect unless reproduced in CI.
- Dev deploy boundary moved to a clean detached worktree because the shared
  repo stays intentionally dirty during parallel work:
  - clean worktree path:
    `/tmp/gpuaas-dev-deploy-4dc64e41`
  - release branch push succeeded:
    `release/platform-control -> 4dc64e41dffd3fc53b99d35ca113c2ca14464794`
  - first deploy attempt failed only because the temp worktree did not have
    the local GitLab env file; rerun with
    `GITLAB_ENV_FILE=/Users/subash/dev/GPUasService/.env.gitlab.local`
    succeeded in triggering CI.
- Pipeline `2682` for `4dc64e41` failed in `build_test`; deterministic
  monitor evidence is in:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2682-mfa-dev/summary.md`.
- Fix commit:
  - `038ecd1c458c8d41196ec67bf13cfbcffed10016`
  - message: `ci: close MFA build-test partial slice`
  - local validation before trigger:
    - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -count=1`
    - `pnpm --dir packages/web exec next build`
    - `git diff --check` for committed slice
- Pipeline `2684` for `038ecd1c458c8d41196ec67bf13cfbcffed10016`
  failed in `backend_build_and_tests` on
  `scripts/ci/v3_namespace_retirement_guard.sh`.
  - failing trace showed backend code still contained a retired
    `/api/v1/v3/` namespace literal in `cmd/api/routes_v3_readmodels.go`;
    this matched the already identified v3 retirement cleanup gap.
  - deterministic monitor artifact root:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2684-mfa-ci-fix`
- Fix commit:
  - `d44788cb4014d34f4bbf7e310daf353b01635f83`
  - message: `ci: avoid retired v3 namespace literal`
  - exact fix: keep compatibility alias behavior but derive the legacy alias
    string from `canonicalPrefix + "v3/"` so canonical backend code no longer
    emits the retired namespace literal.
  - local validation:
    - `bash scripts/ci/v3_namespace_retirement_guard.sh`
    - `GOCACHE=/tmp/gpuaas-go-build go test ./cmd/api -count=1`
    - `GOCACHE=/tmp/gpuaas-go-build bash scripts/ci/ci_script_smoke.sh`
- Active CI-first pipeline:
  - pipeline id: `2686`
  - ref: `release/platform-control`
  - SHA: `d44788cb4014d34f4bbf7e310daf353b01635f83`
  - monitor artifact root:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2686-v3-namespace-fix`
- tmux execution lanes now active for this continuation:
  - deploy window: `gpuaas-git:dev-mfa-deploy2`
  - pipeline watch window: `gpuaas-git:dev-mfa-watch`
- Immediate next action from this state:
  - completed: pipeline `2686` passed for
    `d44788cb4014d34f4bbf7e310daf353b01635f83`.
  - completed: deploy pipeline `2687` passed for the same SHA against
    `release/platform-control` / dev-control RKE2.
    - monitor artifact:
      `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2687-dev-deploy-d447/summary.md`
  - completed: dev MFA product UAT no-factor posture passed from the tmux
    execution lane after the Desktop/browser surface reproduced the known
    macOS `mach_port_rendezvous` launch permission failure.
    - passing UAT summary:
      `/Users/subash/dev/GPUasService/dist/uat/mfa-product/dev-d44788cb-no-factor-tmux/summary.md`
    - passing UAT log:
      `/Users/subash/dev/GPUasService/dist/uat/mfa-product/dev-d44788cb-no-factor-tmux/playwright.log`
    - result: `2 passed`
  - remaining MFA product readiness work is not "deploy failed"; it is the
    product-flow closure work still tracked in Fairway:
    - enrolled-factor proof for dev requires a dev TOTP secret/persona;
    - recovery/remove queue fulfillment and admin/operator execution flow need
      product/UAT closure;
    - provider unavailable/readback edge cases and service-version visibility
      remain queued product/ops tasks.

### 2026-06-16 CI-first deploy correction

- Promotion rule clarified in `doc/operations/Environment_Promotion_Policy.md`:
  deploy only an exact source SHA with green CI evidence; any later fix commit
  restarts the CI-first sequence before deploy.
- Current exact SHA evidence:
  - source SHA: `d44788cb4014d34f4bbf7e310daf353b01635f83`
  - CI pipeline `2686`: passed on `release/platform-control`
  - dev deploy pipeline `2687`: passed for the same SHA
  - dev MFA product no-factor UAT: passed from tmux at
    `dist/uat/mfa-product/dev-d44788cb-no-factor-tmux/summary.md`
- Remaining MFA product gaps are not CI/deploy gaps: enrolled-factor manage,
  remove/disable, and recovery fulfillment still need product/UAT proof.

### 2026-06-17 MFA product readiness CI/deploy continuation

- Current scoped MFA product readiness commit:
  - `473eebdb72ee906a29f59280b5d97b0f9791ac70`
  - message: `iam: close MFA product readiness gaps`
- CI-first sequencing correction:
  - pipeline `2688` was triggered before the SHA was reachable from
    `origin/master`; branch guard correctly failed.
  - pushed `master` to origin; remote `master` and
    `release/platform-control` both resolved to `473eebdb72ee906a29f59280b5d97b0f9791ac70`.
  - validation pipeline `2690` passed for `473eebdb`.
  - monitor artifact root:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2690-mfa-product-readiness-473eebdb`
- Dev deploy retry:
  - pipeline `2691` failed in
    `platform_control_publish_release_artifacts` because the remote CI job
    could not authenticate to `hpcadmin@100.90.157.34`.
  - reran as pipeline `2692` for the same source SHA with
    `DEV_CONTROL_RKE2_SSH_PRIVATE_KEY_B64` from local
    `/Users/subash/.ssh/gpuaas-dev-control-rke2-cd`.
  - `2692` passed publish/deploy far enough to patch remote dev RKE2 runtime
    images and start remote validation.
  - `2692` then failed in
    `platform_control_remote_validation_authz.sh` because
    `scripts/ops/role_authz_smoke.sh` still expected
    `adminBindUserPlatformRole` to return `200`.
  - deployed product behavior was correct for current MFA sensitive-operation
    gate: API returned `403 step_up_required` and did not mutate the role set.
- Harness fix in progress:
  - file: `scripts/ops/role_authz_smoke.sh`
  - change: accept either successful bind/revoke/audit when no step-up gate is
    active, or canonical `step_up_required` with unchanged role list when the
    sensitive-operation gate blocks mutation.
  - local validations passed:
    - `bash -n scripts/ops/role_authz_smoke.sh`
    - `git diff --check -- scripts/ops/role_authz_smoke.sh`
    - `API_BASE_URL=https://aicloud-dev-api.core42.dev KEYCLOAK_BASE_URL=https://aicloud-dev-auth.core42.dev bash scripts/ops/role_authz_smoke.sh`
  - dev live smoke result after fix:
    `OK: bind denied by MFA step-up gate without mutating roles`.
- Next required sequence:
  - completed: committed only `scripts/ops/role_authz_smoke.sh` as
    `540085ff982e9939295045803383effe6c9bb248`
    (`ci: accept MFA step-up role authz smoke`).
  - completed: pushed `master` and promoted `release/platform-control` to
    `540085ff982e9939295045803383effe6c9bb248`.
  - completed: validation pipeline `2694` passed for `540085ff`.
    - monitor artifact:
      `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2694-mfa-stepup-smoke-540085ff/summary.md`
    - Fairway task `CD-FIX-MFA-STEPUP-ROLE-AUTHZ-SMOKE-001` marked `done`.
  - completed: dev deploy pipeline `2695` passed for the same green SHA
    `540085ff982e9939295045803383effe6c9bb248`.
    - monitor artifact:
      `/Users/subash/dev/GPUasService/.fairway/artifacts/local-ci-monitor-2695-dev-deploy-540085ff/summary.md`
    - remote validation included `platform_control_remote_validation_authz.sh`
      passing with the expected MFA `step_up_required` sensitive-operation gate
      behavior.
    - Fairway evidence recorded on
      `IAM-MFA-KIND-DEV-UAT-DEPLOY-CLOSEOUT-001`.
  - next: run dev MFA UAT/readback on that exact deployed SHA and record
    evidence before moving to docs portal/staging/demo work.

### 2026-06-17 dev deploy and UAT closeout for 540085ff

- Dev deploy pipeline `2695` passed for exact source SHA
  `540085ff982e9939295045803383effe6c9bb248`.
- Dev API runtime metadata readback proves the running API reports:
  - commit: `540085ff982e9939295045803383effe6c9bb248`
  - pipeline: `2695`
  - deployed_at: `2026-06-17T14:59:11Z`
  - artifact:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/iam-mfa-dev-uat-540085ff/dev-runtime-metadata.sanitized.json`
- Dev MFA product UAT passed in tmux against the deployed dev environment:
  - mode: `no-factor-posture`
  - user/persona: `dev-platform-admin`
  - command:
    `MFA_PRODUCT_UAT_USERNAME=dev-platform-admin MFA_PRODUCT_UAT_PASSWORD=platform123 MFA_PRODUCT_UAT_PLATFORM_USERNAME=dev-platform-admin MFA_PRODUCT_UAT_PLATFORM_PASSWORD=platform123 bash scripts/ops/mfa_product_env_uat.sh --env dev --output-dir dist/uat/mfa-product/dev-540085ff-no-factor-tmux`
  - result: `2 passed`
  - summary:
    `/Users/subash/dev/GPUasService/dist/uat/mfa-product/dev-540085ff-no-factor-tmux/summary.md`
  - log:
    `/Users/subash/dev/GPUasService/dist/uat/mfa-product/dev-540085ff-no-factor-tmux/playwright.log`
- Fairway evidence recorded on
  `IAM-MFA-KIND-DEV-UAT-DEPLOY-CLOSEOUT-001`.
- Remaining MFA product proof gap:
  - dev enrolled-factor proof still requires a real TOTP secret/persona;
  - no-factor product posture and Platform IAM readiness are green on dev for
    the deployed SHA.

### 2026-06-17 service-version/readback surface closeout

- Completed `OPS-PLATFORM-SERVICE-VERSION-SURFACE-001` implementation slice.
- Root finding: `/api/v1/platform/ops/registry/environment-artifacts` and the
  Platform UI page already existed, but the platform evidence API/store dropped
  structured `details` JSON. Because the registry read model projects service
  version rows from `platform_evidence_items.details`, the page stayed empty
  after deploy.
- Implemented:
  - evidence item API/store now preserves sanitized `details` JSON;
  - OpenAPI/Go/TypeScript generated contracts include evidence item `details`;
  - `PLATFORM_EVIDENCE_EXTRA_ITEMS_JSON` lets CI include structured evidence
    rows in generated platform evidence payloads;
  - new `scripts/ci/platform_environment_artifacts_from_release_env.sh`
    converts platform-control `GPUAAS_*_IMAGE` release env metadata into
    `environment_artifact_inventory` evidence rows for API/web/workers where
    available;
  - post-deploy smoke runbook now requires service-version readback before an
    environment is declared updated;
  - `platform_foundation_degradation_harness.sh` now defaults `GOCACHE` to
    `/tmp/gpuaas-go-build` to avoid Desktop/macOS Go cache permission failures.
- Validation passed:
  - `make codegen`
  - `GOCACHE=/tmp/gpuaas-go-build go test ./packages/platform/evidence ./packages/platform/registry ./cmd/api -run 'Test.*PlatformEvidence|Test.*RegistryEnvironment|TestRuntimeMetadata|TestRecordItemPreservesStructuredDetails' -count=1`
  - `GOCACHE=/tmp/gpuaas-go-build bash scripts/ci/platform_foundation_degradation_harness.sh /tmp/gpuaas-degradation-harness-check`
  - `GOCACHE=/tmp/gpuaas-go-build bash scripts/ci/ci_script_smoke.sh`
  - scoped `git diff --check`
- Fairway evidence recorded:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/ops-platform-service-version-surface-20260617/validation-summary.md`.
- Fairway task status: `OPS-PLATFORM-SERVICE-VERSION-SURFACE-001` is `done`.
  Review domains still show as missing because the queue entry requires
  backend/frontend/ops review; do not stop on this unless merge/deploy gating
  requires it.

### Next ready program items

Fairway ready queue after the service-version slice:

1. `OPS-STAGING-TWO-NODE-REPEATABLE-SETUP-001` - build repeatable two-node
   staging setup.
2. `OPS-DEMO-FRESH-ENV-SUPPORTED-APPS-UAT-001` - create fresh demo environment
   with supported apps and UAT.
3. Legacy queue migration audit/import tasks - important to avoid losing older
   edge/provider/product-quality work, but secondary to staging/demo unless an
   environment task needs that context.

### 2026-06-17 MFA fulfillment scope reset

- Closed the stale broad blocker around existing-factor manage flow:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-MANAGE-FLOW-001` is now `done`
  - closeout artifact:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-manage-flow-20260617/live-kind-closeout.md`
- Re-read current Fairway/task state and confirmed the remaining MFA product
  gap is no longer frontend UX. The open blocker is support/admin fulfillment
  for submitted factor-removal and lost-factor reset requests.
- Created a narrower Fairway task:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-FULFILLMENT-FLOW-001`
  - scope artifact:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/mfa-factor-fulfillment-flow-20260617/scope.md`
- Key dependency discovered from the current Platform IAM readiness contract:
  - `reset_factor` is explicitly disabled until a separate mutation contract is
    approved (`cmd/api/routes_v3_readmodels_platform.go`).
- Created follow-up dependency task:
  - `IAM-MFA-FACTOR-RESET-MUTATION-CONTRACT-001`
- Product-owner interpretation to preserve:
  - user-facing setup/manage/recovery-request/provider-return/branding slices
    are complete for the current kind/dev product surface;
  - remaining remove/disable blockage is operator fulfillment and before/after
    reset proof, not more account-security UI work;
  - do not reopen live MFA or disposable preflight for this gap.

### 2026-06-17 MFA fulfillment sanitized-input-prep

- Added checked-in offline input-prep helper for fulfillment proof:
  - `scripts/ops/keycloak_mfa_factor_fulfillment_input_prep.sh`
  - `scripts/ops/keycloak_mfa_factor_fulfillment_input_prep_test.sh`
- Grouped smoke updated to include the new helper/test:
  - `scripts/ci/ci_script_smoke.sh`
- Validation artifact:
  `/Users/subash/dev/GPUasService/.fairway/artifacts/harness-fix-mfa-factor-fulfillment-sanitized-input-prep-20260617/validation-summary.md`
- Important boundary:
  - this slice prepares or validates sanitized fixture/sample inputs only;
  - it does **not** produce accepted fulfillment proof;
  - remaining blocker is still explicit authorization for one bounded non-live
  fulfillment proof run using the existing runner packet and decision
  artifact.

### 2026-06-17 MFA fulfillment authorization refresh

- Refreshed the control-ready decision artifact so the next authorization does
  not point at stale pre-input-prep tooling:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/iam-mfa-factor-fulfillment-packet-20260617/nonlive_factor_fulfillment_proof_authorization_decision_9725f0e5.md`
- New exact authorization basis:
  - commit/HEAD: `9725f0e5236dc0f3f14d9a5c89d15ecd5a64dc1b`
  - input-prep validator:
    `scripts/ops/keycloak_mfa_factor_fulfillment_input_prep.sh`
  - runner:
    `scripts/ops/keycloak_mfa_factor_fulfillment_proof_runner.sh`
- Shared remaining blocker for:
  - `PRODUCT-GAP-IAM-MFA-FACTOR-FULFILLMENT-FLOW-001`
  - `PRODUCT-GAP-IAM-MFA-FACTOR-REMOVE-DISABLE-FLOW-001`
  - `PRODUCT-GAP-IAM-MFA-FACTOR-RECOVERY-FLOW-001`
- No execution occurred. This is still packet-only until Architecture
  Control/user explicitly authorizes one bounded non-live fulfillment proof
  run.

### 2026-06-17 docs portal role-route deepening

- Continued the docs-portal audience closeout work with a role-usability pass
  instead of only adding more inventory pages.
- Core structural fix from earlier in the day remains in place:
  source docs listed in portal `SourceList` blocks are mirrored into
  `packages/docs/static/portal/source-docs/**` and rendered as clickable links,
  so summary pages no longer dead-end at raw path text.
- Added a new developer-native architecture route:
  - portal page:
    `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/developer-implementation-map/index.mdx`
  - purpose:
    give engineers one entry for contracts, domain/route ownership,
    runtime/C4 shape, state-machine constraints, sequence authority,
    ER/schema ownership, and definition-of-done.
- Important content choice captured:
  - the portal now explicitly treats `doc/architecture/Sequence_Flows.md` as
    prototype/reference-only and routes current implementation readers to the
    review-pack/system-overview/detailed-design path first.
- Strengthened role entry pages so they act as decision routes:
  - developer:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/developer-handoff/index.mdx`
  - operations:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/operations-handoff/index.mdx`
  - security/CISO:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/security-assurance/index.mdx`
  - IAM/identity:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/iam-identity/index.mdx`
  - product:
    `/Users/subash/dev/GPUasService/packages/docs/docs/product/team-handoff/index.mdx`
- Architecture landing page and sidebar now surface the developer route:
 - Architecture landing page and sidebar now surface the developer route:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- Validation rerun after the role-route deepening:
  - `git diff --check` on the changed docs files: pass
  - `make -C /Users/subash/dev/GPUasService docs-portal-check`: pass

### 2026-06-17 governance import batch and stale demo cleanup

- Completed legacy governance/import batch for:
  - `GOV-LEGACY-QUEUE-IMPORT-EDGE-PROXY-001`
  - `GOV-LEGACY-QUEUE-IMPORT-COORDINATION-TOOLING-001`
- Mapping artifacts:
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/legacy-queue-import-edge-proxy-20260617/mapping.md`
  - `/Users/subash/dev/GPUasService/.fairway/artifacts/legacy-queue-import-coordination-tooling-20260617/mapping.md`
- Imported current Fairway tasks from legacy queue:
  - `OPS-EDGE-POMERIUM-CONSOLIDATION-SWEEP-001`
  - `OPS-EDGE-NOTIFICATION-WS-AUTH-MODE-001`
  - `OPS-EDGE-TERMINAL-WS-HOST-PARITY-001`
  - `GOV-FAIRWAY-STRUCTURED-QUEUE-STORE-001`
  - `GOV-FAIRWAY-WATCHER-LANE-MODEL-001`
  - `GOV-FAIRWAY-WATCHER-PREFLIGHT-HELPER-001`
  - `GOV-FAIRWAY-CONTEXT-PACKET-HELPER-001`
- Demo stale-state cleanup:
  - `DEMO-API-CANONICAL-ROUTE-READINESS-001` was reset from stale
    `in_progress` to `todo`
  - stale session ended:
    `architecture-control-demo-route-readiness-20260617`
- Intent to preserve:
  - MFA remains externally blocked on the bounded non-live fulfillment proof
    authorization decision and is not the current ready queue;
  - docs-portal ownership/status hygiene stays with Architecture Control, not
    this governance import batch.
- Follow-up role-route deepening also covered:
 - Follow-up role-route deepening also covered:
  - infra/environment route:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/infra-environments/index.mdx`
  - shared-platform builders route:
    `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/shared-platform-builders/index.mdx`
  - internal-teams landing page now points internal developers to the new
    developer implementation map
  - external developer and external architecture pages now include explicit
    decision routes rather than generic start lists
- Portal front-door tightening after that:
  - platform overview:
    `/Users/subash/dev/GPUasService/packages/docs/docs/platform-overview/index.mdx`
  - product landing:
    `/Users/subash/dev/GPUasService/packages/docs/docs/product/index.mdx`
  - user landing:
    `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/index.mdx`
  - these now route by reader decision/task instead of acting as section
    summaries only
- Publication/access model captured in-repo for the current internal portal:
  - Cloudflare Access with one-time email passcode
  - allowed domains `core42.ai`, `g42.ai`
  - named exception `subahsram@gmail.com`
  - documented in:
    `/Users/subash/dev/GPUasService/doc/operations/Docs_Portal_Static_Cloudflare_Deployment_v1.md`
    `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/publication-tracks/index.mdx`
- New portal-native hierarchy and code-layer coverage added on 2026-06-17:
  - product hierarchy page:
    `/Users/subash/dev/GPUasService/packages/docs/docs/product/tenant-project-hierarchy/index.mdx`
  - architecture layer page:
    `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/code-structure-layers/index.mdx`
  - intent:
    remove two recurring documentation gaps:
    1. clean explanation of organization/tenant -> department -> project ->
       resource/principal ownership and billing attribution
    2. clean explanation of repo layers, package boundaries, route ownership,
       and what belongs in `shared` vs `platform` vs `products`
  - cross-links added from product and architecture landing pages so these are
    first-read pages, not hidden deep links
  - validation after this batch:
    `git diff --check` on the changed docs files: pass
    `make -C /Users/subash/dev/GPUasService docs-portal-check`: pass

### 2026-06-17 GPU slicing and scheduler-layer portal coverage

- Added new architecture page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/gpu-slicing-scheduling/index.mdx`
- The portal now explains:
  - `baremetal` vs `gpu_slice` capacity shapes
  - approved slot inventory vs naive GPU-count scheduling
  - control-plane ownership of SKU intent, placement, claims, and
    reconciliation
  - node-plane ownership of topology discovery, prerequisite checks, runtime
    execution, and cleanup proof
  - control-plane scheduler vs node/app runtime scheduler as separate layers
- Cross-links added from:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/product/launch-allocation-runtime/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/operators/node-lifecycle/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- Validation:
  - `git diff --check`: pass
  - `make docs-portal-check`: pass

### 2026-06-17 portal proof-point and status recalibration pass

- Added new proof-surfacing page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/platform-proof-points/index.mdx`
- This closes a recurring portal calibration issue:
  - the portal was reading like a readiness self-audit even where the repo
    already proves shipped platform capability
  - strongest proof now surfaced explicitly:
    - Slurm reference controller
    - RKE2 self-managed controller
    - platform boundary guard
    - node-agent runtime depth
    - audit/ledger custody primitives
- Re-leveled clearly shipped portal pages from `designed` to `implemented`:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/platform-overview/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/platform-strengths/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/app-sdk-overview/index.mdx`
- Corrected the competitive framing so the portal no longer claims app-instance
  lifecycle is simply absent; it now distinguishes shipped reference-controller
  proof from broader self-service app-platform maturity:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/product/competitive-context/index.mdx`
- Cross-links added from:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- Validation:
  - `git diff --check`: pass
  - `make docs-portal-check`: pass

### 2026-06-17 visual and runtime-depth docs pass

- Added a deeper runtime page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/node-agent-runtime-depth/index.mdx`
- This page gives the missing high-signal explanation of:
  - node-agent authority boundary
  - lifecycle/task execution vs terminal execution domains
  - current host-runtime responsibilities
  - why slice support makes node-agent a serious platform subsystem
- Added stronger visuals where “one picture is worth pages”:
  - proof-map diagram on:
    `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/platform-proof-points/index.mdx`
  - composition model diagram on:
    `/Users/subash/dev/GPUasService/packages/docs/docs/build-on-gpuaas/app-sdk-overview/index.mdx`
- Cross-links added from:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/operators/node-lifecycle/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- Intent:
  - reduce prose-only reading burden for architecture/security/ops/product
  - make runtime depth and platform proof legible at a glance
  - support external readers seeing this as real platform work, not a weekend
    project
- Validation:
  - `git diff --check`: pass
  - `make docs-portal-check`: pass

### 2026-06-17 docs backlog recalibration pass

- Recalibrated the portal so it no longer presents the internal docs portal as
  an unfinished concept when the implementation, quality gates, and static
  publication path already exist.
- Updated landing and roadmap state:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/start-here/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/publication-tracks/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/maintenance/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/execution-roadmap/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/epics-backlog/index.mdx`
- Behavioral change in the documentation itself:
  - home page and start page are now marked `implemented`, not `designed`
  - portal build/version metadata is surfaced directly on the landing path via
    the existing `PortalBuildInfo` component
  - roadmap pages now distinguish:
    - implemented internal portal foundation
    - implemented publication/deploy/gate baseline
    - remaining future work as external-track filtering and deeper visuals,
      not first-time portal creation
  - backlog page now points at the active platform-foundation Fairway queue,
    not the old docusaurus-only backlog framing
- Validation:
  - `git diff --check -- packages/docs/docs/index.mdx packages/docs/docs/start-here/index.mdx packages/docs/docs/portal-roadmap/publication-tracks/index.mdx packages/docs/docs/portal-roadmap/maintenance/index.mdx packages/docs/docs/portal-roadmap/execution-roadmap/index.mdx packages/docs/docs/portal-roadmap/epics-backlog/index.mdx`: pass
  - `make -C /Users/subash/dev/GPUasService docs-portal-check`: pass

### 2026-06-17 docs follow-on pass: publication, visuals, audience polish

- Finished the next three portal backlog themes in one sequence:
  1. external publication filtering and deploy-posture clarification,
  2. deeper runtime/ops/developer/security decision-flow visuals,
  3. product/security/developer audience polish.
- Publication/external-track docs updated to reflect real current state:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/publication-filtering/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/external-readiness/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/external-viewers/index.mdx`
- Key behavioral corrections:
  - these pages now say the internal publication baseline is implemented,
    not still merely designed;
  - they describe the real posture: metadata and publication checks exist now,
    separate customer/partner/public filtered builds are the next stage;
  - publication filtering page now surfaces portal build metadata directly.
- Deeper visual/readability pass landed on:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/developer-handoff/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/internal-teams/security-assurance/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/operators/production-deployment-model/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/security-readiness/index.mdx`
- Added practical decision-flow diagrams for:
  - developer navigation/model selection,
  - security review routing,
  - deployment responsibility/promotion flow,
  - control-to-release decision flow.
- Product-facing polish landed on:
  - `/Users/subash/dev/GPUasService/packages/docs/docs/product/index.mdx`
  - `/Users/subash/dev/GPUasService/packages/docs/docs/product/team-handoff/index.mdx`
- Product pages now work more as decision surfaces:
  - product index marked `implemented`,
  - product reader map added,
  - product handoff now includes a triage flow showing whether a problem is
    user-flow, UX/IA, live-state mismatch, readiness, or competitive pressure.
- Validation:
  - `git diff --check` on touched files: pass
  - `make -C /Users/subash/dev/GPUasService docs-portal-check`: pass

### 2026-06-17 protected internal publish boundary + MFA screenshot guide

- Orchestrator was checked before continuing docs work.
- Current orchestrator state is intentionally `idle`, not stuck:
  - remaining MFA work there is blocked on the explicit non-live fulfillment
    authorization packet:
    `/Users/subash/dev/GPUasService/.fairway/artifacts/iam-mfa-factor-fulfillment-packet-20260617/nonlive_factor_fulfillment_proof_authorization_decision_9725f0e5.md`

Docs portal first protected internal publication path:

- repo/CI/runbook path is implemented and remains the canonical first publish
  path:
  - hostname: `docs.aicloud.core42.dev`
  - publication track: `internal`
  - protection model: Cloudflare Access one-time email passcodes
  - current coarse allowlist:
    - `@core42.ai`
    - `@g42.ai`
    - `subahsram@gmail.com`
- durable sources:
  - `/Users/subash/dev/GPUasService/scripts/ops/docs_portal_publish_cloudflare_pages.sh`
  - `/Users/subash/dev/GPUasService/scripts/ci/docs_portal_static_deploy_preflight.sh`
  - `/Users/subash/dev/GPUasService/doc/operations/Docs_Portal_Static_Cloudflare_Deployment_v1.md`
  - `.gitlab-ci.yml` job `docs_portal_publish_internal`

Important execution boundary:

- this control surface can prove the repo-side publish path, but it cannot
  perform a real Cloudflare upload without external runtime prerequisites;
- current shell readback showed no publish env present;
- local publish still requires:
  - `wrangler` in `PATH`
  - `CLOUDFLARE_ACCOUNT_ID`
  - `CLOUDFLARE_PAGES_PROJECT`
  - `CLOUDFLARE_API_TOKEN`
  - `DOCS_PORTAL_HOSTNAME`
  - `DOCS_PORTAL_PUBLICATION_TRACK`

MFA screenshot-backed portal pass completed:

- updated guide:
  `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/mfa-guide/index.mdx`
- updated publication/build metadata page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/reference/portal-build/index.mdx`
- added stable current-kind screenshots:
  - `/Users/subash/dev/GPUasService/packages/docs/static/img/portal/mfa/account-security-mfa-managed-current-kind.png`
  - `/Users/subash/dev/GPUasService/packages/docs/static/img/portal/mfa/account-security-mfa-status-pending-current-kind.png`
  - `/Users/subash/dev/GPUasService/packages/docs/static/img/portal/mfa/provider-managed-setup-after-manage-click-current-kind.png`

The revised MFA guide now covers:

- account-security entry point
- first-factor setup
- protected post-enrollment state
- pending refresh state
- manage-existing-factor handoff
- recovery entry

Validation:

- `git diff --check -- packages/docs/docs/use-gpuaas/mfa-guide/index.mdx packages/docs/docs/reference/portal-build/index.mdx`: pass
- `make docs-portal-check`: pass

### 2026-06-17 first protected internal docs publication is live

First real publication path is now working end to end.

What was created/configured:

- dedicated Pages project: `aicloud-docs`
- production branch on Pages project: `internal`
- custom domain attached from Pages side: `docs.aicloud.core42.dev`
- `core42.dev` DNS updated to:
  - proxied CNAME
  - `docs.aicloud.core42.dev -> aicloud-docs.pages.dev`
- Cloudflare Access app created on the `core42.dev` zone side:
  - `AI Cloud Docs Portal`
- initial Access allow policy created:
  - `@core42.ai`
  - `@g42.ai`
  - `subashram@gmail.com`

Important topology finding:

- current working Cloudflare model is split:
  - GPUaaS local `core42.dev` env file has zone/DNS/Access capability but not
    Pages project listing (`403` on Pages project list)
  - Fairway env file has Pages capability and was used for the actual Pages
    project/create/deploy path
- this split is now documented in:
  `/Users/subash/dev/GPUasService/doc/operations/Docs_Portal_Static_Cloudflare_Deployment_v1.md`

Publish wrapper hardening completed:

- `scripts/ops/docs_portal_publish_cloudflare_pages.sh`
  now supports:
  - `DOCS_PORTAL_CLOUDFLARE_CREDS_FILE`
  - legacy env names `AccountID` / `APIToken`
  - `npx wrangler` fallback when `wrangler` is not installed globally
- `scripts/ci/docs_portal_static_deploy_preflight.sh`
  now supports the same env-file and legacy-name bridge
- env example updated:
  `/Users/subash/dev/GPUasService/doc/operations/local-dev/docs-portal-cloudflare.env.example`

Live publish result:

- deploy evidence:
  `/Users/subash/dev/GPUasService/dist/docs-portal-cloudflare-publish/summary.json`
- result: `pass`
- publication_track: `internal`
- hostname: `docs.aicloud.core42.dev`
- pages project: `aicloud-docs`
- git SHA:
  `9725f0e5236dc0f3f14d9a5c89d15ecd5a64dc1b`
- deployment URL:
  `https://f767fe16.aicloud-docs.pages.dev`

Verification:

- Pages custom-domain status:
  `active`
- unauthenticated external readback:
  `curl -I https://docs.aicloud.core42.dev/`
  returned Cloudflare Access `302` redirect to the Access login flow
- this is the correct first protected-internal posture

### 2026-06-17 shared-service deep-page expansion

Continued the architecture-depth pass so the portal no longer stops at one
shared-services overview plus only IAM/Billing detail.

New portal-native shared-service engineering pages added:

- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/audit-evidence-status-ops.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/policy-quota-entitlements.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/registry-artifacts-trust.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/secrets-pki-runtime-trust.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/notification-portal-surfaces.mdx`

Supporting portal wiring updated:

- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/shared-services/index.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/detailed-design-index/index.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- `/Users/subash/dev/GPUasService/packages/docs/docs/index.mdx`

What this pass changed:

- shared-services now reads like a real platform-service catalog rather than a
  thin conceptual summary;
- the detailed-design index now explicitly points reviewers from portal summary
  pages into the deep shared-service packet family;
- user-facing portal labels moved further toward `AI Cloud` naming:
  - portal landing page title now says `AI Cloud Documentation`;
  - sidebar labels now say `Use AI Cloud` and `Build on AI Cloud`.

Validation:

- `git diff --check`: pass
- `pnpm build` in `/Users/subash/dev/GPUasService/packages/docs`: pass

Remaining high-value portal gaps after this pass:

- product/user-side guides still need to feel less prose-heavy and more
  workflow/task-first;
- service-deep-page coverage still needs a separate node-agent / runtime-app /
  provider-lifecycle linking pass so shared services and execution surfaces read
  as one coherent system;
- outward/product-facing naming still needs a controlled sweep beyond the
  landing page and sidebar labels;
- persona navigation likely needs another pass so product/security/architecture
  readers can jump to “what this proves” views faster than they can today.

### 2026-06-17 canonical-source rendering and architecture ladder

The portal no longer has to choose between polished pages and exact-source
traceability.

What changed:

- canonical-source reader now renders Markdown inside the portal instead of
  only showing raw `<pre>` output;
- raw file access remains available for exact-source verification, including
  frontmatter and unsupported syntax;
- added a missing architecture front door:
  `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/overall-platform-architecture/index.mdx`
- rewired the architecture section so the intended reading ladder is explicit:
  principles -> overall platform architecture -> system overview -> code/layer
  model -> shared-service deep pages.

Files changed in this slice:

- `/Users/subash/dev/GPUasService/packages/docs/src/pages/reference/source-doc.tsx`
- `/Users/subash/dev/GPUasService/packages/docs/src/css/custom.css`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/overall-platform-architecture/index.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/docs/architecture/index.mdx`
- `/Users/subash/dev/GPUasService/packages/docs/sidebars.ts`
- `/Users/subash/dev/GPUasService/packages/docs/package.json`

Docs workspace/runtime note:

- docs workspace was normalized onto local writable pnpm store
  `/Users/subash/dev/.pnpm-store`
- renderer dependencies added:
  - `react-markdown`
  - `remark-gfm`

Validation:

- `git diff --check`: pass
- `CI=true pnpm build` in `/Users/subash/dev/GPUasService/packages/docs`: pass

Next documentation backlog after republish:

- stronger deep engineering packets with ER/sequence/state detail per service;
- screenshot-backed guides for broader user/admin/operator flows, not just MFA;
- cleaner product pages that move explanatory prose behind better navigation and
  help affordances;
- possible rendered treatment for more canonical artifact types beyond Markdown.

### 2026-06-17 portal backlog closeout pass

Continued the documentation closeout instead of stopping at publish.

Completed in this pass:

- added portal quality gate page:
  `/Users/subash/dev/GPUasService/packages/docs/docs/portal-roadmap/completeness-review/index.mdx`
- added operator runtime/version readback model:
  `/Users/subash/dev/GPUasService/packages/docs/docs/operators/service-version-readback/index.mdx`
- expanded the end-user guide into a task-coverage matrix with screenshot/UAT
  expectations:
  `/Users/subash/dev/GPUasService/packages/docs/docs/use-gpuaas/end-user-guide/index.mdx`
- added explicit glossary distinction:
  - `AI Cloud` = product/platform name
  - `GPUaaS` = GPU capacity/runtime product domain where technically precise
- improved findability of tenant/department/project/resource hierarchy from the
  platform overview;
- swept high-visibility pages so product-facing links now say `Use AI Cloud`
  and `Build on AI Cloud` while preserving canonical/source references and
  GPU-specific resource-family names.

Still missing after this pass:

- screenshot-backed guides for launch/connect/storage/billing/tenant-admin and
  troubleshooting;
- service-level ER/sequence/state diagrams for IAM, billing, app runtime, node
  agent, terminal, and provider lifecycle;
- concrete implementation of the service-version/readback API/UX;
- dedicated Token Factory builder path;
- demo/staging setup walkthrough with current environment screenshots.
