# Gap Analysis: Prototype -> Production-Ready Platform

## Status
- Historical comparison artifact retained for context.
- Many gaps listed here have been closed by the current implementation.
- Use this file as a prototype-to-target delta reference, not as the live implementation status source.
- For current state, use `Execution_Progress.md`, `Implementation_Roadmap.md`, and `governance/Agent_Work_Queue.yaml`.

## A. Architecture and Boundaries
- Current: single large backend module mixing auth, billing, provisioning, storage, Stripe, WS.
- Gap: needs domain boundaries and clearer service/module split.

## B. Data Durability and Consistency
- Current: JSON file persistence with backup snapshots.
- Gap: migrate to relational DB with transactions, constraints, indexes, and migrations.

## C. Security
- Current:
  - localStorage token storage
  - query token for SSH key download
  - plaintext credentials in env files
- Gap:
  - secure session strategy
  - token scope and TTL tightening
  - KMS/secret manager and key rotation
  - audited admin/provisioning actions

## D. Billing and Payments Integrity
- Current:
  - periodic cost deduction mutates user balance directly
  - refund route is internal credit, not true Stripe refund
- Gap:
  - immutable ledger-based accounting
  - reconciliation jobs
  - strict idempotency and compensation paths

## E. Realtime and WS Contract
- Current:
  - one WS endpoint used for terminal
  - frontend also tries to use same endpoint for global notifications without allocId
- Gap:
  - separate channels/protocols for terminal and notifications
  - explicit schemas and heartbeat/reconnect strategy

## F. Provisioning Reliability
- Current:
  - DB updates and SSH actions can diverge on partial failure
- Gap:
  - state machine + job queue + retries + compensation rules
  - idempotent provisioning commands

## G. Observability and Operations
- Current: console logging and `/healthz` only.
- Gap:
  - structured logs, metrics, tracing
  - alerting and SLOs
  - audit event streams

## H. UX and Product Completeness
- Current:
  - scheduler and storage purchase flows are placeholders
  - long single component UI (`App.jsx`) with mixed concerns
- Gap:
  - finalized UX specs and reusable UI architecture
  - end-to-end states and copy standards

## I. Code-Verified Prototype Defects (Do Not Replicate)
- Duplicate function definition:
  - `releaseLinuxUserAccess` is defined twice in `backend/src/server.js` (shadowing risk, migration noise).
- Terminal privilege model coupling:
  - terminal access depends on shared node-admin SSH credentials and `sudo -iu <user>` behavior on nodes.
  - this creates operational fragility around credential rotation and sudo policy drift.
- Sync file I/O data access pattern:
  - prototype `getDb()` reads JSON synchronously on request paths.
  - any equivalent full-state-per-request pattern is non-viable in production.
- API payload contamination:
  - SKU API responses currently leak UI styling fields (`color`, `bg`), forcing weak schema contracts.
- Naming/behavior mismatch:
  - prototype admin `refund` route performs internal balance credit, not provider-side Stripe refund.

## Recommended Hardening Order
1. Data model + Postgres migration + ledger model.
2. Provisioning/billing state machines and background workers.
3. Auth/session/security hardening.
4. WS split and realtime contract cleanup.
5. Observability + audit logging.
6. UX cleanup and production-ready interaction patterns.
