# Platform Code and Deployment Architecture v1

Status: active target architecture
Owner: Platform Architecture
Last updated: 2026-06-01

## Purpose

Define how the codebase and deployment topology should evolve as GPUaaS becomes
the first product on a broader AI Factory platform.

This document is the bridge between:

- `Platform_Shared_Services_Model_v2.md`
- `AI_Factory_Production_Readiness_Gap_Portfolio_v1.md`
- `Monorepo_Structure.md`
- `API_Domain_Authoring_Model_v1.md`
- `Frontend_Surface_Architecture_Work_Plan_v1.md`

The goal is to make implementation sequencing explicit before production
readiness work creates permanent GPUaaS-specific versions of platform services.

## Core Principle

Separate ownership before separating deployment.

The near-term target is a modular monolith with strong package, route, contract,
schema, and frontend boundaries. Physical microservice extraction happens only
after the contract is stable and the operational reason is real.

## Target Code Ownership Model

The repo should evolve toward three explicit code layers:

```text
packages/
  platform/                 # reusable platform services
  products/                 # product-owned domains
  shared/                   # technical primitives only
```

### Platform services

Platform services own reusable authority, evidence, trust, money, policy, and
operating posture.

Target packages:

```text
packages/platform/
  iam/                      # identity, memberships, scopes, service accounts
  billing/                  # usage units, rating, ledger, balances
  payments/                 # checkout, webhook, refund/reconcile
  audit/                    # audit action registry and immutable audit writes
  evidence/                 # release/UAT/security evidence bundles
  statusops/                # health, incidents, maintenance, release posture
  notification/             # templates, preferences, dispatch intent
  registry/                 # product/scope/usage/audit/template registries
  artifacts/                # artifact trust, promotion, SBOM/provenance refs
  secrets/                  # secret purpose, credential custody, rotation hooks
  policy/                   # entitlements, quotas, feature flags, reserve policy
```

Platform services may initially share the same process and database. Their
ownership boundary is still real:

- each platform package owns its schema tables or read models;
- products call platform interfaces or APIs, not platform tables;
- platform events use owned subject families;
- platform route files are domain-owned;
- platform UI surfaces live under the platform frontend modules.

### Product services

Product services own customer-facing domain semantics.

Target packages:

```text
packages/products/
  gpuaas/
    inventory/              # nodes, SKUs, capacity, reserve/ring metadata
    provisioning/           # allocations, node lifecycle, MAAS/reimage
    terminal/               # allocation terminal/session runtime
    storage/                # product storage surface and grants
    nodeagent/              # node-agent task catalog contracts

  appplatform/
    catalog/                # app catalog, manifests, versions
    runtime/                # app instances, shared runtimes, operations
    artifacts/              # app artifact publish/promote adapter
    sdk/                    # developer-facing examples and SDK validation

  tokenfactory/
    gateway/                # product-specific inference gateway
    routing/                # model routing and policy
    analytics/              # token/request analytics projections
```

Products compose platform services for IAM, billing, audit, evidence, status,
notification, secrets, artifact trust, and policy.

`packages/products/appplatform/sdk` is the source and validation home for the
App SDK. Published developer artifacts are separate release outputs, such as a
future TypeScript package or Go module, versioned by an SDK publish flow
(`scripts/sdk-publish` or equivalent; TODO). Developer-facing docs for the
published SDK live in the Docusaurus portal, not only beside the internal source
package.

### Shared technical primitives

`packages/shared` should remain boring and technical:

```text
packages/shared/
  errors/
  middleware/
  events/
  outbox/
  db/
  pki/
  readcache/
  storagepath/
  gen/
```

Rules:

- `shared` must not own business policy.
- `shared` must not own product semantics.
- `shared` must not own registries.
- `shared` can define interfaces and transport helpers used by platform and
  product packages.

## Current-To-Target Package Mapping

L3 implementation status: the former `packages/services/*` tree has been
retired. The rows below retain the current-to-target decision history while
the active code anchors are now `packages/platform/*` and
`packages/products/*`. Transitional implementation internals live under
`legacyimpl` subpackages only when the package is still carrying moved legacy
behavior behind the owner-facing facade.

| Current package | Target package | Notes |
|---|---|---|
| former `packages/services/auth` | `packages/platform/auth`, `packages/platform/iam` | Auth/session bootstrap lives in platform auth; IAM owns scope/authorization facade. |
| former `packages/services/billing` | `packages/platform/billing` | Keep ledger invariants; add usage-unit registry. |
| former `packages/services/payments` | `packages/platform/payments` | Preserve Stripe raw-body/webhook boundary. |
| former `packages/services/notification` | `packages/platform/notification` | Add templates/preferences and offline delivery. |
| former `packages/services/releases` | `packages/platform/releases`, with future statusops/evidence split if needed | Preserve release manifest behavior; split only with a concrete status/evidence contract. |
| audit helpers in API/services | `packages/platform/audit` | Centralize audit action registry and query/read model. |
| release/UAT/security evidence scripts | `packages/platform/evidence` plus scripts/CI adapters | Evidence package owns schema/contract; scripts produce evidence inputs. |
| `packages/shared/policy` | `packages/platform/policy` facade with `shared` client interface | Keep client interface stable while moving policy authority to platform. |
| former `packages/services/inventory` | `packages/products/gpuaas/inventory` | GPUaaS-owned capacity and SKU semantics. |
| former `packages/services/provisioning` | `packages/products/gpuaas/provisioning` | Allocation/node lifecycle remains product-owned. |
| former `packages/services/terminal` | `packages/products/gpuaas/terminal` | Allocation terminal is product-owned; evidence/audit hooks are platform-owned. |
| former `packages/services/storage` | `packages/platform/storage` | Current L3 anchor is platform storage; revisit only when product-neutral versus GPUaaS-specific semantics are settled. |
| former `packages/services/appruntime` | `packages/products/appplatform/runtime` | App Platform is a product domain that composes shared services. |

## Facade Pattern

A facade is a new owner-facing package with the target domain name and target
contract. It may delegate to the existing package while migration is in
progress, but callers depend on the facade contract rather than the old package
shape. The facade should prefer new target-domain request/response types and
adapter logic that maps to the old implementation; it should not merely
re-export old types under a new directory.

Example IAM facade shape:

```text
packages/platform/iam/
  service.go          # target IAM interface and request/response types
  adapter_auth.go     # adapter over packages/platform/auth where needed
  errors.go           # IAM-owned sentinel errors
```

`cmd/api/routes_platform_iam.go` calls `platform/iam`. During migration the
adapter can call `packages/platform/auth`; once callers and tests are stable,
auth internals can be reshaped without changing route or product callers.

## Route Architecture

`cmd/api` remains the public BFF while ownership is split by route module.
Route handler bodies stay in `cmd/api` initially. Business logic moves into
platform/product packages; over time `cmd/api` becomes wiring plus thin handler
shells over those packages rather than the owner of domain behavior.

Target route files:

```text
cmd/api/
  routes_platform_iam.go
  routes_platform_billing.go
  routes_platform_payments.go
  routes_platform_audit.go
  routes_platform_evidence.go
  routes_platform_statusops.go
  routes_platform_registry.go
  routes_platform_artifacts.go
  routes_platform_policy.go

  routes_gpuaas_inventory.go
  routes_gpuaas_allocations.go
  routes_gpuaas_nodes.go
  routes_gpuaas_maas.go
  routes_gpuaas_terminal.go
  routes_gpuaas_storage.go

  routes_appplatform_catalog.go
  routes_appplatform_runtime.go
  routes_appplatform_developer.go

  routes_tokenfactory_gateway.go
  routes_tokenfactory_analytics.go
```

Rules:

1. New platform shared-service routes use `routes_platform_*`.
2. New GPUaaS product routes use `routes_gpuaas_*`.
3. App Platform routes use `routes_appplatform_*`.
4. Temporary `/api/v1/v3/*` read models stay isolated until they graduate to a
   durable domain route.
5. No new production route goes into a catch-all registrar unless it is a
   migration bridge with an explicit removal condition.

## Contract Architecture

The public API remains one canonical OpenAPI artifact and one canonical
AsyncAPI artifact.

Domain ownership moves into fragments:

```text
doc/api/openapi/domains/
  platform-iam.yaml
  platform-billing.yaml
  platform-evidence.yaml
  platform-statusops.yaml
  gpuaas-provisioning.yaml
  gpuaas-inventory.yaml
  appplatform-runtime.yaml
  tokenfactory-gateway.yaml

doc/api/asyncapi/domains/
  platform-evidence.yaml
  platform-billing.yaml
  platform-notification.yaml
  gpuaas-provisioning.yaml
  appplatform-runtime.yaml
```

Contract rules:

- one domain owns each path prefix or event subject family;
- shared schemas are reviewed by platform maintainers;
- generated artifacts remain committed;
- product services cannot add platform-owned fields without updating the
  platform contract/registry first.

## Schema Ownership

The database can remain physically shared during the modular-monolith phase, but
schema ownership must be explicit.

Suggested naming:

```text
platform_iam_*
platform_billing_*
platform_audit_*
platform_evidence_*
platform_status_*
platform_registry_*
platform_artifact_*

gpuaas_inventory_*
gpuaas_allocation_*
gpuaas_node_*
gpuaas_terminal_*

appplatform_catalog_*
appplatform_runtime_*
```

Rules:

1. A package queries only its owned tables.
2. Cross-domain reads use APIs, explicit read models, or events.
3. Shared read models are owned by the domain that publishes them.
4. Redis/read-cache keys are performance artifacts, not ownership boundaries.
5. New production tables should include owner/domain naming even before physical
   database extraction.

Existing tables can be migrated gradually. Do not rename stable tables purely
for aesthetics during production-readiness work.

## Event Ownership

Events should follow the same domain split:

```text
platform.iam.*
platform.billing.*
platform.audit.*
platform.evidence.*
platform.status.*
platform.notification.*
platform.artifact.*

gpuaas.provisioning.*
gpuaas.inventory.*
gpuaas.terminal.*

appplatform.runtime.*
appplatform.catalog.*
tokenfactory.gateway.*
```

Rules:

- HTTP handlers write outbox rows, never publish directly.
- Event payloads are versioned.
- Events carry `correlation_id`.
- Events that affect billing, authorization, audit, or customer-visible status
  snapshot the relevant registry version.

## Worker And Binary Topology

### Phase 0 - Current modular monolith

```text
cmd/api
cmd/billing-worker
cmd/provisioning-worker
cmd/webhook-worker
cmd/notification-relay
cmd/outbox-relay
cmd/terminal-gateway
cmd/node-agent
```

Use this phase to establish package, route, schema, and contract boundaries.

### Phase 1 - Platform worker clarification

Add or rename workers only where the ownership is already clear:

```text
cmd/platform-evidence-worker      # evidence bundle assembly/import
cmd/platform-status-worker        # health/release posture snapshots
cmd/platform-notification-worker  # durable dispatch, email/offline channels
cmd/platform-artifact-worker      # SBOM/signature/provenance verification
```

Existing workers may remain while code moves behind platform packages.

### Phase 2 - Extracted service binaries

Only after contracts and SLOs justify extraction:

```text
cmd/platform-iam
cmd/platform-billing
cmd/platform-evidence
cmd/platform-statusops
cmd/platform-notification

cmd/gpuaas-provisioning
cmd/gpuaas-inventory
cmd/appplatform-runtime
cmd/tokenfactory-gateway
```

Extraction gates:

1. stable OpenAPI/AsyncAPI ownership;
2. no direct table access from consumers;
3. service auth pattern exists;
4. degradation behavior is documented;
5. SLO or compliance reason exists;
6. migration/backfill plan exists;
7. local/dev/test topology remains easy enough for active development.

## Deployment Architecture

### Near-term deployment

```text
Edge/WAF/API gateway
  -> cmd/api
       -> Postgres
       -> Redis
       -> NATS JetStream
       -> Temporal
       -> Keycloak

Workers:
  billing-worker
  provisioning-worker
  webhook-worker
  notification-relay
  outbox-relay
  terminal-gateway

Node plane:
  node-agent on GPU hosts
```

Near-term deployment keeps operational surface area low while production
readiness focuses on release evidence, environment separation, gate
enforcement, idempotency, DLQ, terminal compliance, and capacity/rings.

### Target modular deployment

```text
Edge/WAF/API gateway
  -> BFF/API facade
       -> platform services
       -> product services

Platform services:
  IAM / Access
  Billing / Payments
  Audit / Evidence
  Status / Ops
  Notification
  Registry / Artifacts
  Policy / Entitlements
  Secrets / PKI integration

Product services:
  GPUaaS inventory/provisioning/terminal/storage
  App Platform catalog/runtime/developer
  Token Factory gateway/routing/analytics

Shared infrastructure:
  Postgres or service-owned databases
  Redis/read cache
  NATS JetStream
  Temporal
  object storage
  observability stack
```

Do not require all platform services to become separate deployables at once.
Different services extract on different triggers.

## Frontend Architecture

Frontend code should mirror ownership without exposing backend package names.

Target shape:

```text
packages/web/src/
  shared/
    ui/
    data/
    auth/
    realtime/
    telemetry/
    i18n/

  platform/
    evidence/
    statusops/
    iam/
    finance/
    config/
    artifacts/

  products/
    gpuaas/
      workloads/
      compute/
      storage/
      terminal/
    appplatform/
      catalog/
      launch/
      developer/
    tokenfactory/
      endpoints/
      usage/

  shell/
    navigation/
    layouts/
```

Rules:

1. `/platform/*` surfaces use platform frontend modules.
2. GPUaaS user surfaces use `products/gpuaas/*`.
3. App SDK/developer surfaces use `products/appplatform/*` unless the concern is
   product-neutral developer portal infrastructure.
4. Shared frontend code must be generic UI/data/auth/telemetry/i18n plumbing,
   not business policy.
5. New production-readiness surfaces use page contracts, managed Playwright
   evidence, and the V3 workbench/page-family model.

## Import Rules

Initial rules to enforce with ReviewGuard or a CI script:

1. `packages/products/**` may import `packages/platform/**` only through public
   interfaces or client packages.
2. `packages/platform/**` must not import `packages/products/**`.
3. `packages/shared/**` must not import `packages/platform/**` or
   `packages/products/**`.
4. `cmd/api` may wire dependencies, but business logic stays in packages.
5. Product packages must not import another product package without an explicit
   integration interface.
6. Frontend `src/shared/**` must not import `src/platform/**` or
   `src/products/**`.

## Implementation Sequence

Implementation follows the platform-foundation invariant: maps first, guard
visibility second, facade implementation third. Phase A must produce ownership
maps and report-only guard output before Phase B/C packages or facades are
treated as approved implementation work.

Each phase must produce an enforceable artifact, not only alignment text. Valid
outputs are ownership maps, guard reports, package facades, route/read-model
contracts, CI/report artifacts, or evidence packets with reviewable source and
target paths.

### Phase A - Architecture foundation

1. Approve this code/deployment architecture.
2. Add package ownership map.
3. Add route ownership map.
4. Add schema ownership map.
5. Add import-boundary guard in report-only mode.

### Phase B - Evidence/status first slice

1. Create `packages/platform/evidence`.
2. Create `packages/platform/statusops`.
3. Add `routes_platform_evidence.go` and `routes_platform_statusops.go`.
4. Define evidence bundle and status read-model contracts.
5. Add V3 platform evidence/status page contracts before UI work.
6. Define UAT/release product invariants that evidence/status must prove.

### Phase C - IAM and registry foundation

1. Create `packages/platform/iam` facade over current auth.
2. Create `packages/platform/registry`.
3. Define product, scope, usage-unit, audit-action, notification-template, and
   evidence-type registry contracts.
4. Seed GPUaaS and App Platform registry entries.

### Phase D - Product package alignment

1. Move GPUaaS-owned inventory/provisioning/terminal code behind
   `packages/products/gpuaas/*` facades.
2. Move app runtime/catalog code behind `packages/products/appplatform/*`
   facades.
3. Keep old package paths temporarily as adapters only if needed.
4. Remove adapters after route/API callers migrate.

### Phase E - Deployment extraction readiness

1. Add service-auth pattern for product-to-platform calls.
2. Add per-service degradation docs.
3. Add worker split where useful.
4. Extract only the service with the strongest operational reason first.

## Non-Goals

1. Do not split into microservices before package and contract boundaries are
   stable.
2. Do not move files only to satisfy a directory diagram.
3. Do not create shared-service abstractions without a first production-readiness
   or product consumer.
4. Do not make `packages/shared` the home for business logic.
5. Do not redesign the whole frontend before the page contracts and first V3
   slices are validated.

## Open Decisions

1. Resolved locally by [Facade Pattern](#facade-pattern): target code paths are
   introduced under `packages/platform/*` immediately. L3 retired
   `packages/services/*`; remaining compatibility logic lives under owning
   platform/product packages, usually as `legacyimpl` subpackages.
2. Whether existing table names are renamed gradually or only new tables use the
   owner-prefixed naming.
3. Resolved locally by
   `Platform_Deployment_Extraction_Readiness_v1.md`: evidence/status is the
   first extraction candidate to prepare, but physical extraction still waits
   for service-auth, degradation, data/event, operations, SLO, and backout
   gates.
4. Whether platform registry data starts as schema-backed tables or seed-backed
   config with a migration target. Tracked centrally as `OD-001` in
   `../Platform_Architecture_Open_Decisions_v1.md`.
5. Whether frontend `src/platform/*` and `src/products/*` are introduced before
   or after the next V3 workbench slice.

## Related Docs

- `doc/architecture/platform-foundation/Platform_Shared_Services_Model_v2.md`
- `doc/architecture/platform-foundation/AI_Factory_Production_Readiness_Gap_Portfolio_v1.md`
- `doc/architecture/Monorepo_Structure.md`
- `doc/architecture/API_Domain_Authoring_Model_v1.md`
- `doc/architecture/API_Route_Modularization_and_V1_Freeze_v1.md`
- `doc/architecture/Frontend_Surface_Architecture_Work_Plan_v1.md`
- `doc/product/V3_Admin_Workbench_Consistency_Plan_v1.md`
- `doc/architecture/platform-foundation/Platform_Foundation_Orchestrator_Work_Plan_v1.md`
