# Token Factory Readiness Decision Packet v1

Status: draft decision packet
Owner: Platform Architecture / Token Factory
Last updated: 2026-06-02

## Purpose

Frame the future Token Factory decisions that shared platform services should
prepare for before Token Factory becomes an active product build.

This packet covers:

- `OD-004`: per-key budget/rate-limit enforcement split;
- `OD-005`: Token Factory gateway candidate direction;
- `OD-006`: authoritative token counting source and reconciliation path.

It is not the final gateway ADR and it is not today's Token Factory build plan.
It is a readiness packet for shared-service contracts, gaps, and future spike
criteria after the App SDK reference baseline.

## Current Platform Baseline

PSSM is proven for GPUaaS plus the App SDK reference consumer:

- IAM, Billing, Audit/Evidence, Status/Ops, Registry, Artifacts, Policy, and
  Secrets/PKI have platform ownership boundaries;
- `cmd/api` and worker binaries no longer import `packages/services/*`;
- boundary guards report zero import, semantic-facade, route, schema, frontend,
  and worker findings, with only the classified DLQ replay exception remaining;
- App SDK Jupyter/vLLM smoke proves a lightweight second consumer.

Token Factory is a useful future full-product proof because it exercises request-path
rate limiting, API-key behavior, token metering, product analytics, and billing
at higher event volume than the App SDK reference path.

## Decision Summary

| OD | Recommendation | State |
|---|---|---|
| `OD-004` | Split enforcement by authority: IAM owns keys/scopes, Token Factory gateway owns request-path limits, Billing/Policy owns budget state and rating, Evidence/Status owns operating proof. | Recommended |
| `OD-005` | Start with an Envoy-direct gateway spike backed by a thin Token Factory control plane; keep Kong/Tyk as fallback if developer portal/API-management features become first-release requirements. | Recommended for spike |
| `OD-006` | Use gateway-side counting as the v1 accepted-usage source with tokenizer version evidence; ingest backend/model-router reported usage when available and reconcile asynchronously. | Recommended for v1 |

## OD-004 Enforcement Split

Token Factory should not centralize all policy in the gateway. The gateway is a
request-path enforcement point; authority remains in shared platform services.

| Concern | Owning layer | Runtime behavior |
|---|---|---|
| API-key issuance and revocation | Platform IAM / token issuer | Issues or validates project/service-account credentials; exposes scope and revocation state to gateway cache. |
| Product scopes | Platform IAM + registry | Defines `tokenfactory:*` scopes and validates actor/project/product compatibility. |
| Route identity and tenant isolation | Pomerium / managed ingress | Terminates edge traffic, applies host/path policy, preserves trusted identity context, and forwards only to Token Factory gateway. |
| Per-key / per-model / per-tier rate limit | Token Factory gateway | Enforced synchronously from cached policy and live counters; cache miss for protected invocation fails closed. |
| Budget state and spend policy | Platform Billing / Policy | Owns account budget, rate cards, soft/hard threshold policy, and financial restriction state. |
| Token/request accepted usage | Token Factory gateway durable event path | Gateway records accepted usage before returning success; rating and ledger mutation stay asynchronous. |
| Rating and ledger mutation | Platform Billing | Converts accepted usage to rated usage/ledger entries; never mutates ledger directly from gateway request handlers. |
| Audit and evidence | Platform Audit / Evidence / StatusOps | Captures key issuance, denied requests, usage ingestion health, reconciliation drift, and release-readiness evidence. |

### Request-Path Rule

The gateway may reject a request based on:

- missing/invalid/revoked key;
- missing Token Factory scope;
- project/org/model entitlement mismatch;
- rate-limit exhaustion;
- hard budget or financial restriction state;
- model unavailable or disabled;
- inability to durably record accepted usage for a successful request.

The gateway should not synchronously mutate billing ledger rows. Successful
requests become billable only through durable usage ingestion and asynchronous
rating.

## OD-005 Gateway Candidate Direction

The existing platform decision remains valid: GPUaaS does not need a generic API
gateway in front of `cmd/api`. Token Factory does need a product gateway below
Pomerium and above model routing.

### Recommended Spike Default

Use Envoy-direct plus a thin Token Factory control plane for the first bounded
spike.

Rationale:

- Pomerium already owns platform edge identity-aware routing; Token Factory
  needs a product data-plane, not a second platform edge.
- Envoy fits private/sovereign deployment and keeps the gateway deployable
  without a large API-management control plane.
- Rate-limit, authz, and metering behavior can be implemented through a small
  Token Factory service boundary that calls platform IAM/Billing/Registry
  contracts.
- The spike can test latency, streaming, header stripping, durable usage
  acceptance, and failure behavior without committing to a commercial or
  portal-heavy gateway.

### Fallback Criteria For Kong Or Tyk

Escalate from Envoy-direct to Kong or Tyk if first-release requirements include:

- built-in developer portal for external API-key self-service;
- mature API-product packaging and subscription tiers out of the box;
- plugin marketplace requirements that would otherwise become custom code;
- enterprise API-management governance stronger than the current Pomerium plus
  platform portal model.

### Spike Acceptance

The gateway spike must prove:

1. `Authorization` token validation without browser redirects.
2. Caller `Authorization` stripping before upstream.
3. Per-key/per-model rate-limit denial with canonical error shape.
4. Streaming response metering without losing final usage evidence.
5. Durable accepted-usage write before successful response completion.
6. Async billing/rating ingestion from accepted-usage events.
7. Trace/correlation continuity across Pomerium, gateway, model router/backend,
   usage event, and Status/Ops evidence.

## OD-006 Token Counting Source

Token Factory needs two token-counting concepts:

- **accepted-usage source**: what creates the billable usage event;
- **reconciliation source**: what detects drift after the request completes.

### Recommended V1 Rule

Gateway-side counting is the accepted-usage source for v1. The gateway records:

- model ID and tokenizer family/version;
- input token count;
- output token count;
- cached-token count when available;
- request count;
- streaming completion state;
- backend-reported usage if the backend returns it;
- correlation ID and upstream request ID.

Backend or model-router reported usage is reconciliation input. If backend
usage is absent or delayed, gateway tokenizer counts remain authoritative for
v1 billing, with drift evidence emitted to Status/Ops.

### Streaming Rule

For streaming responses, the gateway should record usage only when it has a
final accounting boundary:

- normal stream completion;
- upstream failure after partial output;
- client disconnect after accepted input;
- timeout / cancellation.

Each case must classify whether output tokens are billable and must preserve
enough evidence for customer support and billing reconciliation.

### Reconciliation Rule

Reconciliation should compare:

- gateway token counts;
- backend/model-router reported counts;
- rated usage lines;
- ledger entries.

Drift should create evidence and operational status, not direct ledger mutation.
Corrections use normal immutable-ledger adjustment entries.

## First Product-Scope Backlog

The next backlog should decompose into these lanes:

| Lane | First output |
|---|---|
| Product contract | OpenAI-compatible API contract, model catalog contract, API-key scope model |
| Gateway spike | Envoy-direct spike with authz, rate-limit, streaming, and metering proof |
| IAM | Token Factory scopes, API-key/service-account read model, revocation cache |
| Billing | Token usage units, accepted-usage ingestion, rating, reconciliation evidence |
| Audit/Evidence | Deny/audit model, usage ingestion evidence, release gate coverage |
| Status/Ops | gateway health, model backend health, usage ingestion lag, drift posture |
| Analytics | OLTP/OLAP boundary for request/token dashboards |
| Portal | developer docs, key-management UX, API reference/playground direction |

## Open Items After This Packet

| Item | Owner | Required before implementation? |
|---|---|---|
| Final gateway ADR after Envoy/Kong/Tyk spike | Token Factory / Platform Architecture | Yes |
| API playground and external developer portal choice | Product / Docs Portal | No for spike; yes before external launch |
| Token Factory model router contract | Token Factory | Yes for production; stub acceptable for spike |
| Customer-facing quota UX | Product / IAM / Billing | No for spike; yes before broad launch |
| OLTP/OLAP analytics boundary | Data Platform / Token Factory | No for spike; yes before high-volume traffic |
| Credential storage tiers for API keys and gateway secrets | Security / IAM | Yes before production |

## Related Docs

- `../Token_Factory_Gateway_Product_Model_v1.md`
- `../API_Gateway_Evaluation_v1.md`
- `../Platform_Proxy_OpenAI_M2M_Auth_Model_v1.md`
- `../IAM_Token_Issuer_v1.md`
- `../Platform_Architecture_Open_Decisions_v1.md`
- `../platform-foundation/Platform_Architecture_Gap_Register_v1.md`
- `../platform-foundation/Product_Onboarding_Checklist_v1.md`
