# Token Factory Gateway Product Model v1

Status: draft for review
Owner: Platform Architecture / Token Factory
Last updated: 2026-05-24

## Purpose

Define where the Token Factory gateway belongs, what it owns, and what remains
in shared platform services.

## Decision

The API gateway is a Token Factory product component, not a GPUaaS platform
component.

Pomerium remains the platform edge for identity-aware routing, tenant
isolation, route policy, and trusted identity propagation. Token Factory may
place a gateway below Pomerium and above inference runtimes.

`API_Gateway_Evaluation_v1.md` is the platform decision record explaining why
GPUaaS does not need a generic API gateway. This document is the product model
for the Token Factory gateway and should own candidate selection.

## Responsibilities

The Token Factory gateway owns:

- API-key validation against IAM contracts or cached IAM read models, never
  independent API-key issuance;
- Token Factory scope enforcement;
- per-key, per-model, per-project, and per-tier rate limiting;
- token metering from request/response payloads;
- mapping token usage into billing usage events;
- product request guards such as max input tokens and model allow-lists;
- product analytics records for Token Factory dashboards.

The gateway does not own:

- browser login or user sessions;
- tenant/project membership;
- org billing root;
- ledger mutation semantics;
- payments;
- platform API route authz;
- Pomerium route rendering;
- certificate issuance;
- model runtime lifecycle.

## Runtime Layer

The gateway does not decide where models run. It routes to Token
Factory-managed model endpoints supplied by a model router/control plane. Those
backends may later run on GPUaaS allocations, slice VMs, bare-metal GPU nodes,
or external bring-your-own endpoints, but that runtime placement is a separate
Token Factory model-runtime contract.

For v1 gateway evaluation, use a stub backend or one explicitly provisioned
model endpoint. Do not make GPUaaS allocation lifecycle part of the gateway
proof.

## Request Flow

```text
client
  -> Pomerium edge
  -> Token Factory gateway
  -> IAM key/scope validation or cached read model
  -> Token Factory model router
  -> inference backend
  -> gateway token/request metering
  -> durable usage/audit event
  -> response
```

The response path must not wait for ledger mutation. The usage/audit event path
must be durable enough that billing can backfill from accepted requests. The
gateway blocks the response on durable acceptance of the usage/audit event
through its outbox or equivalent durable event path; rating and ledger posting
remain asynchronous.

## Platform Contracts Consumed

| Need | Platform owner |
|---|---|
| Organization / department / project hierarchy | IAM |
| API keys | IAM service-account model |
| Product scopes | IAM scope registry |
| Token usage units | Billing usage-unit registry |
| Usage attribution | Billing ledger |
| Audit | Audit shared service |
| Public endpoint | managed_ingress / Pomerium |
| TLS | cert-manager / edge profile |

Token Factory registers product-specific scopes, usage units, audit actions,
and notification templates. It does not patch platform service code to add
them.

The gateway may cache IAM and billing registry read models for request-path
latency, but cache misses and refresh failures must fail closed for protected
model invocation. Usage events are emitted through the product outbox or a
durable gateway event path before they become billable ledger entries.

## Candidate Components

The product ADR should compare:

- Kong;
- Tyk;
- Envoy-direct with a thin Token Factory control plane.

Avoid Apigee and cloud-provider API gateways for v1 because they conflict with
the platform's private/sovereign deployment posture.

## Initial Task Split

Platform-owned:

1. Shared-services foundation docs.
2. IAM department schema and product scope registry.
3. Billing token usage units and ingestion contract.

Token Factory-owned:

4. Token Factory gateway ADR.
5. Token Factory OpenAI-compatible API contract.
6. Gateway smoke in kind with a stub model backend.
7. Billing/audit smoke proving one request produces token usage, product audit,
   and correlation evidence without mutating ledger rows directly.

## Open Decisions

1. Kong vs Tyk vs Envoy-direct.
2. Unified platform API-key page versus Token Factory-specific developer
   portal.
3. Authoritative token counting source: gateway, model router, or backend
   runtime metrics.
4. Whether `cached_token` is billable in v1 or only registered for future use.
5. External API versioning and deprecation policy.
6. Model router/control-plane contract, including model registration, backend
   health, routing policy, and runtime placement evidence.
7. Bring-your-own endpoint authentication model for customer-owned inference
   backends.

Token counting trade-off: gateway counting is low-latency and close to request
policy, but can drift from backend accounting; backend counting is more
authoritative, but arrives later and makes synchronous limits harder. The ADR
must decide the source of truth and the reconciliation path.

## Related Docs

- `token-factory/Token_Factory_Readiness_Decision_Packet_v1.md`
- `token-factory/Token_Factory_Readiness_Backlog_v1.md`
- `API_Gateway_Evaluation_v1.md`
- `platform-foundation/Platform_Shared_Services_Model_v2.md`
- `Unified_IAM_Billing_Across_Products_v1.md`
- `platform-foundation/Platform_Architecture_Gap_Register_v1.md`
- `Platform_Architecture_Open_Decisions_v1.md`
