# IAM Role Assignment and Membership Incident Runbook

## Trigger
1. Spike in IAM mutation failures (bind/revoke membership, role assignment).
2. Reports of unexpected `403 insufficient_permissions`.
3. Reports of role ceiling denial confusion (for example owner grants denied).

## Required Context
1. `correlation_id` from API/UI error envelope.
2. Actor identity (`actor_user_id`) and target identity (`target_user_id`) where applicable.
3. Scope context:
- `tenant_id`
- `project_id`
- attempted `tenant_role` / `project_role` / platform role.

## Immediate Actions
1. Classify scope:
- platform-role mutation path vs tenant/project membership path.
2. Confirm whether denial is expected policy behavior or system degradation.
3. If widespread unexpected denials:
- pause bulk role/membership operations until root cause is identified.

## Diagnosis (Correlation-First)
1. Query API logs by `correlation_id`.
2. Confirm canonical error code/message:
- expected authz ceiling: `insufficient_permissions`
- malformed request: `invalid_request`
- backend dependency issue: `service_unavailable` / `internal_error`
3. Verify role assignment rules:
- tenant admin must not grant tenant owner.
- tenant admin must not grant project owner.
- platform-role mutations require platform-admin authorization.
4. Verify membership state in DB:
- active tenant membership exists in expected tenant.
- project belongs to expected tenant.
- no cross-tenant membership conflict on strict mode.
5. Verify audit coverage:
- privileged mutation should produce `platform_audit_logs` row with matching `correlation_id`.

## Common Failure Classes
1. Expected role ceiling denial:
- actor role lacks grant ceiling for requested target role.
2. Scope mismatch:
- target project outside actor tenant boundary.
3. Cross-tenant binding conflict:
- user already has active membership in another tenant (strict mode).
4. Platform-role binding unavailable:
- role-binding store/dependency unavailable.

## Mitigation
1. Expected denial:
- communicate correct grant ceiling and retry with permitted role.
2. Scope mismatch:
- correct tenant/project selection in UI/CLI and retry.
3. Cross-tenant user move:
- use approved rehome flow where allowed and audited.
4. Dependency degradation:
- restore backing store/service before retrying IAM mutations.

## Recovery Criteria
1. IAM mutation paths return expected deterministic outcomes.
2. Audit logs are present for privileged successful mutations.
3. No unresolved incidents with ambiguous scope/ceiling behavior.

## Evidence to Capture
1. `correlation_id`, `trace_id`, actor/target IDs, scope IDs.
2. Error envelopes and log excerpts showing decision path.
3. Audit log rows for successful privileged mutations.
4. Runbook decision and final remediation action.
