Troubleshooting designed

Troubleshooting should start from the state the user sees, then point to the owning product or operator surface.

Common States

provisioning: placement or node bootstrap is still running.
active: workload is ready for terminal, SSH, metrics, or app access.
releasing: teardown is in progress.
release_failed: billing has stopped, but cleanup needs retry or operator attention.
failed: provisioning did not complete; check machine-readable failure reason.
insufficient_balance: add funds or contact the tenant/customer admin.
sku_unavailable: select a different SKU or wait for capacity.

Troubleshooting By Symptom

Symptom	First check	Next safe step
Launch stays in provisioning	current allocation state and correlation id	wait within normal provisioning window, then escalate with correlation id
Browser or SSH access missing	allocation is actually `active` and the expected access path is enabled	retry the correct access path, then escalate with state proof
Release does not complete	state is `releasing` or `release_failed`	retry release if user-safe, otherwise contact support/operator path
MFA status looks stale	account security page and refresh path	refresh status before assuming the factor was lost
Recovery path fails	capture correlation id and the action attempted	move to support-assisted recovery path
Billing warning appears	current balance, recent usage, and tenant context	add funds or contact the tenant admin

User-Safe Rule

Prefer product/API surfaces and correlation IDs over direct infrastructure inspection. Repeated operator-only checks should become product read models.

Canonical sources

Common States​

Troubleshooting By Symptom​

User-Safe Rule​

Common States

Troubleshooting By Symptom

User-Safe Rule