Skip to main content

Troubleshooting designed

Troubleshooting should start from the state the user sees, then point to the owning product or operator surface.

Common States

  • provisioning: placement or node bootstrap is still running.
  • active: workload is ready for terminal, SSH, metrics, or app access.
  • releasing: teardown is in progress.
  • release_failed: billing has stopped, but cleanup needs retry or operator attention.
  • failed: provisioning did not complete; check machine-readable failure reason.
  • insufficient_balance: add funds or contact the tenant/customer admin.
  • sku_unavailable: select a different SKU or wait for capacity.

Troubleshooting By Symptom

SymptomFirst checkNext safe step
Launch stays in provisioningcurrent allocation state and correlation idwait within normal provisioning window, then escalate with correlation id
Browser or SSH access missingallocation is actually active and the expected access path is enabledretry the correct access path, then escalate with state proof
Release does not completestate is releasing or release_failedretry release if user-safe, otherwise contact support/operator path
MFA status looks staleaccount security page and refresh pathrefresh status before assuming the factor was lost
Recovery path failscapture correlation id and the action attemptedmove to support-assisted recovery path
Billing warning appearscurrent balance, recent usage, and tenant contextadd funds or contact the tenant admin

User-Safe Rule

Prefer product/API surfaces and correlation IDs over direct infrastructure inspection. Repeated operator-only checks should become product read models.