# Agent Execution Quality and Context Model v1

Status: draft
Date: 2026-05-16
Related:
- `doc/governance/Agent_Orchestrator_v2_Coordinated_Execution.md`
- `doc/governance/Agent_Parallel_Track_Control_v1.md`
- `doc/governance/Agent_Watcher_Lanes_v1.md`
- `doc/governance/Frontend_E2E_Validation_Model_v1.md`
- `doc/governance/Testing_Standards.md`

## Purpose

Define how GPUaaS should run autonomous and parallel agent work without losing
the original goal, drifting into unbounded debugging, or accepting bug fixes
that are not protected by workflow-level regression checks.

The primary audience is this repository. The secondary design constraint is
that the model should stay reusable enough to become a standalone orchestration
tool later.

## Problem Statement

Recent work exposed four recurring failures.

1. Coordinator context decays during long autonomous runs.
   - The main agent starts with broad project context.
   - After CI, deploy, and bug-fix loops, the active context narrows to the
     latest failure.
   - Original goals, non-goals, and environment boundaries become easy to
     forget.

2. Worker agents do not receive enough context.
   - Side agents get a task slice but not the full operational and architectural
     context.
   - They can make local progress, but they are more likely to miss drift,
     production-shape constraints, or must-not-touch boundaries.

3. Bug-fix side tracks expand too easily.
   - A test or deploy fails.
   - The agent starts fixing the symptom.
   - The fix crosses into CI, auth, deploy, UI, routing, or environment changes.
   - The original task loses closure.

4. Workflow regression coverage is not strong enough.
   - Unit tests are comparatively solid.
   - Browser/runtime/deploy workflows are under-covered or too fragile.
   - A demo login callback failure was missed because the deployed auth/app
     redirect path was not a release-blocking workflow check.

## Operating Model

Use three execution loops.

### 1. Implementation Loop

Worker agents implement bounded tasks.

Required inputs:

- context packet,
- owned files/systems,
- non-goals,
- known risks,
- acceptance checks,
- stop conditions.

Output:

- exact files changed,
- commands run,
- pass/fail/skipped results,
- residual risks,
- queue evidence and handoff.

### 2. Watch Loop

Watcher agents monitor CI, CD, deploy, smoke, and long-running environment work.

Required inputs:

- watcher packet,
- command or external job to watch,
- expected duration,
- success condition,
- failure condition,
- allowed fixes,
- escalation rules.

Output:

- final status,
- failure classification,
- log/artifact link,
- duration,
- queue evidence.

### 3. Review Loop

Review agents or reviewers inspect bug fixes and high-risk changes for root
cause, regression coverage, and scope drift.

Required questions:

1. What failed?
2. What layer owns the defect?
3. Why did existing tests miss it?
4. What regression now covers it?
5. Did the fix add route, environment, CI, or auth special logic?
6. Did the work stay within the original task packet?

## Context Packet

Every non-trivial implementation task needs this packet:

```yaml
task_id:
track_id:
owner_lane:
goal:
non_goals:
current_state:
architecture_context:
environment:
owned_files:
must_not_touch:
commands_to_run:
acceptance:
workflow_regression_pack:
known_risks:
stop_conditions:
deviation_rules:
handoff_required:
```

The packet travels with the task. The agent must not rely on chat history alone.

Repo-local helper:

```bash
TASK_ID=C-DEPLOYED-AUTH-LOGIN-RELEASE-GATE-001 \
OWNER_LANE=C-ops \
GOAL="Add deployed auth/login release gate for demo" \
ACCEPTANCE="OIDC callback smoke blocks failure;browser login plan recorded" \
WORKFLOW_REGRESSION_PACK=auth-session \
make agent-context-packet
```

## Drift Rules

When unexpected work appears:

1. If it is inside the same owning layer and required for acceptance, fix it.
2. If it blocks the task but crosses ownership boundaries, stop and create a
   blocker or escalation packet.
3. If it is useful but not blocking, create a follow-up task.
4. If it is unrelated, leave it alone.

Do not silently expand an implementation task into environment recovery,
release redesign, broad UI cleanup, or unrelated proxy/auth work.

## Bug-Fix Definition of Done

A bug fix is not done until it records:

- root cause,
- owning layer,
- user-visible or operational impact,
- proof command that reproduced or directly validates the fix,
- regression coverage location,
- why the regression is at the right layer,
- residual risk or skipped coverage reason.

If no regression exists at the right layer, the fix must create one or create a
queue task that blocks release until the missing coverage is accepted.

## Workflow Regression Packs

Workflow packs are product/runtime journeys that matter more than individual
components.

Initial required packs:

| Pack | Required proof |
|---|---|
| Auth and session | deployed app can start OIDC auth, Keycloak accepts callback, user can land in shell |
| V3 shell | authenticated user can navigate major mode and project/tenant context |
| App launch | user can select app, submit launch, and see runtime/workload state |
| Notebook/proxy | active notebook route opens through the intended proxy path |
| Terminal | active allocation terminal opens through browser websocket route |
| Node inventory/lifecycle | operator can see nodes, readiness, blockers, and task evidence |
| Platform proxy | host/path route resolves through intended renderer and logs trace/correlation |
| Demo deploy | deployed app/API/auth hosts pass remote smoke and browser login/auth redirect checks |

Each pack should have:

- owner,
- target environment,
- required seed data,
- lowest reliable test layer,
- blocking vs advisory status,
- artifact requirements.

## Test Layer Rules

Use the lowest layer that catches the regression, but do not confuse layer
fitness with release fitness.

- Unit/component: logic and rendering states.
- API/integration: contracts, authz, DB and policy behavior.
- Playwright local: browser behavior with local stack.
- Kind/deployment parity: ingress, runtime env, config injection, rollout.
- Remote validation: deployed URLs, DNS/TLS/tunnel, OIDC callback registration,
  live service smoke.
- Manual/recorded smoke: workflows that still need human judgment until they are
  automated.

The demo login callback failure belongs to remote validation first. A local
Playwright test can prove login UI behavior, but it cannot prove Keycloak has
the deployed `https://aicloud-demo-app.core42.dev/auth/callback` redirect URI
unless it runs against that deployed auth target.

## Coordinator Checkpoints

For any autonomous run longer than one hour, record:

```text
Original goal:
Current track:
What changed:
Drift discovered:
Blocked on:
Next 3 actions:
Continue | park | split | escalate:
```

This checkpoint can be stored as a track checkpoint, handoff, or dashboard note.

## Product Boundary

Generic orchestration core:

- task graph,
- context packets,
- watcher packets,
- state machine,
- queue evidence,
- session registry,
- review routing,
- workflow regression packs,
- dashboard/API,
- Git/worktree adapters,
- provider adapters.

GPUaaS-specific configuration:

- role names,
- repo paths,
- review rules,
- CI/CD commands,
- environment descriptors,
- release promotion policy,
- workflow pack definitions,
- platform-control and demo runbooks.

## Missing Pieces

Known gaps to address:

1. Context packets are documented but not enforced.
2. Watcher lanes are documented, but dispatch and closeout are still manual.
3. Workflow regression packs are not yet a first-class artifact.
4. Bug-fix DoD does not yet block release when no regression is added.
5. Remote deployed login is smoke-checked at the authorize endpoint, but full
   browser login should become a release gate for demo/staging/prod.
6. Dashboard/API is not built; current state is command-line only.
7. Queue health reports historical evidence debt and should distinguish legacy
   debt from new tasks.
8. Worker agents need better context packets and should not receive vague
   thread summaries.

## Near-Term Sequence

1. Define workflow regression pack format and seed initial packs.
2. Add bug-fix review checklist to Definition of Done.
3. Add deployed auth/login remote validation as a blocking demo release check.
4. Add context packet helper and require it for side agents.
5. Add watcher assignment helper closeout evidence.
6. Build a read-only local dashboard over the queue store.