# Agent Quality Model

This template treats agent quality as observable behavior, not personality.

Use this model to evaluate whether agent work is becoming more reliable over time.

## Metrics

### Verified Change Rate

The percentage of meaningful changes that include a focused verification command.

Target:

- R&D: verification or evidence note for every conclusion
- production: focused verification plus the repo smoke gate when contract files change

### Context Discipline

Whether agents inspect the canonical contract and relevant files before proposing structural changes.

Signal:

- `AGENTS.md`, `ROADMAP.md`, and `ARCHITECTURE.md` are read before substantial template work
- claims about tool behavior are verified or marked as inference

### Review Finding Quality

Whether reviews identify actionable defects rather than style commentary.

Signal:

- findings first
- concrete file or behavior references
- severity ordering
- missing tests called out when relevant

### Handoff Completeness

Whether another worker can continue without reconstructing the task.

Signal:

- scope, touched files, verification, and remaining risk are explicit

### Safety Interception

Whether unsafe commands and secret access are stopped before execution.

Signal:

- pre-tool policy denies direct secret reads, force pushes, recursive deletes, and broad permission changes
- false positives are tracked and reduced without weakening safety

### Downstream Readiness

Whether a bootstrapped repository knows what to replace before production use.

Signal:

- README frames template verification as starter scaffolding
- adoption docs require project-native CI, ownership, release, and security checks

## Scoring Rubric

Use this lightweight score for reviews of agent sessions or downstream pilots.

```text
0 = absent
1 = mentioned but not actionable
2 = actionable but incomplete
3 = complete and verified
```

Score these dimensions:

- task framing
- repository inspection
- implementation scope
- verification
- review or risk assessment
- handoff or durable artifact

## Production Gate

For production-level use, a task should not be considered complete unless:

- the changed surface is clear
- verification was run or the blocker is documented
- safety-sensitive behavior was reviewed when applicable
- any contract change updated the shared-brain docs
- the next worker can understand the result from committed files and the final note
