Writing good checks
A check is one natural-language sentence describing something that must be true about a PR. The reviewer agent reads the description on every PR, looks at the diff (and the surrounding code when it needs to), and answers pass, fail, or needs_review with file:line evidence.
The quality of the answer depends almost entirely on the quality of the description.
The shape of a good description
Write the description the way you'd write a comment on a teammate's PR:
Production code under
src/must not containprint()statements. Use the logger fromsrc/observability/logger.pyinstead.
That sentence does four useful things:
- States the rule ("must not contain
print()"). - Scopes it ("under
src/" — not tests, not scripts). - Names the standard ("Use the logger from ...") so the agent can cite a concrete alternative in its evidence.
- Stays specific. "Don't log sensitive data" is too broad; "don't pass
user.emailoruser.ssnintologger.info" is checkable.
Status meanings
| Status | What it means |
|---|---|
pass | The agent verified the rule holds for this PR's changes. |
fail | The agent found a concrete violation. The evidence will reference file:line. |
needs_review | The agent isn't sure — the rule is ambiguous, the evidence is partial, or the code the rule covers wasn't touched in a way the agent could resolve. A human should look. |
needs_review is a real outcome, not a synonym for "I didn't try." When you see it, click into the run and read the agent's reasoning — it'll usually tell you what evidence it was missing.
Patterns that work
Anchor to paths, function names, or extensions. The agent has read access to the whole repo and uses paths to scope its search.
Every new file under
migrations/includes adown_revisionand a non-trivialdowngrade()body.
Say what to do instead. Rules with a positive alternative get cleaner fail evidence.
External HTTP calls use the
http_clientwrapper fromlib/http.tsso retries and timeouts are applied. Don't usefetchoraxiosdirectly.
Be explicit about what counts as "new". "New tables" is ambiguous — added in this PR? Added in the last release?
SQL files added or modified under
models/staging/have a matching test file undertests/staging/with the same basename.
Lean on file conventions you already enforce in review. If your team already says "all React components must be functional, not class-based," that's a great check — you already know how to spot violations, so the agent can too.
Patterns that don't
Vague quality bars. The agent can't tell you whether something is "clean," "readable," or "well-designed." It can tell you whether a specific structural property holds.
"Code is well-tested." → too vague. "Every public function in
src/api/handlers/has at least one assertion in a sibling*_test.pyfile." → checkable.
Whole-codebase claims. A check evaluates the changes in a PR plus the code they touch. It is not a static analyzer over the whole repo.
"There are no circular imports anywhere." → not a check; use a linter. "New imports added in this PR don't create a cycle with
core/." → checkable.
Things that need to run the code. The agent can read code and run shell commands in a sandbox (no network), but it can't reach your databases, your staging API, or your production logs.
"The new endpoint returns the same JSON shape as the old one in prod." → not a check. "The new endpoint's response model in
schemas.pyhas the same fields asOldEndpointResponse." → checkable.
Compound rules. Two rules that share a sentence make it hard for the agent to give you a meaningful fail with line refs.
"All new functions are typed AND have docstrings AND are tested." → split into three.
Severity and tiering
Two optional fields tune how a check is evaluated.
severity (low | medium | high, default medium) controls how the failure renders in the PR comment and in any policy-pack scoring you build on top — it doesn't change the model's behavior.
tier_hint (auto | triage_only | deep, default auto) controls which tier of the evaluator runs the check. Most checks should stay on auto, which lets a cheap triage pass handle obvious pass / not applicable cases and escalates to a deeper agent only when needed. Use:
deepfor checks that require following code across multiple files or interpreting complex logic. Costs more credits but produces stronger reasoning.triage_onlyfor checks that are obviously a pass-through if the diff doesn't touch the relevant code. Cheap; no escalation.
See Credits and billing for the credit cost of each tier.
Iterating on a check
When a check gives results you don't like:
- Open the run on CrossCheck and read the agent's evidence. The reasoning is usually telling.
- Tighten the scope. Add a path constraint or call out a specific file pattern.
- Name the alternative. Replace "don't do X" with "do Y instead of X."
- Re-run the PR with the Re-review button.
If the description is right but the agent keeps mis-evaluating, set tier_hint: deep and try again.
Where checks live
- In the dashboard. Open a check group under
/<your-org>/check-groups, click Add check. - In
cross-check.yamlat your repo root. YAML checks run alongside dashboard checks; they never replace them. Seecross-check.yaml.
Either way the format is the same: a name and a description.