Hooks Beat Instructions

Three months into running a shared workspace, two patterns appear that nobody planned for. A senior leader asks the agent for the current state of a deal and gets a confident answer drawn from a six-week-old document marked Status: In Progress, even though the deal closed in March. A founder runs five Claude Code sessions in parallel through the day, and three of them re-decide the same architectural question because no session can see what the others have already concluded. Neither failure is the model's fault. Both are the predictable result of running an AI-native workspace on instructions and goodwill rather than on artifacts and enforcement.

Three forms of volatility produce these failures. The thread itself is volatile because the model's recall degrades as the token count grows, and compaction passes silently drop entire stretches of work — Anthropic's engineering writeup on context engineering documents the mechanism plainly. A CLAUDE.md rule is volatile in a different way: the next session reads the file, agrees with the rule, and then routes around it on the third tool call because the rule was a sentence in prose, not a gate the runtime checks. Process documents that nobody enforces are volatile slowest. They look governed in the docs and behave chaotically in practice, and the chaos is invisible until somebody senior asks the agent a question and gets a confidently wrong answer.

The durable answer is small and unglamorous. The file becomes the state machine, and hooks block what shouldn't happen. The rule that "a task in done/ must have a verification artifact" stops being a sentence in a doc and becomes a pre-commit hook that rejects the commit when the artifact is missing. The rule that "every tool call should inject relevant recent decisions into the next prompt" stops being something a founder hopes for and becomes a PostToolUse handler registered in settings.json before the session starts. The integrity check that "no node in the workspace should sit unreferenced for more than a quarter" runs nightly as a cron-driven script and writes a morning report a named human triages before the day's first prompt.

This is the enforcement layer. Without this layer, the chapters around this one describe systems that work for a quarter and rot over a year. The enforcement layer is what changes that arc. With it, the workspace at month twelve is still a single source of truth, and a founder can run five parallel sessions on a Wednesday morning without producing five contradictory views of yesterday's decisions.

The file is the state machine

The principle is simple and the application is non-obvious. The directory a file sits in represents its stage in a lifecycle. The Status field in the file's frontmatter represents the same lifecycle from the agent's point of view. A pre-commit hook reads both before allowing any commit and rejects the commit if they disagree. The team does not have to remember to keep the two in sync because the commit fails when they drift.

The concrete pattern that holds up across founders, mid-team leads, and enterprise rollouts is a directory layout that mirrors a six-stage lifecycle and a frontmatter field that mirrors the same six stages.

work/
  product/
    1-discovery/
    2-ready/
    3-doing/
    4-done/
    5-archive/
    6-deprioritized/
  tooling/
    [same six stages]
  gtm/
    [same six stages]

A task file sits in exactly one stage at any time. The file path encodes the stage as the third path segment. The file's frontmatter carries Status: Discovery | Ready | In Progress | Verified | Complete | Deprioritized (the verbs change but the structural mapping is one-to-one with the directory). When a task moves from In Progress to Complete, two things have to happen at the same commit: the file is git mv'd from 3-doing/ to 4-done/, and the frontmatter Status changes accordingly. A pre-commit hook reads both and rejects any commit where one moved without the other.

The single most common failure this hook prevents is the silent inconsistency that produces the wrong-state read. A founder edits the frontmatter to Status: Complete from a session at the airport, intends to git mv the directory when they're back at a real keyboard, forgets, and the inconsistency persists. A week later, an agent in a different session reads the file from 3-doing/, sees Status: Complete, and either confidently asserts the work is finished (because the frontmatter is the canonical surface most skills read first) or asks the user to clarify and gets the wrong answer (because the user has forgotten the half-completed move). The hook catches the contradiction at commit time, well before any customer-facing wrong answer.

The same pattern generalizes beyond tasks. Decision documents have a lifecycle (proposed → approved → implemented → revisited). Specs have a lifecycle (drafting → locked → executed → archived). Master skills have a lifecycle (drafting → in-rotation → deprecated). Anywhere a state can drift from a location, a hook is what keeps the two aligned. The directory encodes the stage, the frontmatter encodes the state, and the hook is the enforcement that keeps the two from drifting apart.

There is a subtlety worth naming. The hook does not enforce that a particular state-and-stage combination is the right state from the team's point of view. It enforces only that whatever combination the file claims, the directory claims the same thing. The judgment about whether a task is actually done belongs upstream in the verification artifact. The hook's job is narrower and harder to argue with: ensure the artifact and the location agree, every time, without asking anyone to remember.

Hooks beat instructions

Hooks earn their place because instructions are forgotten across sessions and hooks are not. A team that relies on the next session reading CLAUDE.md and following the rules loses the rules the day the senior engineer goes on vacation, the day a model release shifts the failure surface, the day someone joins the team without the institutional memory of which rules are load-bearing. A team that wires the same rules into hooks the runtime executes loses the rules only if the hooks are deleted, and the hooks are in settings.json under version control where deletion is a visible diff.

Anthropic's Claude Code hooks reference documents the four hook classes a small AI-native workspace draws on, plus a wider taxonomy that becomes useful at scale. The four that carry weight at founder and small-team scale are these:

PreToolUse runs before a tool call executes and can block it. The hook the chapter's directory-vs-state example uses is implemented at this layer through the standard pre-commit pattern: a script reads the staged files, compares directory to frontmatter, and refuses the commit when they disagree. The pre-commit framework is the public reference implementation for this pattern across languages.
PostToolUse runs after a tool call succeeds and cannot block, but can write to disk, fire notifications, or inject context into the next prompt. The cross-session sync the chapter develops below lives at this layer.
SessionStart runs when a session begins or resumes, before the user's first prompt. This is the hook that surfaces the morning's lint-triage report, recently-touched decisions relevant to the project at the current working directory, and any nudges other sessions left for this one in a shared inbox.
PreCompact runs before context compaction, the only opportunity to extract the session's important state into durable storage before the model's working memory drops it. A small PreCompact handler that writes a one-paragraph summary plus the list of decisions made in the session to a file in the workspace is one of the highest-value hooks a founder can install in their first week.

The argument for hooks is the same one software engineering teams settled a generation ago about CI and automated tests. Asking every engineer to remember to run the tests before pushing loses test discipline within a quarter, because what one engineer remembers does not survive vacation, late nights, or new hires. Running the tests on every push regardless of what anyone remembered preserves the rule across every personnel change, every late-night fix, and every "just this once" exception. The cost of automatic testing is one configuration file and a few seconds per push. The benefit is institutional rather than individual: the discipline does not depend on memory.

Hooks for the AI-native workspace work the same way. The rule that the next session must inject yesterday's decisions into its context, that destructive commands must be approved, that tracker keys must exist in frontmatter before a task can move to done/, that compaction must save state before discarding it — every one of these rules is a sentence in prose if it lives in CLAUDE.md and an enforced contract if it lives in settings.json. The bar to clear is not "did we write the rule down?" but "does the runtime block the action when the rule is violated, without relying on anyone remembering?" Hooks clear the bar. Instructions, by themselves, do not.

Drift detection runs nightly; humans triage in the morning

A workspace that is actively maintained still drifts. Files get renamed and the inbound links to them break. A node gets added to support an experiment, the experiment ends, and the node sits orphaned with no inbound links. A decision marked revisit_by: 2026-Q1 is now in 2026-Q2 and nobody noticed. Two near-identical canonical answers to the same question accumulate across two folders because two different team members wrote them on different days. Two decision pages contradict each other because the second author wasn't aware of the first.

The integrity layer is the nightly cron-driven lint that catches all of these and writes a report a named human triages in the morning. Six checks earn their place across every implementation worth running:

Broken [[wikilinks]] — a reference points to a node that has been renamed, deleted, or never existed. The lint walks every reference and flags any that fail to resolve.
Orphan nodes — a node has no inbound links from anywhere in the workspace. Usually this surfaces a misnamed tag or an abandoned topic that should be archived rather than left in the active vault.
Frontmatter violations — a status field outside the allowed enum, a missing owner, a malformed date, a required field absent. The lint validates against a small JSON Schema and flags every divergence.
Stale decisions — a node carries a revisit_by date and the date has passed without an update. The decision-node author committed to a review and the review didn't happen.
Near-duplicates — two nodes have cosine similarity above some threshold and almost certainly represent the team forking the canonical answer. The lint flags both for a human to merge.
Contradictions between decision-nodes — two nodes claim incompatible answers to the same question. The lint can detect a subset of these heuristically (overlapping titles, conflicting frontmatter values on the same key) and flag the pair for human resolution.

The load-bearing rule is the one that sounds disappointingly cautious until it bites: the lint reports, humans triage, the lint never auto-fixes. Auto-fixing the lint findings is the wrong design in three specific ways: auto-merging duplicates without human review deletes context the team needs, auto-resolving contradictions in favor of the more recent file silently overwrites a decision the older file may have been right about, and auto-renaming the destination of a broken link guesses at intent. The right design is for the lint to write a single report file per night to a known path, and for the same person who owns the master skill and the nightly sync to triage the report before opening the day's first session.

The triage cost lands at ten to thirty minutes a morning at small scale and an hour or two at large scale, and that cost is the price of the workspace not rotting. Yesterday's report closes before today's first prompt, the same way a build break closes before today's commit. A team that lets the report queue accumulate for a week is running an agent fleet against substrate that has been quietly decaying. The output quality drops in correlated ways that look from the outside like "the model is getting worse" but are really "the context is getting older." Treating the lint queue as a P1 incident is what keeps the substrate trustworthy.

The cron entry that runs the lint and the path it writes the report to are part of the same hook layer the previous section described. The cron job is a scheduled equivalent of SessionStart and PostToolUse, running on a schedule the team never thinks about, producing artifacts in known locations that the next session's SessionStart hook can surface as the first thing the morning's agent reads.

Single source of truth per fact; resolve at read time

Two-way sync between every reference and its target breaks. The first time a team tries to keep "tracker ticket KAN-1234" in three documents auto-updated to its current title and status, the team spends a Saturday debugging race conditions, phantom edits, and last-writer-wins overwrites of valid concurrent work. The pattern that holds up is the inverse: every fact lives in exactly one canonical place, and everywhere else stores only the bare ID. The resolution from ID to current title-and-status happens at read time, either through a render middleware or through the agent fetching the canonical node when the reference is read.

A reference in a document is [[node_supplier_vetting_20260412]] or KAN-1234. The reference is stable. The title of the node may change, the status may flip from In Progress to Complete, the deadline may move three times, and the reference does not have to be touched. When a render pass produces the final document or an agent reads the document at task time, the reference resolves through a small middleware that looks up the current title and status of the canonical node. The output is "Supplier vetting decision (Complete, 2026-04-12)" or "Migration to new logging stack (In Progress, deadline 2026-05-15)" — generated fresh on every read, never stored in the calling document.

The tracker integration is the canonical example of the pattern at small-team scale. Tasks with deadlines and SLA accountability belong in a real tracker like Linear or Jira, not in the markdown vault, because the tracker has the workflow primitives the team needs (assignment, SLA timers, audit trails for compliance, notifications). The vault has different primitives the team needs (grep-ability, AI-readability, free-form association with notes and decisions, version control). The integration that gives both is a nightly snapshot. A cron job exports the current state of the tracker to vault/tracker-snapshot-YYYY-MM-DD.md (or a per-project version of the same), the agents grep the snapshot when answering status questions, and the canonical state of any task lives in the tracker. The snapshot is treated as cache and is never edited by hand. If the snapshot disagrees with the tracker, the snapshot is wrong.

The link from filename to tracker key threads through the same enforcement layer the previous sections described. A task file's directory might be named task-2026-04-26-KAN-1234-data-pipeline-fix/ and the file's frontmatter carries tracker_key: KAN-1234. A pre-commit hook can require the tracker key to exist in frontmatter before the file is allowed to move to 4-done/, which closes one common failure mode where a workspace task ships without a corresponding ticket and silently disappears from the tracker's view.

The anti-pattern, which feels natural and produces drift within a week, is embedding the title and status of a referenced task in the calling document. The calling document says "Supplier vetting (Complete)" and the canonical node moved to status Verified two days later because a defect surfaced. The agent reads the calling document, takes the embedded "Complete" at face value, and gives a wrong answer. The fix is structural: store IDs only, resolve at read time.

This pattern also handles the founder-scale case where the workspace contains a hundred references to a small set of canonical entities (the team's customers, the team's vendors, the team's investors, the team's products). Each entity gets exactly one canonical node, every reference uses the bare ID, and a render middleware or the agent at read time produces the up-to-date title-status-date triple. The cost of the middleware is small (a few hundred lines of code, often produced in a single agent session). The cost of not having it is that within a quarter the workspace contains three slightly different titles for every long-running entity, and any agent answering a question that touches that entity has to choose between three versions and will choose wrong.

Tier the process; opt-in overhead

Process overhead has to be opt-in or it gets routed around. The over-tiered case shows up first: a team that pays the twelve-phase pipeline cost for every typo learns to skip the pipeline within a month, and the discipline goes with it. The under-tiered case shows up later, more painfully: a system-design task that needed deep research ships under the lighter Tier 2 process, and three months on, the system breaks under load. Both failures share a root cause — treating every task as if it carries the same complexity. The fix is to tier tasks by complexity and let each tier carry only the overhead it needs.

Four tiers earn their place across the workspaces that hold up at scale:

Tier 1 — single-file change, typo, config tweak, doc fix. Just do it. No task file, no spec, no pre-mortem. The overhead is zero because the work is deterministic and small enough that getting it wrong is recoverable in seconds.
Tier 2 — add a function, refactor a known shape, write a test for a known feature. A task file plus a brief pre-mortem covering the obvious failure modes. No spec, no design phase, no formal verification. The overhead is small because the change has known shape.
Tier 3 — new feature, multi-file change, ambiguous requirements, customer-facing surface. The full pipeline: spec drafted and reviewed and locked, design produced and approved, pre-mortem covering the non-obvious failures, test plan written before implementation, verification artifact produced before the task moves to done/.
Tier 4 — system design, cross-service work, irreversible action, anything where the cost of being wrong scales with the size of the customer base. The full pipeline plus mandatory deep research and a more demanding evidence bar at every gate.

The hooks enforce that Tier 3 and Tier 4 tasks carry the artifacts their tier requires before they're allowed to advance. The same hook that blocks a directory-state mismatch can read the task's tier from frontmatter and require, at the moment a task moves from 2-ready/ to 3-doing/, that the spec is Status: Approved. Tier 1 and Tier 2 tasks bypass these gates because the tier's frontmatter says they're below the gate's threshold. The tiering is not a cultural rule the team is asked to remember. It's a frontmatter field the runtime reads.

The failure mode this prevents is the silent skip. Without tiering, a team facing a Tier 3 task feels the friction of the full pipeline, decides "this once" to do it as if it were Tier 2, ships, and books the wrong cost on the company's downstream surfaces. With tiering, the question "is this Tier 3?" is asked at the moment the task is created, not at the moment shipping it would be inconvenient, and the answer is recorded as a field the hook reads.

The other failure mode is the over-tiered task. The team that classifies every task as Tier 3 because Tier 3 sounds more rigorous spends three weeks on a Tier 1 problem and learns nothing useful. A small piece of frontmatter discipline at task creation — describe the change in two sentences, choose the tier, justify in one line — pays for itself within the first month.

Phase state as artifact, not thread

The hardest part of a Tier 3 or Tier 4 task is the part most likely to drift without enforcement: keeping the team's view of which phases are complete, which are in progress, and which are blocked, accurate at every moment. The pattern that works is to give every Tier 3 and Tier 4 task a Phase State Table at the top of its file, treat the table as the canonical source of truth, and update the table before any prose anywhere else refers to a phase as done.

The table looks like this and lives near the top of every Tier 3-4 task file:

## Phase State

| # | Phase       | Status       | Artifact                       | Approved   |
|---|-------------|--------------|--------------------------------|------------|
| 1 | TRIAGE      | COMPLETE     | (in-thread)                    | --         |
| 2 | CHALLENGE   | SKIPPED (research already done) | (in-thread)         | --         |
| 3 | CLARIFY     | APPROVED     | ## SPECIFICATION               | 2026-04-26 |
| 4 | RESEARCH    | COMPLETE     | ## RESEARCH FINDINGS           | --         |
| 5 | LOCK        | APPROVED     | SPECIFICATION Status: Locked   | 2026-04-26 |
| 6 | DESIGN      | APPROVED     | ## DESIGN                      | 2026-04-26 |
| 7 | PRE-MORTEM  | SKIPPED (docs-only change, no production code) | ## PRE-MORTEM | -- |
| 8 | TEST PLAN   | APPROVED     | ## TEST PLAN                   | 2026-04-26 |
| 9 | IMPLEMENT   | COMPLETE     | ## IMPLEMENTATION LOG          | --         |
| 10| STAGING E2E | SKIPPED (tooling, no live environment) | ## STAGING E2E | -- |
| 11| VERIFY      | COMPLETE     | ## VERIFICATION                | 2026-04-26 |

Two rules carry the discipline. The first is that skipping a phase requires a written reason in parentheses. A phase marked SKIPPED without a reason fails the pre-commit hook. The reason is short and concrete, and the team can read it back in three months when a similar task is on the table and they want to understand the precedent. Silent skips are how processes erode. Marking a skip with a written reason makes the erosion visible the moment it happens and makes the reasoning available later when a new team member asks why this task didn't have a pre-mortem.

The second rule is that the Phase State Table updates before the user sees any "done" message. A common failure pattern is that the agent finishes the implementation, writes a paragraph saying the task is complete, and only then updates the table. By the time the table is updated, a different session may have already read the prose, taken the prose at face value, and proceeded as if the task were truly done. The fix is mechanical: every phase transition writes the table first, every "done" message references the table state second. The table is the source of truth. Thread prose is never authoritative on phase state.

The done-flow itself benefits from one more piece of structural enforcement: an intermediate Verified status that exists between In Progress and Complete, specifically to block the path where a task ships without a verification artifact.

1. Write the verification section in the task file with PASS/PARTIAL/DEVIATED + evidence per criterion
2. Flip Status: In Progress → Verified  (file stays in 3-doing/)
3. git mv work/{domain}/3-doing/task-*/  work/{domain}/4-done/
4. Flip Status: Verified → Complete

The intermediate Verified state exists so the team can't skip step 1. A task can move from 3-doing/ to 4-done/ only after the file's frontmatter says Verified, and the frontmatter says Verified only after the verification section is written. A pre-commit hook that runs at step 3 reads both conditions and rejects the move when either is missing. Without the Verified state, the team can flip In Progress directly to Complete and git mv to 4-done/ in a single commit, leaving no place where the verification artifact is structurally required. With the Verified state, the artifact is structurally required because the path through the state machine forces it.

Cross-session sync for parallel agents

A founder running five Claude Code sessions in parallel through the day — one drafting a hire spec, one debugging a pipeline, one reviewing a contract, one summarizing yesterday's customer call, one writing this week's investor update — has a problem the previous chapter's substrate doesn't fully solve. Each session reads the workspace at its start. Each session may write to the workspace as it runs. But between starts, each session is independently making decisions, and three of the five sessions are about to re-decide the same architectural question because none of them can see the others.

The pattern that closes this gap is a lightweight watcher process that reads each session's timeline every sixty seconds, computes overlap on the nodes each session is touching, and writes a one-line nudge into a per-session inbox file when overlap exceeds a threshold. The next time a session's PostToolUse hook fires, the hook reads the session's inbox file and injects any unread nudges into the next prompt. The session sees a one-line message — "session 3 just decided X about Y" — alongside its own context, and the founder gets coordination across sessions without having to switch windows.

The hook ordering constraint is the part that decides whether this works. The watcher writes nudges into inbox files. The PostToolUse hook reads the inbox files and injects. For the hook to fire, the hook has to be registered in settings.json ahead of the long-running session, not partway through. A hook installed mid-session is invisible to that session. This is the difference between "works on day one" and "works on the sessions you opened last Tuesday and never closed." The practical consequence is that the watcher and the PostToolUse hook are part of the workspace's bootstrap, not features the founder remembers to enable later. The bootstrap script that creates a new workspace registers all four hook classes the chapter has named — PreToolUse for pre-commit, PostToolUse for cross-session sync, SessionStart for the morning lint-triage report, PreCompact for state-saving before context loss — at the moment the workspace exists and before any agent runs against it.

A worked example clarifies the value. A founder on a Wednesday morning has five sessions open. Without the cross-session sync layer, three of them re-derive the same answer to the question "should we treat the new customer's data residency requirement as in-scope for Q2?" The first arrives at "yes" based on a contract review. The second arrives at "no" based on a cost estimate. The third arrives at "yes, but only for the EU region" based on a partial reading of the legal opinion. The founder has now spent an afternoon receiving three contradictory answers to a question they only meant to ask once. With the cross-session sync layer, the second session's PostToolUse hook receives a nudge from session 1 the moment session 1 commits its decision. The nudge says "session 1 just concluded data-residency = in-scope for Q2 based on contract review of Acme acquisition." The second session reads the nudge, asks the founder whether to take session 1's answer or override, and the founder spends thirty seconds on coordination instead of an afternoon on contradiction.

The cross-session sync layer is the most niche piece of the chapter and the one most likely to be skipped at solo-founder scale. Below the threshold of three concurrent sessions or two team members on the same workspace, the layer is overhead the founder doesn't yet need. Once the threshold is crossed, the layer pays for its setup cost in the first week.

How enforcement layers fail

Four failure modes recur across teams that ship the workspace from the previous chapter and do not ship the enforcement layer this chapter describes. Each one looks like a process problem and is actually a structural problem. The fix in every case is to convert a sentence in a doc into a hook the runtime executes.

Process lives in instructions, not hooks. The team relies on remembering. The rule that "every commit to 4-done/ must have a verification artifact" lives in CLAUDE.md and is enforced only by the model and the team's good intentions. Memory fails by week six. The fix is to convert every "must do X before Y" rule into a hook that blocks the commit or refuses the action. A useful exercise is to grep CLAUDE.md for must-do rules and ask, for each rule, "what runtime gate enforces this?" Every rule without an answer is a rule that will erode within a quarter.

Two-way sync between references and targets. The team starts maintaining the title and status of every referenced task in every document that mentions it. The first week of two-way sync feels organized. By the second week, race conditions appear; by the third, phantom edits and last-writer-wins overwrites are eating valid work. The fix is to store IDs only in callers and resolve at read time. The cost of the resolution layer is a few hundred lines of middleware. The cost of two-way sync is the workspace.

Silent skips. A phase is skipped without a written reason. The skip is invisible the day it happens, and three months later the team's process has eroded in ways nobody can fully reconstruct. The fix is a hook on the Phase State Table that requires SKIPPED to be followed by a parenthetical reason. A skip without a reason fails the commit. The team learns to write the reason at the moment of skipping, which has the side effect of making the team think briefly about whether the skip is justified.

No integrity layer running over the workspace. The nightly sync runs, content flows in, and nothing notices when a hundred broken [[wikilinks]] accumulate, three competing canonical answers exist for the same question, or a decision marked revisit_by: 2026-Q1 is six months stale. The diagnostic is brutal and fast: pick a random decision node from the workspace, check its revisit_by field, check its inbound links, check whether a near-duplicate exists. If two of three are wrong, the workspace has been rotting for a quarter. The fix is the nightly lint plus a triager who closes the previous day's report before opening the day's first session.

Run this week — six tasks to lay the enforcement layer

A six-item time-boxed checklist for the founder or team lead who wants the enforcement layer running by the end of the week.

Wire the directory-vs-state pre-commit hook (2-3 hours). Install the pre-commit framework if it isn't already on the workspace. Add a hook that reads the staged files, parses the directory's third path segment as the stage, parses the file's frontmatter Status field, and rejects the commit when the two disagree. Test by deliberately editing a frontmatter field without doing the matching git mv and confirm the commit is rejected. Output: hook installed in .pre-commit-config.yaml and committed; one deliberately-broken commit rejected as proof.
Tier the last 30 things the team treated as a task (1 hour). List the last 30 task files. Classify each Tier 1 / Tier 2 / Tier 3 / Tier 4. Write a one-page rule for what counts as Tier 3 and Tier 4 in the team's specific domain — what shape of customer impact, what kind of cross-service surface, what magnitude of cost-of-being-wrong. The rule will read disappointingly mechanical and that is the point. Output: 30-row sheet with tier per row, plus a one-page rule for the team's tier definitions.
Add the Phase State Table to one in-flight Tier 3-4 task (1 hour). Pick a Tier 3 or Tier 4 task currently in progress. Add the Phase State Table to the top of its file. Fill in every phase's current state honestly, marking skipped phases with a parenthetical reason. The table will surface at least one phase that was skipped without realizing it. Output: one Tier 3-4 task running with a Phase State Table at the top, and a one-paragraph note on the skip-reasons that surfaced during the exercise.
Wire the integrity lint and assign a triager (90 minutes). A nightly cron-driven script that runs the six checks (broken [[wikilinks]], orphan nodes, frontmatter violations, stale revisit_by decisions, near-duplicates, contradictions between decision-nodes) and writes a morning report to vault/lint-report-YYYY-MM-DD.md. The same person who owns the master skill from the previous chapter owns the triage. Output: cron entry committed, named triager, Day-1 morning report committed showing the workspace's current debt.
Snapshot the tracker into the vault (1 hour). A nightly cron job that exports the current state of the team's tracker (Linear, Jira, or whatever the team uses) to vault/tracker-snapshot-YYYY-MM-DD.md. Update the team's CLAUDE.md to tell agents to grep the snapshot when answering status questions, never to query the tracker API directly. Output: cron entry committed, snapshot committed in the repo, agent prompt updated.
Write the skip-with-reason rule into the pre-commit hook (30 minutes). Extend the pre-commit hook from task 1 with a check on Phase State Tables: any phase status SKIPPED without a parenthetical reason fails the commit. Test by deliberately writing SKIPPED without a reason and confirm the commit is rejected. Output: extended hook committed, one deliberately-blank skip rejected as proof.

For solo founders and small teams (1-5 people)

Skip the cross-session watcher and the four-tier system on day one. Wire two hooks immediately — directory-vs-state agreement and skip-with-reason — and the nightly lint. The compounding starts at the smallest scale because the cost of installing all four hook classes at solo-founder size is two to three hours total, and the cost of not installing them is the same workspace decay that hits larger teams later. The pattern to watch for is the moment the founder starts running more than three concurrent sessions or hires the first full-time team member; both events trigger the cross-session watcher and the tier-the-process work, and at neither moment is the founder going to want to do the work on top of the new operational load.

For team leads introducing AI to a 5-50 person team (chapter's center of gravity)

All four hook classes earn their place. The Phase State Table discipline starts at this scale because senior team members otherwise re-do work that already shipped in someone else's window the previous week. The cross-session watcher matters because parallel sessions per founder commonly hit five or more. The tier-the-process work at this scale needs an explicit launch — a thirty-minute meeting where the team commits to the four-tier definition for their domain, with concrete examples of what counts as Tier 3 and Tier 4 in the team's specific work. Without the explicit commitment, the tiers exist on paper and the team treats every task as Tier 2 because Tier 2 is cheaper to start. The hook that reads the tier field and gates phase transitions is what makes the explicit commitment stick.

For enterprise IT at scale (500+ people)

The same hook classes apply, scaled to a department-by-department rollout rather than a company-wide push on Day 1. Pre-commit hooks are an existing engineering pattern at this scale, so the inside-IT argument is "this is the same CI discipline applied to the knowledge artifacts the agents read, with the same rationale that solved test discipline a generation ago." File data-access approvals at the start of each department's pilot, not when the workspace is ready to launch, because approvals at enterprise scale run on weeks and a workspace that cannot read the team's actual systems is a demo that ages out before it gets used. The integrity lint runs per-department with a per-department triager; the company-wide aggregation is a quarterly review, not a daily one. The cross-session watcher is the highest-value layer for senior operators running parallel agents across multiple high-stakes workflows, and the rollout starts there rather than at the leaf-team level.

The discipline this chapter describes is small per-day cost and large compounding return. Ten to thirty minutes a morning to triage the lint queue. A few hours of one-time setup to register the four hook classes. A five-line YAML file to enforce skip-with-reason. The compounding return is that the workspace at month twelve is still a single source of truth rather than three forks of competing canonical answers, the founder's parallel sessions don't produce contradictions on the questions they share, and the team's processes erode visibly in lint reports rather than invisibly in slow drift. Chapter 4.4 picks up there: the Process Audit is the one-to-two-day discovery sprint that turns this hardened workspace into a scored map of what to automate first, with explicit Build / Buy / Configure / Skip decisions for the candidates the audit surfaces.