The Playbook
The Process Audit
Organizations automate the wrong processes because they audit what is documented rather than what is real. A 1-2 day discovery sprint surfaces actual workflows, shadow processes, and decision bottlenecks through structured interviews, then converges on a scored initiative map that becomes the input to the first pipeline.
Organizations automate the wrong processes because they audit what is documented rather than what is real. The documented procedures capture the novice version of the work. The expert version lives in people's heads and informal channels: undocumented workarounds filling tool gaps, data that exists somewhere but is inaccessible to the people who need it, and work that queues behind one person's judgment. A team that scores its automation candidates from the documented procedures will pick processes that are tidy on the page and irrelevant in practice. The 1-to-2-day discovery sprint described in this chapter surfaces the three categories of real work, scores them on impact and feasibility, and converges on a single deliverable: an initiative map that names the first pipeline target and routes it into Chapter 4.5.
The discovery sprint runs in one to two days, not one to two months
The sprint produces three artifacts: a CEO synthesis page, a set of stakeholder synthesis pages, and a working list of shadow processes / workarounds / dead zones that the audit will score. The pacing is tight on purpose. Teams that stretch the discovery sprint into a multi-week project end up scoring their initial hypothesis instead of what the interviews actually surfaced.
The CEO interview. Sixty to ninety minutes, recorded with consent, transcribed afterward. Five prompts pull out priorities, readiness signals, and champion candidates reliably:
- "What would your company do differently if every knowledge-work step cost five cents?" — strategic framing. The CEO answers from the position of unlimited execution capacity, which surfaces the strategic ambitions that current operating cost suppresses.
- "Which three functions have the highest information-processing load?" — targeting. The answer narrows the audit to the functions where automation will have visible effect on the next quarter's reporting.
- "Where do you personally waste time every week?" — the CEO's own pain list, which is often the highest-leverage start because the CEO has unique authority to remove the obstacles automation surfaces.
- "Who adopted a new tool without being told to?" — identifies the AI Champion. The right person to lead the first pipeline is rarely the most senior person in the function. It is the person who has already started experimenting on their own.
- "Which manual process, if fixed today, would change your quarter?" — pulls out the unambiguous Quick Win. The CEO's answer is usually the candidate to put at the top of the initiative map regardless of where the scoring lands.
Stakeholder interviews. Forty-five minutes each, four to five people across the functions the CEO named, plus one outsider for calibration. Four prompts surface the three categories of real work the documented procedures hide:
- "Walk me through what actually happens when [trigger event] occurs." — surfaces the shadow process. The "actually" matters. Stakeholders default to describing the documented version; the prompt forces them to describe what they did this morning.
- "What do you do when the standard process doesn't work?" — surfaces the workaround. Most workarounds have been running so long that the people who built them no longer think of them as workarounds; they have to be prompted to remember.
- "Where do you go to get information that isn't in the official system?" — surfaces the dead zone. Stakeholders typically name a specific shared drive, a senior colleague's inbox, or a personal spreadsheet that holds knowledge nobody else can access.
- "What would you cut from your job if you could?" — surfaces low-hanging fruit the stakeholder themselves considers wasteful. The candidates from this answer have a built-in advocate and tend to clear the political hurdles fastest.
Pattern convergence after roughly five interviews signals the process map is complete enough to score. The sixth and seventh interviews repeat themes the first five already surfaced; the marginal signal drops sharply. Teams that book ten interviews on the assumption that more is more end up with the same map and a longer schedule.
Hygiene gaps surface first. The audit nearly always finds measurable hygiene gaps before it finds anything strategic: a share of warm inbound leads that never get a response, invoices coded late and missing the discount window, support tickets queued behind one person's review for days. These are the reliable Quick Wins because they are high-frequency, low-risk, easily measured, and visible on the next ROI report. Practitioners audit-fatigue out of these by Day 3 because the candidates feel un-glamorous, and skipping them is the most common mistake in the second wave of the audit. The hygiene gaps are usually the right place to start because they prove the audit produces results before the more ambitious initiatives need political capital.
The process map captures real workflow, not the org chart
Every candidate process the audit surfaces gets entered into a standardized worksheet so the scoring step has comparable inputs.
The columns: internal name (what the team calls it, not what the documentation calls it), department, trigger event (what makes the process start), inputs (sources and formats), actual steps (not the documented ones), decision points (where humans currently apply judgment), outputs, tools involved, handoffs between people and systems, owner (who is accountable today), frequency (per week or per month), and time per instance. The worksheet is a single shared spreadsheet with one row per process. The audit team fills it during and immediately after each interview while the conversation is fresh.
Data inventory as method. Alongside the process map, the audit team runs a half-day data inventory. The team walks each function with its owner, marks every system the function touches (chat, mail, CRM, ticketing, code, data warehouse, internal tools), classifies each system as API-accessible, manual-export, or locked, and flags any system that will require IT data-access approval. The enterprise critical-path note matters: data-access approvals at most large organizations take two to four weeks. File them on Day 1 of the audit, not Day 1 of the pipeline build, because the pipeline cannot start without the data and the approval queue is single-threaded.
Four exclusion criteria filter non-candidates before scoring. Audit credibility comes from what it excludes as much as from what it includes. A scored candidate that should have been excluded earlier wastes scoring effort and signals to the team that the audit is loose.
- Physical. The process requires hands or physical presence (loading a delivery truck, conducting a clinical exam, signing a paper document). Out of scope for software automation; flag for robotics or human-augmentation tracks instead.
- Low-frequency. The process runs less than once a week (year-end planning, occasional regulatory filings). The cost of building reliable automation rarely beats the cost of doing the work manually a few times a year. Flag for revisit at the next audit cycle.
- Novel-judgment. Every instance is unique with no recurring pattern (hiring a VP, negotiating a strategic partnership, drafting a one-of-a-kind contract). The agent has nothing to learn from history because there is no history to learn from. Out of scope until the underlying decision becomes more pattern-shaped.
- Regulatory-restricted. Law or contract requires human execution (a court appearance, a notarized signature, a regulator-required individual sign-off). Out of scope regardless of feasibility.
A short worked example sharpens the criteria. Included on the first pass: invoice three-way match (frequent, structured, low-risk), inbound-lead triage (clear inputs, measurable error rate), support-ticket routing (high-volume, pattern-based). Excluded: hiring a VP (novel-judgment plus relational), court appearance (regulatory-restricted), once-a-year board prep (low-frequency plus high-stakes; usually borderline and out of first-pass).
The scoring framework converts process candidates into a ranked map
Each surviving candidate gets two scores: a raw Impact Score and a feasibility-weighted Final Score. The math is intentionally simple so the audit team can apply it consistently across a dozen candidates in an afternoon.
Impact Score = Frequency × Time per instance × Error rate × Scalability. Each factor is rated 1 to 5, so the raw range runs from 1 to 625. A worked example for invoice three-way match: Frequency 4 (daily), Time 3 (30 minutes average per instance), Error 3 (15% manual error rate), Scalability 5 (volume grows with revenue). Impact = 4 × 3 × 3 × 5 = 180. The Impact Score answers a single question: how much pain does this process create per unit time, weighted by how that pain scales as the business grows.
Three feasibility weights refine raw impact. Each weight runs 0.0 to 1.0 and gets applied multiplicatively.
- Data Readiness. Is the data the agent needs programmatically accessible today, or behind manual exports, or behind locked UIs? A process scoring 1.0 has all required data in API-accessible systems with stable schemas. A process scoring 0.3 needs a human to manually export a CSV before each run.
- Risk Tolerance. What is the blast radius of a wrong answer? A process scoring 1.0 generates reversible drafts a human signs off on. A process scoring 0.2 makes irreversible commitments (sending money, publishing public statements, changing customer records).
- Measurability. Can success be defined with a number? A process scoring 1.0 has a clear ground truth (correct invoice match versus incorrect). A process scoring 0.4 produces outputs that require subjective human judgment to evaluate.
Final Score = Impact × Data Readiness × Risk Tolerance × Measurability. Continuing the invoice example: 180 × 0.8 × 0.5 × 1.0 = 72. The factor that pulled the Final Score down was Risk Tolerance, because a wrong invoice match has real financial consequences. The team can either accept the lower score (and rank invoice match accordingly) or invest in the safety harness that lifts Risk Tolerance to 0.9, which would push the Final Score to 130 and reorder the priority list.
Quantify everything in money. The two categories that survive the scoring step are cost reduction (this process becomes cheaper) and capability opening (this enables revenue or scale that was previously impossible). Both are measurable in dollars. The selection rule is to choose by maximum absolute impact, not percentage improvement. A 100% effect on three people is smaller than a 20% effect on fifty people, and the audit's job is to surface the larger absolute number even when the percentage looks less dramatic. Practitioners default to the impressive percentage because it photographs better in the deck; the absolute number is what shows up in the P&L next quarter.
Decision velocity is itself a readiness signal. Track time to action, override rate, governance coverage, and ROI per decision stream for each candidate process. The Bezos two-way-doors heuristic from the 2016 Amazon shareholder letter is the usable shorthand: irreversible decisions deserve full deliberation, and reversible decisions should be made with "somewhere around 70% of the information you wish you had" rather than 90% Track time to action, override rate, governance coverage, and ROI per decision stream for each candidate process. The Bezos two-way-doors heuristic from the 2016 Amazon shareholder letter is the usable shorthand: irreversible decisions deserve full deliberation, and reversible decisions should be made with "somewhere around 70% of the information you wish you had" rather than 90%↗. The 2026 evidence on AI-assisted decisions sharpens the claim. A controlled study published in the ACM CHI 2025 Proceedings (April 2025) found that designs reducing cognitive friction between agent output and human decision can increase accuracy and reduce interaction rounds simultaneously. Speed and quality come apart only when the curation layer between agent and human is missing or sloppy; a well-designed curation layer keeps both rising together. Field contexts surface a caveat worth pricing into the audit: speed gains can coexist with domain-specific quality trade-offs across modalities, so the scoring weights have to reflect the quality metrics that actually matter for the workflow, not a generic "accuracy" proxy. High process velocity signals readiness for agents. Low velocity signals a decision process that needs redesign before automation, because automating slow ungoverned decisions produces fast ungoverned bad decisions.
Build, Buy, Configure, or Skip
Four classifications cover every process on the initiative map. The choice is driven by the three feasibility weights plus strategic value, not by whether the team has the engineering capacity to write the code.
- Build. Proprietary logic, no vendor has it, encoded judgment becomes a durable moat. The team invests Stage 1 skill-design effort because the resulting skill is the asset that compounds. A first-pass audit at a Stage 1 team typically finds at most one or two Build candidates. The temptation to classify every interesting process as Build is the most common audit failure mode.
- Buy. Commodity workflow, mature vendor, low lock-in risk, outcome-aligned pricing. The team wires the vendor's product into the shared workspace and treats the integration as the engineering work; the workflow itself is not where the team spends its scarce skill-design budget.
- Configure. Existing SaaS with good API or MCP access. The team keeps the platform and customizes via skill files on the agentic-first stack. Most brownfield candidates land here. The Configure path is where the audit's most-frequent classification typically clusters, because the team already has the systems of record and the work is to write the skills that read and write to them.
- Skip. The process will be absorbed by a planned platform change within twelve months, or the blast radius of a wrong answer makes full automation irrational at the team's current error-cost. The classification is "monitor and revisit at the next audit cycle," not "give up forever."
The "everything must be Build" classification is the failure mode of an internal team mis-identifying commodity work as strategic. A Stage 1 team of ten typically has at most one or two Build items in its first initiative map. Most rows are Configure or Skip. Audits that produce a list of fifteen Build candidates are usually scoring the team's enthusiasm rather than the work's strategic value, and the resulting backlog rarely ships on schedule.
The initiative map is the chapter's deliverable
The audit converges on a single spreadsheet: the initiative map. Eleven columns, one row per candidate process, sorted by Final Score. The columns: Process Name, Department, Current Cost (annualized), AI Approach (Build / Buy / Configure / Skip), Impact Score, Final Score, Complexity (low / medium / high), Data Readiness (the 0.0-1.0 weight from scoring), Owner (who carries the work), Timeline (target ship date for the pilot), Classification (Quick Win or Strategic Initiative).
A typical mid-market audit produces eight to fifteen rows on the first pass. An enterprise-wide first audit at a large organization typically produces thirty or more, often spanning multiple departments where the audit team has run the discovery sprint in parallel.
Two tiers, sequenced deliberately. Quick Wins ship in one to four weeks against minimal infrastructure and produce proof-of-concept momentum the team needs to fund the next round. Strategic Initiatives ship in three to six months and require substrate investment and cross-department coordination. Sequence matters because the political capital that funds Strategic Initiatives comes from the trust the Quick Wins built. Teams that lead with the Strategic Initiative usually run out of credibility before the longer build can demonstrate results.
First-pipeline selection. Three criteria converge to the right starting target: clear inputs, deterministic outputs, and a measurable error rate. Exception resolution in finance or operations outperforms open-ended creative tasks as a first target because the success signal is a single number that moves visibly. The champion identified during the CEO interview demonstrates to peers by leading with a working artifact rather than running a workshop, which means the audit team picks the candidate where the demo is fastest to assemble. The top-ranked Build candidate from the initiative map becomes the first pipeline. Chapter 4.5 picks up the build cadence from there: how to construct the stack small enough to hold in the team's head, how to design the eval before writing the code, and how to deploy the pipeline in shadow mode against the human baseline before it earns the right to run at full volume.
Run this week — six tasks to deliver an initiative map
A week-by-week protocol for the team that wants the audit completed and a first-pipeline target named by Friday.
- Schedule and run the CEO interview (90 minutes plus prep). Send the five prompts ahead of time so the CEO can pre-think; book a single 60-90 minute block; record (with consent) and transcribe immediately. Output: a written CEO synthesis page naming top three function targets, highest personal-time-waste candidates, and the named champion.
- Schedule four-to-five stakeholder interviews (half day to schedule, half day to run). Pick interviewees across functions named by the CEO plus one outsider for calibration. Send the four prompts ahead. Stop scheduling beyond five unless pattern convergence has not arrived. Output: one synthesis page per stakeholder; running list of shadow processes, workarounds, and dead zones.
- Walk three functions for the data inventory (half day). With each function's owner, mark every system the function touches; classify API-accessible, manual-export, or locked; flag IT-approval critical-path items. Output: a data-inventory table with one row per system plus owner plus access classification.
- Score the top eight processes (half day). Pull from the synthesis pages. Apply Impact Score (Frequency × Time × Error rate × Scalability) and the three feasibility weights (Data Readiness × Risk Tolerance × Measurability). Rank by Final Score. Output: an eight-row scoring sheet with raw Impact, all three weights, and Final Score.
- Classify Build / Buy / Configure / Skip and assign owners (2 hours). For each scored row, decide the classification driven by feasibility weights plus strategic value. Name an owner per row, the person who will actually carry the work. Most rows land in Configure or Skip; Build is rare. Output: the classified scoring sheet with named owners.
- Deliver the initiative map to the CEO (1 hour plus political work). Single spreadsheet with the eleven-column shape (Process Name, Department, Current Cost, AI Approach, Impact Score, Final Score, Complexity, Data Readiness, Owner, Timeline, Classification). Top-ranked Build candidate flagged as the first-pipeline target for 4.5. CEO confirms ownership and timeline; champion gets the first-pipeline assignment. Output: signed-off initiative map; champion has start date for the first pipeline.
For startups (5-50 people)
Run the audit as a half day rather than two full days. The founder interviews themselves and three to four early hires (engineering, sales or growth, operations or finance) instead of a four-to-five-person stakeholder set. Eight to twelve candidate processes scored in one sitting, which is enough at this scale because the founder already knows where the gaps are. The biggest time trap at startup scale is trying to score every process on the org chart instead of stopping at pattern convergence after the top five to eight. Stop early; the audit's purpose is to pick the first pipeline, not to map the whole company.
For enterprise (500+ people)
Plan two to three weeks per department. File IT data-access requests on Day 1 of the audit, not when the first pipeline is ready to build. Approvals are the critical path on Month 2 of the scaling arc, and the audit team that waits until the engineering work is ready to start usually loses three to four weeks waiting for the access tickets to clear. Run the first department's audit with a central Pirate-plus-Architect team that has done one before; hand the audit template, scoring rubric, and Build / Buy / Configure / Skip classification guide to the next cohort so the second department can run its own audit faster than the first. Approval-queue parallelization is the optimization that compounds: file access requests for departments two and three during department one's audit, not after.