A practical tutorial for producing Product Requirements Documents (PRDs) for large-scale web projects when AI coding agents (Claude Code, Cursor, Copilot, Codex) are part of the team.
Table of Contents
- Why PRDs Are Different Now
- Seven Best Practices
- The Sections of a Good PRD
- AI-Specific Patterns
- Lifecycle: Writing, Reviewing, Maintaining
- Pre-Approval Checklist
- Templates
- Common Mistakes
1. Why PRDs Are Different Now
A traditional PRD is a human-to-human document. It assumes shared context — the reader knows the codebase, the team, the history. Gaps are filled by hallway conversations.
An AI-era PRD is read by two audiences simultaneously:
- Humans need why: motivation, tradeoffs, business context, history.
- AI agents need what: precise, unambiguous, machine-parseable specifications they can ground their code generation in.
The same paragraph that reads as “obvious” to a senior engineer can produce three subtly wrong implementations from an AI agent. AI tools fabricate when context is missing — they pattern-match from training data instead of project specifics. A vague PRD produces vague code.
This shifts what “good” looks like:
| Traditional PRD | AI-era PRD |
|---|---|
| Prose-heavy | Mix of prose + structured artifacts |
| “Should be fast” | “p95 ≤ 250ms, verified by pnpm test:perf” |
| Revision history | Decision log with rationale |
| Stored in Confluence | Stored in repo alongside code |
| Updated when remembered | Updated as part of every PR |
| Goals, non-goals | Goals, non-goals, anti-requirements |
| Implicit conventions | Explicit “AI operating manual” |
2. Seven Best Practices
BP1. Write for two readers simultaneously
Humans need narrative; AI needs structure. Mix both.
- Front-load what as machine-parseable artifacts (tables, code blocks, type definitions, ID’d requirements). Back-load why as prose.
- Use RFC 2119 keywords consistently: MUST, SHOULD, MAY, MUST NOT. AI tools weight these heavily.
- Distinguish normative content (binding) from informative content (illustrative). Mark them.
- No pronouns across sections. “It must validate the token” is fine inside one bullet but breaks when the AI loads only that section. Repeat the noun.
Add a legend at the top of every PRD:
|
|
BP2. Make every requirement executable
A non-executable requirement is a wish. An executable requirement has a verification command or test.
Bad:
The system should be fast.
Better:
P95 latency of
/api/orders≤ 600ms.
Good:
|
|
Three things in one block: what, how to verify, why. AI agents now have a complete grounding triple.
BP3. Layered specificity (zoom levels)
Long PRDs fail because readers lose the thread. Structure so any reader can extract the right layer:
| Layer | Length | Purpose | Audience |
|---|---|---|---|
| L0 — TL;DR | 1 paragraph | What and why, in one breath | Execs, AI system prompt |
| L1 — Goals/non-goals | 1 page | Scope boundary | All readers |
| L2 — User journeys | 2–5 pages | Behavior | PM, QA, designers |
| L3 — Contracts | 5–20 pages | Schemas, endpoints, state machines | Engineers, AI agents |
| L4 — Operations | 2–5 pages | Rollout, rollback, monitoring | SRE, on-call |
| L5 — Appendix | unbounded | Decision log, glossary, references | Future readers |
The L0 TL;DR is the single most valuable section. AI tools with limited context budgets pin it as the snippet they remember. Write it last, after the rest of the doc has clarified your thinking.
BP4. Co-locate with code; make AI find it
The PRD is worthless if the AI agent doesn’t load it. Three mechanisms:
Path conventions AI tools index by default:
/docs/prd/*.md/specs/*.md/.cursor/rules/*.mdc(Cursor)CLAUDE.mdreferences (Claude Code)
Bidirectional linking. PRDs reference CLAUDE.md; CLAUDE.md references active PRDs. Example CLAUDE.md snippet:
|
|
Stable section anchors. When the AI says “see PRD §7.4”, that should still resolve in 6 months. Use ID-stable headings, not narrative ones (## 7.4 Auth & session, not ## Auth (after the recent rewrite)).
BP5. Decision log over revision history
Every “we chose X over Y” must be capturable, searchable, and dated. Without this, AI agents will happily undo your decisions because the prose framing was lost.
Replace “v1.2 — updated by Alice” with ADRs (Architecture Decision Records). One file per decision, in docs/adr/:
|
|
BP6. Define non-goals AND anti-requirements
Non-goal = “we will not do X (for now)”. Defers scope. Anti-requirement = “we will not do X, ever, even if it seems obvious to add”. Preempts AI over-engineering.
Anti-requirements are gold for AI agents because they catch the most common over-delivery patterns:
|
|
BP7. Slice vertically, not horizontally
A horizontal slice (“build all the API routes, then all the UI”) leaves nothing demonstrable until the end. A vertical slice (“login screen + login route handler + middleware decode”) works end-to-end immediately.
Each milestone should:
- Ship a thin, end-to-end working feature
- Be independently demoable in 2 minutes
- Be independently rollback-able
- Be implementable by an AI agent in one or two focused sessions
Add a demo script per milestone — it doubles as an AI verification prompt:
|
|
3. The Sections of a Good PRD
Below is a complete table of contents with templates. Not every project needs every section, but the order matters: top-down readability is a feature.
§0. TL;DR
One paragraph. What and why, in one breath. Write this last.
|
|
§1. Status & ownership
|
|
§2. Context
Three paragraphs, no more:
- Current state — factual, present tense.
- What changed externally — why now? (regulation, customer ask, scale event, technical debt cliff)
- What we’ve already tried or considered — sets up the decision log.
§3. Problem statement
One sentence problem + one sentence solution direction. Test: if you remove this section, do readers still know what the project is?
§4. Goals
Goals get an ID, a sentence, and a measure — never a goal without a measure.
|
|
Anti-pattern: “G3: Improve security.” Unmeasurable, unverifiable, useless.
§5. Non-goals & anti-requirements
|
|
§6. Users & scenarios
Three sub-sections:
6.1 Personas — who, with permission scopes:
|
|
6.2 Routes / surfaces — exhaustive table of public surfaces:
|
|
6.3 User journeys with risk — capture happy path AND unhappy paths:
|
|
§7. Architecture
- Topology diagrams before/after (ASCII art is fine and AI-readable)
- What’s inherited / what isn’t for migrations
- Component-level deep dives for load-bearing pieces
For each load-bearing component, include:
- Inputs / outputs (interface)
- State (what does it own?)
- Failure modes (what breaks, and how do we know?)
- Concurrency model (single-flight? lock? lease?)
§8. API contracts
For each endpoint, the contract block:
|
|
A handler with this contract is implementable by an AI agent in one shot.
§9. Data model
- ER diagram or table
- Schema definitions as code (TypeScript / Prisma / SQL) — not prose
- Data lifecycle: creation, update, soft-delete, retention
- PII classification & compliance fields
|
|
§10. UI/UX specifications
- Design system reference (tokens, components used)
- Wireframes or Figma links per screen
- Responsive breakpoints
- Accessibility (WCAG level, keyboard, ARIA)
- i18n / l10n scope
For each screen, document all states: empty, loading, partial, error, success.
§11. Non-functional requirements
Categories to always cover:
- Latency — P50, P95, P99 (never just “fast”)
- Throughput — RPS sustained, peak, duration
- Availability — SLO % + tolerance for AZ/region failure
- Capacity — concurrent users, DB connections, memory ceiling
- Security — auth, secrets, threat model coverage
- Privacy — PII handling, retention, redaction
- Compatibility — browsers, OS, screen sizes
- Accessibility — WCAG level, keyboard, screen reader, contrast
- i18n / l10n
- Build / CI — build time, bundle budgets
- Cost — per-RPS infra cost ceiling (often forgotten)
|
|
§12. Observability
The pattern: every requirement says what to log/measure, what threshold triggers an alert, who gets paged.
|
|
Cover:
- Structured log fields (with redaction rules)
- Metric names and labels
- Dashboards (which boards exist, what panels)
- Alerts (threshold, window, severity, recipient)
§13. Risks
|
|
Run a pre-mortem before approval: a 30-minute “imagine it failed — why?” session. Add the top 3 unspoken risks.
§14. Test plan
Pyramid + per-milestone gates. A milestone is not “done” until its gate is green.
|
|
Cover: unit, integration, E2E, load, security, cutover smoke.
§15. Migration / rollout plan
Each milestone: scope, acceptance, rollback, demo script.
|
|
§16. Open questions
Every open question owned and timed:
|
|
When decided, update the row with the answer and link to the ADR.
§17. Glossary
For domain terms, define what it is, what it isn’t, what it’s often confused with:
|
|
§18. Appendix
- 18.1 File-level pointers — paths to authoritative files in the current codebase. Critical for AI grounding.
- 18.2 Decision log seed — list of decisions to promote to ADRs.
- 18.3 Out-of-band assets — DNS names, certs, third-party config required before kickoff.
- 18.4 Fixtures — captured payloads (OAuth callbacks, payment notifies, SSE chunks) for replay tests.
§19. AI agent operating manual
Dedicated section telling AI agents how to do the work, not just what:
|
|
4. AI-Specific Patterns
Pattern 1: Traceability matrix
A table linking REQ → test → code:
|
|
The #1 thing AI agents lack is knowing which test verifies which requirement. Hand-maintain this matrix; it pays back tenfold.
Pattern 2: Captured-fixture appendix
For payment notifies, OAuth callbacks, SSE chunks — capture real payloads:
|
|
These fixtures are the input to integration tests. AI agents can implement against fixtures without needing live services.
Pattern 3: Hallucination guards
A short list of facts the AI tends to get wrong about your project, restated explicitly. See §19 above. This catches AI agents who pattern-match from training-data prevalence rather than your specifics.
Pattern 4: Demo script as AI prompt
A demo script doubles as a verification prompt: “run the M1 demo script using Playwright; report any step that fails.” If the script is precise enough for a human QA, it’s precise enough for an AI agent.
Pattern 5: Stable IDs for everything
Every requirement, goal, risk, journey, route, and milestone gets a stable ID:
- Goals:
G1,G2, … - Non-goals:
NG1,NG2, … - Anti-requirements:
AR1,AR2, … - Requirements:
REQ-AUTH-001,REQ-EDGE-001, … - Journeys:
J1,J2, … - Risks:
R1,R2, … - Open questions:
Q1,Q2, … - Non-functional:
NFR-1,NFR-2, … - Milestones:
M1,M2, …
Stable IDs survive rewrites and let AI agents reference exactly. PR descriptions can say “implements REQ-AUTH-003 and REQ-AUTH-005, mitigates R5” — instantly searchable, instantly verifiable.
5. Lifecycle: Writing, Reviewing, Maintaining
Writing
- Outline first, in 30 minutes. Just headings, no content. Get sign-off on shape before drafting.
- Draft top-down. TL;DR → goals → architecture → contracts → ops. Don’t start with the API table.
- Draft with the AI in the loop. Ask the AI to identify ambiguities — it’s an excellent ambiguity detector. Paste a section and ask “what could be interpreted multiple ways here?”
- Examples first, prose second. When stuck, write the example payload, then the prose explaining it.
- TL;DR last. You don’t know what the doc says until you’ve finished it.
Reviewing
Use the Pre-Approval Checklist below. Reject PRDs that don’t meet it; reviewing a half-baked PRD wastes everyone’s time.
Specific reviewer assignments:
- Frontend lead — UI/UX, journeys, routes
- Backend lead — API contracts, data model, architecture
- Platform/SRE — observability, NFRs, rollout, rollback
- Security — auth, threat model, NFRs related to security
- Product — goals, non-goals, success metrics
Maintaining
A PRD that doesn’t get updated becomes a lying document — the worst kind. Worse than no PRD, because AI agents will trust the lies.
Rules:
- PR template includes “PRD updated? Y/N + REQ-* IDs touched”
- Status field ratchets Draft → Approved → In-Flight → Shipped → Superseded; never goes backward without a comment
- After Shipped, freeze and archive. Link to its successor.
- Open questions resolved within milestone get promoted to ADRs and removed from §16
last_updatedis real, updated on every meaningful edit- Quarterly audit. Walk every PRD; archive shipped ones; flag stale ones.
When to split a PRD
A PRD over ~500 lines is a smell. Split when:
- Two sections could ship independently → two PRDs with shared context doc
- Ownership splits across teams → each team owns its PRD
- A section is reused across PRDs → extract to a standalone doc, reference from multiple PRDs
Common extraction candidates:
- Auth/session design →
docs/auth/session-design.md - Streaming architecture →
docs/architecture/sse-streaming.md - Glossary →
docs/glossary.md - ADRs →
docs/adr/ADR-XXXX.md
6. Pre-Approval Checklist
Use this before moving status from Draft to Approved:
- §0 TL;DR is exactly one paragraph
- Every goal has a measure
- Every non-goal is concrete and non-aspirational
- §5.1 anti-requirements section exists
- Every REQ-* is atomic, ID’d, and testable
- Every endpoint in §8 has request/response/error schemas
- Every NFR has a number, not an adjective
- Every risk has owner + mitigation + status
- Every open question has owner + decide-by date
- §19 AI operating manual section exists
- Glossary distinguishes commonly-confused terms
- §18.1 file-level pointers list authoritative current-state files
- Demo script per milestone in §15
- Verification command per REQ (or batched in §14)
- Pre-mortem held; top risks captured
- All required reviewers listed in §1 and have approved
- PRD is committed in repo at a stable path
-
CLAUDE.md(or equivalent) references the PRD
7. Templates
Minimal PRD skeleton
Copy this when starting a new PRD:
|
|
Requirement template
|
|
ADR template
|
|
Endpoint contract template
|
|
8. Common Mistakes
“We’ll figure it out as we go”
PRDs without contracts force every implementer (human or AI) to invent. AI agents will invent confidently and inconsistently. Specify the contract before writing the code, not after.
Goals without measures
“Improve performance.” Improve from what to what, measured how? An AI agent reading this writes plausibly-fast code that doesn’t necessarily move the metric. Always: number, threshold, verification command.
Hidden assumptions
If a senior engineer would “just know” something, write it down. AI agents don’t have hallway conversations. The phrase “obviously we’d …” is a signal you have a hidden assumption — surface it.
Prose-only specifications
A paragraph describing a JSON shape is harder to ground than the JSON shape itself. When in doubt, show the artifact.
One PRD to rule them all
A 2000-line PRD is unreviewable, unmaintained, and lies within months. Split aggressively. Reuse shared design docs. PRDs are scoped to projects, not products.
Stale PRDs left in the repo
If a PRD is shipped or abandoned, archive it. Move to docs/prd/archive/ with status updated. Otherwise AI agents will keep loading it as if it’s current.
“We’ll write the tests later”
If the test isn’t specified, the requirement isn’t real. AI agents will write code that compiles and looks plausible — but without tests as anchors, you can’t know if it actually meets the requirement. Specify verification with the requirement.
No anti-requirements
You forgot to say “do NOT add an ORM” — so the AI added an ORM. The default behavior of AI agents is over-delivery. Anti-requirements are the brakes.
Skipping the AI operating manual
You assumed the AI would “just figure out” your conventions. Three different agents make three different choices, and your codebase fragments. The AI operating manual is cheap insurance.
No demo script
You said “M3 done when checkout works” — but works how? Nobody can verify. A demo script is a 2-minute scripted walkthrough; if you can’t write one, the milestone isn’t well-defined.
Further reading
- RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
- ADR GitHub organization — examples and tooling
- Google’s “Design Docs at Google” essay
- Anthropic’s Claude Code documentation on
CLAUDE.md
This is a living document. Improvements welcome — open a PR.