The Anatomy of a Claude Prompt  ·  Technical Deep Dive
The Anatomy of a Claude Prompt
A Technical Deep Dive into Precision Prompting
Select a chapter from the index
Chapter I
Task
Directive · Outcome · Intent Encoding
"I want to [TASK] so that [SUCCESS CRITERIA]." — The atomic unit of every prompt.

The Task section is the intent encoding layer of your prompt. It is where you compress a potentially complex goal into two tightly coupled clauses: the directive (what operation should be performed) and the outcome (what state should exist when the operation is complete). Claude's language model architecture means it does not reason about goals the way a human engineer would — it performs next-token prediction conditioned on everything in its context window. The Task section's job is to make the intended goal maximally unambiguous inside that context so that the most probable token sequence it generates is also the most useful one.

Why outcome matters more than directive

A directive alone — "write a HubSpot connector" — is underspecified. It describes an action but not its success condition. Claude will fill the gap with its own prior over what "a HubSpot connector" looks like, drawn from training data. That prior may not match your codebase's conventions, your quality bar, or your integration requirements. Adding an outcome — "so that all contacts are normalised into the canonical table with no data loss, the mapper passes all existing tests, and the PR checklist is satisfied" — shifts the distribution of probable outputs dramatically. Claude now has a verifiable end-state to optimise toward.

The cognitive load transfer problem

Every ambiguity in your Task section is cognitive load that gets transferred to Claude — and Claude resolves ambiguity using priors, not your intent. The more precisely you specify what done looks like, the less Claude has to infer, and the less likely it is to infer incorrectly. Think of it as reducing the entropy of the output distribution: a vague task has high entropy (many plausible outputs); a precise task with outcome conditioning has low entropy (a narrow range of correct outputs).

Structuring the directive

The directive should identify: (1) the operation (write, refactor, debug, explain, design, test), (2) the object (which class, file, system, or concept), and (3) the scope (new from scratch, edit existing, extend interface). Missing any of these forces Claude to assume defaults that may be wrong.

❌ Underspecified
"Write a connector for Zendesk."
// Operation ✓, object ✓, scope ✗, outcome ✗
✓ Well-formed
"Write a new ZendeskTicketConnector from scratch so that tickets appear in the canonical table, deduplicated on external_id, with mvn test passing."
Outcome as a verifiability contract

The outcome clause should be falsifiable. "High-quality output" is not falsifiable — no test can determine pass/fail. "mvn test passes with zero new failures, all canonical fields populated, no hardcoded credentials" is falsifiable. You or a CI system can run it and get a binary result. When the outcome is falsifiable, Claude can self-evaluate its own output against it before presenting it to you — which measurably reduces iteration cycles.

DataHarness example (complete): "I want to implement a new ZendeskTicketConnector and its corresponding ZendeskTicketFieldMapper from scratch so that: (1) all open Zendesk tickets are fetched via OAuth2 pagination and normalised into CanonicalRecord using the canonical fields in references/schema.md, (2) ZendeskTicketFieldMapperTest passes with coverage of happy path, null fields, and missing id, and (3) the PR checklist in references/adding-a-source.md is fully satisfied."
Chapter II
Context Files
Context Window · Knowledge Loading · Attention Steering
"First, read these files completely before responding." — A mandatory pre-pass, not a suggestion.

Claude operates with a finite context window. Everything inside that window has equal "recency" from the model's perspective — there is no persistent memory between turns unless you explicitly carry it forward. Context Files is the mechanism by which you pre-load domain-specific knowledge into the window before any generation occurs, ensuring that Claude's outputs are conditioned on your project's actual state rather than statistical priors from training.

The ordering constraint is load-bearing

The phrase "completely before responding" is not stylistic — it is an explicit sequential processing constraint. Without it, Claude may begin generating in a streaming fashion before fully processing all referenced files, anchoring on early content and progressively discounting later material. This is especially dangerous when a later file (e.g., schema.md) contradicts an assumption formed from an earlier one. The ordering constraint forces a complete read-then-respond pattern.

Anatomy of a well-annotated file list

Each entry in your Context Files list should follow the pattern: filenameone-sentence description of what it constrains. The description serves two functions: it tells Claude when to consult the file during reasoning (not just that it exists), and it exposes gaps — if you can't write the one-sentence description, the file probably doesn't have a clear purpose.

Annotated context list — DataHarness connector task
SKILL.md — architecture, all core interfaces, style rules, routing table to references
references/adding-a-source.md — 5-step checklist; governs file naming, class structure, PR requirements
references/schema.md — canonical field vocabulary; consult before naming any mapped field
references/testing.md — CanonicalAssert API, RawRecordFixture patterns, test tagging conventions
Context selection strategy

Over-inclusion is as harmful as under-inclusion. Every token of context you add compresses the "attention budget" available for task-relevant reasoning. Include only files that constrain or inform the specific output you need. For a new connector: architecture + process + schema + testing. For a debugging task: architecture + debugging guide only. The selection itself communicates task scope to Claude — a long file list signals a complex, multi-constraint task; a short one signals a focused, bounded task.

File ordering within the list

Order matters. Place files that establish global constraints (SKILL.md, PROMPT.md) first — they set the frame. Place files that establish task-specific constraints (adding-a-source.md, schema.md) second. Place files that provide reference material (testing.md, debugging.md) last. Claude's attention is slightly recency-biased, so task-specific constraints placed later in the context are weighted more heavily during generation — a feature, not a bug.

Anti-pattern — the kitchen sink list: Including every file in the project because "more context is better" is a common mistake. It dilutes the signal, increases token cost, and causes Claude to spend attention budget on irrelevant constraints. If removing a file from the list would not change the output, remove it.
Chapter III
Reference
Few-Shot Priming · Pattern Extraction · Constraint Surfacing
"Here is a reference. Here's what makes it work." — Few-shot learning, explicitly engineered.

References are the most powerful calibration tool in prompt engineering. They operate through the same mechanism as few-shot in-context learning — by providing concrete examples of desired output, you shift Claude's output distribution toward that example's style, structure, and quality level far more reliably than descriptive instructions alone. The reason is fundamental: language models are trained to complete patterns; an example is a direct pattern signal, whereas an adjective like "clean" or "idiomatic" requires the model to resolve the adjective against its training prior, which may not match your standard.

Why one reference beats ten adjectives

Consider the instruction "write clean, idiomatic, well-documented Java." Each adjective — clean, idiomatic, well-documented — has an enormous range of valid interpretations. "Clean" to one engineer means no Lombok; to another it means short methods; to another it means no raw types. When you provide the existing SalesforceLeadsConnector as a reference, Claude doesn't need to resolve any of those ambiguities — it reads the actual code and extracts the conventions directly. The example encodes intent with zero loss compared to natural language description.

The reverse-engineering step is mandatory

A reference alone is incomplete. Without explicit rule extraction, Claude may identify the wrong properties as salient. If your reference uses AbstractFieldMapper, Claude might pattern-match on the class name rather than the null-handling contract. The reverse-engineering step — your list of "Always" and "Never" rules derived from the example — tells Claude which features of the reference are load-bearing versus incidental.

  • Read the reference and identify every structural decision (class hierarchy, method signatures, return types).
  • For each decision, ask: "Would the system break if I changed this?" If yes, it's a rule.
  • Express each rule as an unconditional "Always" or "Never" — no hedging, no "usually".
  • Group rules by category: architecture, naming, error handling, testing, config.
  • Include at least one rule from each category — they constrain different phases of generation.
  • Negative examples are equally powerful

    Don't only show what you want — show what you explicitly don't want. If a previous connector used System.out.println for logging, show that connector and mark it as the anti-pattern. Negative examples constrain the output distribution from below, complementing how positive examples constrain it from above. Together they define a tight target band.

    ❌ Reference without rules
    "Here is the Salesforce connector. Write something like it."
    ✓ Reference with extracted rules
    "Here is the Salesforce connector. Always: extend AbstractFieldMapper. Never: hardcode base_url. Always: sourceId() returns kebab-case literal."
    Upload vs. inline: For short references (<50 lines), paste inline — Claude processes it with full attention. For longer references, upload as a file and include it in Context Files with a description. Never describe a reference without providing it — description is lossy; the source is lossless.
    Chapter IV
    Success Brief
    Output Specification · Evaluation Criteria · Negative Space
    Four fields that define "done" with enough precision to be falsifiable by a third party.

    The Success Brief is an output specification contract. Its purpose is to close the gap between what Claude infers "good output" to mean (based on its training distribution) and what you actually need. Without it, Claude optimises for the most probable interpretation of the task — which is, statistically, a generic implementation that satisfies the literal directive but may miss your quality bar, scope constraints, and audience expectations entirely.

    Field 1 — Type of output + length

    Specify the artifact type and size budget explicitly. Type: Java class, test file, YAML config snippet, inline explanation, architectural decision record. Length: approximate line count or file count. These constraints prevent two failure modes: scope creep (Claude produces a full framework when you needed a single class) and under-delivery (Claude produces a stub with TODOs when you needed a complete, runnable implementation). If you can't state the type and length, you haven't fully scoped the task.

    Field 2 — Recipient's reaction (the action test)

    This is the most technically powerful field and the most commonly omitted. Instead of describing output properties, describe what the recipient should be able to do immediately after reading. This is the "action test" — it forces outcome-orientation and eliminates vague quality descriptors. Compare:

    ❌ Property-based
    "The code should be clean, well-tested, and production-ready."
    ✓ Action-based
    "Engineer can paste the class into the connector package, run mvn test, and merge the PR within 10 minutes."
    Field 3 — Does NOT sound like (negative space)

    Negative constraints are among the highest-ROI additions to any prompt. They eliminate entire failure modes before generation begins. Common anti-patterns to call out for DataHarness: generic boilerplate without project-specific conventions, missing null checks on optional fields, skipping pagination in the fetch loop, using System.out.println instead of ctx.debug(), hardcoding the base URL, returning null instead of List.of() for empty fetch results. Each one you name removes a class of bad output from consideration.

    Field 4 — Success means (the falsifiability anchor)

    This field should be a concrete, programmatically verifiable statement. Think of it as the acceptance criteria in a user story — not "high quality" but a precise list of conditions that can be checked. The more your success criteria map to things a CI pipeline could verify, the better. Examples: mvn test -Dtest=ZendeskTicketFieldMapperTest exits 0; no calls to System.out.println (checked by grep); sourceId() returns "zendesk-tickets"; all fields in references/schema.md §Identity are mapped or explicitly null.

    Self-evaluation trigger: When your Success Brief has a falsifiable "Success means" field, you can ask Claude to self-evaluate its own output against it before presenting: "Before you show me the code, check it against each success criterion and report any gaps." This catches issues in Claude's own generation pass, before your review.
    Chapter V
    Rules
    Hard Constraints · Conflict Resolution · Rule Taxonomy
    "If you're about to break one of my rules, stop and tell me." — Surfacing conflicts before they become wasted work.

    Rules are hard constraints that are invariant across all tasks in a given project context. Unlike instructions (which are task-scoped) and references (which are output-scoped), rules are project-scoped: they apply regardless of what the task is, and they cannot be overridden by task-specific instructions without explicit acknowledgment. They encode the non-negotiables that your project has accumulated — architectural decisions, security requirements, team conventions, legal constraints — in a form Claude can check against during generation.

    The rule taxonomy

    Effective rule sets have coverage across at least four categories:

  • Architectural rules — structural invariants: "Never add setters to CanonicalRecord." "One connector class per source." "All config via FetchContext, never System.getenv() directly."
  • Security rules — safety invariants: "Never hardcode credentials." "All secret references via *_env keys in sources.yml." "No logging of raw API responses at INFO level or above."
  • Style rules — consistency invariants: "No Lombok." "Source IDs are kebab-case string literals." "No raw types in generic collections."
  • Process rules — workflow invariants: "Every mapper must have a corresponding test class before merge." "ConnectorException for recoverable, FatalConnectorException for unrecoverable."
  • Rule quality criteria

    A rule is well-formed if it satisfies all four properties: unconditional (no "usually" or "where possible"), unambiguous (a third party could apply it consistently), verifiable (can be checked by static analysis, grep, or code review), and scoped (it applies to a specific domain of output, not "everything").

    ✓ Unconditional ✓ Unambiguous ✓ Verifiable ✓ Scoped ✗ "Keep it clean" ✗ "Usually avoid X"
    The "stop and tell me" directive — why it matters

    Without this directive, Claude has two default behaviors when a rule conflicts with a task: silent violation (comply with the task, ignore the rule) or silent refusal (apply the rule, produce incomplete output, explain vaguely). Neither is acceptable. The directive creates a third behavior: explicit conflict report. Claude names the rule, names the task requirement that violates it, and blocks on your decision. This turns rule violations from silent failures into collaborative decision points — you may grant a one-time exception, reframe the task, or discover the rule needs updating.

    Rule maintenance: Rules accumulate technical debt just like code. An outdated rule (e.g., "never use Java records" written before Java 17 adoption) is worse than no rule — it consumes attention and produces incorrect conflicts. Audit your rules whenever the project's architecture changes.
    Chapter VI
    Conversation
    Ambiguity Resolution · Branching Questions · Dialogue Design
    "DO NOT start executing yet." — A deliberate circuit-breaker against premature generation.

    The Conversation section is a generation circuit-breaker. It prevents the most expensive failure mode in LLM-assisted development: Claude generating a complete, internally consistent, but fundamentally wrong solution because it resolved ambiguity silently in the wrong direction. The cost of this failure scales with output length — a 200-line class built on a wrong assumption requires more rework than a 2-line stub. Catching that assumption before generation costs one exchange; catching it after costs many.

    What constitutes genuine ambiguity

    Not all ambiguity is worth a clarifying exchange. The test is: would different reasonable interpretations produce materially different outputs? "Should I use OAuth2 or API key auth for this connector?" — yes, the entire auth implementation diverges. "Should I add a blank line after the constructor?" — no, the output is functionally identical. Only branch the solution space when the branches lead to structurally different code, significantly different scope, or divergent architectural choices.

    Designing good clarifying questions

    Claude should ask questions that partition the solution space — each answer eliminates a class of possible outputs. Structurally, a good clarifying question has exactly two to four discrete answers, each leading to a meaningfully different implementation path. Questions with open-ended answers are better placed in the Plan section as explicit assumptions Claude states before you confirm them.

    ❌ Non-branching
    "Should I follow the project's coding standards?"
    "Do you want me to write tests?"
    ✓ Solution-space-branching
    "Should the connector support incremental sync (fetch only records updated since last run) or always full-refresh?"
    "Does the Zendesk API require cursor-based or page-based pagination?"
    Step by step — the serialisation requirement

    The phrase "step by step" is a serialisation constraint on the clarification dialogue. Without it, Claude may batch all its uncertainties into a single wall of questions — which overwhelms the user and prevents early answers from informing later questions. Serialised dialogue — one or two questions, answers, then next questions informed by those answers — produces better-targeted questions and faster convergence on a shared understanding. It mirrors how skilled engineers conduct requirement interviews.

    Diagnostic signal: If Claude is asking too many low-value questions, your Task (Chapter I) and Success Brief (Chapter IV) are under-specified. Clarifying questions are a real-time signal of prompt quality — they reveal exactly where your specification has gaps. Use them to iterate your prompt template, not just to unblock the current task.

    For DataHarness specifically, high-value questions include: auth type (OAuth2 vs API key vs basic), pagination strategy (cursor vs offset), rate limit handling (retry with backoff vs fail-fast), whether the source supports incremental sync, and whether the canonical mapping requires any new schema fields not yet in references/schema.md.

    Chapter VII
    Plan
    Reasoning Externalisation · Checkpoint · Assumption Surfacing
    "List the 3 rules that matter most. Then your execution plan." — Making Claude's reasoning legible before it becomes code.

    The Plan section forces reasoning externalisation — it asks Claude to make its interpretation of the task and its intended execution strategy legible to you before any code is written. This is technically significant: Claude's generation process is largely opaque. The plan step creates a visible checkpoint at which you can verify alignment between Claude's interpretation and your intent, catch errors in reasoning before they propagate through hundreds of lines of output, and redirect cheaply.

    Why "3 rules" forces prioritisation

    Asking Claude to list all applicable rules produces a laundry list — it transfers the prioritisation problem back to you. Asking for exactly three forces a ranking. Claude must decide which rules are architecturally load-bearing for this specific task. That decision is itself informative: if Claude's top three don't match yours, you have a misalignment to correct before execution. The rules Claude selects reveal how it has parsed your task.

    Reading the plan as a diagnostic

    Each element of the plan tells you something about Claude's understanding:

  • Step 1 of the plan reveals what Claude treats as the primary constraint — if it starts with "read context files" rather than the actual implementation, the task may be under-specified.
  • The scope of each step reveals whether Claude has correctly bounded the task — steps that are too broad ("implement everything") indicate ambiguity in the Task section.
  • The absence of a testing step is a red flag — it means testing was not inferred as required, despite your project rules mandating it.
  • Dependencies between steps reveal Claude's understanding of the architecture — if step 2 depends on an output from step 3, the plan has an ordering error that would manifest as a bug.
  • Plan vs. chain-of-thought

    A plan is distinct from chain-of-thought reasoning. CoT is internal deliberation embedded in the output stream. A plan is a structured, reviewable contract that precedes output. Plans should be numbered, action-oriented, and bounded: "Step 1: Implement ZendeskTicketConnector.fetch() with cursor pagination. Step 2: Implement ZendeskTicketFieldMapper mapping the seven canonical identity fields. Step 3: Write ZendeskTicketFieldMapperTest covering happy path, null email, and missing id." This is legible, reviewable, and correctable in under 30 seconds.

    Extended plan directives: For high-stakes tasks, augment the plan request: "Before the plan, also state: (a) any assumptions you are making about API behaviour, (b) any canonical fields from schema.md you will need to add or modify, (c) any ambiguity in the task that you are resolving by assumption rather than by confirmed requirement." This surfaces the invisible reasoning that most commonly leads to wrong outputs.
    Chapter VIII
    Alignment
    Execution Gate · Shared Mental Model · Compounding Returns
    "Only begin work once we've aligned." — The gate between specification and execution.

    Alignment is the execution gate — the formal handoff between the specification phase (Chapters I–VII) and the execution phase. It is not a rubber stamp or a courtesy check. It is the moment at which you and Claude establish a shared mental model of the task, its constraints, its success criteria, and its execution strategy. Everything before it prepares for this moment; everything after it builds on it. Skipping alignment is betting that every prior section was interpreted exactly as intended — and that bet compounds incorrectly with task complexity.

    The 5-step complexity budget

    Requiring a maximum 5-step execution plan is a complexity budget, not an arbitrary limit. If a task genuinely requires more than five high-level steps, it should be decomposed into subtasks, each with its own prompt. Accepting a 12-step plan is accepting a task that is too broad to execute reliably in a single context window without accumulating compounding errors. When Claude can't compress the plan to five steps, that is signal to decompose the task, not to accept a longer plan.

    What alignment actually verifies

    Alignment is a bidirectional verification protocol. From your side: does Claude's plan match your intent? Does the step ordering match the logical dependency graph of the task? Are the three surfaced rules actually the most critical ones? From Claude's side: has it correctly parsed the Task, Success Brief, and Rules? Does it have all the context it needs, or are there gaps? A 30-second review of a well-formed plan catches the majority of misalignments before a single line of code is generated.

    Compounding returns of the full anatomy

    The eight sections of the prompt anatomy are not independent — they form a mutually reinforcing constraint system. Each section reduces the entropy of the output distribution further. Task narrows the operation space. Context Files loads domain knowledge. Reference calibrates the style distribution. Success Brief defines the acceptance boundary. Rules hard-constrains the solution space. Conversation resolves ambiguity forks. Plan externalises the reasoning path. Alignment confirms the shared model before execution.

  • Task — sets the operation and outcome. Reduces output entropy from "anything" to "a connector-shaped thing."
  • Context Files — loads project knowledge. Reduces output from "generic connector" to "DataHarness-shaped connector."
  • Reference — calibrates style and structure. Reduces from "DataHarness-shaped" to "looks like the Salesforce connector."
  • Success Brief — defines the acceptance boundary. Reduces from "looks right" to "passes these verifiable criteria."
  • Rules — hard-constrains the solution space. Eliminates all outputs that violate project invariants.
  • Conversation — resolves ambiguity forks. Eliminates structurally wrong solution branches.
  • Plan — externalises reasoning. Catches interpretation errors before propagation.
  • Alignment — confirms shared model. The gate that only opens when all prior reductions have converged.
  • The anatomy in one sentence
    A well-formed Claude prompt is a mutually constraining specification system in which each section progressively narrows the output distribution until the only remaining high-probability output is the one you actually need — before execution begins.