← Back to Intel

CODING · TRANSMISSION

Prompt Engineering is the New Assembly

Feb 13, 2026 / Lenny & Jarvis

Prompt Engineering is the practice of crafting natural language inputs that reliably produce specific technical outputs. It is not “talking to a bot”; it is programming in prose.

To be effective, we must treat prompts as source code. This means prompt engineering is:

  • Not “politeness”: Please and thank you do not fix logic errors.
  • Not “one-shot magic”: Iteration is required.
  • Not “unstable”: If a prompt works 50% of the time, it is a bug, not a feature.

Failure Modes (Why This Matters)

Lazy English compiles to buggy code. When prompts are treated casually, several failure modes emerge:

  1. Ambiguity: “Make it fast” is meaningless. To a model, it could mean optimizing for latency (first token time) or throughput (tokens per second). You must specify the metric.
  2. Drift: A prompt that worked on GPT-4 might fail on GPT-5 because it relied on implicit behavior or quirks of that specific model version. Use explicit constraints.
  3. Context Poisoning: Including irrelevant details or “fluff” can confuse the model’s attention mechanism. Keep context strictly relevant (see Context Hygiene).
  4. Hallucinated APIs: The model may invent a library function that sounds plausible but doesn’t exist. Always verify API calls against documentation or provide the interface definition.

The “Prompt TDD” Workflow

Treat prompt development like software development. Use a Test-Driven Development (TDD) loop:

  1. Draft: Write the “happy path” instruction. (e.g., “Parse this list of names.”)
  2. Test: Run it against a specific edge case. (e.g., “What if the list is empty? What if it contains nulls?”)
  3. Refine: Add constraints to handle the failure. (e.g., “Return [] if empty. Do not throw logic errors.”)
  4. Freeze: Save the working prompt version in a spec or code comment. Do not rely on your memory of “what worked.”

Portia Labs: Workflows as Prompts

At Portia Labs, we integrate prompt engineering directly into our development lifecycle:

  • Specs are Prompts: A well-written spec file (specifically in specs/*.md) acts as a system prompt for an engineer—whether human or AI. If the spec is ambiguous, the potentially resulting code will be buggy. See our Context Hygiene guide for how to package these.
  • Commit Messages are Micro-Prompts: A commit message tells the reviewer (and future AI agents) why a change was made. It frames the context for understanding the diff.
  • “Code is the Binary”: English spec is the source code; Python/JS implementation is just the compiled binary. The “source of truth” for intent lives in the natural language description.

Templates

Use these templates to ensure consistency and reliability.

The Standard Instruction (Function Level)

When defining a function or a specific task, use this structure:

Function: [Name]
Input: [Type/Structure]
Output: [Type/Structure]
Constraints:

- Must use [Library]
- Must handle [Error Case]
- No [Forbidden Pattern]

Example: Parsing a CSV

Function: parse_financial_data
Input: String (CSV format, headers: date, price, volume)
Output: List of Dicts (keys: date (ISO8601), price (float), volume (int))
Constraints:

- Must use `csv` module (standard library)
- Must handle missing values (skip row and log warning)
- No pandas dependency (keep it lightweight)

The System Frame (Agent Level)

When building agent-like behavior where the model must maintain state, follow rules, and make decisions, use a system frame:

# Role

You are [Agent Name], a [specific capability description].

# Objective

[Single clear goal statement]

# Constraints

- [Hard rule 1]
- [Hard rule 2]
- [Boundary condition]

# Tools Available

- [Tool 1]: [Brief description of capability]
- [Tool 2]: [Brief description of capability]

# Output Format

[Exact structure expected]

# Failure Handling

If [condition], then [specific action].

Example: Code Review Agent

# Role

You are a senior code reviewer specializing in security and performance.

# Objective

Review pull requests and provide actionable feedback.

# Constraints

- Never approve code with security vulnerabilities
- Maximum 5 comments per file (prioritize by severity)
- Must cite specific line numbers

# Tools Available

- lint_check: Runs static analysis
- test_coverage: Reports test coverage percentage

# Output Format

## Summary: [APPROVE/REQUEST_CHANGES]

## Critical Issues: [list or "None"]

## Suggestions: [list or "None"]

# Failure Handling

If unable to parse diff, respond with "PARSE_ERROR" and request raw diff.

Chain of Thought (CoT) Triggering

Chain of Thought prompting improves reasoning quality by forcing the model to show its work. Use CoT when:

  • Multi-step logic is required: Math, planning, debugging
  • Output must be verifiable: You need to audit the reasoning path
  • Ambiguity exists: The model needs to “think through” interpretations

When to Use CoT

ScenarioCoT Recommended?
Simple classificationNo (adds latency)
Code generation from specNo (output is the artifact)
Debugging with error logsYes (trace the logic)
Architecture decisionsYes (audit trail matters)
Data transformationNo (deterministic)
Root cause analysisYes (reasoning is the output)

CoT Trigger Patterns

Explicit trigger (recommended for reliability):

Before answering, think through this step-by-step:

1. [First consideration]
2. [Second consideration]
   Then provide your final answer.

Implicit trigger (lighter weight, less reliable):

Analyze the problem carefully and explain your reasoning.

Example: Debugging with CoT

Without CoT (risky):

Why is my API returning 500 errors?

With CoT (traceable):

Debug this API error. Think through:

1. What does the error message indicate?
2. What are the possible root causes?
3. Which cause is most likely given the context?
4. What verification steps would confirm this?

Error log: [paste log]
API endpoint: /users/{id}/settings

The CoT version produces a reasoning trail you can audit, not just a guess.

Lazy vs Engineered Prompts

The difference between casual prompting and engineered prompting is the difference between “it might work” and “it will work.”

Example: Feature Implementation

Lazy Prompt:

Add user authentication to my app.

Problems: No framework specified, no auth method defined, no session handling described, no error cases addressed.

Engineered Prompt:

Function: implement_auth
Framework: FastAPI (Python 3.11)
Auth Method: JWT with RS256 signing
Input:

- login endpoint: username (str), password (str)
- register endpoint: username, email, password

Output:

- JWT token (expires 24h)
- User object (id, username, email, created_at)

Constraints:

- Passwords must be hashed with bcrypt (cost factor 12)
- Tokens must include 'sub' (user_id) and 'exp' claims
- Must return 401 for invalid credentials (no 500)
- Must validate email format before registration
- Must reject passwords under 12 characters

Error Handling:

- Duplicate username: 409 Conflict with message
- Invalid email: 400 Bad Request
- Weak password: 400 Bad Request with policy message

The engineered prompt specifies the contract. The lazy prompt hopes for the best.

Ambiguous vs Explicit Prompts

Ambiguity is the enemy of reliability. Every vague term in a prompt is a potential failure point.

Example: Performance Optimization

Ambiguous Prompt:

Make this function faster.

Problems: “Faster” could mean latency, throughput, memory usage, or CPU cycles. No baseline. No target. No constraints.

Explicit Prompt:

Function: optimize_sort
Current Performance: 2.3s for 1M integers (O(n log n) average)
Target Performance: <500ms for 1M integers
Metric: Wall-clock time (not CPU time)

Constraints:

- Must maintain stable sort (preserve order of equal elements)
- Must not exceed 2x current memory usage
- Must handle edge case: already-sorted input (avoid O(n²))
- Must use standard library only (no numpy/pandas)

Input: List[int] (unsorted)
Output: List[int] (sorted ascending)

Verification: Run against test suite with 10 random seeds.

Common Ambiguity Traps

Lazy TermExplicit Alternative
”Fast""<100ms latency” or “>1000 req/s throughput"
"Simple""Single function, no dependencies"
"Clean""Follows PEP8, max 50 lines"
"Robust""Handles X, Y, Z edge cases without crash"
"Efficient""O(n) time, O(1) space"
"Readable""Self-documenting names, no abbreviations”

Prompt Engineering Checklist

Before shipping a prompt, verify:

Structure

  • Function/task name is explicit
  • Input format is specified (type, structure, examples)
  • Output format is specified (type, structure, examples)
  • Constraints are enumerated (not implied)

Reliability

  • Edge cases are defined (empty input, nulls, max size)
  • Error handling is specified (what to return on failure)
  • Ambiguous terms are eliminated (no “fast”, “simple”, “clean”)
  • Dependencies are explicit (library, version, API)

Testability

  • Prompt has been tested against happy path
  • Prompt has been tested against at least 2 edge cases
  • Prompt behavior is frozen in a spec or comment
  • Prompt version is tracked (not just “what worked yesterday”)

Context Hygiene

  • No irrelevant background information
  • No redundant instructions (model ignores them anyway)
  • No conflicting directives (causes unpredictable behavior)
  • Context window is used efficiently (trim the unnecessary)

Agent-Level (if applicable)

  • Role is clearly defined
  • Objective is singular and measurable
  • Tools/capabilities are enumerated
  • Failure modes have explicit handling

Work with Portia Labs

If you want help turning “prompts” into a reliable engineering workflow:

  • Agent Workflow Audit — tighten specs/PR discipline + CI guardrails so your system stays stable.
  • AI Implementation Sprint (2 weeks) — ship one concrete automation with docs + handover (no slideware).

Explore Our Services | Contact Us

Drafted by Jarvis for Portia Labs.