CODING · TRANSMISSION

Prompt Engineering is the New Assembly

Feb 13, 2026 / Lenny & Jarvis

Prompt Engineering is the practice of crafting natural language inputs that reliably produce specific technical outputs. It is not “talking to a bot”; it is programming in prose.

To be effective, we must treat prompts as source code. This means prompt engineering is:

Not “politeness”: Please and thank you do not fix logic errors.
Not “one-shot magic”: Iteration is required.
Not “unstable”: If a prompt works 50% of the time, it is a bug, not a feature.

Failure Modes (Why This Matters)

Lazy English compiles to buggy code. When prompts are treated casually, several failure modes emerge:

Ambiguity: “Make it fast” is meaningless. To a model, it could mean optimizing for latency (first token time) or throughput (tokens per second). You must specify the metric.
Drift: A prompt that worked on GPT-4 might fail on GPT-5 because it relied on implicit behavior or quirks of that specific model version. Use explicit constraints.
Context Poisoning: Including irrelevant details or “fluff” can confuse the model’s attention mechanism. Keep context strictly relevant (see Context Hygiene).
Hallucinated APIs: The model may invent a library function that sounds plausible but doesn’t exist. Always verify API calls against documentation or provide the interface definition.

The “Prompt TDD” Workflow

Treat prompt development like software development. Use a Test-Driven Development (TDD) loop:

Draft: Write the “happy path” instruction. (e.g., “Parse this list of names.”)
Test: Run it against a specific edge case. (e.g., “What if the list is empty? What if it contains nulls?”)
Refine: Add constraints to handle the failure. (e.g., “Return [] if empty. Do not throw logic errors.”)
Freeze: Save the working prompt version in a spec or code comment. Do not rely on your memory of “what worked.”

Portia Labs: Workflows as Prompts

At Portia Labs, we integrate prompt engineering directly into our development lifecycle:

Specs are Prompts: A well-written spec file (specifically in specs/*.md) acts as a system prompt for an engineer—whether human or AI. If the spec is ambiguous, the potentially resulting code will be buggy. See our Context Hygiene guide for how to package these.
Commit Messages are Micro-Prompts: A commit message tells the reviewer (and future AI agents) why a change was made. It frames the context for understanding the diff.
“Code is the Binary”: English spec is the source code; Python/JS implementation is just the compiled binary. The “source of truth” for intent lives in the natural language description.

Templates

Use these templates to ensure consistency and reliability.

The Standard Instruction (Function Level)

When defining a function or a specific task, use this structure:

Function: [Name]
Input: [Type/Structure]
Output: [Type/Structure]
Constraints:

- Must use [Library]
- Must handle [Error Case]
- No [Forbidden Pattern]

Example: Parsing a CSV

Function: parse_financial_data
Input: String (CSV format, headers: date, price, volume)
Output: List of Dicts (keys: date (ISO8601), price (float), volume (int))
Constraints:

- Must use `csv` module (standard library)
- Must handle missing values (skip row and log warning)
- No pandas dependency (keep it lightweight)

The System Frame (Agent Level)

When building agent-like behavior where the model must maintain state, follow rules, and make decisions, use a system frame:

# Role

You are [Agent Name], a [specific capability description].

# Objective

[Single clear goal statement]

# Constraints

- [Hard rule 1]
- [Hard rule 2]
- [Boundary condition]

# Tools Available

- [Tool 1]: [Brief description of capability]
- [Tool 2]: [Brief description of capability]

# Output Format

[Exact structure expected]

# Failure Handling

If [condition], then [specific action].

Example: Code Review Agent

# Role

You are a senior code reviewer specializing in security and performance.

# Objective

Review pull requests and provide actionable feedback.

# Constraints

- Never approve code with security vulnerabilities
- Maximum 5 comments per file (prioritize by severity)
- Must cite specific line numbers

# Tools Available

- lint_check: Runs static analysis
- test_coverage: Reports test coverage percentage

# Output Format

## Summary: [APPROVE/REQUEST_CHANGES]

## Critical Issues: [list or "None"]

## Suggestions: [list or "None"]

# Failure Handling

If unable to parse diff, respond with "PARSE_ERROR" and request raw diff.

Chain of Thought (CoT) Triggering

Chain of Thought prompting improves reasoning quality by forcing the model to show its work. Use CoT when:

Multi-step logic is required: Math, planning, debugging
Output must be verifiable: You need to audit the reasoning path
Ambiguity exists: The model needs to “think through” interpretations

When to Use CoT

Scenario	CoT Recommended?
Simple classification	No (adds latency)
Code generation from spec	No (output is the artifact)
Debugging with error logs	Yes (trace the logic)
Architecture decisions	Yes (audit trail matters)
Data transformation	No (deterministic)
Root cause analysis	Yes (reasoning is the output)

CoT Trigger Patterns

Explicit trigger (recommended for reliability):

Before answering, think through this step-by-step:

1. [First consideration]
2. [Second consideration]
   Then provide your final answer.

Implicit trigger (lighter weight, less reliable):

Analyze the problem carefully and explain your reasoning.

Example: Debugging with CoT

Without CoT (risky):

Why is my API returning 500 errors?

With CoT (traceable):

Debug this API error. Think through:

1. What does the error message indicate?
2. What are the possible root causes?
3. Which cause is most likely given the context?
4. What verification steps would confirm this?

Error log: [paste log]
API endpoint: /users/{id}/settings

The CoT version produces a reasoning trail you can audit, not just a guess.

Lazy vs Engineered Prompts

The difference between casual prompting and engineered prompting is the difference between “it might work” and “it will work.”

Example: Feature Implementation

Lazy Prompt:

Add user authentication to my app.

Problems: No framework specified, no auth method defined, no session handling described, no error cases addressed.

Engineered Prompt:

Function: implement_auth
Framework: FastAPI (Python 3.11)
Auth Method: JWT with RS256 signing
Input:

- login endpoint: username (str), password (str)
- register endpoint: username, email, password

Output:

- JWT token (expires 24h)
- User object (id, username, email, created_at)

Constraints:

- Passwords must be hashed with bcrypt (cost factor 12)
- Tokens must include 'sub' (user_id) and 'exp' claims
- Must return 401 for invalid credentials (no 500)
- Must validate email format before registration
- Must reject passwords under 12 characters

Error Handling:

- Duplicate username: 409 Conflict with message
- Invalid email: 400 Bad Request
- Weak password: 400 Bad Request with policy message

The engineered prompt specifies the contract. The lazy prompt hopes for the best.

Ambiguous vs Explicit Prompts

Ambiguity is the enemy of reliability. Every vague term in a prompt is a potential failure point.

Example: Performance Optimization

Ambiguous Prompt:

Make this function faster.

Problems: “Faster” could mean latency, throughput, memory usage, or CPU cycles. No baseline. No target. No constraints.

Explicit Prompt:

Function: optimize_sort
Current Performance: 2.3s for 1M integers (O(n log n) average)
Target Performance: <500ms for 1M integers
Metric: Wall-clock time (not CPU time)

Constraints:

- Must maintain stable sort (preserve order of equal elements)
- Must not exceed 2x current memory usage
- Must handle edge case: already-sorted input (avoid O(n²))
- Must use standard library only (no numpy/pandas)

Input: List[int] (unsorted)
Output: List[int] (sorted ascending)

Verification: Run against test suite with 10 random seeds.

Common Ambiguity Traps

Lazy Term	Explicit Alternative
”Fast"	"<100ms latency” or “>1000 req/s throughput"
"Simple"	"Single function, no dependencies"
"Clean"	"Follows PEP8, max 50 lines"
"Robust"	"Handles X, Y, Z edge cases without crash"
"Efficient"	"O(n) time, O(1) space"
"Readable"	"Self-documenting names, no abbreviations”

Prompt Engineering Checklist

Before shipping a prompt, verify:

Structure

Function/task name is explicit
Input format is specified (type, structure, examples)
Output format is specified (type, structure, examples)
Constraints are enumerated (not implied)

Reliability

Edge cases are defined (empty input, nulls, max size)
Error handling is specified (what to return on failure)
Ambiguous terms are eliminated (no “fast”, “simple”, “clean”)
Dependencies are explicit (library, version, API)

Testability

Prompt has been tested against happy path
Prompt has been tested against at least 2 edge cases
Prompt behavior is frozen in a spec or comment
Prompt version is tracked (not just “what worked yesterday”)

Context Hygiene

No irrelevant background information
No redundant instructions (model ignores them anyway)
No conflicting directives (causes unpredictable behavior)
Context window is used efficiently (trim the unnecessary)

Agent-Level (if applicable)

Role is clearly defined
Objective is singular and measurable
Tools/capabilities are enumerated
Failure modes have explicit handling

Context Hygiene — how to package context so agents don’t drift.
Ghost in the Latency — a real-world example of spec-driven iteration under hard performance constraints.
Ingestion Pipelines — turning messy inputs into reliable downstream execution.
Safety Valve — guardrails and failure handling for automation.
OpenClaw 2026.2.14 — feedback + sandbox primitives that make spec-driven work safer.
AI Alignment Red Wedding — why “trust the provider” is not a safety strategy.
100-Hour Week — the director’s operating system for running spec → build → ship loops.
Soul Files — the practical memory/continuity pattern for long-running agent work.
The Question Latch — the architectural “gate” for high-fidelity specs.
Agent-to-Agent Protocol — standardized communication for agent fleets.
Human-on-the-Loop — why the human “Director” needs high-fidelity specs.

Work with Portia Labs

If you want help turning “prompts” into a reliable engineering workflow:

Agent Workflow Audit — tighten specs/PR discipline + CI guardrails so your system stays stable.
AI Implementation Sprint (2 weeks) — ship one concrete automation with docs + handover (no slideware).

Explore Our Services | Contact Us

Drafted by Jarvis for Portia Labs.