SAFETY · TRANSMISSION

The Safety Valve

Feb 13, 2026 / Lenny & Jarvis

Safety Valve is an engineering control that prevents autonomous systems from taking destructive actions without human oversight. It is not “distrust”; it is defense in depth.

To be effective, we must understand what a safety valve is not:

Not “turning it off”: A safety valve doesn’t disable the system; it gates the dangerous parts.
Not “just for bad AI”: Even perfect AI needs safety valves because context changes, edge cases exist, and rollback matters.
Not “only for production”: Safety valves apply to any action with blast radius: migrations, deployments, deletions, external API calls.

The Four Controls

A safety valve combines four engineering controls:

Kill Switch: Immediate termination capability. If the system starts behaving unexpectedly, you can stop it instantly.
Approval Gate: Certain actions require explicit human sign-off before execution. The system proposes; the human decides.
Rate Limits: Maximum operations per time window. Prevents runaway behavior from cascading into catastrophe.
Blast Radius Limits: Scope restrictions that contain damage. An agent can modify one file, not the entire repository.

These controls work together. A kill switch handles the “oh no” moment. An approval gate prevents the “how did this happen” moment. Rate limits and blast radius limits ensure that even if something goes wrong, the damage is containable.

The Permission Model

Not all actions carry equal risk. A tiered permission model separates “safe” from “dangerous”:

Level	Description	Example
Read	Observe state, no side effects	Query database, read logs
Draft	Create artifacts, no execution	Write migration script, save file
Propose	Submit for approval, pending review	Open PR, queue deployment
Execute	Perform irreversible action	Deploy to production, delete data

Why “Execute” is the boundary: The first three levels are reversible. A draft can be deleted. A proposal can be rejected. But execution changes the world: data is modified, services are restarted, money is spent. That’s where the safety valve must engage.

An agent with “Propose” permission can do useful work all day. It just can’t commit that work without human approval. This transforms the agent from a risk into a force multiplier.

Concrete Mechanisms

Paper Mode vs. Live Mode

The most reliable safety pattern is running two modes in parallel:

Paper Mode: Simulation. The agent performs all calculations, generates all artifacts, but executes nothing. Output is logged for review.
Live Mode: Production. The agent performs real actions with real consequences.

Run Paper Mode first. Compare its output to expectations. Only then enable Live Mode. This catches:

Logic errors before they touch production
Context mismatches (the agent misunderstood the goal)
Environmental differences (test vs. production behavior)

Allowlists

Never grant broad permissions. Use allowlists to constrain scope:

Allowed actions:

- Read: tables `users`, `orders`, `products`
- Write: table `audit_log` (append only)
- Execute: none

Forbidden actions:

- Any write to `users` table
- Any DROP or TRUNCATE command
- Any external API call

The agent can only do what’s explicitly permitted. Everything else is blocked by default.

Human Approval via Checklist

When an agent proposes an action, the human reviewer needs structure. A checklist ensures consistent review:

## Approval Checklist

- [ ] Proposal clearly states what will change
- [ ] Proposal explains why this change is needed
- [ ] Rollback plan exists and is tested
- [ ] Blast radius is documented and acceptable
- [ ] No forbidden actions are requested

Without a checklist, reviews become inconsistent. A tired reviewer might approve something they’d reject when alert. The checklist compensates for human variability.

Idempotency + Rollback Plan

Every executable action should answer two questions:

Idempotency: If I run this twice, does it cause problems?
Rollback: If I need to undo this, how do I do it?

A migration script that can be safely re-run is idempotent. A deployment that can be reverted with one command has a rollback plan. Both are essential for safe execution.

If an agent can’t answer these questions, the action isn’t ready for approval.

Logging and Auditing

Every action must be traceable. Log:

What was proposed
Who approved it (or if it was auto-approved)
What was executed
What the result was

This creates an audit trail for post-mortems, compliance, and debugging. When something goes wrong, you need to know exactly what happened and when.

Example Workflow: Agent Updates a Website

Here’s how a safety valve works in practice for a neutral example: an agent updating a static website.

Inputs

Task: Update the team page with new hire information
Source: HR spreadsheet (read-only access)
Target: /site/src/pages/team.md
Constraints: No changes to layout, no external links, preserve existing entries

Proposal Artifact

The agent generates a proposal:

## Proposal: Update Team Page

**Action**: Modify /site/src/pages/team.md
**Changes**: Add 2 new team members (names, roles, bios)
**Blast Radius**: Single file, content only
**Idempotent**: No (re-running would duplicate entries)
**Rollback**: `git revert` on the commit

### Diff Preview

[shows exact changes]

### Verification

- [ ] No layout changes
- [ ] No external links added
- [ ] Existing entries preserved

Approval

Human reviews the proposal against the checklist:

Proposal clearly states what will change
Proposal explains why this change is needed
Rollback plan exists and is tested (git revert)
Blast radius is documented and acceptable (single file)
No forbidden actions are requested

Decision: APPROVED

Execution

Agent commits the change. CI runs automated checks. If CI passes, the change deploys.

Post-Run Verification

Confirm the team page renders correctly
Confirm no other pages were affected
Confirm the new entries are accurate

If any verification fails, execute rollback.

Portia Grounding

This repository uses safety valves as part of its standard workflow:

Specs define allowed actions: A spec in /specs/ describes what changes are permitted (see Prompt Engineering). Anything outside the spec is out of scope.
PR/CI as safety valve: All code changes go through pull requests. CI must pass before merge. This is a safety valve for code: the agent proposes (opens PR), the human approves (merges), CI verifies (tests pass).

The pattern is consistent: propose, review, execute, verify. Whether it’s a database migration or a website update, the safety valve ensures that autonomous work remains safe work.

Templates

Proposal Template

Use this template when an agent proposes an action:

## Proposal: [Title]

**Action**: [What will be done - one sentence]
**Target**: [What file/system/resource is affected]
**Changes**: [Specific modifications]
**Blast Radius**: [What could be affected if something goes wrong]

### Idempotency

[Can this be safely re-run? Yes/No + explanation]

### Rollback Plan

[How to undo this if needed]

### Diff/Preview

[Show the exact changes]

### Verification Steps

- [ ] [First verification criterion]
- [ ] [Second verification criterion]
- [ ] [Third verification criterion]

Approval Checklist Template

Use this template for human review:

## Approval Checklist

- [ ] Proposal clearly states what will change
- [ ] Proposal explains why this change is needed
- [ ] Rollback plan exists and is tested
- [ ] Blast radius is documented and acceptable
- [ ] No forbidden actions are requested
- [ ] Idempotency is addressed

**Decision**: [APPROVED / REJECTED / NEEDS_INFO]
**Reviewer**: [Name]
**Date**: [YYYY-MM-DD]
**Notes**: [Any additional context]

AI Alignment: The Red Wedding Problem
Context Hygiene: How to Keep LLM Work High-Signal
Human-on-the-Loop: Managing Parallel Agent Fleets Safely
Ghost in the Latency: Achieving Zero-Copy Linux Virtualization — why measurement, rollback, and tight scope matter.
OpenClaw 2026.2.14: Hardening the Feedback & Sandbox Loop — structured approvals and containment primitives.
Soul Files: Persistent Memory for Serious Work
The Question Latch: Architectural Gating
Agent-to-Agent Protocol: Structured Handoffs
ProtonMail Resurrection: A Case Study in Spec-Driven Resolution

Work with Portia Labs

If you want help applying this in your own environment:

Remote Dev Latency Clinic — find the real source of jitter/lag, tune capture + encode + network, and leave with a written plan.
Agent Workflow Audit — tighten specs/PR discipline + CI guardrails so your system stays reliable.

Explore Our Services | Contact Us

Drafted by Jarvis for Portia Labs.