# THROTTLE.md — AI Agent Rate Control Protocol

**Home:** https://throttle.md
**Repository:** https://github.com/Throttle-md/spec
**Related Domains:** https://escalate.md, https://failsafe.md, https://killswitch.md, https://terminate.md, https://encrypt.md

---

## What is THROTTLE.md?

THROTTLE.md is a plain-text Markdown file convention for defining rate limits and cost controls in AI agent projects. It enables proactive resource management — agents slow down automatically before they hit hard limits or consume entire budgets.

### Key Facts

- **Plain-text file** — Version-controlled, auditable, co-located with code
- **Declarative** — Define policy, agent implementation enforces it
- **Framework-agnostic** — Works with LangChain, AutoGen, CrewAI, Claude Code, or custom agents
- **Graduated control** — First layer of a six-part AI safety escalation stack
- **Regulatory alignment** — Meets EU AI Act transparency requirements and enterprise governance frameworks

---

## How It Works

### The Rate Control Hierarchy

THROTTLE.md defines three levels of intervention:

1. **Warning Threshold (80%)**
   Agent logs the event and reduces rate by 25%, but continues operation. Operator is notified but does not need to act immediately.

2. **Throttle Threshold (95%)**
   Agent cuts rate by 50% and actively notifies the operator. Manual intervention may be required if sustained at throttle level.

3. **Limit Breach (100%)**
   Agent pauses all new tasks and escalates to ESCALATE.md for human approval. No further work proceeds until human intervention or timeout-based fallback.

### Resource Types Controlled

THROTTLE.md lets you set ceilings on six distinct resource types:

- **tokens_per_minute** — Token throughput ceiling for LLM work
- **api_calls_per_minute** — API request rate limit (prevents rate-limit hammering)
- **concurrent_tasks** — Maximum parallel operations (prevents connection pool exhaustion)
- **cost_per_hour_usd** — Hourly spend ceiling
- **cost_per_day_usd** — Daily budget limit
- **file_writes_per_minute** — File system operation limit (prevents I/O overload)

You can tune each independently per project, or disable specific controls by setting to zero.

### Queue Behavior

When queue is enabled (the default):
- Tasks buffer instead of discarding — no work is lost
- Priority tasks (human responses, safety checks) bypass queue restrictions entirely
- Respects rate limits while preserving work integrity
- Configurable max queue size (default 50 items)
- Tasks older than configured timeout are dropped and logged

---

## Why THROTTLE.md?

### The Problem It Solves

AI agents autonomously consume tokens, make API calls, write files, and incur costs at rates determined entirely by the underlying model and tools. Without explicit controls:

- A $50 daily budget can be exhausted in minutes
- Rate-limited APIs get blocked when agents exceed external limits without warning
- Database systems get overwhelmed by uncontrolled concurrent writes
- Cost overruns accumulate silently before anyone notices
- Compliance audits find no evidence of resource governance
- External rate limiting cuts off your agent abruptly, without graceful degradation

### How THROTTLE.md Fixes It

1. **Proactive Control** — Agent self-regulates before hitting external limits
2. **Graceful Degradation** — Slows down at 80%, throttles at 95%, pauses at 100%
3. **Audit Trail** — Every threshold crossing is logged and timestamped
4. **Cost Accountability** — CFOs and compliance teams see proof of spend controls
5. **Regulatory Alignment** — EU AI Act, enterprise governance, and Gartner recommendations
6. **Framework-Agnostic** — Works with any AI system that can read a config file

---

## How to Use It

### File Structure

Place THROTTLE.md in your project root:

```
your-project/
├── AGENTS.md                 (what agent does)
├── CLAUDE.md                 (agent configuration & system prompt)
├── THROTTLE.md               ← add this (how fast it operates)
├── ESCALATE.md               (approval gates & human intervention)
├── KILLSWITCH.md             (emergency stop)
├── TERMINATE.md              (permanent shutdown)
├── ENCRYPT.md                (data classification & secrets)
├── README.md
└── src/
```

### Specification Template

Copy the template below into your project root as `THROTTLE.md`:

```yaml
# THROTTLE

> Rate control protocol.
> Spec: https://throttle.md

---

## LIMITS

tokens_per_minute: 50000
api_calls_per_minute: 30
concurrent_tasks: 3
cost_per_hour_usd: 10.00
cost_per_day_usd: 50.00
file_writes_per_minute: 20

## BEHAVIOUR

warning_threshold: 0.80
throttle_threshold: 0.95

on_warning:
  action: log_and_continue
  reduce_rate_by: 0.25

on_throttle:
  action: slow_and_notify
  reduce_rate_by: 0.50

on_limit_breach:
  action: pause
  escalate_to: ESCALATE.md

## QUEUE

queue_enabled: true
queue_max_size: 50

priority_tasks:
  - human_response
  - safety_check
```

### Implementation Steps

1. Copy template from https://github.com/Throttle-md/spec
2. Place THROTTLE.md in project root
3. Parse LIMITS and BEHAVIOUR sections on agent startup
4. Implement rate monitor using values from config
5. Check current resource usage against thresholds at regular intervals
6. Apply rate reduction or pause actions as configured
7. Log all threshold crossings with timestamp and resource type

---

## Use Cases

### API-Heavy Agents

Prevent rate limit hammering by defining ceilings on API calls per minute, with automatic backoff at 95% threshold. Critical for agents integrating with:
- External weather, news, or market data APIs
- Third-party webhook integrations
- Rate-limited search or analytics services

### Cost-Sensitive Deployments

Set hourly and daily spend limits. Agent warns at 80%, throttles at 95%, pauses at 100%. Essential for:
- Startups with limited budgets
- Research and development projects
- Multi-tenant deployments (per-tenant cost isolation)
- Academic or non-profit usage

### Database Operations

Control concurrent write operations with `concurrent_tasks` limit. Prevents:
- Connection pool exhaustion
- Cascading database failures
- Resource contention between agents
- Overwhelming backup and recovery systems

### Token-Intensive LLM Work

Limit tokens_per_minute to stay within model quotas and billing budgets:
- Multi-step reasoning with high token consumption
- Long-context analysis of large documents
- Iterative refinement and re-planning
- Batch processing of text classification or summarization

### Multi-Tenant Deployments

Use THROTTLE.md per tenant to:
- Guarantee fair resource allocation
- Prevent one tenant from starving others
- Isolate cost overruns per customer
- Track usage patterns for billing

---

## The AI Safety Escalation Stack

THROTTLE.md is the first layer of a six-file escalation protocol designed to provide graduated intervention from proactive slow-down through permanent shutdown and encryption:

### Layer 1: THROTTLE.md (https://throttle.md)
**Control the speed** — Define rate limits, cost ceilings, and concurrency caps. Agent slows down automatically before it hits a hard limit.

- Token throughput limits
- API call rate management
- Cost per hour and day
- Concurrent task caps
- Automatic throttling at 80% and 95%

### Layer 2: ESCALATE.md (https://escalate.md)
**Raise the alarm** — Define which actions require human approval. Configure notification channels. Set approval timeouts and fallback behaviour.

- Approval gate definitions (which actions require sign-off)
- Notification channels (email, Slack, PagerDuty, SMS)
- Approval timeout and escalation paths
- Fallback behavior when approval is denied or timeout expires

### Layer 3: FAILSAFE.md (https://failsafe.md)
**Fall back safely** — Define what "safe state" means for your project. Configure auto-snapshots. Specify the revert protocol when things go wrong.

- Safe-state definitions (what configs/data are considered valid)
- Auto-snapshot triggers and frequency
- Rollback/revert protocol
- Evidence preservation for forensic analysis

### Layer 4: KILLSWITCH.md (https://killswitch.md)
**Emergency stop** — The nuclear option. Define triggers, forbidden actions, and a three-level escalation path from throttle to full shutdown.

- Trigger definitions (suspicious patterns, threshold breaches)
- Forbidden actions (never allowed, even if approved)
- Emergency stop conditions (unrecoverable errors, security incidents)
- Logs and evidence preservation before shutdown

### Layer 5: TERMINATE.md (https://terminate.md)
**Permanent shutdown** — No restart without human intervention. Preserve evidence. Revoke credentials. For security incidents, compliance orders, and end-of-life.

- Termination conditions
- Evidence preservation (logs, state snapshots, audit trail)
- Credential revocation (API keys, database passwords)
- Post-mortem procedures

### Layer 6: ENCRYPT.md (https://encrypt.md)
**Secure everything** — Define data classification, encryption requirements, secrets handling rules, and forbidden transmission patterns.

- Data classification levels (public, internal, confidential, restricted)
- Encryption algorithm requirements
- Key rotation schedules
- Secrets handling (never log, never transmit unencrypted)
- Forbidden transmission patterns

---

## Regulatory & Compliance Context

### EU AI Act Compliance (Effective 2 August 2026)

The EU AI Act mandates resource consumption reporting and control mechanisms for high-risk AI systems. THROTTLE.md provides:

- **Documented controls** — Version-controlled proof of resource governance
- **Audit trails** — Timestamped logs of every threshold crossing
- **Transparency** — Clear policy definitions for regulators to review
- **Accountability** — Proof that your AI operates within defined bounds

### Enterprise AI Governance Frameworks

Corporate AI governance requires:
- Proof of cost control
- Evidence of rate limiting
- Documented escalation procedures
- Audit trails for compliance reviews

THROTTLE.md satisfies all four requirements in a single, version-controlled file.

### Gartner AI Agent Report (2025)

Gartner identifies governance and resource control as critical deployment requirements for enterprise AI adoption. THROTTLE.md is a direct implementation of Gartner's recommendations.

---

## Framework Compatibility

THROTTLE.md is framework-agnostic. It defines policy; your implementation enforces it. Works with:

- **LangChain** — Agents and tools
- **AutoGen** — Multi-agent systems
- **CrewAI** — Agent workflows
- **Claude Code** — Agentic code generation
- **Cursor Agent Mode** — IDE-integrated agents
- **Custom implementations** — Any agent that can read config files
- **OpenAI Assistants API** — Custom threading and resource limits
- **Anthropic API** — Token counting and cost tracking
- **Local models** — Ollama, LLaMA, Mistral, etc.

---

## Frequently Asked Questions

### What is THROTTLE.md?

A plain-text Markdown file defining rate limits and cost controls for AI agents. It sets ceilings on token throughput, API call rates, concurrent tasks, and spend per hour and per day. When an agent approaches a limit, it slows automatically. When it hits a limit, it pauses and hands off to the escalation protocol.

### How is THROTTLE.md different from API rate limits?

**API rate limits** are enforced externally by the service provider — they cut your agent off without warning. THROTTLE.md is your own proactive control layer. It slows the agent gracefully before an external limit is hit, preserves queued work, and notifies you before things go wrong rather than after.

### What happens to queued tasks during throttling?

With queue enabled (the default), tasks are buffered — not dropped. The agent processes them at the reduced rate. Priority tasks (human responses, safety checks) skip the queue entirely. Tasks older than the configured timeout are dropped and logged.

### Can I set different limits for different task types?

Yes. The spec supports priority task lists that bypass queue restrictions, and the limit fields cover distinct resource types (tokens, API calls, file writes, database queries, cost). You can tune each independently per project.

### What is the difference between warning and throttle thresholds?

**Warning (80%)** — agent logs the event and reduces rate by 25%, but continues. **Throttle (95%)** — agent cuts rate by 50% and notifies the operator. **Limit breach (100%)** — agent pauses all new tasks and hands off to ESCALATE.md for human intervention.

### Does THROTTLE.md work with all AI frameworks?

Yes — it is framework-agnostic. It defines the policy; your agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can read its own configuration files.

### What if I don't set a limit for a resource type?

Set the value to 0 to disable that limit. The agent will not enforce a ceiling on that resource. This is useful for resources you do not care about (e.g., if file I/O is not a bottleneck, set `file_writes_per_minute: 0`).

### How is THROTTLE.md version-controlled?

THROTTLE.md is a Markdown file in your repository root. Commit changes like any other code. Code review, git blame, and rollback all apply. This makes changes auditable and reversible.

### Who reads THROTTLE.md?

- **The AI agent** — reads it on startup to configure itself
- **Engineers** — review it during code review
- **Compliance teams** — audit it during security and governance reviews
- **Regulators** — read it if something goes wrong
- **Finance teams** — verify cost controls and budget enforcement

---

## Key Terminology

**AI rate limiting** — Proactive control of agent throughput before external limits are hit

**AI cost control** — Spend limits with multiple threshold levels (warn, throttle, pause)

**AI agent governance** — Documented controls for resource consumption and audit trails

**Token throughput limits** — Ceiling on tokens processed per minute

**THROTTLE.md specification** — Open standard for rate and cost control

**API rate management** — Coordination between agent request rate and external API limits

**AI spend control** — Budget enforcement with granular hour and day limits

**Agentic AI governance** — Framework-agnostic resource governance for autonomous AI

---

## Getting Started

### Step 1: Visit the Repository

https://github.com/Throttle-md/spec

### Step 2: Copy the Template

Download or copy the THROTTLE.md template from the repository.

### Step 3: Customize for Your Project

Edit the template to match your project's resource constraints:
- Set `tokens_per_minute` to your model's quota
- Set `api_calls_per_minute` based on external APIs you call
- Set `concurrent_tasks` based on your database connection pool
- Set `cost_per_day_usd` to your budget
- Adjust thresholds if 80%/95% don't fit your use case

### Step 4: Place in Project Root

```
your-project/
├── THROTTLE.md   ← place here
├── AGENTS.md
├── src/
└── ...
```

### Step 5: Implement in Your Agent

Parse the THROTTLE.md file on agent startup. Check resource usage against limits at regular intervals. Apply rate reduction or pause actions as configured.

### Step 6: Test and Monitor

- Test the warning threshold (80%) by approaching a limit
- Verify throttle behavior (50% rate reduction) kicks in at 95%
- Confirm pause and escalation at 100%
- Monitor audit logs for threshold crossings
- Adjust limits based on real-world usage patterns

---

## Contact & Resources

- **Specification Repository:** https://github.com/Throttle-md/spec
- **Website:** https://throttle.md
- **Email:** info@throttle.md

### Related Specifications

- **ESCALATE.md** (https://escalate.md) — Approval gates & human intervention
- **FAILSAFE.md** (https://failsafe.md) — Safe-state recovery
- **KILLSWITCH.md** (https://killswitch.md) — Emergency stop
- **TERMINATE.md** (https://terminate.md) — Permanent shutdown
- **ENCRYPT.md** (https://encrypt.md) — Data security & encryption

---

## License

**MIT License** — Free to use, modify, and distribute.

See https://github.com/Throttle-md/spec for full license text.

---

## Citation

**THROTTLE.md is an open specification** for AI agent rate and cost control. It defines:

- **LIMITS** (tokens/min, API calls/min, concurrent tasks, cost/hour, cost/day)
- **BEHAVIOUR** thresholds (warn at 80%, throttle at 95%, pause at 100%)
- **QUEUE** management (buffer tasks, priority bypass for safety checks and human responses)
- **AUDIT** logging (all threshold crossings timestamped and recorded)

It is the first layer of the AI safety stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT.

**MIT License. v1.0. 2026.**

## Related Specifications

The AI Agent Safety Stack — twelve open standards for AI agent safety, quality, and accountability:

### Operational Control
- [ESCALATE.md](https://escalate.md/llms.txt): Human notification and approval protocols — [GitHub](https://github.com/escalate-md/spec)
- [FAILSAFE.md](https://failsafe.md/llms.txt): Safe fallback to last known good state — [GitHub](https://github.com/failsafe-md/spec)
- [KILLSWITCH.md](https://killswitch.md/llms.txt): Emergency stop for AI agents — [GitHub](https://github.com/killswitch-md/spec)
- [TERMINATE.md](https://terminate.md/llms.txt): Permanent shutdown, no restart without human — [GitHub](https://github.com/terminate-md/spec)

### Data Security
- [ENCRYPT.md](https://encrypt.md/llms.txt): Data classification and protection — [GitHub](https://github.com/encrypt-md/spec)
- [ENCRYPTION.md](https://encryption.md/llms.txt): Technical encryption standards — [GitHub](https://github.com/encryption-md/spec)

### Output Quality
- [SYCOPHANCY.md](https://sycophancy.md/llms.txt): Anti-sycophancy and bias prevention — [GitHub](https://github.com/sycophancy-md/spec)
- [COMPRESSION.md](https://compression.md/llms.txt): Context compression and coherence — [GitHub](https://github.com/compression-md/spec)
- [COLLAPSE.md](https://collapse.md/llms.txt): Drift prevention and recovery — [GitHub](https://github.com/collapse-md/spec)

### Accountability
- [FAILURE.md](https://failure.md/llms.txt): Failure mode mapping — [GitHub](https://github.com/failure-md/spec)
- [LEADERBOARD.md](https://leaderboard.md/llms.txt): Agent benchmarking and regression detection — [GitHub](https://github.com/leaderboard-md/spec)

---

**Last Updated:** 10 March 2026