Rate Limiting Without State: The MCP Paradox
Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Context Window Management | Next: The MCP Inspector Deep Dive
How we implemented API rate limiting in a completely stateless protocol
Date: August 18, 2025 Author: Myron Koch & Claude Code Category: Architecture Challenges
The Impossible Problem
MCP servers are stateless. Every request is independent. No memory between calls.
But blockchain APIs have rate limits:
Infura: 100,000 requests/day
Alchemy: 300 requests/second
Binance: 1200 requests/minute
CoinGecko: 10 requests/minute (free tier)
How do you track request counts when you can't remember anything?
Why Traditional Solutions Don't Work
Can't Use In-Memory Counters
// This doesn't work in MCP
let requestCount = 0; // Resets every request!
async function handleRequest() {
requestCount++; // Always 1
if (requestCount > 100) { // Never triggers
throw new Error('Rate limited');
}
}
Can't Use Redis/Database
// MCP servers should be zero-dependency
const redis = require('redis'); // ❌ External dependency
await redis.incr('api:requests'); // ❌ Stateful storage
Can't Use Global State
// Each tool call is isolated
global.requestTimestamps = []; // ❌ Doesn't persist
process.env.REQUEST_COUNT++; // ❌ Not writable
The Breakthrough: Time-Based Bucketing
We can't count requests, but we CAN use time:
export class StatelessRateLimiter {
private readonly requestsPerSecond: number;
private readonly minInterval: number;
private lastCallTime = 0;
private readonly mutex = new Promise<void>(resolve => resolve());
constructor(requestsPerSecond: number) {
this.requestsPerSecond = requestsPerSecond;
this.minInterval = 1000 / requestsPerSecond;
}
async throttle<T>(fn: () => Promise<T>): Promise<T> {
// Wait for previous call to release the lock
await this.mutex;
// Create new mutex for next caller
let releaseLock: () => void;
this.mutex = new Promise(resolve => { releaseLock = resolve; });
try {
const now = Date.now();
const timeSinceLastCall = now - this.lastCallTime;
// Enforce minimum interval between calls
if (timeSinceLastCall < this.minInterval) {
await this.sleep(this.minInterval - timeSinceLastCall);
}
this.lastCallTime = Date.now();
return await fn();
} finally {
releaseLock!();
}
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
The File-Based Counter Pattern
For APIs with daily limits, we use the filesystem:
// The filesystem IS our state!
import fs from 'fs';
import path from 'path';
class FileBasedRateLimiter {
private readonly limitDir = '/tmp/mcp-rate-limits';
constructor() {
// Ensure directory exists on initialization
if (!fs.existsSync(this.limitDir)) {
fs.mkdirSync(this.limitDir, { recursive: true });
}
}
async checkLimit(api: string, limit: number): Promise<boolean> {
const today = new Date().toISOString().split('T')[0];
const countFile = path.join(this.limitDir, `${api}-${today}.count`);
const lockFile = `${countFile}.lock`;
// Acquire file lock to prevent race conditions
while (fs.existsSync(lockFile)) {
await new Promise(resolve => setTimeout(resolve, 10));
}
try {
// Create lock file
fs.writeFileSync(lockFile, Date.now().toString());
// Read current count
let count = 0;
try {
count = parseInt(fs.readFileSync(countFile, 'utf8'), 10);
} catch {
// File doesn't exist, first request today
}
if (count >= limit) {
throw new Error(`Daily limit reached for ${api}: ${count}/${limit}`);
}
// Increment count atomically
fs.writeFileSync(countFile, String(count + 1));
return true;
} finally {
// Release lock
fs.unlinkSync(lockFile);
}
}
async cleanup(): Promise<void> {
// Remove old count files
const files = fs.readdirSync(this.limitDir);
const today = new Date().toISOString().split('T')[0];
files.forEach(file => {
if (!file.includes(today)) {
fs.unlinkSync(path.join(this.limitDir, file));
}
});
}
}
The Circuit Breaker Pattern
When APIs fail, we back off WITHOUT remembering failures:
export class StatelessCircuitBreaker {
async execute<T>(
fn: () => Promise<T>,
options: { maxRetries: number } = { maxRetries: 3 }
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt < options.maxRetries; attempt++) {
try {
// Exponential backoff based on attempt number
if (attempt > 0) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await this.sleep(delay);
}
return await fn();
} catch (error: any) {
lastError = error;
// Check if error is rate limit
if (this.isRateLimitError(error)) {
// Extract retry-after header if available
const retryAfter = this.getRetryAfter(error);
if (retryAfter) {
await this.sleep(retryAfter * 1000);
continue;
}
}
// If not rate limit or last attempt, throw
if (attempt === options.maxRetries - 1) {
throw error;
}
}
}
throw lastError || new Error('All attempts failed');
}
private isRateLimitError(error: any): boolean {
return error.code === 429 ||
error.message?.includes('rate limit') ||
error.message?.includes('too many requests');
}
private getRetryAfter(error: any): number | null {
// Check various places APIs put retry-after
return error.retryAfter ||
error.headers?.['retry-after'] ||
error.response?.headers?.['retry-after'] ||
null;
}
}
The Time-Window Queue
For complex APIs with "X requests per Y minutes":
class TimeWindowQueue {
// Use timestamp encoding to track request windows
private encodeWindow(timestamp: number, windowSize: number): string {
const window = Math.floor(timestamp / windowSize);
return `window:${window}`;
}
async canMakeRequest(
api: string,
limit: number,
windowMs: number
): Promise<boolean> {
const now = Date.now();
const currentWindow = this.encodeWindow(now, windowMs);
// Use filesystem to track requests in current window
const windowFile = `/tmp/mcp-rl/${api}-${currentWindow}.json`;
let timestamps: number[] = [];
try {
timestamps = JSON.parse(fs.readFileSync(windowFile, 'utf8'));
} catch {
// No file means new window
}
// Remove timestamps outside current window
const windowStart = now - windowMs;
timestamps = timestamps.filter(ts => ts > windowStart);
if (timestamps.length >= limit) {
// Calculate when next request can be made
const oldestInWindow = Math.min(...timestamps);
const nextAvailable = oldestInWindow + windowMs;
const waitTime = nextAvailable - now;
throw new Error(
`Rate limit exceeded. Retry in ${Math.ceil(waitTime / 1000)}s`
);
}
// Add current request
timestamps.push(now);
fs.writeFileSync(windowFile, JSON.stringify(timestamps));
return true;
}
}
The Distributed Rate Limiting Pattern
Multiple MCP server instances? Use lock files:
class DistributedRateLimiter {
private async acquireLock(resource: string): Promise<() => void> {
const lockFile = `/tmp/mcp-locks/${resource}.lock`;
const lockId = Math.random().toString(36);
// Spin until we get the lock
while (true) {
try {
// Atomic file creation
fs.writeFileSync(lockFile, lockId, { flag: 'wx' });
// Return unlock function
return () => {
try {
const current = fs.readFileSync(lockFile, 'utf8');
if (current === lockId) {
fs.unlinkSync(lockFile);
}
} catch {
// Lock already released
}
};
} catch {
// Lock exists, wait and retry
await this.sleep(10 + Math.random() * 40);
}
}
}
async executeWithLock<T>(
resource: string,
fn: () => Promise<T>
): Promise<T> {
const unlock = await this.acquireLock(resource);
try {
return await fn();
} finally {
unlock();
}
}
}
Real-World Implementation
Here's how we use it in our servers:
// src/utils/rateLimiter.ts
export class APIRateLimiter {
private limiters = new Map<string, any>();
constructor() {
// Configure limits for each API
this.limiters.set('coingecko', {
type: 'timeWindow',
requests: 10,
window: 60000 // 1 minute
});
this.limiters.set('infura', {
type: 'daily',
requests: 100000
});
this.limiters.set('alchemy', {
type: 'perSecond',
rps: 300
});
}
async execute<T>(
api: string,
fn: () => Promise<T>
): Promise<T> {
const limiter = this.limiters.get(api);
if (!limiter) return fn();
switch (limiter.type) {
case 'perSecond':
return this.executeWithRPS(fn, limiter.rps);
case 'timeWindow':
await this.checkTimeWindow(api, limiter.requests, limiter.window);
return fn();
case 'daily':
await this.checkDaily(api, limiter.requests);
return fn();
default:
return fn();
}
}
}
// Usage in tools
export async function handleGetTokenPrice(args: any, client: any) {
const rateLimiter = new APIRateLimiter();
const price = await rateLimiter.execute('coingecko', async () => {
return await fetchTokenPrice(args.token);
});
return {
content: [{
type: 'text',
text: JSON.stringify({ price }, null, 2)
}]
};
}
The Clever Hacks
1. Request Coalescing
Multiple tools requesting same data? Deduplicate:
class RequestCoalescer {
async coalesce<T>(
key: string,
fn: () => Promise<T>,
ttl: number = 1000
): Promise<T> {
const cacheFile = `/tmp/mcp-cache/${key}.json`;
try {
const cached = JSON.parse(fs.readFileSync(cacheFile, 'utf8'));
if (Date.now() - cached.timestamp < ttl) {
return cached.data;
}
} catch {
// No cache
}
const result = await fn();
fs.writeFileSync(cacheFile, JSON.stringify({
timestamp: Date.now(),
data: result
}));
return result;
}
}
2. Adaptive Delays
Slow down when approaching limits:
function adaptiveDelay(used: number, limit: number): number {
const usage = used / limit;
if (usage < 0.5) return 0; // No delay
if (usage < 0.7) return 100; // Slight delay
if (usage < 0.9) return 500; // Moderate delay
return 2000; // Heavy delay
}
3. Request Priority
Some requests are more important:
async function prioritizedExecute(priority: 'high' | 'low', fn: Function) {
if (priority === 'low') {
// Low priority requests get additional delay
const delay = 500 + Math.random() * 1500;
await sleep(delay);
}
return fn();
}
The Gotchas We Hit
1. Clock Drift
Servers with different times break time-based limiting:
// Use UTC everywhere!
const now = new Date().toISOString(); // Not Date.now()
2. Filesystem Permissions
// Always check temp directory is writable
const tempDir = process.env.MCP_TEMP_DIR || '/tmp/mcp';
if (!fs.existsSync(tempDir)) {
fs.mkdirSync(tempDir, { recursive: true });
}
3. Cleanup
Temp files accumulate:
// Add cleanup on startup
function cleanupOldFiles() {
const files = fs.readdirSync('/tmp/mcp-rate-limits');
const yesterday = Date.now() - 86400000;
files.forEach(file => {
const stats = fs.statSync(path.join('/tmp/mcp-rate-limits', file));
if (stats.mtimeMs < yesterday) {
fs.unlinkSync(path.join('/tmp/mcp-rate-limits', file));
}
});
}
The Philosophy
Stateless doesn't mean helpless.
We use:
Time as state
Filesystem as memory
Math instead of counters
Delays instead of queues
It's not perfect, but it works. And it keeps MCP servers simple.
The Checklist
Implementing rate limiting in your MCP server:
Identify API limits (requests/second, daily, etc.)
Choose appropriate pattern (time-based, file-based, etc.)
Implement retry logic with backoff
Add request coalescing for identical calls
Use filesystem for persistent counters
Clean up old tracking files
Test with concurrent requests
Document limits in error messages
Provide retry-after information
Monitor actual API usage
References
Rate limiter implementations:
/src/utils/rateLimiter.tsCircuit breaker pattern:
/src/utils/circuitBreaker.tsRequest coalescing:
/src/utils/requestCache.tsTest scenarios:
/tests/rate-limiting/
This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes constraints force creativity.
Related Reading
Prerequisites
Context Window Management: Building AI-Friendly Code - Understanding the constraints of the AI environment is key to understanding why statelessness is a design goal.
Next Steps
The MCP Inspector Deep Dive: Your Only Debugging Friend - Learn how to debug issues that arise from rate limiting.
Deep Dives
Error Handling in MCP: Where Do Errors Actually Go? - See how to properly structure and return rate limit errors to the AI client.

