Rate Limiting Without State: The MCP Paradox

Aug 18, 2025

Part 4 of the Journey: Advanced Topics & Deep Dives Previous: Context Window Management | Next: The MCP Inspector Deep Dive

How we implemented API rate limiting in a completely stateless protocol

Date: August 18, 2025 Author: Myron Koch & Claude Code Category: Architecture Challenges

The Impossible Problem

MCP servers are stateless. Every request is independent. No memory between calls.

But blockchain APIs have rate limits:

Infura: 100,000 requests/day
Alchemy: 300 requests/second
Binance: 1200 requests/minute
CoinGecko: 10 requests/minute (free tier)

How do you track request counts when you can't remember anything?

Why Traditional Solutions Don't Work

Can't Use In-Memory Counters

// This doesn't work in MCP
let requestCount = 0;  // Resets every request!

async function handleRequest() {
  requestCount++;  // Always 1
  if (requestCount > 100) {  // Never triggers
    throw new Error('Rate limited');
  }
}

Can't Use Redis/Database

// MCP servers should be zero-dependency
const redis = require('redis');  // ❌ External dependency
await redis.incr('api:requests');  // ❌ Stateful storage

Can't Use Global State

// Each tool call is isolated
global.requestTimestamps = [];  // ❌ Doesn't persist
process.env.REQUEST_COUNT++;    // ❌ Not writable

The Breakthrough: Time-Based Bucketing

We can't count requests, but we CAN use time:

export class StatelessRateLimiter {
  private readonly requestsPerSecond: number;
  private readonly minInterval: number;
  private lastCallTime = 0;
  private readonly mutex = new Promise<void>(resolve => resolve());

  constructor(requestsPerSecond: number) {
    this.requestsPerSecond = requestsPerSecond;
    this.minInterval = 1000 / requestsPerSecond;
  }

  async throttle<T>(fn: () => Promise<T>): Promise<T> {
    // Wait for previous call to release the lock
    await this.mutex;

    // Create new mutex for next caller
    let releaseLock: () => void;
    this.mutex = new Promise(resolve => { releaseLock = resolve; });

    try {
      const now = Date.now();
      const timeSinceLastCall = now - this.lastCallTime;

      // Enforce minimum interval between calls
      if (timeSinceLastCall < this.minInterval) {
        await this.sleep(this.minInterval - timeSinceLastCall);
      }

      this.lastCallTime = Date.now();
      return await fn();
    } finally {
      releaseLock!();
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

The File-Based Counter Pattern

For APIs with daily limits, we use the filesystem:

// The filesystem IS our state!
import fs from 'fs';
import path from 'path';

class FileBasedRateLimiter {
  private readonly limitDir = '/tmp/mcp-rate-limits';

  constructor() {
    // Ensure directory exists on initialization
    if (!fs.existsSync(this.limitDir)) {
      fs.mkdirSync(this.limitDir, { recursive: true });
    }
  }

  async checkLimit(api: string, limit: number): Promise<boolean> {
    const today = new Date().toISOString().split('T')[0];
    const countFile = path.join(this.limitDir, `${api}-${today}.count`);
    const lockFile = `${countFile}.lock`;

    // Acquire file lock to prevent race conditions
    while (fs.existsSync(lockFile)) {
      await new Promise(resolve => setTimeout(resolve, 10));
    }

    try {
      // Create lock file
      fs.writeFileSync(lockFile, Date.now().toString());

      // Read current count
      let count = 0;
      try {
        count = parseInt(fs.readFileSync(countFile, 'utf8'), 10);
      } catch {
        // File doesn't exist, first request today
      }

      if (count >= limit) {
        throw new Error(`Daily limit reached for ${api}: ${count}/${limit}`);
      }

      // Increment count atomically
      fs.writeFileSync(countFile, String(count + 1));

      return true;
    } finally {
      // Release lock
      fs.unlinkSync(lockFile);
    }
  }

  async cleanup(): Promise<void> {
    // Remove old count files
    const files = fs.readdirSync(this.limitDir);
    const today = new Date().toISOString().split('T')[0];

    files.forEach(file => {
      if (!file.includes(today)) {
        fs.unlinkSync(path.join(this.limitDir, file));
      }
    });
  }
}

The Circuit Breaker Pattern

When APIs fail, we back off WITHOUT remembering failures:

export class StatelessCircuitBreaker {
  async execute<T>(
    fn: () => Promise<T>,
    options: { maxRetries: number } = { maxRetries: 3 }
  ): Promise<T> {
    let lastError: Error | undefined;

    for (let attempt = 0; attempt < options.maxRetries; attempt++) {
      try {
        // Exponential backoff based on attempt number
        if (attempt > 0) {
          const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
          await this.sleep(delay);
        }

        return await fn();
      } catch (error: any) {
        lastError = error;

        // Check if error is rate limit
        if (this.isRateLimitError(error)) {
          // Extract retry-after header if available
          const retryAfter = this.getRetryAfter(error);
          if (retryAfter) {
            await this.sleep(retryAfter * 1000);
            continue;
          }
        }

        // If not rate limit or last attempt, throw
        if (attempt === options.maxRetries - 1) {
          throw error;
        }
      }
    }

    throw lastError || new Error('All attempts failed');
  }

  private isRateLimitError(error: any): boolean {
    return error.code === 429 ||
           error.message?.includes('rate limit') ||
           error.message?.includes('too many requests');
  }

  private getRetryAfter(error: any): number | null {
    // Check various places APIs put retry-after
    return error.retryAfter ||
           error.headers?.['retry-after'] ||
           error.response?.headers?.['retry-after'] ||
           null;
  }
}

The Time-Window Queue

For complex APIs with "X requests per Y minutes":

class TimeWindowQueue {
  // Use timestamp encoding to track request windows
  private encodeWindow(timestamp: number, windowSize: number): string {
    const window = Math.floor(timestamp / windowSize);
    return `window:${window}`;
  }

  async canMakeRequest(
    api: string,
    limit: number,
    windowMs: number
  ): Promise<boolean> {
    const now = Date.now();
    const currentWindow = this.encodeWindow(now, windowMs);

    // Use filesystem to track requests in current window
    const windowFile = `/tmp/mcp-rl/${api}-${currentWindow}.json`;

    let timestamps: number[] = [];
    try {
      timestamps = JSON.parse(fs.readFileSync(windowFile, 'utf8'));
    } catch {
      // No file means new window
    }

    // Remove timestamps outside current window
    const windowStart = now - windowMs;
    timestamps = timestamps.filter(ts => ts > windowStart);

    if (timestamps.length >= limit) {
      // Calculate when next request can be made
      const oldestInWindow = Math.min(...timestamps);
      const nextAvailable = oldestInWindow + windowMs;
      const waitTime = nextAvailable - now;

      throw new Error(
        `Rate limit exceeded. Retry in ${Math.ceil(waitTime / 1000)}s`
      );
    }

    // Add current request
    timestamps.push(now);
    fs.writeFileSync(windowFile, JSON.stringify(timestamps));

    return true;
  }
}

The Distributed Rate Limiting Pattern

Multiple MCP server instances? Use lock files:

class DistributedRateLimiter {
  private async acquireLock(resource: string): Promise<() => void> {
    const lockFile = `/tmp/mcp-locks/${resource}.lock`;
    const lockId = Math.random().toString(36);

    // Spin until we get the lock
    while (true) {
      try {
        // Atomic file creation
        fs.writeFileSync(lockFile, lockId, { flag: 'wx' });

        // Return unlock function
        return () => {
          try {
            const current = fs.readFileSync(lockFile, 'utf8');
            if (current === lockId) {
              fs.unlinkSync(lockFile);
            }
          } catch {
            // Lock already released
          }
        };
      } catch {
        // Lock exists, wait and retry
        await this.sleep(10 + Math.random() * 40);
      }
    }
  }

  async executeWithLock<T>(
    resource: string,
    fn: () => Promise<T>
  ): Promise<T> {
    const unlock = await this.acquireLock(resource);
    try {
      return await fn();
    } finally {
      unlock();
    }
  }
}

Real-World Implementation

Here's how we use it in our servers:

// src/utils/rateLimiter.ts
export class APIRateLimiter {
  private limiters = new Map<string, any>();

  constructor() {
    // Configure limits for each API
    this.limiters.set('coingecko', {
      type: 'timeWindow',
      requests: 10,
      window: 60000  // 1 minute
    });

    this.limiters.set('infura', {
      type: 'daily',
      requests: 100000
    });

    this.limiters.set('alchemy', {
      type: 'perSecond',
      rps: 300
    });
  }

  async execute<T>(
    api: string,
    fn: () => Promise<T>
  ): Promise<T> {
    const limiter = this.limiters.get(api);
    if (!limiter) return fn();

    switch (limiter.type) {
      case 'perSecond':
        return this.executeWithRPS(fn, limiter.rps);

      case 'timeWindow':
        await this.checkTimeWindow(api, limiter.requests, limiter.window);
        return fn();

      case 'daily':
        await this.checkDaily(api, limiter.requests);
        return fn();

      default:
        return fn();
    }
  }
}

// Usage in tools
export async function handleGetTokenPrice(args: any, client: any) {
  const rateLimiter = new APIRateLimiter();

  const price = await rateLimiter.execute('coingecko', async () => {
    return await fetchTokenPrice(args.token);
  });

  return {
    content: [{
      type: 'text',
      text: JSON.stringify({ price }, null, 2)
    }]
  };
}

The Clever Hacks

1. Request Coalescing

Multiple tools requesting same data? Deduplicate:

class RequestCoalescer {
  async coalesce<T>(
    key: string,
    fn: () => Promise<T>,
    ttl: number = 1000
  ): Promise<T> {
    const cacheFile = `/tmp/mcp-cache/${key}.json`;

    try {
      const cached = JSON.parse(fs.readFileSync(cacheFile, 'utf8'));
      if (Date.now() - cached.timestamp < ttl) {
        return cached.data;
      }
    } catch {
      // No cache
    }

    const result = await fn();
    fs.writeFileSync(cacheFile, JSON.stringify({
      timestamp: Date.now(),
      data: result
    }));

    return result;
  }
}

2. Adaptive Delays

Slow down when approaching limits:

function adaptiveDelay(used: number, limit: number): number {
  const usage = used / limit;

  if (usage < 0.5) return 0;        // No delay
  if (usage < 0.7) return 100;      // Slight delay
  if (usage < 0.9) return 500;      // Moderate delay
  return 2000;                      // Heavy delay
}

3. Request Priority

Some requests are more important:

async function prioritizedExecute(priority: 'high' | 'low', fn: Function) {
  if (priority === 'low') {
    // Low priority requests get additional delay
    const delay = 500 + Math.random() * 1500;
    await sleep(delay);
  }

  return fn();
}

The Gotchas We Hit

1. Clock Drift

Servers with different times break time-based limiting:

// Use UTC everywhere!
const now = new Date().toISOString();  // Not Date.now()

2. Filesystem Permissions

// Always check temp directory is writable
const tempDir = process.env.MCP_TEMP_DIR || '/tmp/mcp';
if (!fs.existsSync(tempDir)) {
  fs.mkdirSync(tempDir, { recursive: true });
}

3. Cleanup

Temp files accumulate:

// Add cleanup on startup
function cleanupOldFiles() {
  const files = fs.readdirSync('/tmp/mcp-rate-limits');
  const yesterday = Date.now() - 86400000;

  files.forEach(file => {
    const stats = fs.statSync(path.join('/tmp/mcp-rate-limits', file));
    if (stats.mtimeMs < yesterday) {
      fs.unlinkSync(path.join('/tmp/mcp-rate-limits', file));
    }
  });
}

The Philosophy

Stateless doesn't mean helpless.

We use:

Time as state
Filesystem as memory
Math instead of counters
Delays instead of queues

It's not perfect, but it works. And it keeps MCP servers simple.

The Checklist

Implementing rate limiting in your MCP server:

Identify API limits (requests/second, daily, etc.)
Choose appropriate pattern (time-based, file-based, etc.)
Implement retry logic with backoff
Add request coalescing for identical calls
Use filesystem for persistent counters
Clean up old tracking files
Test with concurrent requests
Document limits in error messages
Provide retry-after information
Monitor actual API usage

References

Rate limiter implementations: /src/utils/rateLimiter.ts
Circuit breaker pattern: /src/utils/circuitBreaker.ts
Request coalescing: /src/utils/requestCache.ts
Test scenarios: /tests/rate-limiting/

This is part of our ongoing series documenting architectural patterns and insights from building the Blockchain MCP Server Ecosystem. Sometimes constraints force creativity.

Operational Semantics

Discussion about this post

Ready for more?