Skip to main content
Guard provides rate limiting, concurrency control, execution timeout, and IP filtering for your MCP server. It protects against abuse, ensures fair resource allocation, and prevents runaway requests.
Guard is powered by the @frontmcp/guard library and integrates directly into tool and agent flows. All guard checks run automatically before execution, with cleanup handled in finalize stages.

Why Guard?

ThreatWithout GuardWith Guard
Client flooding requestsServer overwhelmedRate-limited per user/IP
Tool running foreverHangs, resource leakTimeout protection
Unbounded parallelismResource exhaustionControlled concurrency
Malicious IPsOpen accessIP allow/deny filtering

Quick Start

Add rate limiting and a timeout to any tool with decorator options:
import { Tool, ToolContext } from '@frontmcp/sdk';
import { z } from 'zod';

@Tool({
  name: 'search',
  description: 'Search documents',
  inputSchema: { query: z.string() },
  rateLimit: { maxRequests: 60, windowMs: 60_000, partitionBy: 'userId' },
  timeout: { executeMs: 10_000 },
})
class SearchTool extends ToolContext<typeof SearchTool> {
  async execute({ query }: { query: string }) {
    return { results: await this.get(SearchService).search(query) };
  }
}

How Guard Integrates with Flows

Guard checks are implemented as flow stages that run automatically in the tool and agent execution pipelines:
Pre stages:      ... → acquireQuota → acquireSemaphore → ...
Execute stages:  validateInput → execute (wrapped with timeout) → validateOutput
Finalize stages: releaseSemaphore → releaseQuota → ...
  1. acquireQuota — Checks global and per-entity rate limits. Throws RateLimitError if exceeded.
  2. acquireSemaphore — Acquires a concurrency slot. Throws ConcurrencyLimitError if no slot available.
  3. execute — Wrapped with withTimeout if a timeout is configured. Throws ExecutionTimeoutError if exceeded.
  4. releaseSemaphore — Releases the concurrency slot back to the pool.
  5. releaseQuota — Cleans up rate limit state.

Rate Limiting

FrontMCP uses a sliding window algorithm for rate limiting. It provides smooth, accurate throttling with O(1) storage per key.

Per-Tool Rate Limiting

@Tool({
  name: 'api:call',
  inputSchema: { endpoint: z.string() },
  rateLimit: {
    maxRequests: 100,       // 100 requests
    windowMs: 60_000,       // per 60 seconds
    partitionBy: 'userId',  // per user
  },
})
class ApiCallTool extends ToolContext<typeof ApiCallTool> {
  async execute({ endpoint }: { endpoint: string }) {
    return await this.get(ApiService).call(endpoint);
  }
}

Global Rate Limiting

Set a server-wide rate limit in your app configuration:
@FrontMcp({
  name: 'my-server',
  throttle: {
    enabled: true,
    global: {
      maxRequests: 1000,
      windowMs: 60_000,
      partitionBy: 'ip',
    },
  },
  tools: [ApiCallTool, SearchTool],
})
class MyApp {}
The global rate limit is checked before per-entity limits. Both must pass for a request to proceed.

Partition Strategies

Partition keys determine how rate limits are bucketed:
StrategyDescriptionUse Case
'global'Single shared bucketServer-wide limits
'ip'Per client IP addressPrevent IP-based abuse
'session'Per MCP session IDPer-connection limits
'userId'Per authenticated userPer-user quotas
Custom function(ctx) => stringTenant, org, or custom grouping
Custom partition key example:
@Tool({
  name: 'tenant:query',
  inputSchema: { query: z.string() },
  rateLimit: {
    maxRequests: 500,
    windowMs: 60_000,
    partitionBy: (ctx) => ctx.userId?.split(':')[0] ?? 'anonymous',
  },
})
class TenantQueryTool extends ToolContext<typeof TenantQueryTool> { /* ... */ }

Concurrency Control

Concurrency control uses a distributed semaphore to limit how many instances of a tool or agent can execute simultaneously.

Per-Tool Concurrency

@Tool({
  name: 'report:generate',
  inputSchema: { reportId: z.string() },
  concurrency: {
    maxConcurrent: 3,        // At most 3 simultaneous executions
    queueTimeoutMs: 10_000,  // Wait up to 10s for a slot
    partitionBy: 'global',   // Shared across all users
  },
})
class GenerateReportTool extends ToolContext<typeof GenerateReportTool> {
  async execute({ reportId }: { reportId: string }) {
    return await this.get(ReportService).generate(reportId);
  }
}

Mutex Pattern

Set maxConcurrent: 1 to ensure only one execution at a time:
@Tool({
  name: 'db:migrate',
  inputSchema: { version: z.string() },
  concurrency: { maxConcurrent: 1 },
})
class MigrateTool extends ToolContext<typeof MigrateTool> { /* ... */ }

Queue Behavior

When queueTimeoutMs is set, requests that cannot acquire a slot immediately will wait in a queue:
  • queueTimeoutMs: 0 (default) — Immediately reject if no slot available. Throws ConcurrencyLimitError.
  • queueTimeoutMs: 5000 — Wait up to 5 seconds for a slot. Throws QueueTimeoutError if the wait expires.
The semaphore uses pub/sub notifications when available (Redis) for efficient slot release detection, falling back to polling with exponential backoff.

Execution Timeout

Timeout wraps the execute stage with a deadline. If execution exceeds the configured duration, it throws ExecutionTimeoutError.

Per-Tool Timeout

@Tool({
  name: 'llm:summarize',
  inputSchema: { text: z.string() },
  timeout: { executeMs: 30_000 },  // 30-second deadline
})
class SummarizeTool extends ToolContext<typeof SummarizeTool> {
  async execute({ text }: { text: string }) {
    return await this.get(LlmService).summarize(text);
  }
}

Default Timeout

Set a default timeout for all tools and agents at the app level:
@FrontMcp({
  name: 'my-server',
  throttle: {
    enabled: true,
    defaultTimeout: { executeMs: 15_000 },
  },
  tools: [SummarizeTool, SearchTool],
})
class MyApp {}
Per-entity timeout takes precedence over the app default.

IP Filtering

IP filtering allows or blocks requests based on client IP address, supporting IPv4, IPv6, and CIDR ranges.
@FrontMcp({
  name: 'my-server',
  throttle: {
    enabled: true,
    ipFilter: {
      allowList: ['10.0.0.0/8', '172.16.0.0/12'],
      denyList: ['192.0.2.1', '198.51.100.0/24'],
      defaultAction: 'deny',
      trustProxy: true,
      trustedProxyDepth: 1,
    },
  },
  tools: [MyTool],
})
class MyApp {}

Filter Precedence

  1. Deny list is checked first. If matched, the request is blocked with IpBlockedError (403).
  2. Allow list is checked next. If matched, the request proceeds.
  3. Default action applies if neither list matches:
    • 'allow' (default) — Request proceeds.
    • 'deny' — Request is blocked with IpNotAllowedError (403).

Supported IP Formats

FormatExample
IPv4 address192.168.1.1
IPv4 CIDR10.0.0.0/8
IPv6 address2001:db8::1
IPv6 CIDR2001:db8::/32
IPv4-mapped IPv6::ffff:192.168.1.1

Proxy Configuration

When your server is behind a reverse proxy (Nginx, CloudFront, etc.), enable trustProxy to read the client IP from the X-Forwarded-For header:
ipFilter: {
  trustProxy: true,
  trustedProxyDepth: 2,  // Trust up to 2 proxy hops
  // ...
}

App-Level Configuration

The throttle field in @FrontMcp configures all guard features at the app level:
@FrontMcp({
  name: 'production-server',
  throttle: {
    enabled: true,

    // Storage backend (defaults to in-memory)
    storage: { provider: 'redis', host: 'localhost', port: 6379 },
    keyPrefix: 'mcp:guard:',

    // Global limits (checked before per-entity)
    global: { maxRequests: 1000, windowMs: 60_000, partitionBy: 'ip' },
    globalConcurrency: { maxConcurrent: 50 },

    // Defaults for entities without explicit config
    defaultRateLimit: { maxRequests: 60, windowMs: 60_000, partitionBy: 'session' },
    defaultConcurrency: { maxConcurrent: 10 },
    defaultTimeout: { executeMs: 30_000 },

    // IP filtering
    ipFilter: {
      allowList: ['203.0.113.0/24'],
      denyList: ['192.0.2.1'],
      defaultAction: 'allow',
      trustProxy: true,
    },
  },
  tools: [SearchTool, ReportTool],
})
class ProductionApp {}

Configuration Precedence

Guard TypePer-Entity ConfigApp DefaultFallback
Rate limit@Tool({ rateLimit })throttle.defaultRateLimitNo limit
Concurrency@Tool({ concurrency })throttle.defaultConcurrencyNo limit
Timeout@Tool({ timeout })throttle.defaultTimeoutNo timeout
IP filterN/A (app-level only)throttle.ipFilterNo filter
Global rate limitN/A (app-level only)throttle.globalNo limit

Storage Backends

Guard supports multiple storage backends for distributed deployments.

Memory (Development)

The default backend. Suitable for single-instance development. No configuration needed.
throttle: {
  enabled: true,
  // storage not set = in-memory
}
In-memory storage does not persist across restarts and does not work with multiple server instances. Use Redis for production.

Redis (Production)

For distributed rate limiting across multiple server instances:
throttle: {
  enabled: true,
  storage: {
    provider: 'redis',
    host: 'redis.example.com',
    port: 6379,
    password: process.env.REDIS_PASSWORD,
    tls: true,
  },
}
Redis enables pub/sub-based semaphore notifications for more efficient concurrency slot release detection.

Vercel KV / Upstash

For serverless environments:
throttle: {
  enabled: true,
  storage: {
    provider: 'vercel-kv',
    url: process.env.KV_REST_API_URL,
    token: process.env.KV_REST_API_TOKEN,
  },
}

Error Handling

Guard throws specific error classes when limits are exceeded:
Error ClassCodeHTTP StatusWhen Thrown
RateLimitErrorRATE_LIMIT_EXCEEDED429Request exceeds rate limit
ConcurrencyLimitErrorCONCURRENCY_LIMIT429No concurrency slot available
QueueTimeoutErrorQUEUE_TIMEOUT429Queue wait time exceeded
ExecutionTimeoutErrorEXECUTION_TIMEOUT408Execution exceeded deadline
IpBlockedErrorIP_BLOCKED403Client IP is on deny list
IpNotAllowedErrorIP_NOT_ALLOWED403Client IP not on allow list
These errors are automatically serialized to appropriate MCP error responses by the transport layer.

Agent Guard

Agents support the same guard options as tools:
@Agent({
  name: 'research-agent',
  description: 'Research assistant',
  rateLimit: { maxRequests: 10, windowMs: 60_000, partitionBy: 'userId' },
  concurrency: { maxConcurrent: 2 },
  timeout: { executeMs: 120_000 },
})
class ResearchAgent extends AgentContext<typeof ResearchAgent> {
  async execute(input: unknown) {
    // Agent execution with guard protection
  }
}
The agent flow follows the same stage ordering: acquireQuotaacquireSemaphoreexecute (with timeout) → releaseSemaphorereleaseQuota.

Configuration Reference

RateLimitConfig

FieldTypeDefaultDescription
maxRequestsnumberrequiredMaximum requests allowed in the window
windowMsnumber60000Time window in milliseconds
partitionByPartitionKey'global'Partition strategy for bucketing

ConcurrencyConfig

FieldTypeDefaultDescription
maxConcurrentnumberrequiredMaximum simultaneous executions
queueTimeoutMsnumber0Max wait time for a slot (0 = no wait)
partitionByPartitionKey'global'Partition strategy for bucketing

TimeoutConfig

FieldTypeDefaultDescription
executeMsnumberrequiredMaximum execution time in milliseconds

IpFilterConfig

FieldTypeDefaultDescription
allowListstring[][]IPs or CIDR ranges to always allow
denyListstring[][]IPs or CIDR ranges to always block
defaultAction'allow' | 'deny''allow'Action when IP matches neither list
trustProxybooleanfalseTrust X-Forwarded-For header
trustedProxyDepthnumber1Max proxy hops to trust

GuardConfig (App-Level)

FieldTypeDefaultDescription
enabledbooleanrequiredEnable or disable all guard features
storageStorageConfigin-memoryStorage backend configuration
keyPrefixstring'mcp:guard:'Prefix for all storage keys
globalRateLimitConfigGlobal rate limit for all requests
globalConcurrencyConcurrencyConfigGlobal concurrency limit
defaultRateLimitRateLimitConfigDefault per-entity rate limit
defaultConcurrencyConcurrencyConfigDefault per-entity concurrency
defaultTimeoutTimeoutConfigDefault per-entity timeout
ipFilterIpFilterConfigIP filtering configuration