Production & Scaling

This guide covers everything you need to deploy CodeCall in production: performance optimization, monitoring, multi-instance deployment, and operational best practices.

Production Checklist

Before deploying CodeCall to production, complete these steps:

Choose the right VM preset

Use secure for most production workloads, locked_down for sensitive data. Never use experimental in production.

Configure tool access control

Set up includeTools filtering and per-tool codecall metadata to limit which tools are accessible.

Enable audit logging

Configure audit sinks to track script execution, tool calls, and security events.

Set up rate limiting

Add rate limits on codecall:execute to prevent abuse.

Configure monitoring and alerting

Track execution latency, error rates, and security blocks with your observability stack.

Test security boundaries

Run the attack vector tests from AST Guard’s security audit suite.

Plan gradual rollout

Start with metadata_driven mode, then migrate to codecall_only once validated.

Performance Characteristics

Latency Breakdown

Stage	Typical Time	Notes
AST Parsing	1-5ms	Scales with code size
AST Validation	2-10ms	Depends on rule count
Code Transformation	1-3ms	One-time per script
VM Execution	Variable	Depends on script complexity
Tool Calls	Variable	Network/database bound
Output Sanitization	1-5ms	Scales with output size

Total overhead (excluding tool calls): ~8-25ms for typical scripts.

Throughput

Configuration	Requests/sec	Notes
Single instance, TF-IDF	~500	Bottleneck: VM isolation
Single instance, ML	~200	Bottleneck: Model inference
Multi-instance (4 pods)	~1,500+	Near-linear scaling

Throughput depends heavily on script complexity and tool call latency. These numbers assume simple scripts with 1-3 tool calls.

Performance Optimization

1. Use TF-IDF for Most Cases

Unless you have 100+ tools with similar descriptions, TF-IDF provides excellent relevance with minimal overhead:

CodeCallPlugin.init({
  embedding: {
    strategy: 'tfidf',  // 10x faster than ml
  },
});

2. Enable ML for Large Toolsets

For 100+ tools with similar descriptions, the ML strategy provides better semantic matching:

CodeCallPlugin.init({
  embedding: {
    strategy: 'ml',
    useHNSW: true,  // For 1000+ tools
  },
});

3. Use Direct Invoke for Simple Calls

Bypass VM overhead for single-tool operations:

// Instead of
{
  "tool": "codecall:execute",
  "input": {
    "script": "return await callTool('users:getById', { id: '123' });"
  }
}

// Use
{
  "tool": "codecall:invoke",
  "input": {
    "tool": "users:getById",
    "input": { "id": "123" }
  }
}

Savings: ~15-20ms per call.

4. Cache Describe Results

Tool schemas rarely change. CodeCall internally caches describe and search results to reduce overhead on repeated calls.

Multi-Instance Deployment

CodeCall is stateless and scales horizontally.

Architecture

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: mcp-server
          image: your-mcp-server:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30

Resource Recommendations

Workload	CPU	Memory	Instances
Light (<100 req/min)	0.5 core	512MB	1-2
Medium (100-500 req/min)	1 core	1GB	2-4
Heavy (500+ req/min)	2 cores	2GB	4+

Embedding strategy requires additional memory (~200MB) for the transformer model. Account for this in resource limits.

Monitoring

Metrics to Track

Execution Latency

Track p50, p95, p99 of codecall:execute duration

Error Rate

Monitor validation errors, timeouts, and tool failures

Tool Call Count

Average tool calls per script execution

Search Latency

Track search response times for index health

Logging

CodeCall’s internal AuditLoggerService emits structured log events for observability. These events can be consumed by your logging infrastructure: Log events:

// Script execution start
{ "event": "codecall:execute:start", "executionId": "abc123", "scriptSize": 245 }

// Tool call
{ "event": "codecall:tool:call", "executionId": "abc123", "tool": "users:list", "duration": 45 }

// Script execution complete
{ "event": "codecall:execute:complete", "executionId": "abc123", "status": "ok", "duration": 234, "toolCalls": 3 }

// Security event
{ "event": "codecall:security:blocked", "reason": "self_reference", "tool": "codecall:execute" }

Alerting Recommendations

Metric	Warning	Critical
Execute p99 latency	> 2s	> 5s
Error rate	> 5%	> 15%
Timeout rate	> 1%	> 5%
Security blocks	Any	High volume

Cost Optimization

Token Savings

CodeCall dramatically reduces token usage:

Scenario	Without CodeCall	With CodeCall	Savings
100 tools in context	~25,000 tokens	~3,000 tokens	88%
Multi-tool workflow (5 calls)	~50,000 tokens	~5,000 tokens	90%
Complex filtering	~100,000 tokens	~8,000 tokens	92%

Compute Costs

Factor	Impact	Optimization
VM isolation	~10ms overhead	Use `codecall:invoke` for simple calls
Embedding inference	~50ms/query	Use TF-IDF for fewer than 100 tools
Tool calls	Dominant cost	Optimize underlying tools

Cost vs. Performance Tradeoffs

Minimize Latency

Use TF-IDF search
Enable caching for describe/search
Use direct invoke for simple calls
Increase VM timeout for complex scripts

Minimize Compute

Use locked_down preset (shorter timeouts)
Limit maxToolCalls aggressively
Cache aggressively
Use fewer instances with more resources

Minimize Tokens

Use codecall_only mode
Hide all tools from list_tools
Return minimal data from tools
Let scripts filter server-side

Security in Production

Checklist

Use secure or locked_down preset

Never use experimental in production.

Enable audit logging

Log all script executions and security events.

Configure rate limiting

Prevent abuse via aggressive rate limits.

Monitor security events

Alert on validation failures and self-reference attempts.

Regular security reviews

Review tool allowlists and filter rules quarterly.

Rate Limiting

Rate limiting should be handled at the infrastructure level (reverse proxy, API gateway) or with middleware. Configure limits on codecall:execute to prevent abuse.

Multi-Tenancy Patterns

CodeCall supports multiple isolation strategies for multi-tenant deployments.

Tenant Context

Pass tenant information via codecallContext:

{
  "tool": "codecall:execute",
  "input": {
    "script": "...",
    "context": {
      "tenantId": "acme-corp",
      "userId": "user-123",
      "permissions": ["read", "write"]
    }
  }
}

Per-Tenant Tool Filtering

Restrict tools based on tenant using the includeTools filter:

CodeCallPlugin.init({
  includeTools: (tool) => {
    // Block admin tools from CodeCall
    if (tool.name.startsWith('admin:')) return false;

    // Filter by app ownership
    if (tool.metadata?.codecall?.appId) {
      return ['user-service', 'billing'].includes(tool.metadata.codecall.appId);
    }

    return true;
  },
});

Isolation Strategies

Strategy	Isolation Level	Cost	Use Case
Shared instance	Low	$	Dev/staging
Tenant-specific limits	Medium	$$	SaaS standard
Dedicated instances	Maximum	$$$$	Compliance-heavy

Troubleshooting

Common Issues

Scripts timing out

Symptoms: Frequent TIMEOUT errorsCauses:

Script too complex
Tool calls too slow
Timeout too aggressive

Solutions:

Profile tool call latency
Increase vm.timeoutMs if tools are slow
Break complex scripts into smaller pieces
Use Promise.all() for independent tool calls

Search returning irrelevant results

Symptoms: Low relevance scores, wrong tools returnedCauses:

Poor tool descriptions
Threshold too low
TF-IDF limitations

Solutions:

Improve tool descriptions
Switch to ml strategy for semantic matching
Add more specific keywords to descriptions

High memory usage

Symptoms: OOM errors, pod restartsCauses:

Embedding model loaded
Large tool index
Scripts returning large data

Solutions:

Use TF-IDF instead of embeddings
Increase memory limits
Configure output sanitization limits
Enable HNSW for large indexes

Validation errors for valid code

Symptoms: Scripts rejected that should workCauses:

Using blocked constructs
Reserved prefix collision
Unicode issues

Solutions:

Check for eval, Function, etc.
Avoid __ag_ and __safe_ prefixes
Use ASCII identifiers
Review AST Guard rules

Migration & Rollback

Gradual Rollout

Phase 1: Deploy with mode: 'metadata_driven'
- All tools visible normally
- Mark select tools for CodeCall
- Monitor for issues
Phase 2: Switch to mode: 'codecall_opt_in'
- Tools opt into CodeCall
- Both access methods work
- Measure token savings
Phase 3: Move to mode: 'codecall_only'
- Hide tools from list_tools
- Full CodeCall experience
- Maximum token savings

Rollback Plan

// Emergency rollback: disable CodeCall
@App({
  plugins: process.env.CODECALL_ENABLED === 'false'
    ? []
    : [CodeCallPlugin.init({ ... })],
})

Feature flag CodeCall to enable instant rollback without redeployment.

Configuration

All configuration options including VM presets and embedding strategies

Security Model

Defense-in-depth security architecture and settings

API Reference

Meta-tool schemas, error codes, and debugging guide

Deployment Guide

General FrontMCP production deployment

Get Started

FrontMCP

Features

Extensibility

Testing

Guides

​Production Checklist

​Performance Characteristics

​Latency Breakdown

​Throughput

​Performance Optimization

​1. Use TF-IDF for Most Cases

​2. Enable ML for Large Toolsets

​3. Use Direct Invoke for Simple Calls

​4. Cache Describe Results

​Multi-Instance Deployment

​Architecture

​Kubernetes Deployment

​Resource Recommendations

​Monitoring

​Metrics to Track

Execution Latency

Error Rate

Tool Call Count

Search Latency

​Logging

​Alerting Recommendations

​Cost Optimization

​Token Savings

​Compute Costs

​Cost vs. Performance Tradeoffs

​Security in Production

​Checklist

​Rate Limiting

​Multi-Tenancy Patterns

​Tenant Context

​Per-Tenant Tool Filtering

​Isolation Strategies

​Troubleshooting

​Common Issues

​Migration & Rollback

​Gradual Rollout

​Rollback Plan

​Related

Configuration

Security Model

API Reference

Deployment Guide

Production Checklist

Performance Characteristics

Latency Breakdown

Throughput

Performance Optimization

1. Use TF-IDF for Most Cases

2. Enable ML for Large Toolsets

3. Use Direct Invoke for Simple Calls

4. Cache Describe Results

Multi-Instance Deployment

Architecture

Kubernetes Deployment

Resource Recommendations

Monitoring

Metrics to Track

Logging

Alerting Recommendations

Cost Optimization

Token Savings

Compute Costs

Cost vs. Performance Tradeoffs

Security in Production

Checklist

Rate Limiting

Multi-Tenancy Patterns

Tenant Context

Per-Tenant Tool Filtering

Isolation Strategies

Troubleshooting

Common Issues

Migration & Rollback

Gradual Rollout

Rollback Plan

Related