Skip to main content
This guide covers everything you need to deploy CodeCall in production: performance optimization, monitoring, multi-instance deployment, and operational best practices.

Production Checklist

Before deploying CodeCall to production, complete these steps:
1

Choose the right VM preset

Use secure for most production workloads, locked_down for sensitive data. Never use experimental in production.
2

Configure tool access control

Set up includeTools filtering and per-tool codecall metadata to limit which tools are accessible.
3

Enable audit logging

Configure audit sinks to track script execution, tool calls, and security events.
4

Set up rate limiting

Add rate limits on codecall:execute to prevent abuse.
5

Configure monitoring and alerting

Track execution latency, error rates, and security blocks with your observability stack.
6

Test security boundaries

Run the attack vector tests from AST Guard’s security audit suite.
7

Plan gradual rollout

Start with metadata_driven mode, then migrate to codecall_only once validated.

Performance Characteristics

Latency Breakdown

StageTypical TimeNotes
AST Parsing1-5msScales with code size
AST Validation2-10msDepends on rule count
Code Transformation1-3msOne-time per script
VM ExecutionVariableDepends on script complexity
Tool CallsVariableNetwork/database bound
Output Sanitization1-5msScales with output size
Total overhead (excluding tool calls): ~8-25ms for typical scripts.

Throughput

ConfigurationRequests/secNotes
Single instance, TF-IDF~500Bottleneck: VM isolation
Single instance, ML~200Bottleneck: Model inference
Multi-instance (4 pods)~1,500+Near-linear scaling
Throughput depends heavily on script complexity and tool call latency. These numbers assume simple scripts with 1-3 tool calls.

Performance Optimization

1. Use TF-IDF for Most Cases

Unless you have 100+ tools with similar descriptions, TF-IDF provides excellent relevance with minimal overhead:
CodeCallPlugin.init({
  embedding: {
    strategy: 'tfidf',  // 10x faster than ml
  },
});

2. Enable ML for Large Toolsets

For 100+ tools with similar descriptions, the ML strategy provides better semantic matching:
CodeCallPlugin.init({
  embedding: {
    strategy: 'ml',
    useHNSW: true,  // For 1000+ tools
  },
});

3. Use Direct Invoke for Simple Calls

Bypass VM overhead for single-tool operations:
// Instead of
{
  "tool": "codecall:execute",
  "input": {
    "script": "return await callTool('users:getById', { id: '123' });"
  }
}

// Use
{
  "tool": "codecall:invoke",
  "input": {
    "tool": "users:getById",
    "input": { "id": "123" }
  }
}
Savings: ~15-20ms per call.

4. Cache Describe Results

Tool schemas rarely change. CodeCall internally caches describe and search results to reduce overhead on repeated calls.

Multi-Instance Deployment

CodeCall is stateless and scales horizontally.

Architecture

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: mcp-server
          image: your-mcp-server:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30

Resource Recommendations

WorkloadCPUMemoryInstances
Light (<100 req/min)0.5 core512MB1-2
Medium (100-500 req/min)1 core1GB2-4
Heavy (500+ req/min)2 cores2GB4+
Embedding strategy requires additional memory (~200MB) for the transformer model. Account for this in resource limits.

Monitoring

Metrics to Track

Execution Latency

Track p50, p95, p99 of codecall:execute duration

Error Rate

Monitor validation errors, timeouts, and tool failures

Tool Call Count

Average tool calls per script execution

Search Latency

Track search response times for index health

Logging

CodeCall’s internal AuditLoggerService emits structured log events for observability. These events can be consumed by your logging infrastructure: Log events:
// Script execution start
{ "event": "codecall:execute:start", "executionId": "abc123", "scriptSize": 245 }

// Tool call
{ "event": "codecall:tool:call", "executionId": "abc123", "tool": "users:list", "duration": 45 }

// Script execution complete
{ "event": "codecall:execute:complete", "executionId": "abc123", "status": "ok", "duration": 234, "toolCalls": 3 }

// Security event
{ "event": "codecall:security:blocked", "reason": "self_reference", "tool": "codecall:execute" }

Alerting Recommendations

MetricWarningCritical
Execute p99 latency> 2s> 5s
Error rate> 5%> 15%
Timeout rate> 1%> 5%
Security blocksAnyHigh volume

Cost Optimization

Token Savings

CodeCall dramatically reduces token usage:
ScenarioWithout CodeCallWith CodeCallSavings
100 tools in context~25,000 tokens~3,000 tokens88%
Multi-tool workflow (5 calls)~50,000 tokens~5,000 tokens90%
Complex filtering~100,000 tokens~8,000 tokens92%

Compute Costs

FactorImpactOptimization
VM isolation~10ms overheadUse codecall:invoke for simple calls
Embedding inference~50ms/queryUse TF-IDF for fewer than 100 tools
Tool callsDominant costOptimize underlying tools

Cost vs. Performance Tradeoffs

  • Use TF-IDF search
  • Enable caching for describe/search
  • Use direct invoke for simple calls
  • Increase VM timeout for complex scripts
  • Use locked_down preset (shorter timeouts)
  • Limit maxToolCalls aggressively
  • Cache aggressively
  • Use fewer instances with more resources
  • Use codecall_only mode
  • Hide all tools from list_tools
  • Return minimal data from tools
  • Let scripts filter server-side

Security in Production

Checklist

1

Use secure or locked_down preset

Never use experimental in production.
2

Enable audit logging

Log all script executions and security events.
3

Configure rate limiting

Prevent abuse via aggressive rate limits.
4

Monitor security events

Alert on validation failures and self-reference attempts.
5

Regular security reviews

Review tool allowlists and filter rules quarterly.

Rate Limiting

Rate limiting should be handled at the infrastructure level (reverse proxy, API gateway) or with middleware. Configure limits on codecall:execute to prevent abuse.

Multi-Tenancy Patterns

CodeCall supports multiple isolation strategies for multi-tenant deployments.

Tenant Context

Pass tenant information via codecallContext:
{
  "tool": "codecall:execute",
  "input": {
    "script": "...",
    "context": {
      "tenantId": "acme-corp",
      "userId": "user-123",
      "permissions": ["read", "write"]
    }
  }
}

Per-Tenant Tool Filtering

Restrict tools based on tenant using the includeTools filter:
CodeCallPlugin.init({
  includeTools: (tool) => {
    // Block admin tools from CodeCall
    if (tool.name.startsWith('admin:')) return false;

    // Filter by app ownership
    if (tool.metadata?.codecall?.appId) {
      return ['user-service', 'billing'].includes(tool.metadata.codecall.appId);
    }

    return true;
  },
});

Isolation Strategies

StrategyIsolation LevelCostUse Case
Shared instanceLow$Dev/staging
Tenant-specific limitsMedium$$SaaS standard
Dedicated instancesMaximum$$$$Compliance-heavy

Troubleshooting

Common Issues

Symptoms: Frequent TIMEOUT errorsCauses:
  • Script too complex
  • Tool calls too slow
  • Timeout too aggressive
Solutions:
  1. Profile tool call latency
  2. Increase vm.timeoutMs if tools are slow
  3. Break complex scripts into smaller pieces
  4. Use Promise.all() for independent tool calls
Symptoms: Low relevance scores, wrong tools returnedCauses:
  • Poor tool descriptions
  • Threshold too low
  • TF-IDF limitations
Solutions:
  1. Improve tool descriptions
  2. Switch to ml strategy for semantic matching
  3. Add more specific keywords to descriptions
Symptoms: OOM errors, pod restartsCauses:
  • Embedding model loaded
  • Large tool index
  • Scripts returning large data
Solutions:
  1. Use TF-IDF instead of embeddings
  2. Increase memory limits
  3. Configure output sanitization limits
  4. Enable HNSW for large indexes
Symptoms: Scripts rejected that should workCauses:
  • Using blocked constructs
  • Reserved prefix collision
  • Unicode issues
Solutions:
  1. Check for eval, Function, etc.
  2. Avoid __ag_ and __safe_ prefixes
  3. Use ASCII identifiers
  4. Review AST Guard rules

Migration & Rollback

Gradual Rollout

  1. Phase 1: Deploy with mode: 'metadata_driven'
    • All tools visible normally
    • Mark select tools for CodeCall
    • Monitor for issues
  2. Phase 2: Switch to mode: 'codecall_opt_in'
    • Tools opt into CodeCall
    • Both access methods work
    • Measure token savings
  3. Phase 3: Move to mode: 'codecall_only'
    • Hide tools from list_tools
    • Full CodeCall experience
    • Maximum token savings

Rollback Plan

// Emergency rollback: disable CodeCall
@App({
  plugins: process.env.CODECALL_ENABLED === 'false'
    ? []
    : [CodeCallPlugin.init({ ... })],
})
Feature flag CodeCall to enable instant rollback without redeployment.

Configuration

All configuration options including VM presets and embedding strategies

Security Model

Defense-in-depth security architecture and settings

API Reference

Meta-tool schemas, error codes, and debugging guide

Deployment Guide

General FrontMCP production deployment