Tool Discovery

Learn how to build an always-on semantic tool discovery system using VectoriaDB.

In this guide you’ll build a typed document shape for tools, an indexing routine that stays in sync, semantic queries with filters, persistent caches, and tunable HNSW search.

What You’ll Build

A typed document shape for every tool, app, or resource you want to search
An indexing routine that stays in sync with your tool registry
Semantic queries with metadata filters, score thresholds, and pagination controls
Persistent caches (file or Redis) so restarts do not require re-embedding everything
Tunable HNSW search for large inventories

Prerequisites

Node.js 22 or later
Ability to install npm packages
Optional: writable disk or Redis for persistence

Install & Initialize VectoriaDB

npm install vectoriadb

Initialize a singleton database during server startup:

src/tool-index.ts

import { VectoriaDB, DocumentMetadata } from 'vectoriadb';

interface ToolDocument extends DocumentMetadata {
  toolName: string;
  owner: string;
  tags: string[];
  risk: 'safe' | 'destructive';
}

export const toolIndex = new VectoriaDB<ToolDocument>({
  cacheDir: './.cache/transformers',
  defaultSimilarityThreshold: 0.4,
});

await toolIndex.initialize(); // downloads and warms the embedding model once

initialize() must run before add, search, or update. Calling it twice is safe because VectoriaDB short-circuits if it is already ready.

Index Your Tools

Collect metadata from your tool registry and write it into the database. Each document needs a unique id, the natural-language text you want to vectorize, and metadata that extends DocumentMetadata.

src/collect-tools.ts

interface ToolEntry {
  name: string;
  owner: { id: string };
  metadata: {
    description?: string;
    inputSchema?: Record<string, unknown>;
    tags?: string[];
    annotations?: { destructiveHint?: boolean };
  };
}

function collectToolDocuments(tools: ToolEntry[]): Array<{ id: string; text: string; metadata: ToolDocument }> {
  return tools.map((tool) => {
    const docId = `${tool.owner.id}:${tool.name}`;
    return {
      id: docId,
      text: [
        tool.metadata.description ?? tool.name,
        `Inputs: ${Object.keys(tool.metadata.inputSchema ?? {}).join(', ') || 'none'}`,
        `Tags: ${(tool.metadata.tags ?? []).join(', ') || 'none'}`,
      ].join('\n'),
      metadata: {
        id: docId,
        toolName: tool.name,
        owner: tool.owner.id,
        tags: tool.metadata.tags ?? [],
        risk: tool.metadata.annotations?.destructiveHint ? 'destructive' : 'safe',
      },
    };
  });
}

export async function indexTools(tools: ToolEntry[]) {
  await toolIndex.addMany(collectToolDocuments(tools));
}

addMany validates every document, enforces maxBatchSize, and prevents duplicates.

Run Semantic Search

Query the index anywhere you can run async code:

src/search-tools.ts

const matches = await toolIndex.search('reset a billing password', {
  topK: 5,
  threshold: 0.45,
  filter: (metadata) => metadata.owner === 'billing' && !metadata.tags.includes('deprecated'),
});

for (const match of matches) {
  console.log(`${match.metadata.toolName} (${match.score.toFixed(2)})`);
}

search returns the best matches sorted by cosine similarity. Use filter to enforce authorization, includeVector to inspect raw vectors, and threshold to drop low-confidence hits.

Keep the index current with updateMetadata, update, or updateMany. Metadata-only updates never trigger re-embedding, while text changes re-embed only the affected documents.

Persist Embeddings

Avoid re-indexing on every boot by using storage adapters with a deterministic tools hash:

src/warmup.ts

import { VectoriaDB, FileStorageAdapter, SerializationUtils } from 'vectoriadb';

export async function warmToolIndex(tools: ToolEntry[]) {
  const documents = collectToolDocuments(tools);

  const toolIndex = new VectoriaDB<ToolDocument>({
    storageAdapter: new FileStorageAdapter({
      cacheDir: './.cache/vectoriadb',
      namespace: 'tool-index',
    }),
    toolsHash: SerializationUtils.createToolsHash(documents),
    version: process.env.npm_package_version,
  });

  await toolIndex.initialize();

  if (toolIndex.size() === 0) {
    await toolIndex.addMany(documents);
    await toolIndex.saveToStorage(); // persist embeddings to disk
  }

  return toolIndex;
}

toolsHash automatically invalidates the cache when your tool list or descriptions change. Call saveToStorage() after indexing; initialize() transparently loads the cache on the next boot.

Need a shared cache across pods? Swap in RedisStorageAdapter with your preferred Redis client and namespace.

Scale & Tune

Enable useHNSW for datasets above roughly ten thousand documents. HNSW provides sub-millisecond queries with more than 95% recall.
Adjust threshold and topK per query to trade recall for precision.
Guard resource usage with maxDocuments, maxDocumentSize, and maxBatchSize.
Set a custom cacheDir if your runtime has strict filesystem policies.

src/scaled-config.ts

const toolIndex = new VectoriaDB<ToolDocument>({
  useHNSW: true,
  hnsw: { M: 16, efConstruction: 200, efSearch: 64 },
  maxDocuments: 150_000,
  maxBatchSize: 2_000,
});

Complete Example

src/complete-example.ts

import { VectoriaDB, FileStorageAdapter, SerializationUtils, DocumentMetadata } from 'vectoriadb';

interface ToolDocument extends DocumentMetadata {
  toolName: string;
  owner: string;
  tags: string[];
  risk: 'safe' | 'destructive';
}

// Initialize with persistence
const documents = [
  {
    id: 'users:list',
    text: 'List all users with pagination and filtering',
    metadata: { id: 'users:list', toolName: 'list', owner: 'users', tags: ['read'], risk: 'safe' as const },
  },
  {
    id: 'users:create',
    text: 'Create a new user account with email and password',
    metadata: { id: 'users:create', toolName: 'create', owner: 'users', tags: ['write'], risk: 'safe' as const },
  },
  {
    id: 'billing:charge',
    text: 'Charge a customer payment method',
    metadata: { id: 'billing:charge', toolName: 'charge', owner: 'billing', tags: ['write'], risk: 'destructive' as const },
  },
];

const db = new VectoriaDB<ToolDocument>({
  storageAdapter: new FileStorageAdapter({
    cacheDir: './.cache/vectoriadb',
    namespace: 'tools',
  }),
  toolsHash: SerializationUtils.createToolsHash(documents),
});

await db.initialize();

if (db.size() === 0) {
  await db.addMany(documents);
  await db.saveToStorage();
}

// Search for tools
const results = await db.search('create new account', {
  topK: 3,
  threshold: 0.4,
  filter: (m) => m.risk === 'safe',
});

console.log('Matching tools:');
for (const result of results) {
  console.log(`  ${result.metadata.toolName} (${result.score.toFixed(2)})`);
}

Welcome

Getting started

Indexing

Adding documents

Search

Query options

Storage

Storage adapters

HNSW

Scaling to large datasets

Get Started

Core Guides

Alternatives

Use Cases

Deployment

Integrations

Troubleshooting

What You’ll Build

Prerequisites

Complete Example

Welcome

Indexing

Search

Storage

HNSW

Get Started

Core Guides

Alternatives

Use Cases

Deployment

Integrations

Troubleshooting

​What You’ll Build

​Prerequisites

​Complete Example

​Related

Welcome

Indexing

Search

Storage

HNSW

What You’ll Build

Prerequisites

Complete Example

Related