Index your tools with VectoriaDB for semantic search and intelligent tool recommendations
Learn how to build an always-on semantic tool discovery system using VectoriaDB.
In this guide you’ll build a typed document shape for tools, an indexing routine that stays in sync, semantic queries with filters, persistent caches, and tunable HNSW search.
Initialize a singleton database during server startup:
src/tool-index.ts
import { VectoriaDB, DocumentMetadata } from 'vectoriadb';interface ToolDocument extends DocumentMetadata { toolName: string; owner: string; tags: string[]; risk: 'safe' | 'destructive';}export const toolIndex = new VectoriaDB<ToolDocument>({ cacheDir: './.cache/transformers', defaultSimilarityThreshold: 0.4,});await toolIndex.initialize(); // downloads and warms the embedding model once
initialize() must run before add, search, or update. Calling it twice is safe because VectoriaDB short-circuits if it is already ready.
2
Index Your Tools
Collect metadata from your tool registry and write it into the database. Each document needs a unique id, the natural-language text you want to vectorize, and metadata that extends DocumentMetadata.
addMany validates every document, enforces maxBatchSize, and prevents duplicates.
3
Run Semantic Search
Query the index anywhere you can run async code:
src/search-tools.ts
const matches = await toolIndex.search('reset a billing password', { topK: 5, threshold: 0.45, filter: (metadata) => metadata.owner === 'billing' && !metadata.tags.includes('deprecated'),});for (const match of matches) { console.log(`${match.metadata.toolName} (${match.score.toFixed(2)})`);}
search returns the best matches sorted by cosine similarity. Use filter to enforce authorization, includeVector to inspect raw vectors, and threshold to drop low-confidence hits.
Keep the index current with updateMetadata, update, or updateMany. Metadata-only updates never trigger re-embedding, while text changes re-embed only the affected documents.
4
Persist Embeddings
Avoid re-indexing on every boot by using storage adapters with a deterministic tools hash:
src/warmup.ts
import { VectoriaDB, FileStorageAdapter, SerializationUtils } from 'vectoriadb';export async function warmToolIndex(tools: ToolEntry[]) { const documents = collectToolDocuments(tools); const toolIndex = new VectoriaDB<ToolDocument>({ storageAdapter: new FileStorageAdapter({ cacheDir: './.cache/vectoriadb', namespace: 'tool-index', }), toolsHash: SerializationUtils.createToolsHash(documents), version: process.env.npm_package_version, }); await toolIndex.initialize(); if (toolIndex.size() === 0) { await toolIndex.addMany(documents); await toolIndex.saveToStorage(); // persist embeddings to disk } return toolIndex;}
toolsHash automatically invalidates the cache when your tool list or descriptions change. Call saveToStorage() after indexing; initialize() transparently loads the cache on the next boot.
Need a shared cache across pods? Swap in RedisStorageAdapter with your preferred Redis client and namespace.
5
Scale & Tune
Enable useHNSW for datasets above roughly ten thousand documents. HNSW provides sub-millisecond queries with more than 95% recall.
Adjust threshold and topK per query to trade recall for precision.
Guard resource usage with maxDocuments, maxDocumentSize, and maxBatchSize.
Set a custom cacheDir if your runtime has strict filesystem policies.