Skip to main content
Learn the fundamentals of document indexing in VectoriaDB.
VectoriaDB uses embedding vectors to enable semantic search. Each document’s text is converted to a vector representation that captures its meaning.

How Indexing Works

  1. Text Input: You provide a document with text and metadata
  2. Embedding Generation: VectoriaDB generates a vector embedding from the text
  3. Storage: The embedding is stored in memory (and optionally persisted)
  4. Searchable: The document becomes searchable via semantic queries

Document Structure

Each document requires three pieces:
src/document-structure.ts
await db.add(
  'document-id',          // Unique identifier
  'Document text here',   // Text to embed
  {                       // Type-safe metadata
    id: 'document-id',
    // ... your custom fields
  }
);

ID Requirements

  • Must be unique within the database
  • Used to retrieve, update, or remove documents
  • Should match metadata.id for consistency

Text Guidelines

  • Descriptive, natural language text works best
  • Include relevant keywords and context
  • Maximum size controlled by maxDocumentSize config

Metadata

  • Must extend DocumentMetadata interface
  • id field is required and must match document ID
  • Add any custom fields for filtering and display

Type-Safe Metadata

Define your metadata interface for compile-time safety:
src/types.ts
import { VectoriaDB, DocumentMetadata } from 'vectoriadb';

interface ToolDocument extends DocumentMetadata {
  toolName: string;
  owner: string;
  tags: string[];
  risk: 'safe' | 'destructive';
  deprecated?: boolean;
}

const db = new VectoriaDB<ToolDocument>();

// TypeScript ensures metadata matches interface
await db.add('id', 'text', {
  id: 'id',
  toolName: 'test',
  owner: 'system',
  tags: [],
  risk: 'safe',
  // TypeScript error if you add wrong fields
});

Embedding Generation

Embeddings are generated automatically when you add or update documents. The process:
  1. Text is tokenized using the configured model
  2. Embeddings are generated (~100-200 documents/second)
  3. Embeddings are stored in memory (and optionally persisted)
For large imports, use addMany with appropriate maxBatchSize to avoid memory spikes.

Document Limits

VectoriaDB enforces limits to prevent DoS attacks:
src/config-limits.ts
const db = new VectoriaDB({
  maxDocuments: 100000,    // Maximum documents in index
  maxDocumentSize: 1000000, // Maximum text size in characters
  maxBatchSize: 1000,      // Maximum documents per batch operation
});

Adding Documents

Add single and batch documents

Updating Documents

Update metadata and text

Search

Query the index