Learn how to configure similarity thresholds for optimal search results.
Understanding Thresholds
The threshold determines the minimum similarity score for results:
// Strict matching - only highly relevant results
const strict = await toolIndex . search ( query , { threshold : 0.7 });
// Moderate matching - good balance
const moderate = await toolIndex . search ( query , { threshold : 0.5 });
// Loose matching - include tangentially related
const loose = await toolIndex . search ( query , { threshold : 0.3 });
Threshold Guidelines
Threshold Use Case Results 0.7+ High precision, exact matches Few, highly relevant 0.5-0.7 Balanced precision/recall Moderate, mostly relevant 0.3-0.5 High recall, broader matches Many, some tangential < 0.3 Exploratory, catch-all Most documents
Default Threshold
Set a default threshold in configuration:
const db = new VectoriaDB ({
defaultSimilarityThreshold : 0.4 ,
});
// Uses default threshold (0.4)
const results = await db . search ( query );
// Override for specific query
const strict = await db . search ( query , { threshold : 0.7 });
Choosing the Right Threshold
src/tool-discovery-threshold.ts
// Start conservative - only show high-confidence matches
const results = await db . search ( userQuery , {
threshold : 0.5 ,
topK : 5 ,
});
if ( results . length === 0 ) {
// Fallback: broaden search
const broader = await db . search ( userQuery , {
threshold : 0.3 ,
topK : 10 ,
});
}
For Document Search
src/document-search-threshold.ts
// Lower threshold for document search - users expect more results
const results = await db . search ( searchQuery , {
threshold : 0.3 ,
topK : 20 ,
});
For Recommendations
src/recommendations-threshold.ts
// Get similar documents
const similar = await db . search ( currentDocument . text , {
threshold : 0.6 , // Higher - we want truly similar items
topK : 5 ,
filter : ( m ) => m . id !== currentDocument . id , // Exclude current
});
Adaptive Thresholds
Adjust thresholds based on query characteristics:
src/adaptive-threshold.ts
function getThreshold ( query : string ): number {
// Short queries often need lower thresholds
if ( query . split ( ' ' ). length <= 2 ) {
return 0.3 ;
}
// Longer, more specific queries can use higher thresholds
if ( query . split ( ' ' ). length >= 5 ) {
return 0.5 ;
}
return 0.4 ; // Default
}
const results = await db . search ( query , {
threshold : getThreshold ( query ),
});
Debugging Threshold Issues
Too Few Results
// Check what scores documents actually have
const debug = await db . search ( query , {
threshold : 0.0 , // Show all
topK : 20 ,
});
console . log ( ' Score distribution: ' );
for ( const r of debug ) {
console . log ( ` ${ r . id } : ${ r . score . toFixed ( 3 ) } ` );
}
// Adjust threshold based on observed scores
Too Many Irrelevant Results
src/debug-many-results.ts
// Gradually increase threshold
for ( const threshold of [ 0.3 , 0.4 , 0.5 , 0.6 , 0.7 ]) {
const results = await db . search ( query , { threshold , topK : 10 });
console . log ( ` Threshold ${ threshold } : ${ results . length } results ` );
if ( results . length > 0 ) {
console . log ( ` Top score: ${ results [ 0 ]. score . toFixed ( 3 ) } ` );
console . log ( ` Low score: ${ results [ results . length - 1 ]. score . toFixed ( 3 ) } ` );
}
}
Start with a lower threshold (0.3-0.4) and increase it if you’re getting too many irrelevant results. It’s easier to filter down than to miss relevant documents.
Basic Search Search fundamentals
Filtering Filter by metadata
Performance Optimize search