Overview

Semantic Router's intelligent caching system understands the semantic meaning of queries, enabling cache hits for semantically similar requests and reducing LLM inference costs.

Core Concepts

Semantic Similarity

Uses embeddings and cosine similarity to match queries by meaning rather than exact text.

Configurable Thresholds

Adjustable similarity thresholds balance cache hit rates with response quality.

Multiple Backends

Support for in-memory, Redis, and Milvus backends for different scale requirements.

How It Works

Backend Options

In-Memory Cache

Fast, local caching for development and single-instance deployments.

Milvus Cache

Persistent, distributed caching using vector database for production environments.

Key Benefits

Cost Reduction: Avoid redundant LLM API calls for similar queries
Improved Latency: Cache hits return responses in milliseconds
Better Throughput: Handle more concurrent requests efficiently
Semantic Understanding: Match queries by meaning, not just text

Core Concepts​

Semantic Similarity​

Configurable Thresholds​

Multiple Backends​

How It Works​

Backend Options​

In-Memory Cache​

Milvus Cache​

Key Benefits​