Skip to main content

Overview

Semantic Router's intelligent caching system understands the semantic meaning of queries, enabling cache hits for semantically similar requests and reducing LLM inference costs.

Core Concepts​

Semantic Similarity​

Uses embeddings and cosine similarity to match queries by meaning rather than exact text.

Configurable Thresholds​

Adjustable similarity thresholds balance cache hit rates with response quality.

Multiple Backends​

Support for in-memory, Redis, and Milvus backends for different scale requirements.

How It Works​

Backend Options​

In-Memory Cache​

Fast, local caching for development and single-instance deployments.

Milvus Cache​

Persistent, distributed caching using vector database for production environments.

Key Benefits​

  • Cost Reduction: Avoid redundant LLM API calls for similar queries
  • Improved Latency: Cache hits return responses in milliseconds
  • Better Throughput: Handle more concurrent requests efficiently
  • Semantic Understanding: Match queries by meaning, not just text