Skip to content

Reranker

Rerankers re-order a candidate set of documents by relevance to a query. They sit between an initial retriever (vector search, BM25, hybrid) and the final consumer (LLM, RAG pipeline) and dramatically improve precision at low recall depths.

Operators manage rerankers under Data → Reranker. Each reranker is project-scoped, has a unique key, and persists usage stats (totalRuns, avgLatencyMs, lastUsedAt) for monitoring.

Reranker list

The landing page tracks total / active reranker counts and total runs, with a row per reranker showing its strategy, backing model, run count, and average latency. Create reranker opens the form where you pick a strategy (below) and bind the backing model or judge.

Strategies

The reranker abstraction supports five strategies. Pick one based on latency budget, quality target, and whether you already have a dedicated rerank model.

StrategyHow it worksWhen to use
dedicated-modelCalls a hosted cross-encoder (Cohere Rerank, Jina, Voyage, etc.)Best quality; needs a provider integration
llm-judgeLLM scores each (query, document) pair one-by-oneHigh quality without a dedicated model, costs scale with N
llm-listwiseLLM ranks the whole list in a single callCheaper than llm-judge for medium N; lower latency
heuristicScore-aware rules (BM25 blending, length penalty, freshness boost, etc.)Zero model dependency; tune for known corpora
fusionReciprocal Rank Fusion across multiple input score listsCombine vector + lexical scores into one ranking

The strategy plus config payload is stored on the reranker — see src/lib/services/reranker/strategies/ for the per-strategy config shape.

Creating a reranker

bash
curl -X POST https://console.example.com/api/reranker \
  -H "Content-Type: application/json" \
  -d '{
    "name": "support-knowledge-rerank",
    "strategy": "dedicated-model",
    "status": "active",
    "config": {
      "providerKey": "cohere",
      "model": "rerank-multilingual-v3.0",
      "topN": 10
    }
  }'

The response contains the generated key you use for subsequent calls.

Running a reranker

POST /api/reranker/:key/run accepts a query and a list of documents (strings or { id, content, score?, metadata? }) and returns them re-ordered by relevance:

bash
curl -X POST https://console.example.com/api/reranker/support-knowledge-rerank/run \
  -d '{
    "query": "How do I reset my password?",
    "documents": [
      "Forgot password? Click here to reset.",
      "Our cookie policy explains tracking choices.",
      "Two-factor authentication setup guide."
    ],
    "topN": 3
  }'

Each run is persisted to a run-log table; GET /api/reranker/:key/runs?from=&to=&limit= returns the recent history for debugging and quality tracking.

RAG integration

The RAG module accepts an optional rerankerKey per module. When set, the retrieval pipeline becomes:

Query → Vector store top-K → Reranker (top-N) → LLM context

Set the reranker on a RAG module to bolt better ranking onto an existing pipeline without changing the embedding model or vector store.

Dashboard

The reranker page lists every reranker with strategy, status, run counts, and average latency. Each detail page exposes a playground that proxies to POST /api/reranker/:key/run so you can validate the behavior with real documents before wiring it into production retrieval.

See the Reranker API reference for the full request/response schema.

Community edition is AGPL-3.0. Commercial licensing and support are available separately.