RAG Pipeline

The RAG (Retrieval-Augmented Generation) service provides end-to-end document ingestion, chunking, embedding, and semantic retrieval. It surfaces in the console as Data → Knowledge Engine.

Knowledge Engine

A RAG module is the configuration unit that ties a chunking strategy, an embedding model, and a vector index together. The Knowledge Engine screen lists every module configured for the active project, with counters for modules, documents indexed, and chunk count.

Knowledge Engine list

When the project is empty the screen shows the onboarding CTA — clicking Create module opens a multi-step form where you choose the source datasource, pick chunking parameters (size, overlap, splitter), select an embedding model from Model Hub, point the output at a vector index (see Vector Stores), and — optionally — attach a Reranker to re-order results before they reach the LLM.

When a reranker is attached, retrieval becomes a two-stage pipeline: the vector store returns top-K candidates and the reranker scores them down to the final top-N. This is the recommended way to plug Cohere Rerank, Jina, Voyage, or any cross-encoder onto an existing RAG module without changing the embedding model or vector store.

Architecture

Document → Chunk → Embed → Vector Store
                              ↓
Query   → Embed → Vector Search → (Reranker?) → Return Matches

Concepts

Concept	Description
RAG Module	Configuration container linking chunking strategy, embedding model, and vector index
Document	A text or file ingested into a module
Chunk	A segment of a document after splitting
Query Log	Audit record of retrieval queries

Service Functions

Function	Description
`createRagModule()`	Create module with chunk/embed config
`updateRagModule()`	Update module settings
`deleteRagModule()`	Delete module and associated data
`getRagModule()`	Get by key
`listRagModules()`	List modules for tenant
`ingestDocument()`	Text ingestion: chunk → embed → store
`ingestFile()`	File ingestion: convert → chunk → embed → store
`queryRag()`	Semantic retrieval
`deleteRagDocument()`	Delete document and its vectors
`reingestDocument()`	Re-chunk and re-embed existing document
`listRagDocuments()`	List documents in a module
`listRagQueryLogs()`	Query audit log

Module Configuration

json

{
  "name": "Support Knowledge Base",
  "key": "support-kb",
  "embeddingModelKey": "text-embedding-ada-002",
  "vectorProviderKey": "vectors-prod",
  "vectorIndexKey": "support-index",
  "rerankerKey": "support-rerank",
  "chunkConfig": {
    "strategy": "recursive",
    "chunkSize": 1000,
    "chunkOverlap": 200
  }
}

rerankerKey is optional. When present, the matching reranker (see Reranker) runs after vector search and before the response is returned.

Text Ingestion

POST /api/client/v1/rag/modules/:key/ingest
Authorization: Bearer <token>

json

{
  "fileName": "product-faq.txt",
  "content": "Long document text here...",
  "contentType": "text/plain",
  "metadata": { "source": "docs", "version": "2.0" }
}

Pipeline:

Split text into chunks using configured strategy
Generate embeddings for each chunk
Upsert vectors to the vector index
Store document and chunk metadata

File Ingestion

Submit a file for automatic processing:

File is converted to Markdown (using @cognipeer/to-markdown)
Markdown is chunked according to module config
Chunks are embedded and stored

Querying

POST /api/client/v1/rag/modules/:key/query
Authorization: Bearer <token>

json

{
  "query": "How do I reset my password?",
  "topK": 5,
  "filter": { "source": "docs" }
}

Pipeline:

Embed the query text
Perform vector similarity search
Return matching chunks with scores and metadata

Response:

json

{
  "matches": [
    {
      "content": "To reset your password, navigate to...",
      "score": 0.92,
      "metadata": { "source": "docs", "documentTitle": "Product FAQ" }
    }
  ]
}

Re-ingestion

When a document is updated or chunking config changes, use reingestDocument() to:

Remove existing vectors for the document
Re-chunk with current settings
Re-embed and store new vectors

Dependencies

The RAG pipeline integrates several gateway services:

Inference Service — For generating embeddings
Vector Service — For storing and querying vectors
File Service — For file conversion (optional)

RAG Pipeline ​

Knowledge Engine ​

Architecture ​

Concepts ​

Service Functions ​

Module Configuration ​

Text Ingestion ​

File Ingestion ​

Querying ​

Re-ingestion ​

Dependencies ​

RAG Pipeline

Knowledge Engine

Architecture

Concepts

Service Functions

Module Configuration

Text Ingestion

File Ingestion

Querying

Re-ingestion

Dependencies