Skip to content

Guardrails

Guardrails provide real-time content moderation and safety checks for LLM inputs and outputs. They can detect PII, evaluate content against custom policies, and block or flag problematic content.

Guardrail Types

TypeDescription
PII DetectionRegex-based detection of 15 PII categories
Content ModerationEvaluates content against moderation categories
Prompt ShieldDetects prompt injection attacks
Custom PromptLLM-based evaluation with custom rules

PII Categories

The PII detector supports 15 categories:

CategoryExamples
emailuser@example.com
phone+1-234-567-8900
creditCard4111 1111 1111 1111
ibanGB29 NWBK 6016 1331 9268 19
swiftNWBKGB2L
nationalIdSSN, national ID numbers
passportPassport numbers
birthDate1990-01-15
addressStreet addresses
ipAddress192.168.1.1, IPv6
urlhttps://example.com
socialHandle@username
apiKeysk-..., AKIA...
cryptoWalletWallet addresses

Actions

When a guardrail triggers, it can take one of these actions:

ActionBehavior
blockReject the request with an error
flagAllow the request but include findings in the response
redactReplace detected content with placeholder text

API

Evaluate Guardrail

POST /api/client/v1/guardrails/evaluate
Authorization: Bearer <token>
json
{
  "guardrailKey": "pii-checker",
  "content": "My email is john@example.com and my phone is 555-0100",
  "target": "input"
}

Response:

json
{
  "triggered": true,
  "action": "flag",
  "findings": [
    { "category": "email", "value": "john@example.com", "position": [12, 28] },
    { "category": "phone", "value": "555-0100", "position": [47, 55] }
  ]
}

Service Layer

typescript
import {
  createGuardrail,
  evaluateGuardrail,
  listGuardrails,
} from '@/lib/services/guardrail';

// Create a guardrail
await createGuardrail(tenantDbName, tenantId, projectId, {
  name: 'PII Checker',
  type: 'pii',
  action: 'flag',
  target: 'input',
  policy: buildDefaultPresetPolicy(),
});

// Evaluate content
const result = await evaluateGuardrail(tenantDbName, guardrailKey, {
  content: userMessage,
  target: 'input',
});

Inference Integration

Guardrails can be attached to models for automatic evaluation:

Request → Guardrail Check → Provider Call → Response
                ↓ (if blocked)
          Return 422 with findings

When a guardrail blocks a request during inference, a GuardrailBlockError is thrown with the guardrail key, action, and findings.

Management Endpoints

MethodEndpointDescription
GET/api/guardrailsList guardrails
POST/api/guardrailsCreate guardrail
GET/api/guardrails/:idGet guardrail
PATCH/api/guardrails/:idUpdate guardrail
DELETE/api/guardrails/:idDelete guardrail

Community edition is AGPL-3.0. Commercial licensing and support are available separately.