Skip to content

Guardrails

Guardrails provide real-time content moderation and safety checks for LLM inputs and outputs. They can detect PII, evaluate content against custom policies, and block or flag problematic content. Operators manage them under Operate → Guardrail.

Operator view

Each guardrail is a named policy attached to one or more models. The list view summarises the four states that matter operationally: total policies, how many are currently enabled, how many are disabled, and how many are configured to block requests (versus warn or flag).

Guardrails list

Filters at the top of the table narrow by type (preset / custom), by action (block / warn / flag), and by status. The Create guardrail flow walks you through picking a type, declaring which categories or rules to enforce, and pinning the policy to specific models or to the whole project.

When a model has guardrails attached the runtime applies them at two points: before forwarding the request upstream (input guardrails) and before responding to the client (output guardrails). A blocked request short-circuits with a structured guardrail_violation error that includes which policy fired.

Guardrail Types

The GuardrailType enum has just two values:

TypeDescription
presetA bundled policy that combines one or more detection families — PII, content moderation, prompt-shield, and similar checks.
customAn LLM-based evaluation driven by your own prompt and rules.

Detection families (PII, moderation, prompt-shield, etc.) are not themselves types — they are the checks a preset policy enables. PII detection delegates to the PII service, which stays license-free even though guardrails themselves are gated. Configure the detector once as a PII policy, then reference it from a preset guardrail.

PII Categories

The PII service supports 18 categories (the guardrail UI surfaces a subset). A representative sample:

CategoryExamples
emailuser@example.com
phone+1-234-567-8900
creditCard4111 1111 1111 1111
ibanGB29 NWBK 6016 1331 9268 19
swiftNWBKGB2L
nationalIdSSN, national ID numbers
passportPassport numbers
birthDate1990-01-15
addressStreet addresses
ipAddress192.168.1.1, IPv6
urlhttps://example.com
socialHandle@username
apiKeysk-..., AKIA...
cryptoWalletWallet addresses

Actions

When a guardrail triggers, it can take one of these actions:

ActionBehavior
blockReject the request with an error
warnAllow the request but surface a warning
flagAllow the request but include findings in the response

API

Evaluate Guardrail

POST /api/client/v1/guardrails/evaluate
Authorization: Bearer <token>
json
{
  "guardrail_key": "pii-checker",
  "text": "My email is john@example.com and my phone is 555-0100"
}

Response:

json
{
  "passed": false,
  "action": "flag",
  "findings": [
    { "category": "email", "value": "john@example.com", "position": [12, 28] },
    { "category": "phone", "value": "555-0100", "position": [47, 55] }
  ],
  "guardrail_key": "pii-checker",
  "guardrail_name": "PII Checker",
  "message": "PII detected in input"
}

Service Layer

typescript
import {
  createGuardrail,
  evaluateGuardrail,
  listGuardrails,
} from '@/lib/services/guardrail';

// Create a guardrail
await createGuardrail(tenantDbName, tenantId, projectId, {
  name: 'PII Checker',
  type: 'pii',
  action: 'flag',
  target: 'input',
  policy: buildDefaultPresetPolicy(),
});

// Evaluate content
const result = await evaluateGuardrail(tenantDbName, guardrailKey, {
  content: userMessage,
  target: 'input',
});

Inference Integration

Guardrails can be attached to models for automatic evaluation:

Request → Guardrail Check → Provider Call → Response
                ↓ (if blocked)
          Return 400 with findings

When a guardrail blocks a request during inference, a GuardrailBlockError is thrown with the guardrail key, action, and findings.

Management Endpoints

MethodEndpointDescription
GET/api/guardrailsList guardrails
POST/api/guardrailsCreate guardrail
GET/api/guardrails/:idGet guardrail
PATCH/api/guardrails/:idUpdate guardrail
DELETE/api/guardrails/:idDelete guardrail

Community edition is AGPL-3.0. Commercial licensing and support are available separately.