Moderations API
OpenAI-compatible moderation endpoint for classifying text against a console guardrail.
The model field selects which console guardrail to evaluate against (any enabled guardrail key). When omitted, the tenant's first enabled preset guardrail with an active moderation policy is used, so an OpenAI client pointed at the console works without code changes once such a guardrail exists.
Endpoint
POST /api/client/v1/moderationsRequest
json
{
"input": "Text to classify",
"model": "default-moderation"
}Parameters
| Field | Type | Required | Description |
|---|---|---|---|
input | string | string[] | Yes | Text to classify. May be a single string, an array of strings, or an array of content-part objects with a text field. Image inputs are not supported. |
model | string | No | Guardrail key to evaluate against. When omitted, falls back to the first enabled preset guardrail with the moderation policy active. |
Response
Each input produces one entry in results, indexed by input position. model echoes the resolved guardrail key.
json
{
"id": "modr_2f1c8e7a-9b3d-4a21-8c0e-7d5f6a1b2c3d",
"model": "default-moderation",
"results": [
{
"flagged": true,
"categories": {
"harassment": true,
"hate": false,
"violence": false
},
"category_scores": {
"harassment": 0.9,
"hate": 0,
"violence": 0
},
"findings": [
{
"type": "moderation",
"category": "harassment",
"severity": "high",
"message": "Detected harassing content",
"action": "block",
"block": true
}
]
}
]
}Result fields
| Field | Type | Description |
|---|---|---|
flagged | boolean | true when any finding was produced (including PII or prompt-shield findings when those policies are enabled). |
categories | object | Map of every moderation category to a boolean indicating whether it was triggered. |
category_scores | object | Map of every moderation category to a score derived from finding severity: low → 0.3, medium → 0.6, high → 0.9. Untriggered categories are 0. |
findings | array | Console extension: the raw guardrail findings, including PII and prompt-shield findings when those policies are enabled (these are not folded into the fixed category map). |
Each findings entry has the shape:
| Field | Type | Description |
|---|---|---|
type | string | One of pii, moderation, prompt_shield, custom. |
category | string | Category identifier within the finding type. |
severity | string | low, medium, or high. |
message | string | Human-readable description of the finding. |
action | string | Guardrail action applied. |
block | boolean | Whether the finding blocks the content. |
value | string | Optional matched value (e.g. the detected PII substring). |
Moderation categories
The fixed categories / category_scores maps always include every moderation category below:
| Category | Label |
|---|---|
harassment | Harassment |
harassment/threatening | Harassment (Threatening) |
hate | Hate speech |
hate/threatening | Hate (Threatening) |
illicit | Illicit Activity |
illicit/violent | Illicit (Violent) |
self-harm | Self Harm |
self-harm/intent | Self Harm (Intent) |
self-harm/instructions | Self Harm (Instructions) |
sexual | Sexual Content |
sexual/minors | Sexual Content (Minors) |
violence | Violence |
violence/graphic | Graphic Violence |
terrorism | Terrorism & Extremism |
weapons | Weapons & Weapon Crafting |
fraud | Fraud & Scams |
drugs | Illegal Drugs |
cybercrime | Cybercrime & Malware |
child_safety | Child Safety & Grooming |
misinformation | Medical/Health Misinformation |
privacy_violation | Privacy Violations & Doxxing |
impersonation | Identity Impersonation |
manipulation | Psychological Manipulation |
radicalization | Radicalization Content |
financial_advice | Unauthorized Financial Advice |
animal_cruelty | Animal Abuse & Cruelty |
Errors
| Status | Description |
|---|---|
| 400 | Missing input; model is not a string; input is empty or an unsupported type (e.g. image inputs); or the requested guardrail key was not found / no moderation guardrail is configured |
| 401 | Invalid API token |
| 500 | Internal moderation error |
Example
bash
curl -X POST https://gateway.example.com/api/client/v1/moderations \
-H "Authorization: Bearer cpeer_your_token" \
-H "Content-Type: application/json" \
-d '{
"input": "Text to classify",
"model": "default-moderation"
}'