Skip to content

Moderations API

OpenAI-compatible moderation endpoint for classifying text against a console guardrail.

The model field selects which console guardrail to evaluate against (any enabled guardrail key). When omitted, the tenant's first enabled preset guardrail with an active moderation policy is used, so an OpenAI client pointed at the console works without code changes once such a guardrail exists.

Endpoint

POST /api/client/v1/moderations

Request

json
{
  "input": "Text to classify",
  "model": "default-moderation"
}

Parameters

FieldTypeRequiredDescription
inputstring | string[]YesText to classify. May be a single string, an array of strings, or an array of content-part objects with a text field. Image inputs are not supported.
modelstringNoGuardrail key to evaluate against. When omitted, falls back to the first enabled preset guardrail with the moderation policy active.

Response

Each input produces one entry in results, indexed by input position. model echoes the resolved guardrail key.

json
{
  "id": "modr_2f1c8e7a-9b3d-4a21-8c0e-7d5f6a1b2c3d",
  "model": "default-moderation",
  "results": [
    {
      "flagged": true,
      "categories": {
        "harassment": true,
        "hate": false,
        "violence": false
      },
      "category_scores": {
        "harassment": 0.9,
        "hate": 0,
        "violence": 0
      },
      "findings": [
        {
          "type": "moderation",
          "category": "harassment",
          "severity": "high",
          "message": "Detected harassing content",
          "action": "block",
          "block": true
        }
      ]
    }
  ]
}

Result fields

FieldTypeDescription
flaggedbooleantrue when any finding was produced (including PII or prompt-shield findings when those policies are enabled).
categoriesobjectMap of every moderation category to a boolean indicating whether it was triggered.
category_scoresobjectMap of every moderation category to a score derived from finding severity: low0.3, medium0.6, high0.9. Untriggered categories are 0.
findingsarrayConsole extension: the raw guardrail findings, including PII and prompt-shield findings when those policies are enabled (these are not folded into the fixed category map).

Each findings entry has the shape:

FieldTypeDescription
typestringOne of pii, moderation, prompt_shield, custom.
categorystringCategory identifier within the finding type.
severitystringlow, medium, or high.
messagestringHuman-readable description of the finding.
actionstringGuardrail action applied.
blockbooleanWhether the finding blocks the content.
valuestringOptional matched value (e.g. the detected PII substring).

Moderation categories

The fixed categories / category_scores maps always include every moderation category below:

CategoryLabel
harassmentHarassment
harassment/threateningHarassment (Threatening)
hateHate speech
hate/threateningHate (Threatening)
illicitIllicit Activity
illicit/violentIllicit (Violent)
self-harmSelf Harm
self-harm/intentSelf Harm (Intent)
self-harm/instructionsSelf Harm (Instructions)
sexualSexual Content
sexual/minorsSexual Content (Minors)
violenceViolence
violence/graphicGraphic Violence
terrorismTerrorism & Extremism
weaponsWeapons & Weapon Crafting
fraudFraud & Scams
drugsIllegal Drugs
cybercrimeCybercrime & Malware
child_safetyChild Safety & Grooming
misinformationMedical/Health Misinformation
privacy_violationPrivacy Violations & Doxxing
impersonationIdentity Impersonation
manipulationPsychological Manipulation
radicalizationRadicalization Content
financial_adviceUnauthorized Financial Advice
animal_crueltyAnimal Abuse & Cruelty

Errors

StatusDescription
400Missing input; model is not a string; input is empty or an unsupported type (e.g. image inputs); or the requested guardrail key was not found / no moderation guardrail is configured
401Invalid API token
500Internal moderation error

Example

bash
curl -X POST https://gateway.example.com/api/client/v1/moderations \
  -H "Authorization: Bearer cpeer_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Text to classify",
    "model": "default-moderation"
  }'

Community edition is AGPL-3.0. Commercial licensing and support are available separately.