Conversation guardrails
Guardrails let you intercept both user inputs and assistant outputs before they reach the model or the caller. Each guardrail bundles one or more checks that return a disposition (allow
, warn
, block
).
import {
createAgent,
createRegexGuardrail,
createCodeGuardrail,
GuardrailPhase,
} from "@cognipeer/agent-sdk";
const guardrails = [
createRegexGuardrail(/password|secret/i, {
guardrailId: "password-filter",
guardrailTitle: "Sensitive secret filter",
phases: [GuardrailPhase.Request],
rule: {
failureMessage: "Outbound request blocked: secret detected.",
},
}),
createCodeGuardrail({
guardrailId: "code-ban",
guardrailTitle: "No raw code",
phases: [GuardrailPhase.Response],
rule: { disposition: "block" },
}),
];
const agent = createAgent({
model,
guardrails,
});
When a guardrail blocks a request, the agent adds an assistant message summarising the violation and stops before sending content to the model. For responses, the offending assistant turn is replaced with a guardrail notice.
Built-in checks
Check | Factory | Description |
---|---|---|
Regex pattern | regexRule |
Blocks (or warns) when a regular expression matches the selected message text. |
JSON schema | jsonSchemaRule |
Parses the message as JSON and validates it with either a Zod schema or JSON Schema. |
Code detection | codePresenceRule |
Flags messages containing fenced code blocks or common programming keywords. |
Agent verdict | agentVerdictRule |
Delegates the decision to another agent that returns structured output. |
Custom callback | customCallbackRule |
Runs arbitrary user logic for full control. |
Each helper returns a GuardrailRule
that you can feed into createGuardrail({ checks: [...] })
. For convenience, preset builders (createRegexGuardrail
, createJsonGuardrail
, createCodeGuardrail
) wrap a single rule into a ready-to-use guardrail.
Programmatic evaluation
Every invocation stores the aggregated outcome in result.state?.guardrailResult
and emits guardrail
events through onEvent
hooks:
const result = await agent.invoke({ messages: [...] }, {
onEvent(event) {
if (event.type === "guardrail") {
console.log(`[${event.phase}] ${event.disposition} -> ${event.reason}`);
}
},
});
console.log(result.state?.guardrailResult);
GuardrailOutcome
lists all incidents (warnings and blocks) so you can audit or surface them in your product.