Skip to content

Guardrails

Guardrails are pluggable validation hooks that run before and after agent execution, giving you control over what enters and exits each task. They let you enforce content policies, safety constraints, length limits, or any custom validation rule without modifying agent prompts or task logic.


// Block tasks whose description contains certain keywords
InputGuardrail noSensitiveDataGuardrail = input -> {
if (input.taskDescription().contains("SSN") || input.taskDescription().contains("password")) {
return GuardrailResult.failure("Task description contains sensitive data");
}
return GuardrailResult.success();
};
// Enforce a maximum response length
OutputGuardrail lengthGuardrail = output -> {
if (output.rawResponse().length() > 5000) {
return GuardrailResult.failure("Response exceeds maximum length of 5000 characters");
}
return GuardrailResult.success();
};
var task = Task.builder()
.description("Write an executive summary")
.expectedOutput("A concise summary")
.agent(writer)
.inputGuardrails(List.of(noSensitiveDataGuardrail))
.outputGuardrails(List.of(lengthGuardrail))
.build();

Input guardrails run before the LLM call is made. If any guardrail returns a failure, execution stops immediately and GuardrailViolationException is thrown — the agent’s LLM is never contacted.

Implement InputGuardrail as a functional interface:

@FunctionalInterface
public interface InputGuardrail {
GuardrailResult validate(GuardrailInput input);
}

The GuardrailInput record carries everything needed to make a decision:

FieldTypeDescription
taskDescription()StringThe task description
expectedOutput()StringThe expected output specification
contextOutputs()List<TaskOutput>Outputs from prior context tasks (immutable)
agentRole()StringThe role of the agent about to execute
InputGuardrail piiGuardrail = input -> {
String desc = input.taskDescription().toLowerCase();
if (desc.contains("ssn") || desc.contains("credit card") || desc.contains("passport")) {
return GuardrailResult.failure(
"Task description may contain personally identifiable information");
}
return GuardrailResult.success();
};
InputGuardrail roleGuardrail = input -> {
if ("Untrusted Agent".equals(input.agentRole())) {
return GuardrailResult.failure("Untrusted agents are not permitted on this task");
}
return GuardrailResult.success();
};

Output guardrails run after the agent produces a response. When task.outputType is set, output guardrails run after structured output parsing completes — the parsed Java object is available via parsedOutput().

Implement OutputGuardrail as a functional interface:

@FunctionalInterface
public interface OutputGuardrail {
GuardrailResult validate(GuardrailOutput output);
}

The GuardrailOutput record carries the response for inspection:

FieldTypeDescription
rawResponse()StringThe full text produced by the agent
parsedOutput()ObjectThe parsed Java object (null if no outputType set)
taskDescription()StringThe task description
agentRole()StringThe role of the agent that produced the output
OutputGuardrail lengthGuardrail = output -> {
int maxChars = 3000;
if (output.rawResponse().length() > maxChars) {
return GuardrailResult.failure(
"Response is " + output.rawResponse().length() +
" chars, exceeds limit of " + maxChars);
}
return GuardrailResult.success();
};
OutputGuardrail conclusionGuardrail = output -> {
if (!output.rawResponse().toLowerCase().contains("conclusion")) {
return GuardrailResult.failure(
"Response must include a conclusion section");
}
return GuardrailResult.success();
};
record ResearchReport(String title, List<String> findings, String conclusion) {}
OutputGuardrail findingsGuardrail = output -> {
if (output.parsedOutput() instanceof ResearchReport report) {
if (report.findings() == null || report.findings().isEmpty()) {
return GuardrailResult.failure("Report must include at least one finding");
}
}
return GuardrailResult.success();
};

Guardrails communicate pass/fail via GuardrailResult:

// Pass
return GuardrailResult.success();
// Fail with a descriptive reason
return GuardrailResult.failure("Reason: response contains prohibited content");

The failure reason is included verbatim in the GuardrailViolationException message.


You can configure multiple guardrails per task. They are evaluated in order — the first failure stops evaluation and throws immediately. Subsequent guardrails in the list are not called.

var task = Task.builder()
.description("Write an article")
.expectedOutput("An article")
.agent(writer)
.inputGuardrails(List.of(piiGuardrail, roleGuardrail, domainGuardrail))
.outputGuardrails(List.of(lengthGuardrail, conclusionGuardrail, toxicityGuardrail))
.build();

To collect all failures rather than stop at the first, compose them into a single guardrail that aggregates results:

InputGuardrail compositeGuardrail = input -> {
List<String> failures = new ArrayList<>();
for (InputGuardrail g : List.of(piiGuardrail, roleGuardrail)) {
GuardrailResult r = g.validate(input);
if (!r.isSuccess()) {
failures.add(r.getMessage());
}
}
return failures.isEmpty()
? GuardrailResult.success()
: GuardrailResult.failure(String.join("; ", failures));
};

When a guardrail fails, GuardrailViolationException is thrown. It propagates through the workflow executor and is wrapped in TaskExecutionException (the same pattern as other task failures).

try {
ensemble.run();
} catch (TaskExecutionException ex) {
if (ex.getCause() instanceof GuardrailViolationException gve) {
System.out.println("Guardrail type: " + gve.getGuardrailType()); // INPUT or OUTPUT
System.out.println("Violation: " + gve.getViolationMessage());
System.out.println("Task: " + gve.getTaskDescription());
System.out.println("Agent: " + gve.getAgentRole());
}
}

GuardrailViolationException fields:

FieldTypeDescription
getGuardrailType()GuardrailTypeINPUT or OUTPUT
getViolationMessage()StringThe failure reason from GuardrailResult.failure(reason)
getTaskDescription()StringThe task that was blocked
getAgentRole()StringThe agent assigned to the task

When a guardrail blocks a task, the TaskFailedEvent callback fires before the exception propagates. The cause field of TaskFailedEvent will be the GuardrailViolationException.

Ensemble.builder()
.agent(writer)
.task(guardedTask)
.onTaskFailed(event -> {
if (event.cause() instanceof GuardrailViolationException gve) {
metrics.incrementCounter("guardrail.violation." + gve.getGuardrailType());
}
})
.build()
.run();

When a task uses outputType, the execution order is:

  1. Input guardrails run (before LLM)
  2. LLM executes and produces raw text
  3. Structured output parsing (JSON extraction + deserialization)
  4. Output guardrails run (with both rawResponse() and parsedOutput() available)

This means output guardrails can inspect the typed object directly:

OutputGuardrail typedGuardrail = output -> {
if (output.parsedOutput() instanceof Report r && r.title() == null) {
return GuardrailResult.failure("Report title must not be null");
}
return GuardrailResult.success();
};

InputGuardrail and OutputGuardrail are functional interfaces — their implementations must be thread-safe when used with Workflow.PARALLEL, as multiple tasks may run concurrently and invoke guardrails on separate threads. Stateless guardrails (lambdas with no shared mutable state) are inherently thread-safe.