Skip to content

Metrics and Observability

AgentEnsemble provides two layers of observability: execution metrics (token counts, timing, costs) available on every run result, and tool metrics (pluggable per-tool counters and timers via the ToolMetrics interface).


Every EnsembleOutput carries an ExecutionMetrics object and every TaskOutput carries a TaskMetrics object. These are populated automatically — no configuration required.

EnsembleOutput output = ensemble.run();
// Per-run totals
ExecutionMetrics metrics = output.getMetrics();
System.out.println("Total tokens: " + metrics.getTotalTokens());
System.out.println("Input tokens: " + metrics.getTotalInputTokens());
System.out.println("Output tokens: " + metrics.getTotalOutputTokens());
System.out.println("LLM latency: " + metrics.getTotalLlmLatency());
System.out.println("Tool exec time: " + metrics.getTotalToolExecutionTime());
System.out.println("LLM calls: " + metrics.getTotalLlmCallCount());
// Per-task breakdown
for (TaskOutput task : output.getTaskOutputs()) {
TaskMetrics tm = task.getMetrics();
System.out.printf("[%s] tokens=%d (in=%d out=%d) llm=%s tools=%s%n",
task.getAgentRole(),
tm.getTotalTokens(),
tm.getInputTokens(),
tm.getOutputTokens(),
tm.getLlmLatency(),
tm.getToolExecutionTime());
}

Token counts are sourced from ChatResponse.tokenUsage(). When the LLM provider does not return usage metadata, token fields are -1 (unknown) rather than 0. A value of 0 means zero tokens were used, not that the count is unavailable.

long inputTokens = task.getMetrics().getInputTokens();
if (inputTokens < 0) {
System.out.println("Token usage not available for this provider");
} else {
System.out.println("Input tokens: " + inputTokens);
}

When any task in the run has unknown token counts, the aggregate ExecutionMetrics.getTotalTokens() is also -1.

TaskMetrics tracks four distinct timings:

FieldDescription
llmLatencyCumulative time waiting for LLM responses across all ReAct iterations
toolExecutionTimeCumulative time executing tools (excluding wait for LLM)
promptBuildTimeTime building system + user prompts before the first LLM call
memoryRetrievalTimeTime querying long-term and entity memory stores

All durations use java.time.Duration. Use .toMillis(), .toSeconds(), or .toString() to format them.


Provide per-token rates and the framework multiplies them by the actual token counts.

Ensemble ensemble = Ensemble.builder()
.agent(researcher)
.task(researchTask)
.costConfiguration(CostConfiguration.builder()
.inputTokenRate(new BigDecimal("0.0000025")) // $2.50 / 1M input tokens
.outputTokenRate(new BigDecimal("0.0000100")) // $10.00 / 1M output tokens
.currency("USD")
.build())
.build();
EnsembleOutput output = ensemble.run();
// Per-run cost
CostEstimate total = output.getMetrics().getTotalCostEstimate();
if (total != null) {
System.out.printf("Run cost: $%.6f (in=%.6f out=%.6f)%n",
total.getTotalCost(),
total.getInputCost(),
total.getOutputCost());
}
// Per-task cost
for (TaskOutput task : output.getTaskOutputs()) {
CostEstimate cost = task.getMetrics().getCostEstimate();
if (cost != null) {
System.out.printf("[%s] $%.6f%n", task.getAgentRole(), cost.getTotalCost());
}
}

Cost estimation requires that the LLM provider returns token usage. When token counts are -1, getCostEstimate() returns null rather than an incorrect zero.


Every run produces a complete ExecutionTrace — a hierarchical record of every LLM interaction, every tool call with its input and output, all prompts sent, and delegation chains. This is the primary resource for post-mortem debugging and analysis.

EnsembleOutput output = ensemble.run();
ExecutionTrace trace = output.getTrace();
System.out.println("Run ID: " + trace.getEnsembleId());
System.out.println("Workflow: " + trace.getWorkflow());
System.out.println("Duration: " + trace.getTotalDuration());
// Inspect each task's LLM interactions
for (TaskTrace task : trace.getTaskTraces()) {
System.out.printf("Task [%s]: %d LLM call(s)%n",
task.getAgentRole(), task.getLlmInteractions().size());
for (LlmInteraction interaction : task.getLlmInteractions()) {
System.out.printf(" Iteration %d: %s, %dms, %d tool call(s)%n",
interaction.getIterationIndex(),
interaction.getResponseType(),
interaction.getLatency().toMillis(),
interaction.getToolCalls().size());
}
}

The trace serializes to pretty-printed JSON with a single method call. All Instant fields are ISO-8601 strings and Duration fields are ISO-8601 duration strings (PT12.345S).

// Get as JSON string
String json = output.getTrace().toJson();
// Write to a file
output.getTrace().toJson(Path.of("run-trace.json"));

Register a traceExporter on the ensemble to automatically export after every run:

Ensemble ensemble = Ensemble.builder()
.agent(researcher)
.task(researchTask)
// Write each run to traces/<ensembleId>.json
.traceExporter(new JsonTraceExporter(Path.of("traces/")))
.build();

JsonTraceExporter supports two modes:

  • Directory mode (default): each run writes {ensembleId}.json inside the directory
  • File mode: always overwrites the same file — useful for single-run pipelines
// Directory mode (each run = new file)
new JsonTraceExporter(Path.of("traces/"))
// File mode (always overwrites)
new JsonTraceExporter(Path.of("run-trace.json"), false)

Implement ExecutionTraceExporter to send traces to any destination:

Ensemble.builder()
.traceExporter(trace -> {
myObservabilityApi.ingest(trace.toJson());
})
.build();

The trace is organized as a hierarchy:

ExecutionTrace
schemaVersion, ensembleId, workflow
startedAt, completedAt, totalDuration
inputs (template variables)
agents[] (role, goal, toolNames, allowDelegation)
taskTraces[]
agentRole, taskDescription, duration
prompts (systemPrompt, userPrompt)
llmInteractions[]
iterationIndex, latency, inputTokens, outputTokens
responseType (TOOL_CALLS or FINAL_ANSWER)
responseText (on FINAL_ANSWER)
toolCalls[]
toolName, arguments, result, duration, outcome
delegations[] (for peer delegation)
finalOutput, parsedOutput
metrics (TaskMetrics)
metrics (ExecutionMetrics)
totalCostEstimate
errors[]

The exact prompts sent to the LLM are captured on each TaskTrace:

for (TaskTrace task : trace.getTaskTraces()) {
TaskPrompts prompts = task.getPrompts();
System.out.println("=== System prompt ===");
System.out.println(prompts.getSystemPrompt());
System.out.println("=== User prompt ===");
System.out.println(prompts.getUserPrompt());
}

Every tool invocation is recorded with its arguments, result, timing, and outcome:

for (TaskTrace task : trace.getTaskTraces()) {
for (LlmInteraction iter : task.getLlmInteractions()) {
for (ToolCallTrace tool : iter.getToolCalls()) {
System.out.printf("[%s] %s(%s) -> %s [%dms, %s]%n",
task.getAgentRole(),
tool.getToolName(),
tool.getArguments(),
tool.getResult(),
tool.getDuration().toMillis(),
tool.getOutcome());
}
}
}

Tool call outcomes:

  • SUCCESS — tool returned a successful ToolResult
  • FAILURE — tool returned a failed ToolResult (error message begins with "Error: ")
  • ERROR — tool threw an uncaught exception
  • SKIPPED_MAX_ITERATIONS — tool was not executed because the iteration limit was reached

In addition to execution metrics, individual tool executions can be instrumented with the pluggable ToolMetrics interface. Every tool that extends AbstractAgentTool is automatically instrumented.

When a tool is executed, AbstractAgentTool.execute() automatically records:

  • Success counter — incremented when doExecute() returns a successful ToolResult
  • Failure counter — incremented when doExecute() returns a failed ToolResult
  • Error counter — incremented when doExecute() throws an uncaught exception
  • Duration timer — recorded on every execution regardless of outcome

All measurements are tagged with the tool name and the agent role that invoked the tool.

Tools can also record custom measurements using the metrics() accessor:

public class InventoryTool extends AbstractAgentTool {
@Override
protected ToolResult doExecute(String input) {
metrics().incrementCounter("inventory.cache.hit", agentRole());
// ... execute tool
return ToolResult.success(result);
}
}

Use the agentensemble-metrics-micrometer module to export tool metrics to any Micrometer-compatible registry (Prometheus, Datadog, CloudWatch, etc.):

MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
Ensemble ensemble = Ensemble.builder()
.toolMetrics(new MicrometerToolMetrics(registry))
.build();

Implement ToolMetrics directly for custom backends:

public class MyToolMetrics implements ToolMetrics {
@Override
public void incrementSuccess(String toolName, String agentRole) {
// record success
}
@Override
public void recordDuration(String toolName, String agentRole, Duration duration) {
// record duration
}
// ... other methods
}
Ensemble.builder()
.toolMetrics(new MyToolMetrics())
.build();