Metrics and Observability
AgentEnsemble provides two layers of observability: execution metrics (token counts,
timing, costs) available on every run result, and tool metrics (pluggable per-tool
counters and timers via the ToolMetrics interface).
Execution Metrics
Section titled “Execution Metrics”Every EnsembleOutput carries an ExecutionMetrics object and every TaskOutput carries
a TaskMetrics object. These are populated automatically — no configuration required.
EnsembleOutput output = ensemble.run();
// Per-run totalsExecutionMetrics metrics = output.getMetrics();System.out.println("Total tokens: " + metrics.getTotalTokens());System.out.println("Input tokens: " + metrics.getTotalInputTokens());System.out.println("Output tokens: " + metrics.getTotalOutputTokens());System.out.println("LLM latency: " + metrics.getTotalLlmLatency());System.out.println("Tool exec time: " + metrics.getTotalToolExecutionTime());System.out.println("LLM calls: " + metrics.getTotalLlmCallCount());
// Per-task breakdownfor (TaskOutput task : output.getTaskOutputs()) { TaskMetrics tm = task.getMetrics(); System.out.printf("[%s] tokens=%d (in=%d out=%d) llm=%s tools=%s%n", task.getAgentRole(), tm.getTotalTokens(), tm.getInputTokens(), tm.getOutputTokens(), tm.getLlmLatency(), tm.getToolExecutionTime());}Token counts
Section titled “Token counts”Token counts are sourced from ChatResponse.tokenUsage(). When the LLM provider
does not return usage metadata, token fields are -1 (unknown) rather than 0.
A value of 0 means zero tokens were used, not that the count is unavailable.
long inputTokens = task.getMetrics().getInputTokens();if (inputTokens < 0) { System.out.println("Token usage not available for this provider");} else { System.out.println("Input tokens: " + inputTokens);}When any task in the run has unknown token counts, the aggregate
ExecutionMetrics.getTotalTokens() is also -1.
Timing breakdown
Section titled “Timing breakdown”TaskMetrics tracks four distinct timings:
| Field | Description |
|---|---|
llmLatency | Cumulative time waiting for LLM responses across all ReAct iterations |
toolExecutionTime | Cumulative time executing tools (excluding wait for LLM) |
promptBuildTime | Time building system + user prompts before the first LLM call |
memoryRetrievalTime | Time querying long-term and entity memory stores |
All durations use java.time.Duration. Use .toMillis(), .toSeconds(), or
.toString() to format them.
Cost Estimation
Section titled “Cost Estimation”Provide per-token rates and the framework multiplies them by the actual token counts.
Ensemble ensemble = Ensemble.builder() .agent(researcher) .task(researchTask) .costConfiguration(CostConfiguration.builder() .inputTokenRate(new BigDecimal("0.0000025")) // $2.50 / 1M input tokens .outputTokenRate(new BigDecimal("0.0000100")) // $10.00 / 1M output tokens .currency("USD") .build()) .build();
EnsembleOutput output = ensemble.run();
// Per-run costCostEstimate total = output.getMetrics().getTotalCostEstimate();if (total != null) { System.out.printf("Run cost: $%.6f (in=%.6f out=%.6f)%n", total.getTotalCost(), total.getInputCost(), total.getOutputCost());}
// Per-task costfor (TaskOutput task : output.getTaskOutputs()) { CostEstimate cost = task.getMetrics().getCostEstimate(); if (cost != null) { System.out.printf("[%s] $%.6f%n", task.getAgentRole(), cost.getTotalCost()); }}Cost estimation requires that the LLM provider returns token usage. When token counts
are -1, getCostEstimate() returns null rather than an incorrect zero.
Execution Trace
Section titled “Execution Trace”Every run produces a complete ExecutionTrace — a hierarchical record of every LLM
interaction, every tool call with its input and output, all prompts sent, and delegation
chains. This is the primary resource for post-mortem debugging and analysis.
EnsembleOutput output = ensemble.run();
ExecutionTrace trace = output.getTrace();System.out.println("Run ID: " + trace.getEnsembleId());System.out.println("Workflow: " + trace.getWorkflow());System.out.println("Duration: " + trace.getTotalDuration());
// Inspect each task's LLM interactionsfor (TaskTrace task : trace.getTaskTraces()) { System.out.printf("Task [%s]: %d LLM call(s)%n", task.getAgentRole(), task.getLlmInteractions().size()); for (LlmInteraction interaction : task.getLlmInteractions()) { System.out.printf(" Iteration %d: %s, %dms, %d tool call(s)%n", interaction.getIterationIndex(), interaction.getResponseType(), interaction.getLatency().toMillis(), interaction.getToolCalls().size()); }}Export to JSON
Section titled “Export to JSON”The trace serializes to pretty-printed JSON with a single method call. All
Instant fields are ISO-8601 strings and Duration fields are ISO-8601 duration
strings (PT12.345S).
// Get as JSON stringString json = output.getTrace().toJson();
// Write to a fileoutput.getTrace().toJson(Path.of("run-trace.json"));Automatic export
Section titled “Automatic export”Register a traceExporter on the ensemble to automatically export after every run:
Ensemble ensemble = Ensemble.builder() .agent(researcher) .task(researchTask) // Write each run to traces/<ensembleId>.json .traceExporter(new JsonTraceExporter(Path.of("traces/"))) .build();JsonTraceExporter supports two modes:
- Directory mode (default): each run writes
{ensembleId}.jsoninside the directory - File mode: always overwrites the same file — useful for single-run pipelines
// Directory mode (each run = new file)new JsonTraceExporter(Path.of("traces/"))
// File mode (always overwrites)new JsonTraceExporter(Path.of("run-trace.json"), false)Implement ExecutionTraceExporter to send traces to any destination:
Ensemble.builder() .traceExporter(trace -> { myObservabilityApi.ingest(trace.toJson()); }) .build();Trace structure
Section titled “Trace structure”The trace is organized as a hierarchy:
ExecutionTrace schemaVersion, ensembleId, workflow startedAt, completedAt, totalDuration inputs (template variables) agents[] (role, goal, toolNames, allowDelegation) taskTraces[] agentRole, taskDescription, duration prompts (systemPrompt, userPrompt) llmInteractions[] iterationIndex, latency, inputTokens, outputTokens responseType (TOOL_CALLS or FINAL_ANSWER) responseText (on FINAL_ANSWER) toolCalls[] toolName, arguments, result, duration, outcome delegations[] (for peer delegation) finalOutput, parsedOutput metrics (TaskMetrics) metrics (ExecutionMetrics) totalCostEstimate errors[]Accessing prompt content
Section titled “Accessing prompt content”The exact prompts sent to the LLM are captured on each TaskTrace:
for (TaskTrace task : trace.getTaskTraces()) { TaskPrompts prompts = task.getPrompts(); System.out.println("=== System prompt ==="); System.out.println(prompts.getSystemPrompt()); System.out.println("=== User prompt ==="); System.out.println(prompts.getUserPrompt());}Inspecting tool calls
Section titled “Inspecting tool calls”Every tool invocation is recorded with its arguments, result, timing, and outcome:
for (TaskTrace task : trace.getTaskTraces()) { for (LlmInteraction iter : task.getLlmInteractions()) { for (ToolCallTrace tool : iter.getToolCalls()) { System.out.printf("[%s] %s(%s) -> %s [%dms, %s]%n", task.getAgentRole(), tool.getToolName(), tool.getArguments(), tool.getResult(), tool.getDuration().toMillis(), tool.getOutcome()); } }}Tool call outcomes:
SUCCESS— tool returned a successfulToolResultFAILURE— tool returned a failedToolResult(error message begins with"Error: ")ERROR— tool threw an uncaught exceptionSKIPPED_MAX_ITERATIONS— tool was not executed because the iteration limit was reached
Tool Metrics
Section titled “Tool Metrics”In addition to execution metrics, individual tool executions can be instrumented with the
pluggable ToolMetrics interface. Every tool that extends AbstractAgentTool is
automatically instrumented.
How tool metrics work
Section titled “How tool metrics work”When a tool is executed, AbstractAgentTool.execute() automatically records:
- Success counter — incremented when
doExecute()returns a successfulToolResult - Failure counter — incremented when
doExecute()returns a failedToolResult - Error counter — incremented when
doExecute()throws an uncaught exception - Duration timer — recorded on every execution regardless of outcome
All measurements are tagged with the tool name and the agent role that invoked the tool.
Tools can also record custom measurements using the metrics() accessor:
public class InventoryTool extends AbstractAgentTool { @Override protected ToolResult doExecute(String input) { metrics().incrementCounter("inventory.cache.hit", agentRole()); // ... execute tool return ToolResult.success(result); }}Micrometer integration
Section titled “Micrometer integration”Use the agentensemble-metrics-micrometer module to export tool metrics to any
Micrometer-compatible registry (Prometheus, Datadog, CloudWatch, etc.):
MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
Ensemble ensemble = Ensemble.builder() .toolMetrics(new MicrometerToolMetrics(registry)) .build();Custom tool metrics implementation
Section titled “Custom tool metrics implementation”Implement ToolMetrics directly for custom backends:
public class MyToolMetrics implements ToolMetrics { @Override public void incrementSuccess(String toolName, String agentRole) { // record success } @Override public void recordDuration(String toolName, String agentRole, Duration duration) { // record duration } // ... other methods}
Ensemble.builder() .toolMetrics(new MyToolMetrics()) .build();