Skip to content

13 - Future Roadmap

This document outlines features planned for Phase 2 and beyond. These are NOT part of the Phase 1 implementation but are designed into the architecture so they can be added without breaking changes.

Goal: Enable a manager agent that automatically delegates tasks to worker agents based on their roles and goals.

  1. User defines agents and tasks as usual
  2. User sets workflow(Workflow.HIERARCHICAL)
  3. Optionally provides a managerLlm (defaults to the first agent’s LLM)
  4. At run() time:
    • A “Manager” agent is automatically created with a meta-prompt
    • The manager receives the full list of tasks and available worker agents
    • The manager decides which agent should handle each task
    • The manager can re-order tasks, combine outputs, and synthesize a final result
  • HierarchicalWorkflowExecutor implements WorkflowExecutor
  • Manager prompt includes agent roles/goals/backgrounds so it can make informed delegation decisions
  • Manager uses tool calls to delegate: delegate_task(agent_role, task_description)
  • Manager produces the final synthesized output
  • Error handling: if a delegated task fails, the manager is informed and can reassign or skip
var ensemble = Ensemble.builder()
.agents(List.of(researcher, writer, editor))
.tasks(List.of(researchTask, writeTask, editTask))
.workflow(Workflow.HIERARCHICAL)
.managerLlm(gpt4Model) // New optional field
.build();

Goal: Enable agents to maintain context across tasks and across ensemble runs.

  • Conversation context within a single task execution
  • Already partially supported via the chat memory window in AgentExecutor
  • Enhancement: make the window size configurable per agent
  • Persists across ensemble runs
  • Backed by a vector store (via LangChain4j’s EmbeddingStore)
  • Before each task, relevant memories are retrieved and injected into the prompt
  • After each task, key information is extracted and stored
  • Tracks information about specific entities mentioned across tasks
  • Uses a structured store (key-value or graph)
  • Agents can query entity memory for known facts about a person, company, concept, etc.
var ensemble = Ensemble.builder()
.agents(List.of(researcher, writer))
.tasks(List.of(researchTask, writeTask))
.workflow(Workflow.SEQUENTIAL)
.memory(EnsembleMemory.builder()
.shortTerm(true)
.longTerm(embeddingStore)
.entityMemory(entityStore)
.build())
.build();

Goal: Allow agents to delegate subtasks to other agents within the same ensemble during task execution.

  • Agent A is executing a task and decides it needs help
  • Agent A calls a delegate tool: delegate(agent_role="Data Analyst", task="Analyze this dataset...")
  • The framework pauses Agent A, executes the delegated subtask with the target agent
  • The subtask output is returned to Agent A as the tool result
  • Agent A incorporates the result and continues
  • Agent must have allowDelegation = true
  • Target agent must be in the ensemble’s agent list
  • Delegation depth limit to prevent infinite recursion (configurable, default: 3)
  • Delegated subtasks are logged separately with clear parent-child relationship

Phase 5: Parallel Workflow (COMPLETE — v0.5.0)

Section titled “Phase 5: Parallel Workflow (COMPLETE — v0.5.0)”

Implemented: Workflow.PARALLEL, TaskDependencyGraph, ParallelWorkflowExecutor, ParallelErrorStrategy, ParallelExecutionException.

  • TaskDependencyGraph builds a DAG from each task’s context list using identity-based maps.
  • Tasks with no unmet dependencies start immediately on Java 21 virtual threads.
  • As each task completes, its dependents are evaluated. Those whose all dependencies are now satisfied are submitted; those with failed dependencies are skipped.
  • Uses Executors.newVirtualThreadPerTaskExecutor() (stable Java 21 API, no preview flags).
  • MDC is propagated from the calling thread into each virtual thread.
var ensemble = Ensemble.builder()
.agents(List.of(a1, a2, a3))
.tasks(List.of(t1, t2, t3)) // t1 and t2 are independent, t3 depends on both
.workflow(Workflow.PARALLEL)
.parallelErrorStrategy(ParallelErrorStrategy.FAIL_FAST) // or CONTINUE_ON_ERROR
.build();
// t1 and t2 run concurrently, t3 runs after both complete
  • FAIL_FAST (default): cancel unstarted tasks on first failure, throw TaskExecutionException.
  • CONTINUE_ON_ERROR: independent tasks finish; failed-dep tasks skipped; throw ParallelExecutionException with partial results.

See docs/design/10-concurrency.md for the full concurrency design.


Phase 6: Structured Output (COMPLETE — v0.6.0)

Section titled “Phase 6: Structured Output (COMPLETE — v0.6.0)”

Implemented: Task.outputType, Task.maxOutputRetries, TaskOutput.parsedOutput, TaskOutput.getParsedOutput(Class), JsonSchemaGenerator, StructuredOutputParser, ParseResult, OutputParsingException.

  • Task.outputType(Class<?>) specifies the target Java class (records, POJOs, common JDK types).
  • AgentPromptBuilder injects an ## Output Format section into the user prompt containing the JSON schema derived from the class, plus explicit JSON-only instructions.
  • AgentExecutor runs a retry loop after the main execution:
    1. StructuredOutputParser.extractJson(raw) — extracts JSON from the response, handling plain JSON, markdown fences, and prose-embedded JSON.
    2. StructuredOutputParser.parse(json, type) — deserializes via Jackson (FAIL_ON_UNKNOWN_PROPERTIES = false).
    3. On failure: sends a correction prompt to the LLM showing the error and schema; retries up to Task.maxOutputRetries times (default: 3).
    4. On exhaustion: throws OutputParsingException with raw output, parse errors, and attempt count.
  • Parsed output is stored in TaskOutput.parsedOutput; access via getParsedOutput(Class<T>).
record ResearchReport(String title, List<String> findings, String conclusion) {}
var task = Task.builder()
.description("Research AI trends")
.expectedOutput("A structured research report")
.agent(researcher)
.outputType(ResearchReport.class) // required JSON schema injected into prompt
.maxOutputRetries(3) // default; use 0 to disable retries
.build();
// After execution:
ResearchReport report = taskOutput.getParsedOutput(ResearchReport.class);
  • Generic collection types as top-level output (List<MyRecord>Class<?> cannot carry generic info). Workaround: wrap in a record: record Results(List<MyRecord> items) {}.
  • Setting ResponseFormat.JSON on ChatRequest to use native JSON mode on models that support it (would require detecting model capability at runtime). Prompt-based instruction works universally.

Phase 7+: Delegation Policy Hooks and Lifecycle Events (COMPLETE — v1.0.x)

Section titled “Phase 7+: Delegation Policy Hooks and Lifecycle Events (COMPLETE — v1.0.x)”

Implemented (issues #78 and #79):

  • DelegationPolicy (@FunctionalInterface): pluggable pre-delegation interceptor
  • DelegationPolicyResult (sealed interface): allow(), reject(reason), modify(request)
  • DelegationPolicyContext (record): caller role, depth, max depth, available worker roles
  • DelegationStartedEvent, DelegationCompletedEvent, DelegationFailedEvent (records)
  • Three new EnsembleListener default methods: onDelegationStarted, onDelegationCompleted, onDelegationFailed
  • Three new ExecutionContext fire methods: fireDelegationStarted, fireDelegationCompleted, fireDelegationFailed
  • DelegationContext extended with List<DelegationPolicy> policies() (propagated through descend())
  • Ensemble.builder() gains: .delegationPolicy(DelegationPolicy), .onDelegationStarted(Consumer), .onDelegationCompleted(Consumer), .onDelegationFailed(Consumer)
  • Policy evaluation wired into both AgentDelegationTool (peer delegation) and DelegateTaskTool (hierarchical delegation)

Policies run after built-in guards and before worker invocation:

  1. REJECT — short-circuit; worker not invoked; DelegationFailedEvent fired; FAILURE response returned
  2. MODIFY — replace working request; continue evaluating remaining policies
  3. ALLOW — continue to next policy; if all allow, worker executes normally

DelegationStartedEvent.delegationId() matches DelegationCompletedEvent.delegationId() or DelegationFailedEvent.delegationId() for the same delegation attempt. Guard and policy rejections fire only DelegationFailedEvent (no start event).


Phase 7: Callbacks and Observability (COMPLETE — v0.7.0)

Section titled “Phase 7: Callbacks and Observability (COMPLETE — v0.7.0)”

Implemented: ExecutionContext, EnsembleListener, TaskStartEvent, TaskCompleteEvent, TaskFailedEvent, ToolCallEvent, ToolResolver, and Ensemble builder convenience methods.

  • ExecutionContext is an immutable value object bundling MemoryContext, verbose flag, and List<EnsembleListener>. It is created once per Ensemble.run() and threaded through the entire execution stack, replacing the previously separate verbose and MemoryContext parameters.
  • EnsembleListener is an interface with default no-op implementations of all four event methods, so implementors only override the events they care about.
  • Events are fired by workflow executors (SequentialWorkflowExecutor, ParallelWorkflowExecutor, HierarchicalWorkflowExecutor) and AgentExecutor. Exceptions from listeners are caught and logged without aborting execution or blocking subsequent listeners.
  • ToolResolver was extracted from AgentExecutor as a package-private helper class, reducing AgentExecutor’s complexity and making tool resolution independently testable.
  • AgentExecutor overloads were consolidated from 3 to 2: execute(task, contextOutputs, ExecutionContext) and execute(task, contextOutputs, ExecutionContext, DelegationContext).
  • DelegationContext was refactored to hold ExecutionContext instead of separate memoryContext + verbose fields.
// Full interface implementation
Ensemble.builder()
.agent(researcher)
.task(researchTask)
.listener(new MyMetricsListener())
.build()
.run();
// Lambda convenience methods
Ensemble.builder()
.agent(researcher)
.task(researchTask)
.onTaskStart(event -> log.info("Starting: {}", event.agentRole()))
.onTaskComplete(event -> metrics.record(event.duration()))
.onTaskFailed(event -> alerts.notify(event.cause()))
.onToolCall(event -> metrics.increment("tool." + event.toolName()))
.build()
.run();
EventWhen FiredKey Fields
TaskStartEventBefore task execution beginstaskDescription, agentRole, taskIndex, totalTasks
TaskCompleteEventAfter successful task executiontaskOutput, duration, taskIndex, totalTasks
TaskFailedEventAfter task failure (before exception propagates)cause, duration, taskIndex, totalTasks
ToolCallEventAfter each tool execution in the ReAct looptoolName, toolArguments, toolResult, agentRole, duration

ExecutionContext is immutable. Fire methods may be called concurrently from parallel workflow virtual threads. Listener implementations must be thread-safe when registered with a Workflow.PARALLEL ensemble.


Implemented: InputGuardrail, OutputGuardrail, GuardrailInput, GuardrailOutput, GuardrailResult, GuardrailViolationException, and integration in AgentExecutor and SequentialWorkflowExecutor.

  • InputGuardrail and OutputGuardrail are @FunctionalInterface types on Task.
  • AgentExecutor runs input guardrails before building prompts (before any LLM call). The first failure throws GuardrailViolationException(GuardrailType.INPUT, ...).
  • AgentExecutor runs output guardrails after the final response (and after structured output parsing when outputType is set). The first failure throws GuardrailViolationException(GuardrailType.OUTPUT, ...).
  • SequentialWorkflowExecutor catches GuardrailViolationException alongside the existing AgentExecutionException | MaxIterationsExceededException, fires TaskFailedEvent, and wraps in TaskExecutionException.
  • GuardrailResult.success() / .failure(String reason) are the result factory methods.
  • Guardrails are evaluated in order; the first failure stops evaluation.
var task = Task.builder()
.description("Summarize the article")
.expectedOutput("A concise summary")
.agent(writer)
.inputGuardrails(List.of(noPersonalInfoGuardrail))
.outputGuardrails(List.of(lengthLimitGuardrail, toxicityGuardrail))
.build();

Execution Metrics and Observability (COMPLETE — issue #42)

Section titled “Execution Metrics and Observability (COMPLETE — issue #42)”

Implemented: TaskMetrics, ExecutionMetrics, CostConfiguration, CostEstimate, MemoryOperationCounts, ExecutionTrace, TaskTrace, LlmInteraction, ToolCallTrace, DelegationTrace, TaskPrompts, AgentSummary, ErrorTrace, ExecutionTraceExporter, JsonTraceExporter, TaskTraceAccumulator (internal).

Every task execution automatically captures metrics and a complete call trace. No configuration is required to get the base data.

Metrics (TaskMetrics / ExecutionMetrics):

  • Token counts (input, output, total) with -1 propagation for unknown values
  • LLM latency, tool execution time, memory retrieval time, prompt build time
  • LLM call count, tool call count, delegation count, memory operation counts
  • Optional cost estimation via CostConfiguration (per-token input/output rates)

Trace (ExecutionTrace / TaskTrace):

  • Hierarchical: ExecutionTrace contains TaskTrace list; each TaskTrace contains LlmInteraction list; each LlmInteraction contains ToolCallTrace list
  • Prompts: exact system and user prompts sent to the LLM
  • Each LlmInteraction records the iteration index, timing, token counts, response type, and all tool calls requested in that turn
  • Agent configuration snapshots captured in AgentSummary
  • Peer delegations captured as DelegationTrace with nested worker task trace

Export:

  • ExecutionTrace.toJson() / .toJson(Path) for direct serialization
  • ExecutionTraceExporter strategy interface for custom destinations
  • JsonTraceExporter for file-based JSON export (directory or fixed-file mode)
  • Ensemble.builder().traceExporter(exporter) to register

ExecutionTrace.schemaVersion identifies the trace format. Current version: 1.1 (added captureMode field in issue #89).


CaptureMode — Transparent Debug Capture (COMPLETE — issue #89)

Section titled “CaptureMode — Transparent Debug Capture (COMPLETE — issue #89)”

Implemented: CaptureMode (enum), CapturedMessage (value object), MemoryOperationListener (interface). Extended: LlmInteraction, ToolCallTrace, ExecutionTrace, ExecutionContext, TaskTraceAccumulator, MemoryContext, AgentExecutor, Ensemble.

CaptureMode is an opt-in toggle that layers deeper data collection on top of the base trace without requiring any changes to agents, tasks, or tools.

The effective mode is resolved at run() time:

  1. Explicit .captureMode(CaptureMode.STANDARD) on the builder
  2. -Dagentensemble.captureMode=STANDARD JVM system property
  3. AGENTENSEMBLE_CAPTURE_MODE=STANDARD environment variable
  4. Default: OFF

What each level adds:

  • STANDARD: full LLM message history per ReAct iteration (LlmInteraction.messages); memory operation counts (MemoryOperationCounts) wired from MemoryContext via MemoryOperationListener
  • FULL: everything in STANDARD; auto-export to ./traces/ when no traceExporter is configured; enriched tool I/O (ToolCallTrace.parsedInput as structured Map<String,Object>)

At CaptureMode.OFF, no listeners are registered, no message lists are built, and no JSON parsing is done. The only overhead is the same object allocation that the base trace (#42) already imposes.


Built-In Tool Library (COMPLETE — v1.0.0)

Section titled “Built-In Tool Library (COMPLETE — v1.0.0)”

Implemented: agentensemble-tools module with 7 built-in AgentTool implementations, published as a separate artifact net.agentensemble:agentensemble-tools.

ToolClassDescription
CalculatorCalculatorToolArithmetic expression evaluation via recursive-descent parser
Date/TimeDateTimeToolCurrent time, timezone conversion, date arithmetic using java.time
File ReadFileReadToolSandboxed file reading; path traversal rejected
File WriteFileWriteToolSandboxed file writing; parent dirs auto-created
Web SearchWebSearchToolHTTP web search via WebSearchProvider (Tavily, SerpAPI, or custom)
Web ScraperWebScraperToolHTTP GET + Jsoup HTML-to-text extraction with configurable length limit
JSON ParserJsonParserToolDot-notation path extraction from JSON (supports array indexing)

See Built-in Tools guide for usage.

Hierarchical Constraints (COMPLETE — v1.0.x)

Section titled “Hierarchical Constraints (COMPLETE — v1.0.x)”

Implemented (issue #81):

  • HierarchicalConstraints configuration on Ensemble.builder().hierarchicalConstraints(...) for constraining manager-to-worker delegation in hierarchical workflows
  • Constraint types: requiredWorkers (Set<String> — roles that must be called), allowedWorkers (Set<String> — allowlist; empty means all allowed), maxCallsPerWorker (Map<String,Integer> — per-worker cap), globalMaxDelegations (int — total cap; 0=unlimited), requiredStages (List<List<String>> — ordered stage groups)
  • Pre-delegation violations (disallowed worker, cap exceeded, stage ordering) are returned as error messages to the Manager LLM via the DelegationPolicy mechanism
  • Post-execution validation: if required workers were not called, ConstraintViolationException is thrown with violations list and partial task outputs
  • Constraint enforcer is prepended as first DelegationPolicy; user policies still apply after constraint checks
  • All constraint roles validated against registered agents at Ensemble.run() time (ValidationException if invalid)
  • Stream agent responses token-by-token using LangChain4j’s StreamingChatLanguageModel
  • Useful for real-time UIs showing agent progress
  • Per-agent or per-LLM rate limiting (requests per minute)
  • Useful when multiple agents share the same API key
  • Configurable via agent builder or ensemble builder

PhaseTargetKey Features
Phase 1v0.1.0Core framework: Agent, Task, Ensemble, sequential workflow, tools
Phase 2v0.2.0Hierarchical workflow
Phase 3v0.3.0Memory system (short-term, long-term, entity)
Phase 4v0.4.0Agent delegation
Phase 5v0.5.0Parallel workflow
Phase 6v0.6.0Structured output
Phase 7v0.7.0Callbacks and observability (COMPLETE)
Phase 8v0.8.0Guardrails: pre/post execution validation (COMPLETE)
Phase 9v1.0.0Streaming, built-in tools
Phase 10v2.0.0MapReduceEnsemble (static + adaptive + short-circuit)
Phase 11v2.1.0Live Execution Dashboard (COMPLETE)

Goal: Provide a first-class builder for the fan-out / tree-reduce pattern, with both a static (pre-built DAG) and adaptive (token-budget-driven) execution strategy.

The core problem this solves: when N agents each produce substantial output, a single aggregation agent’s context can overflow the model’s context window. MapReduceEnsemble automates multi-level tree reduction to keep every reduce step within a configurable token budget.

Issue A: Static MapReduceEnsemble (chunkSize)

Delivers the MapReduceEnsemble builder with a fixed chunkSize strategy. The entire DAG is pre-built before execution. The inner Ensemble with Workflow.PARALLEL handles concurrent execution. Includes toEnsemble() for DAG inspection and devtools/viz updates.

Issue B: Adaptive MapReduceEnsemble (targetTokenBudget)

Extends Issue A with token-budget-aware adaptive execution. After each level completes, actual output token counts drive bin-packing to determine the next level’s shape. Includes contextWindowSize/budgetRatio convenience, token estimation strategies, maxReduceLevels safety valve, and trace/metrics aggregation across multiple Ensemble.run() calls.

Issue C: Short-Circuit Optimization

Extends Issue B with an optional pre-execution size check. When total input fits within the token budget, the map-reduce pipeline is bypassed in favour of a single direct task. Requires directAgent and directTask factory configuration.

See docs/design/14-map-reduce.md for the complete specification including algorithm pseudocode, API reference, visualization layer changes, edge case table, and implementation class structure.

Each phase should be backward-compatible with previous phases. The API is designed with future phases in mind — builder methods can be added without breaking existing code.


Phase 11: Live Execution Dashboard (COMPLETE — v2.1.0)

Section titled “Phase 11: Live Execution Dashboard (COMPLETE — v2.1.0)”

Goal: Stream ensemble execution events to a browser in real-time and use the browser as the interactive GUI for human-in-the-loop review approval.

The current visualization toolchain is post-hoc: agentensemble-devtools writes a .trace.json file after execution and agentensemble-viz renders it statically. Phase 11 turns the browser into a live execution dashboard where:

  1. Every EnsembleListener event is pushed to the browser as it fires, giving the developer a live timeline and flow graph that update as tasks start, complete, and fail.
  2. WebReviewHandler (currently a stub throwing UnsupportedOperationException) is fully implemented. Review gates in agentensemble-review block the JVM thread while the browser displays an approval UI. The developer clicks Approve, Edit, or Exit Early from the browser.

See docs/design/16-live-dashboard.md for the full specification.

agentensemble-web (net.agentensemble:agentensemble-web):

  • Embedded Javalin WebSocket server
  • WebDashboard public API: WebDashboard.builder().port(7329).build()
  • ConnectionManager for multi-client session tracking and late-join snapshot delivery
  • WebSocketStreamingListener (implements EnsembleListener) bridges all 7 callback events to WebSocket messages
  • WebReviewHandler (implements ReviewHandler) blocks on CompletableFuture until browser sends decision
WebDashboard dashboard = WebDashboard.builder()
.port(7329)
.reviewTimeout(Duration.ofMinutes(5))
.onTimeout(OnTimeoutAction.CONTINUE)
.build();
EnsembleOutput output = Ensemble.builder()
.chatLanguageModel(model)
.task(Task.builder()
.description("Research AI trends")
.expectedOutput("Research report")
.review(Review.required()) // pauses at this task for browser approval
.build())
.webDashboard(dashboard) // wires streaming listener + web review handler
.build()
.run();

Opening http://localhost:7329 shows the live dashboard. The timeline and flow graph update as events stream in. When a review gate fires, the browser shows an approval panel with a countdown timer.

IssueTitleDependencies
G1agentensemble-web module: WebSocket server + protocolNone
G2WebSocketStreamingListener: bridge callbacks to WebSocketG1
H1WebReviewHandler: real implementation (replaces stub)G1
H2Viz: review approval UIH1, I1
I1Viz: live mode + WebSocket client + incremental stateG2
I2Viz: live timeline and flow view updatesI1