Skip to main content
Java

Structured Concurrency in Production: What Changes in Your Code, Not the Slides

Ravinder··8 min read
JavaStructured ConcurrencyVirtual ThreadsConcurrencyProject Loom
Share:
Structured Concurrency in Production: What Changes in Your Code, Not the Slides

The Claim and the Reality

The claim for structured concurrency is compelling: concurrent operations are scoped to a lexical block, they fail together, they cancel together, and you get error propagation that actually works. The slides are persuasive. The JEP is well-written.

The production reality is more nuanced. Structured concurrency (finalized in Java 21 after preview cycles in Java 19 and 20) genuinely improves correctness. But it requires you to rethink how you structure concurrent operations, how you handle partial failures, how you observe running tasks, and how you debug things when they go wrong. The mental model shift is real, and the code you write looks different from both traditional ExecutorService code and reactive code.

This article is about what that difference looks like in practice — the patterns that work, the patterns that look right but do not, and the observability gaps you need to plan for.


What Structured Concurrency Actually Is

Before the practical patterns, the model needs to be precise.

In unstructured concurrency (standard ExecutorService), task lifetimes are decoupled from the code that creates them. A task submitted to a thread pool outlives the method that submitted it. Errors propagate back only if you explicitly retrieve the Future. Cancellation requires manual coordination.

Structured concurrency inverts this: tasks are children of a scope, the scope is a try-with-resources block, and the scope closes only when all children have completed or been cancelled. This creates a tree of tasks with well-defined lifetimes.

flowchart TD subgraph scope["StructuredTaskScope (try block)"] direction TB P[Parent Thread] --> F1[fork: fetchUser] P --> F2[fork: fetchInventory] P --> F3[fork: fetchPricing] F1 --> J[join / joinUntil] F2 --> J F3 --> J end J --> R[Results available or\nexception propagated]

The key invariant: when the try block exits, all forked tasks have either completed or been interrupted. There is no way for a forked task to outlive its scope. This is the guarantee that makes structured concurrency useful for correctness.


The Two Built-In Scopes

Java 21 ships two concrete StructuredTaskScope implementations:

ShutdownOnFailure: if any task fails, cancel all others and surface the first failure.

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> userTask = scope.fork(() -> userService.findById(userId));
    Subtask<Inventory> inventoryTask = scope.fork(() -> inventoryService.forUser(userId));
    Subtask<List<Price>> pricingTask = scope.fork(() -> pricingService.forUser(userId));
 
    scope.join()           // wait for all to complete or one to fail
         .throwIfFailed(); // propagate failure if any task failed
 
    // all three succeeded
    return buildResponse(userTask.get(), inventoryTask.get(), pricingTask.get());
}

ShutdownOnSuccess: if any task succeeds, cancel all others and surface the first success.

try (var scope = new StructuredTaskScope.ShutdownOnSuccess<SearchResult>()) {
    scope.fork(() -> searchProvider1.search(query));
    scope.fork(() -> searchProvider2.search(query));
    scope.fork(() -> searchProvider3.search(query));
 
    scope.join();
    return scope.result(); // returns whichever completed first
}

ShutdownOnSuccess is the hedge pattern — you fan out to multiple providers and accept the fastest response. This replaces complex CompletableFuture chains that attempted the same thing.


Error Semantics: Where People Get It Wrong

The most common mistake with ShutdownOnFailure is conflating "task threw an exception" with "the operation failed."

When any forked task throws, ShutdownOnFailure cancels the other tasks and marks the scope as failed. The throwIfFailed() call re-throws the exception. This is correct when all tasks are required. It is wrong when some tasks are optional.

// Wrong: treats optional enrichment failure as fatal
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<Order> orderTask = scope.fork(() -> orderService.findById(orderId));
    Subtask<UserProfile> profileTask = scope.fork(() -> profileService.findById(userId)); // optional
 
    scope.join().throwIfFailed();
    return buildResponse(orderTask.get(), profileTask.get());
}

If profileService is down and profile enrichment is optional, this will fail the entire request when it should degrade gracefully.

// Correct: separate required and optional tasks
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<Order> orderTask = scope.fork(() -> orderService.findById(orderId));
 
    scope.join().throwIfFailed();  // fail fast on required task
 
    // optional task in its own scope with fallback
    UserProfile profile = fetchProfileWithFallback(userId);
    return buildResponse(orderTask.get(), profile);
}
 
private UserProfile fetchProfileWithFallback(String userId) {
    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        Subtask<UserProfile> task = scope.fork(() -> profileService.findById(userId));
        scope.join();
        if (task.state() == Subtask.State.SUCCESS) {
            return task.get();
        }
        log.warn("Profile service unavailable for user {}", userId);
        return UserProfile.anonymous();
    }
}

The nested scope approach is idiomatic. Required operations and optional operations belong in separate scopes with separate error policies.


Cancellation and Timeouts

Deadlines are the most important feature of StructuredTaskScope that the introductory examples rarely show.

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> userTask = scope.fork(() -> userService.findById(userId));
    Subtask<List<Order>> ordersTask = scope.fork(() -> orderService.findByUser(userId));
 
    // deadline: fail the scope if not complete in 500ms
    scope.joinUntil(Instant.now().plusMillis(500))
         .throwIfFailed();
 
    if (userTask.state() != Subtask.State.SUCCESS ||
        ordersTask.state() != Subtask.State.SUCCESS) {
        throw new ServiceTimeoutException("User/order lookup timed out");
    }
 
    return buildResponse(userTask.get(), ordersTask.get());
}

joinUntil(Instant) sets an absolute deadline. If the deadline passes before all tasks complete, the scope initiates shutdown — remaining tasks are interrupted. The interrupt propagates to blocking I/O calls via InterruptedException, which is why virtual threads and structured concurrency work well together. Virtual threads respond to interruption correctly; platform threads in blocking I/O sometimes do not.

Timeout handling pattern for HTTP clients:

// HttpClient in Java 21 works well with structured concurrency and interruption
private User fetchUserWithTimeout(String userId) throws Exception {
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create(userServiceUrl + "/" + userId))
        .timeout(Duration.ofMillis(400))  // per-request timeout as backstop
        .build();
 
    HttpResponse<String> response = httpClient.send(
        request,
        HttpResponse.BodyHandlers.ofString()
    );
 
    return objectMapper.readValue(response.body(), User.class);
}

The per-request timeout on HttpClient is a backstop. The joinUntil deadline on the scope is the primary control. They work together: if the scope deadline fires first, the forked virtual thread is interrupted and the HTTP request is cancelled.


Custom Scopes for Real-World Policies

The built-in scopes cover the common cases. For more nuanced policies, extend StructuredTaskScope directly.

// collect all results, accumulate all errors, report both
public class CollectAllScope<T> extends StructuredTaskScope<T> {
 
    private final List<T> results = new CopyOnWriteArrayList<>();
    private final List<Throwable> errors = new CopyOnWriteArrayList<>();
 
    @Override
    protected void handleComplete(Subtask<? extends T> task) {
        if (task.state() == Subtask.State.SUCCESS) {
            results.add(task.get());
        } else if (task.state() == Subtask.State.FAILED) {
            errors.add(task.exception());
        }
    }
 
    public List<T> results() {
        ensureOwnerAndJoined();
        return Collections.unmodifiableList(results);
    }
 
    public List<Throwable> errors() {
        ensureOwnerAndJoined();
        return Collections.unmodifiableList(errors);
    }
}

Usage:

try (var scope = new CollectAllScope<NotificationResult>()) {
    for (NotificationChannel channel : channels) {
        scope.fork(() -> send(notification, channel));
    }
    scope.join();
 
    log.info("Sent to {} channels, {} failures",
        scope.results().size(), scope.errors().size());
 
    scope.errors().forEach(e ->
        log.warn("Notification delivery failed", e));
 
    return DeliveryReport.of(scope.results(), scope.errors());
}

This pattern is useful for any fan-out operation where partial success is acceptable and you want to record both outcomes.


Observability: The Gap to Plan For

Structured concurrency creates a task tree, but the standard observability tools — thread names, MDC, spans — do not automatically propagate through fork(). This is the observability gap that will surprise you in production.

Problem: MDC does not cross scope boundaries

// MDC set on parent thread
MDC.put("requestId", requestId);
 
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    scope.fork(() -> {
        // MDC.get("requestId") returns null here — different thread
        log.info("Fetching user"); // no requestId in log
        return userService.findById(userId);
    });
    scope.join().throwIfFailed();
}

Fix: capture context at fork time

String requestId = MDC.get("requestId");
Map<String, String> mdcContext = MDC.getCopyOfContextMap();
 
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    scope.fork(() -> {
        MDC.setContextMap(mdcContext);  // restore on forked thread
        try {
            log.info("Fetching user");  // requestId is present
            return userService.findById(userId);
        } finally {
            MDC.clear();
        }
    });
    scope.join().throwIfFailed();
}

A utility wrapper makes this less painful:

public static <T> Callable<T> withMdc(Callable<T> callable) {
    Map<String, String> ctx = MDC.getCopyOfContextMap();
    return () -> {
        Map<String, String> previous = MDC.getCopyOfContextMap();
        MDC.setContextMap(ctx);
        try {
            return callable.call();
        } finally {
            if (previous != null) MDC.setContextMap(previous);
            else MDC.clear();
        }
    };
}
 
// usage
scope.fork(withMdc(() -> userService.findById(userId)));

OpenTelemetry span propagation requires the same pattern — capture the current context at fork time and set it on the child thread.


Debugging Structured Concurrent Programs

The debugging story for structured concurrency is better than for reactive but requires knowing where to look.

Thread dumps are readable again. Virtual threads appear in jstack output and in JFR thread events. A forked virtual thread shows a stack trace rooted at the forked callable. The parent thread shows it is blocked in scope.join().

"virtual-21" virtual #21 daemon
  com.acme.service.UserService.findById(UserService.java:42)
  com.acme.api.UserController.lambda$handleRequest$0(UserController.java:67)
  java.util.concurrent.StructuredTaskScope$SubtaskImpl.run(StructuredTaskScope.java:...)

This is a dramatic improvement over reactive stack traces. The business logic is visible.

JFR jdk.VirtualThreadPinned event fires when a virtual thread is pinned to a carrier thread — typically because it called synchronized while blocking. Pinning reduces the effectiveness of virtual threads. JFR makes this visible.

jcmd <pid> JFR.start duration=30s filename=/tmp/pinning.jfr settings=profile
# look for jdk.VirtualThreadPinned events

Key Takeaways

  • Structured concurrency's core guarantee — forked tasks cannot outlive their scope — is the correctness improvement that matters; everything else follows from it.
  • Separate required tasks from optional tasks into different scopes with different error policies; using ShutdownOnFailure for optional operations is the most common mistake.
  • joinUntil(Instant) is the deadline mechanism; combine it with per-request timeouts on HTTP clients as a backstop, not a replacement.
  • MDC and OpenTelemetry context do not propagate through fork() automatically — capture context at fork time and restore it on the child thread.
  • Virtual threads respond correctly to interruption from scope cancellation; platform threads in blocking I/O may not — prefer virtual threads when using structured concurrency.
  • Thread dumps are readable with structured concurrency on virtual threads — invest in knowing how to read them rather than adding excessive logging as a substitute.