Mapping Resonate's durable async/await onto Rust
Resonate is a durable execution engine. We have covered what durable execution is in earlier post, so here it is enough to say that you write your business logic as a normal function and the engine takes care of recovering and resuming it after a crash. The Rust SDK exposes this as ordinary async Rust.
use resonate_sdk::prelude::*;
#[resonate_sdk::function]
async fn greet(ctx: &Context, name: String) -> Result<String> {
let upper = ctx.run(uppercase, name.clone()).await?;
Ok(format!("Hello, {}!", upper))
}
The interesting part is everything that has to happen behind ctx.run(...).await? to make that durable. This post walks through how we map Resonate’s durable model onto Rust, and why the language fits the mapping well.
The model: durable execution is replay
“Durable execution” does not mean we pause a function and write its stack to disk. It means replay.
When a durable function runs, every step that matters (calling a sub-function, sleeping, doing an rpc) is recorded as a durable promise on the server, keyed by a deterministic ID. If the process crashes, the engine re-runs the function from the top. On that second run, each recorded step is already settled, so instead of doing the work again we return the stored result immediately. Execution fast-forwards through everything that already completed and continues from where it stopped.
Three properties make replay correct:
Deterministic IDs. The Nth durable call inside a function always gets the same ID, no randomness and no wall clock, so a replay maps each call back to the promise it created last time.
Durable promises on the server. The server records every durable step and its outcome, so the function body carries no durable state of its own. It is replayed mechanically and the stored promises feed it the results, which means a step that already completed is never run a second time. This also enables us to heavily use caching at the SDK level to reduce network round trips.
A function has to suspend at the right moment. A workflow can spawn local sub-tasks, which run inside this process, and remote ones, dispatched to other workers. We only suspend once all of the local work has finished and settled its promises, and the only thing left to wait on is remote. Suspending while a local task is still in flight would be a bug, a replay would re-run that work or lose its result, and the durable graph would no longer match what actually happened.
The cost of doing this in TypeScript
To make replay work, the runtime has to sit between the user’s function and each durable step, pausing the function so it can run that step, dispatch it, or hand back a value it already computed. How hard that is depends on the language.
In JavaScript it is hard. A native async function runs itself: once it’s called, it schedules its own continuations on the microtask queue and the javascript engine drives it to completion. There is no seam between two await points where the runtime can step in to substitute a memoized value or skip work that already happened. So our TypeScript SDK cannot use async/await for the user-facing model. It uses generators:
function* greet(ctx: Context, name: string): Generator<Yieldable, string, any> {
const upper = yield* ctx.run(uppercase, name);
return `Hello, ${upper}!`;
}A generator does not run itself. It advances only when a driver calls .next(value), stopping at every yield to hand back a command object that describes the intent. Our runtime performs the durable operation, resumes the generator with the result, and on replay feeds the stored value straight back with no network call. It works, but the model leaks into the syntax: every workflow is a function*, every durable call is a yield*, and the user must spell out a Generator<...> return type.
Rust’s lazy async/await model lets us implement the full durable execution model while keeping the surface as close to ordinary async Rust as possible.
A macro turns an async fn into a durable trait impl
The entry point is #[resonate_sdk::function]. It does not rewrite the function’s control flow; it inspects the type of the first parameter and generates a trait implementation around the body.
A &Context first parameter marks a workflow, a function allowed to orchestrate sub-tasks. Anything else marks a leaf. Usually, a leaf wraps a side effect such as a database write or an HTTP call, so the engine checkpoints its result and runs it once. On replay the recorded value comes back instead of the effect firing again, which is what keeps replay deterministic. The role is inferred from the parameter types, with nothing extra to declare.
The generated code implements the Durable trait:
pub trait Durable<Args, T>: Send + Sync + 'static {
const NAME: &'static str;
const KIND: DurableKind;
fn execute(&self, env: ExecutionEnv<'_>, args: Args)
-> impl Future<Output = Result<T>> + Send;
}The ExecutionEnv for a leaf is a read-only Info, a workflow receives a full Context. A leaf cannot reach for Context orchestration methods because it never gets a Context, so the type system itself encodes which functions may suspend and which might perform non-deterministic effects.
Awaiting is the interception point
That same interception, the thing TypeScript needs generators and a custom runtime for, Rust gives us from .await itself. This is where the language fits our model.
A Rust Future is lazy, it does nothing until something polls it, an .await we can point to in the source code. The SDK is built on tokio, and tokio drives our futures the same way it drives any other future in the program. What we lean on is the laziness, it gives us a deterministic point where we know a durable step is awaited. Building on tokio instead of replacing it is a real strength of the SDK, it composes with the rest of the ecosystem, tokio::spawn, tokio::select!, tracing, etc, all work as expected.
Another feature that fits our problem nicely is that .await works on anything implementing IntoFuture, not only built-in futures. So ctx.run(...) does not return a value or a plain future; it returns a builder, a RunTask, with a custom IntoFuture implementation (same for ctx.rpc(...), ctx.sleep(...), and ctx.promise(...) each have their own). Awaiting it drives a future whose behavior we wrote.
Inside that into_future, the logic is the replay model in miniature:
let record = consume_promise_record(cell, req, &ctx.effects).await?;
if let Some(result) = record.as_result::<T>() {
return result; // already settled: hand back the memoized value
}
// still pending: register the dependency and suspend
ctx.spawned_remote.lock().await.push(child_id);
Err(Error::Suspended)If the promise is already resolved, we return the stored value and the user’s code never knows it was a replay; if it is pending, we suspend. Either way, the user just sees .await?.
The builder shape also buys ergonomics, because ctx.run(f, args) does no work until awaited, the call site can chain .spawn() or .timeout() to override execution options before the .await.
What we had to hand build for the TypeScript SDK, with generators and an execution loop, Rust gives to us in the form of the Future trait, so the user writes plain async code.
Suspension is control flow
When a durable step has to wait on another worker, the function must suspend. In Rust we express that by returning Err(Error::Suspended), an ordinary variant of our error enum that propagates using the ? operator straight up and out of the user’s function. The error channel is also doing control-flow duty.
This composes nicely with ?, but it leaks. Err(Suspended) flows through the same Result the user sees, so the user can intercept it. Call .unwrap() on a durable step instead of using ? and it unwraps a Suspended error, which panics the task. We try to mitigate this edge case by wrapping every user function in a panic guard. In other languages like Python and Go we have been able to use try/except and panic/recover to implement the same feature in a less error prone way that is still close to the language semantics.
Standing on the shoulders of Tokio
For concurrent execution, RunTask has .spawn(), which eagerly creates the durable promise and runs the function on a fresh tokio::spawn task:
let h1 = ctx.run(work, a).spawn()?;
let h2 = ctx.run(work, b).spawn()?;
let r1 = h1.await?;
let r2 = h2.await?;Each .spawn() returns a DurableFuture backed by a oneshot channel, so the parent continues immediately and collects results later. Because these are real tokio tasks rather than entries in a single event loop, on a multi-threaded runtime they can execute in parallel across worker threads, not just interleave on one thread.
Structured concurrency falls out of the suspension model, and this is where the suspend-timing becomes important for correctness. When a workflow returns or is about to suspend, the Context flushes any in-flight spawned tasks through, before the parent is allowed to suspend, which guarantees we only ever suspend when there is no more local work to do. If a spawned child itself is suspended on a remote dependency, that dependency bubbles up into the parent’s and the parent suspends too. Nothing local is abandoned mid-flight, so the next replay sees a consistent state that matches reality.
Beyond Rust async/await
A few Rust features make the Rust SDK nice to use.
Serde bounds as a durability boundary. Arguments are Serialize, results are DeserializeOwned, and those bounds sit directly on the durable call. The compiler statically rejects a workflow step whose input or output cannot be serialized. In a dynamically typed SDK, serializability is a runtime hope, in Rust it is a compile error.
Typed arguments through the Durable trait. Because ctx.run(my_func, args) is generic over the Durable impl the macro generated, the compiler knows the exact arguments and return types of the function passed to it. A wrong argument is a compile error at the call site, and the editor offers autocomplete and inline type hints for a durable call as if it were a plain function call. This was is possible thanks to the very powerful metaprogramming capabilities of Rust.
The takeaway
The hard parts of durable execution are deterministic replay, the promise store, and the suspend-replay semantics, and none of them care whether the frontend is a generator or a future. What changes between languages is how much work we need to make our model fit the language. Rust gives us lazy futures driven by tokio, .await on custom IntoFutures, and bounds that match what replay already needs. The result is an SDK where durable, distributed, crash-recoverable code reads like ordinary async Rust.


