Distributed Async Await | Introduction
Solve complex problems with simple code-Enjoy peace of mind
This blog post introduces Distributed Async Await, a dead simple programming model for concurrent and distributed applications, designed to provide a delightful developer experience.
This is the first in a series of three blog post:
Introduction
The Protocol Stack
The Programming Model
The Challenge: Concurrency and Distribution
Today’s applications operate in distributed environments, handling large numbers of concurrent executions across multiple processes. In short, today’s applications are characterized by concurrency and distribution.
Concurrency and distribution present two challenges:
Concurrency
Concurrency introduces non-deterministic partial order, where the sequence of operations is unpredictable across components. In other words, we do not know what happens next.
Distribution
Distribution introduces non-deterministic partial failure, where the occurrence of failure is unpredictable across components. In other words, we do not know what fails next.
To mitigate the effects of concurrency, we need distributed coordination and to mitigate the effects of distribution we need distributed recovery:
Distributed Coordination
Distributed Coordination is the ability of a system to mitigate partial order by synchronizing between executions, potentially running on different processes.
Distributed Recovery
Distributed Recovery is the ability of a system to mitigate partial failure by resuming an execution in case of failure, potentially on a different process.
Concurrent programming models such as async await have simplified the challenges of concurrency within a single process. However, no comparable concurrent and distributed programming models have emerged.
Developers have to address the complexity manually, cobbling together snapshotting, checkpointing, polling, or retrying. This resulting code that is hard to create, hard to update, brittle, buggy, and obscures its very reason to exist: its business logic.
Example: The Countdown
Consider a simple use case, a countdown timer. We initiate a countdown with a phone number, a count, and a frequency. The app will send a text message to the provided phone number in the provided frequency, decreasing the count by one each time, until we reach zero.
To make the challenge more obvious, we execute the countdown on a Function-as-a-Service platform such as AWS Lambda. That introduces two problems:
Time Limit
A lambda function can run for a maximum of 15 minutes before being forcibly terminated.
Cost
Even if your countdown fits within the time limit, you are billed for the entire runtime, even though the function is mostly idle.
For example, a countdown starting at 10, with a frequency of 15 minutes will take 135 minutes, or 2 hours and 15 minutes to complete. Performing a back-of-the-envelope calculation, assuming sending a message takes 500ms, the execution is active for 5 seconds and asleep for 2 hours, 14 minutes and 55 seconds. First, we are exceeding the 15 min time limit and second we would be paying for 2 hours and 15 minutes instead of 5 seconds.
So, instead of simply suspending and subsequently resuming the execution, due to time limit and costs, we have to suspend the execution, terminate the process, subsequently initialize another process, and resume the execution-until today, manually.

Solution #1: Event Driven Architecture
One possible solution is an event-driven architecture: Instead of trying to maintain an active execution of countdown, we split the countdown into multiple (recursive) callbacks, each triggered after the delay:
// Bound to http://example.com/countdown
faas.handler('countdown', async (phone, count, delay) => {
// send the current countdown
send(phone, count);
// schedule another invocation for the future
if(count => 0) {
await cron.when({
post: 'http://example.com/countdown',
data: [phone, count - 1, delay]
}, delay);
}
});
This approach comes at a cost: We need to manage the state of the execution, that is, we need to manage the instruction pointer, the stack, and the heap, a consequence of event driven architectures known as stack ripping or callback hell-the code looks and feels fragmented across space and time.
The obfuscated control flow is not our only issue tho. Like most ad-hoc solutions, we face the most difficult problems in case of failure.
countdown
lacks any notion of principled failure handling semantics:
What happens if the invocation of
send
fails? Will the invocation be retried? Who is responsible for the retry? Will the message be sent again?What happens if the invocation of
cron
fails? Will the invocation be retried? Who is responsible for the retry? Will the next countdown be scheduled twice?What happens if cron itself fails? Will cron retry the http call? Will the next countdown be invoked twice?
Maybe basing a long running execution such as countdown on ephemeral http requests is not the best call. Maybe we should switch to durable queues?
To make matters worse, countdown lacks any notion of debugability and observability:
How do we learn that a countdown has failed mid execution?
How do we learn what countdowns are currently executing?
How do we diagnose a problem? Where do we look?
How do you cancel an ongoing countdown?
Maybe not having a manifestation of a long running execution except some job in cron or some message in a message queue is not the best call either. Maybe we should add a database?
The result? Dread!
We face an endless list of questions that we must answer one by one, analyzing the behavior of the individual components and how they interact, only to start from the very beginning after we make the slightest change.
The result? The code is neither easy to write nor easy to read and only indirectly, in a roundabout, obfuscated way, reflects our business logic. Reasoning about failure? Confusion. Adding new functionality? Frustration.
Even if we acknowledge that most of us can handle the complexity, I would argue none of us wants to. This is not a fun challenge, this is joyless toil-a dreadful developer experience.
Solution #2: Distributed Async Await
Could we write code as if we suspended and subsequently resumed?! On this or a different process, with or without failure?! Could we enjoy a delightful developer experience and achieve the illusion of continuity across space and time?!
async countdown (context, phone, count, delay) {
for (let i = count; i > 0; i--) {
// 1. Locally invoke and await a function send
await async send(phone, i);
// 2. Remotely invoke and await a timer
await async wait(delay);
}
}
Distributed Async Await is available as SDKs for different programming languages like typescript, javascript, or python-the example abstractly showcases the core primitives,
async
andawait
.
This code is easy to write, easy to read, and directly reflects our business logic. No cron jobs, no message queues, no databases.
Just a simple function.
The result? Delight!
Resonate’s Distributed Async Await is a programming model with language-integrated distributed coordination and distributed recovery, enabling developers to express complex logic with simple code.
Distributed Async Await is built on two abstractions, durable functions and durable promises:
Durable Functions
Functions that run to completion, even in the presence of interruptions.
Durable Promises
Promises that coordinate between executions on the same or different processes.
Distributed Async Await is a dead simple programming model that enables developers to build applications that are
simple to understand
simple to develop
simple to operate
The next blog posts will explore the mechanics of Distribute Async Await to gain an accurate and concise mental model
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away
Antoine de Saint-Exupéry