Transparent batching with the Resonate Python SDK
The Resonate Python SDK gives you a handy set of APIs to manage the practical complexities of batching operations.
The Resonate Python SDK is shaping up to provide developers with a delightful experience for building distributed applications.
This post highlights one of the SDK’s features that might make you jump for joy.
Let's talk about batching.
Problem space
To ensure we’re all on the same page, "batching" refers to the practice of grouping multiple operations into a single unit of work.
Why do this?
Because performing operations in batches can be much more efficient than processing them one by one.
When we talk about efficiency we could be talking about two different things:
Speed
Resource utilization
With speed, the benefit is clear - things happen faster and you are removing bottlenecks.
With resource utilization, the benefit will often come down to cost savings. In todays cloud environments, you are often charged for resources you use. If a resource is remote and accessed over a network, you may be charged every time you access it.
This level of efficiency might not be significant at smaller scales, but at larger scales, efficiency becomes crucial.
For example, processing 100,000 operations sequentially can be far more costly and time-consuming than batching those 100,000 operations into a single task. Of course, the target system must support batching.
So let's focus on a concrete example.
Consider an application that creates a row in a database for each new user.
If the application gets less than 1 new user per minute, or even every 30 seconds, then it can probably handle sequential inserts. In other words, the application can run and commit one SQL query per new user and there may not ever be a problem.
But what if the applications suddenly becomes really popular? What if, hypothetically, it received 1000 new users per second? 🚀 The database would save to durable storage on each query and the application probably wouldn't be able to keep up with all the demand. Each second there would be a longer and longer delay for users to get added to the database. We can imagine that, if these user creation requests were coming over a network, that many of them would timeout while waiting for their turn to get a row in the database.
In this very simplified example, we might now consider batching SQL queries so that more user rows are created per commit.
If we can batch thousands of queries into a single commit, then likely the application would be able to keep up with the demand.
In production
In production, ensure that inserts are idempotent to account for the possibility of retries.
In theory that sounds great. In practice, now you have to manage the complexity of coordinating otherwise concurrent executions to collect a batch.
Sounds like a trade off of on complexity vs developer experience right?
Not if you use Resonate. 😉
Resonate's solution
The Resonate Python SDK gives you a handy set of APIs to manage the practical complexities of batching operations.
This post is working with Resonate Python SDK v0.1.33
If we assume that you are willing to embrace Resonate's programming mode, then at a high level, Resonate just requires that you define a data structure and a handler.
Let's look at how you would implement the use case above with Resonate.
First, create a data structure that inherits what Resonate calls a Command interface. The data structure must include the data to be inserted into the database. The Command data structure stands in for a function execution invocation so that you still get a promise and await on result of the commit.
Then, create a handler that can process a batch of SQL queries. This should look similar to the code that batched the SQL queries above.
Next, register the data structure and the handler with the Resonate Scheduler.
Finally, create a function that can be invoked over and over again and passes the data to Resonate to manage. Register it with the Resonate Scheduler, and then call that function with Resonate's run()
method.
Coroutines in action
Resonate promotes the use of coroutines anytime there is a need to await on the result of another execution. You will see coroutines generically referred to as functions, but know that you are actually using coroutines whenever a value is yielded into the execution.
From top to bottom, taking into account database setup, a working application would look something like this:
The example above shows that batching happens transparently in the background. The SDK handles the coordination of otherwise concurrent executions on a platform level and you still get to write concurrent, non-coordinated code.
Configure batch size
If you want to ensure a maximum batch size, you need only supply that when registering the handler:
But is this actually more efficient?
Benchmark it
To demonstrate the efficiency we will do the following things.
First, we will adjust our application to support the option to do sequential writes.
Then we will update our application to expose a simple CLI for us to choose whether to process batch writes or sequential writes. We will also capture the start time and the end time of the operation.
Let's run this with 10,000 sequential user inserts.
First, we will see a log for each and every insert.
And we will see it has taken roughly 3.5 seconds to complete.
Now let's run the same number of values using batching.
We should notice that we see only about a dozen inserts logged.
And it took less than a second to complete.
In this example we can see that batching improves efficiency, both for resource utilization (less inserts to the database). And it becomes more and more impactful at higher volumes. Try it out 50,000 or 100,000 inserts to see for yourself using the batching-benchmark example in the examples-py repository.
Conclusion
The Resonate SDK provides a way for developers to batch operations transparently. The developer need only define a data structure and a handler function that processes the batch. Resonate will then automatically handle the batching and execution of the handler function.
The batching operations that Resonate provides is more efficient in speed and resource usage in comparison to non-batched operations, which could reduce costs.
Want to learn more about the Python SDK? Join the waitlist or RSVP for an upcoming webinar!