Agent tools part 1 | The current status
Tool calling has evolved horizontally, addressing the needs of autonomy but not distribution.
“Evolution does not always optimize survival”
— someone smart
You can apply that quote to our species obsession with the advancement of AI in general. However, here I am applying it to the evolution of LLM tool calling capabilities.
Mad tool skills, bruh
LLMs are now trained to provide structured data responses when they “want” to make use of a tool.
For example, if I prompt a model that has been trained to make tool calls to gather weather data for Parkland County, Alberta (that’s where I live) for July 2023, the model will know to format its response to my question with something like this:
The chat client, or application that interacts with the LLM, is responsible for looking at the output, detecting the structured response, parsing it, and then invoking the “get_weather_data” function with the arguments.
The expectation is that the result of the tool function is then passed back to the LLM as part of the conversation. That is, the LLM, as part of its training, expects the next input to be a result from the tool call.
Remember, remember
Remember that an LLM by itself does not have memory. It relies on the entire conversation thread (also called context) to be provided on each and every interaction, with each user input, LLM response, or tool result appended to the context/conversation thread. In other words, the interaction is turn based, and on each turn the entire history is fed back into the LLM.
Note, the big players like Open AI and Anthropic have added RAG-like memory to their chat clients so that there is an appearance of memory across different context windows (they persist the conversation threads and search them for information, passing it to the LLM as part of other conversation threads). But fundamentally they are working around the memory constraint described above.
Sync, sync, baby
Currently interactions with tools are synchronous. And it is hard to imagine it will be anything other than that.
For the conversation to make sense to the LLM it needs to have the result of the tool after it invokes it.
If gathering weather data takes 8 hours, perhaps for a very large area over a longer time period, then that context thread needs to wait 8 hours for the tool result before the user can add additional input and continue interacting with the LLM within that thread.
As a workaround, Open AI has managed to train some of its models to invoke a tool multiple times from a single model response. That is, if I were to ask their model to “gather weather data for July 2022 and July 2023 for Parkland County, Alberta” it would know how to return a response like this:
I think you might call this the “fork / join technique”.
So, if data gathering for each location takes 8 hours, instead of taking 16 hours for back-to-back tool calls with a prompt between them, it would only take 8 hours before you could move forward with the thread.
But, again, the interaction is synchronous and you can’t move forward with that specific context/conversation thread until there is a result from the interaction with the tool calls. The LLM would throw an exception otherwise.
But what about MCP?
MCP (Model Context Protocol) standardizes the approach. That is, MCP creates a standard by which all LLMs can be trained to structure their output to connect to and invoke tools that exist outside of a chat client or application. Which is cool.
But MCP doubles down on this synchronous behaviour. It has no built-in mechanisms for handling the distribution that it facilitates. It creates TCP connections that need to be maintained throughout the entire tool interaction. And if the tool errors, the MCP Server crashes, or the connection between the chat client / application and the tool fails, there is no built-in means by which to mitigate that failure or those errors.
Even though MCP standardizes a tool calling convention, it ignores all the other issues that arise from building a distributed system and as the developer you are now burdened with figuring out:
Supervision, such as timeouts, to detect issues
Retry logic for application level errors
Deduplication and/or idempotency guarantees
Recovery for process crashes
Congrats, you are now engineering a reliable distributed system.
Strongest of hype cycle
This situation continues to exacerbate a problem where application developers, and in this case autonomous application developers, are forced to be distributed systems engineers.
Basically, tool calling has evolved sub-optimally, solely addressing the needs of autonomy and not distribution.
So, while AI speeds things up in theory, there will still be a delay in pushing reliable autonomous applications to production, because the developers still need to wrestle with the same issues they have been wrestling with since an applications started spanning more than one process almost three decades ago.
Unless…
Unless Resonate has something up its sleeve that could take all that pain away?
Stay tuned to find out!