We hear a lot about ChatGPT, Bard, Mistral and the other LLMs. It's constant chatter, mostly with sweeping statements about how they're going to change the world. But when you get down into the details, it's typically about how they're going to improve customer service of all things, or content generation or some other relatively unimportant application.
People are noticing this and starting to be turned off by the hype, which I noticed on Twitter/X recently with the reactions to my reply to someone who was "blown away" by how efficient AI tools are making people:
It's understandable; the hype does seem overblown. But we're missing the most painfully obvious application of LLMs in tech.
LLMs for Operating Software Through Natural Language
All software consists of a (sometimes long) list of functions that can be called by the user through the UI (or through an API). The user has to learn how to call these functions and what the parameters are (e.g. "If I want to rename this, I need to have the new name ready..."). We've tried to standardize via design patterns, which is good, but we still keep our software exceedingly simple so that even the most novice user can understand it.
But what if we could just talk to the software? What if we could just say "rename this to that" and the software would do it? What if we could say "take all my tweets from last month and put them in a Google Sheet and send a link to the Google Sheet to my Slack" and it would just happen?
LLMs can do this.
Transforming Natural Language Into Code
We focus too much on how LLMs can translate natural language into more natural language (e.g. "rewrite my email to be more formal" or "summarize this article") and not enough on how they can translate natural language into code. Turns out they are very good at this, and will only get better!
Here's the roadmap for building an LLM system that can control lots of software through natural language:
- Reduce the software into a set of functions and parameters. This is probably the hardest part, but it's quite doable for most software that already have APIs. For example, you could reduce Twitter into a set of functions like "getTweets", "postTweet", "deleteTweet", etc. and then define the parameters for each function. This is a one-time cost.
- When users make requests, translate them into function calls. This is where the LLM comes in. Users request to do something, like "take all my tweets from last month and put them in a Google Sheet" and the LLM translates that into a series of function calls across different software systems. Given that the you already have the functions, parameters, and return values defined, this is a relatively simple task for the LLM. The limiting factor is the context window, which is why function names and parameters should be kept as simple as possible.
- Validate the function calls. LLMs make mistakes. Validate the proposed output by typechecking all function calls. Show the user what is about to be done for them and get confirmation, particularly for destructive actions.
- Execute the function calls. This is the easy part. Just call the functions.
Zapier, You Fools, You Should Have Built This Already
The time of siloed software is coming to an end. Tools like Zapier and IFTTT that require you to make yet another account, connect various APIs and set up automations are going to be a thing of the past. We will be able to just talk to our computers and have them do things across all the software we use, through natural language.
Zapier has the entire infrastructure in place to build this. They have the APIs, the users, the integrations, and the money. It's either coming, or they're sleeping on this entire concept. I suppose we'll see.
This is the most painfully obvious application of LLMs in tech, and nobody is talking about it.