Inside AutoGen: Microsoft Research New Autonomous Agent Framework
A new open source framework that streamlines reasoning and communication with agents.
I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
Autonomous agents are rapidly becoming one of the hottest trends in generative AI. Still far from being a solve problem or a mainstream trend, autonomous agents is universally acknowledged as one of the new frontiers in the foundation model landscape. Frameworks and research in this space are popping up everywhere. One of the most interesting work recently published, came from Microsoft Research with a project called AutoGen.
In essence AutoGen is a platform that simplifies the creation of conversable agents capable of solving tasks through inter-agent conversations. With AutoGen, developers can easily construct various forms and patterns of multi-agent conversations involving Language Models (LLMs), humans, and tools.
In a straightforward manner, AutoGen facilitates the building of complex multi-agent conversation systems, requiring two key steps:
I. Defining Conversable Agents: Developers begin by defining a set of conversable agents, each endowed with specialized capabilities and roles. These agents serve as the participants in the conversations.
II. Defining Interaction Behaviors: The next step involves defining how these conversable agents should interact with one another. This includes specifying how an agent should respond when receiving messages from another agent, thus determining the flow of the conversation.
Conversable Agents in AutoGen
One distinctive feature of agents in AutoGen is their conversability, which enables them to collectively solve tasks through inter-agent conversations. These conversable agents are entities with specific roles, capable of both sending and receiving messages to and from other agents to initiate or continue a conversation. They maintain their internal states based on the messages they send and receive and can be configured with various capabilities, such as language understanding, generation, and reasoning, making them versatile and adaptable.
Agent capabilities in AutoGen are powered by a combination of resources:
I. LLMs: AutoGen primarily leverages LLMs, positioning them as critical components in the backend of agents. Different agents can be supported by various LLM configurations, some of which may utilize LLMs tuned on private data. Furthermore, LLMs can take on different roles, each associated with distinct system messages.
II. Humans: Recognizing the importance of human feedback and involvement, AutoGen enables the integration of human users into agent conversations. This is accomplished by configuring a proxy agent, allowing humans to interact with other agents seamlessly. AutoGen offers flexibility in defining the extent of human involvement, including specifying the frequency and conditions for requesting human input, granting humans the option to skip providing input when necessary.
III. Tools: AutoGen acknowledges that tools are essential for overcoming limitations associated with LLMs. The platform natively supports the use of tools through code generation and execution. For instance, when using a default assistant agent from AutoGen, the system message can prompt the LLM to suggest Python code or shell scripts to solve problems. This capability is particularly useful in scenarios requiring information collection or multi-step problem-solving. Additionally, agents in AutoGen can execute LLM-suggested function calls, making use of pre-defined toolsets, and enhancing problem-solving capabilities.
By offering this straightforward approach to building conversable agents with diverse capabilities, AutoGen empowers developers to create advanced multi-agent conversation systems that can tackle a wide range of tasks effectively.
AutoGen offers a practical solution for tackling tasks through inter-agent conversations. In pursuit of next-generation applications, they recognize the need for a straightforward approach to managing complex workflows. To address this, they introduce the following features:
· Unified Conversation Interfaces: AutoGen equips its agents with unified conversation interfaces. These interfaces provide the means for agents to send and receive messages and generate replies based on received messages. This design places conversations at the center of workflow representation, allowing developers to define workflows as sequences of inter-agent message exchanges and programmed agent actions using the “generate reply” feature. Once the logic for message exchange and agent actions is set, the workflow is effectively defined.
· Automated Agent Chat with Auto-Reply: AutoGen aims to simplify the development of multi-agent conversations by reducing the burden on developers. They achieve this by requiring developers to focus solely on defining the behavior of each agent. In practice, this means that once agents are configured appropriately, developers can effortlessly trigger conversations among the agents. The conversations then proceed automatically, without the need for additional developer intervention in crafting a control plane. AutoGen introduces an agent auto-reply mechanism as a default feature to enable this automation. When an agent receives a message from another agent, it automatically invokes the “generate reply” function and sends the reply back to the sender, unless the reply is empty (for instance, when a termination condition is met).
By offering these user-friendly features, AutoGen streamlines the creation of multi-agent conversations, making it accessible for developers to orchestrate complex workflows efficiently.
AutoGen in Action
Microsoft Research has outlined various use cases to demonstrate the versatility of AutoGen:
1. Math Problem Solving: AutoGen proves its prowess in solving mathematical problems across three distinct scenarios.
2. Multi-Agent Coding: AutoGen’s capabilities extend to solving complex supply chain optimization problems by employing three interconnected agents.
3. Online Decision Making: AutoGen showcases its ability to tackle web interaction tasks within the MiniWob++ benchmark, harnessing the power of agents for online decision-making.
4. Retrieval-Augmented Chat: AutoGen introduces retrieval-augmented agents adept at solving challenges in code generation and question-answering.
5. Dynamic Group Chat: AutoGen’s adaptability shines through in the creation of dynamic group chats, illustrating its capacity to build versatile group communication systems.
6. Conversational Chess: Microsoft Research’s AutoGen brings the world of chess into the realm of conversational AI, allowing players to engage in an interactive and creative chess game through conversation.
These use cases highlight the wide-ranging applicability of AutoGen in solving diverse problems and scenarios, making it a valuable tool for developers across various domains. Let’s look at this diagram that illustrates AutoGen’s capabilities in the context of conversational chess.
The space of autonomous agents is moving extremely fast. AutoGen is one of the most architectures that comes out of that space. Definitely worth tracking in this space.