代理#
AutoGen AgentChat 提供了一组预设的代理,每个代理在如何响应消息方面都有所不同。所有代理都共享以下属性和方法
name
:代理的唯一名称。description
:代理的文本描述。run
:该方法运行代理,给定一个字符串形式的任务或消息列表,并返回一个TaskResult
。 **代理应该是状态化的,并且此方法应该用新的消息调用,而不是完整的历史记录**。run_stream
:与run()
相同,但返回一个消息迭代器,这些消息是BaseAgentEvent
或BaseChatMessage
的子类,随后是作为最后一项的TaskResult
。
有关 AgentChat 消息类型的更多信息,请参见 autogen_agentchat.messages
。
助手代理#
AssistantAgent
是一个内置代理,它使用语言模型并且具有使用工具的能力。
警告
AssistantAgent
是一个用于原型设计和教育目的的“大杂烩”代理——它非常通用。请务必阅读文档和实现,以了解设计选择。完全理解设计后,您可能需要实现自己的代理。参见 自定义代理。
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import StructuredMessage
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
# Define a tool that searches the web for information.
# For simplicity, we will use a mock function here that returns a static string.
async def web_search(query: str) -> str:
"""Find information on the web"""
return "AutoGen is a programming framework for building multi-agent applications."
# Create an agent that uses the OpenAI GPT-4o model.
model_client = OpenAIChatCompletionClient(
model="gpt-4.1-nano",
# api_key="YOUR_API_KEY",
)
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
)
获取结果#
我们可以使用 run()
方法来获取在给定任务上运行的代理。
# Use asyncio.run(agent.run(...)) when running in a script.
result = await agent.run(task="Find information on AutoGen")
print(result.messages)
[TextMessage(source='user', models_usage=None, metadata={}, content='Find information on AutoGen', type='TextMessage'), ToolCallRequestEvent(source='assistant', models_usage=RequestUsage(prompt_tokens=61, completion_tokens=16), metadata={}, content=[FunctionCall(id='call_703i17OLXfztkuioUbkESnea', arguments='{"query":"AutoGen"}', name='web_search')], type='ToolCallRequestEvent'), ToolCallExecutionEvent(source='assistant', models_usage=None, metadata={}, content=[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', name='web_search', call_id='call_703i17OLXfztkuioUbkESnea', is_error=False)], type='ToolCallExecutionEvent'), ToolCallSummaryMessage(source='assistant', models_usage=None, metadata={}, content='AutoGen is a programming framework for building multi-agent applications.', type='ToolCallSummaryMessage')]
对 run()
方法的调用返回一个 TaskResult
,其中 messages
属性中包含消息列表,该属性存储了代理的“思考过程”以及最终响应。
注意
与 v0.2 AgentChat 不同,工具由同一代理直接在对 run()
的同一次调用中执行。 默认情况下,代理将返回工具调用的结果作为最终响应。
多模态输入#
可以通过将输入作为 MultiModalMessage
提供,AssistantAgent
可以处理多模态输入。
from io import BytesIO
import PIL
import requests
from autogen_agentchat.messages import MultiModalMessage
from autogen_core import Image
# Create a multi-modal message with random image and text.
pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
img = Image(pil_image)
multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
img
# Use asyncio.run(...) when running in a script.
result = await agent.run(task=multi_modal_message)
print(result.messages[-1].content) # type: ignore
The image depicts a scenic mountain landscape under a clear blue sky. There are several rugged mountain peaks in the background, with some clouds scattered across the sky. In the valley below, there is a body of water, possibly a lake or river, surrounded by greenery. The overall scene conveys a sense of natural beauty and tranquility.
流式消息#
我们还可以使用 run_stream()
方法来流式传输代理生成的每条消息,并使用 Console
将消息打印到控制台。
async def assistant_run_stream() -> None:
# Option 1: read each message from the stream (as shown in the previous example).
# async for message in agent.run_stream(task="Find information on AutoGen"):
# print(message)
# Option 2: use Console to print all messages as they appear.
await Console(
agent.run_stream(task="Find information on AutoGen"),
output_stats=True, # Enable stats printing.
)
# Use asyncio.run(assistant_run_stream()) when running in a script.
await assistant_run_stream()
---------- TextMessage (user) ----------
Find information on AutoGen
---------- ToolCallRequestEvent (assistant) ----------
[FunctionCall(id='call_HOTRhOzXCBm0zSqZCFbHD7YP', arguments='{"query":"AutoGen"}', name='web_search')]
[Prompt tokens: 61, Completion tokens: 16]
---------- ToolCallExecutionEvent (assistant) ----------
[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', name='web_search', call_id='call_HOTRhOzXCBm0zSqZCFbHD7YP', is_error=False)]
---------- ToolCallSummaryMessage (assistant) ----------
AutoGen is a programming framework for building multi-agent applications.
---------- Summary ----------
Number of messages: 4
Finish reason: None
Total prompt tokens: 61
Total completion tokens: 16
Duration: 0.52 seconds
run_stream()
方法返回一个异步生成器,它产生代理生成的每条消息,随后是作为最后一项的 TaskResult
。
从消息中,您可以观察到助手代理利用 web_search
工具来收集信息并根据搜索结果做出响应。
使用工具和工作台#
大型语言模型 (LLM) 通常仅限于生成文本或代码响应。然而,许多复杂的任务受益于使用执行特定操作的外部工具的能力,例如从 API 或数据库获取数据。
为了解决这个限制,现代 LLM 现在可以接受可用工具模式的列表(工具及其参数的描述)并生成一个工具调用消息。此功能称为**工具调用**或**函数调用**,并且正在成为构建基于智能代理的应用程序中的一种流行模式。有关 LLM 中工具调用的更多信息,请参阅来自 OpenAI 和 Anthropic 的文档。
在 AgentChat 中,AssistantAgent
可以使用工具来执行特定操作。web_search
工具就是这样一种工具,它允许助理代理在网络上搜索信息。单个自定义工具可以是 Python 函数,也可以是 BaseTool
的子类。
另一方面,Workbench
是共享状态和资源的一组工具的集合。
默认情况下,当 AssistantAgent
执行工具时,它将以 ToolCallSummaryMessage
的形式返回工具的输出作为字符串。如果您的工具没有以自然语言返回格式良好的字符串,您可以添加一个反思步骤,让模型总结工具的输出,方法是在 AssistantAgent
构造函数中设置 reflect_on_tool_use=True
参数。
内置工具和工作台#
AutoGen 扩展提供了一组内置工具,可与助理代理一起使用。前往 API 文档 了解 autogen_ext.tools
命名空间下的所有可用工具。例如,您可以找到以下工具:
函数工具#
AssistantAgent
自动将 Python 函数转换为 FunctionTool
,该工具可以用作代理的工具,并自动从函数签名和文档字符串生成工具模式。
web_search_func
工具是函数工具的一个例子。模式是自动生成的。
from autogen_core.tools import FunctionTool
# Define a tool using a Python function.
async def web_search_func(query: str) -> str:
"""Find information on the web"""
return "AutoGen is a programming framework for building multi-agent applications."
# This step is automatically performed inside the AssistantAgent if the tool is a Python function.
web_search_function_tool = FunctionTool(web_search_func, description="Find information on the web")
# The schema is provided to the model during AssistantAgent's on_messages call.
web_search_function_tool.schema
{'name': 'web_search_func',
'description': 'Find information on the web',
'parameters': {'type': 'object',
'properties': {'query': {'description': 'query',
'title': 'Query',
'type': 'string'}},
'required': ['query'],
'additionalProperties': False},
'strict': False}
模型上下文协议 (MCP) 工作台#
AssistantAgent
还可以使用从模型上下文协议 (MCP) 服务器提供的工具,通过使用 McpWorkbench()
实现。
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import McpWorkbench, StdioServerParams
# Get the fetch tool from mcp-server-fetch.
fetch_mcp_server = StdioServerParams(command="uvx", args=["mcp-server-fetch"])
# Create an MCP workbench which provides a session to the mcp server.
async with McpWorkbench(fetch_mcp_server) as workbench: # type: ignore
# Create an agent that can use the fetch tool.
model_client = OpenAIChatCompletionClient(model="gpt-4.1-nano")
fetch_agent = AssistantAgent(
name="fetcher", model_client=model_client, workbench=workbench, reflect_on_tool_use=True
)
# Let the agent fetch the content of a URL and summarize it.
result = await fetch_agent.run(task="Summarize the content of https://en.wikipedia.org/wiki/Seattle")
assert isinstance(result.messages[-1], TextMessage)
print(result.messages[-1].content)
# Close the connection to the model client.
await model_client.close()
Seattle is a major city located in the state of Washington, United States. It was founded on November 13, 1851, and incorporated as a town on January 14, 1865, and later as a city on December 2, 1869. The city is named after Chief Seattle. It covers an area of approximately 142 square miles, with a population of around 737,000 as of the 2020 Census, and an estimated 755,078 residents in 2023. Seattle is known by nicknames such as The Emerald City, Jet City, and Rain City, and has mottos including The City of Flowers and The City of Goodwill. The city operates under a mayor–council government system, with Bruce Harrell serving as mayor. Key landmarks include the Space Needle, Pike Place Market, Amazon Spheres, and the Seattle Great Wheel. It is situated on the U.S. West Coast, with a diverse urban and metropolitan area that extends to a population of over 4 million in the greater metropolitan region.
并行工具调用#
某些模型支持并行工具调用,这对于需要同时调用多个工具的任务非常有用。默认情况下,如果模型客户端生成多个工具调用,AssistantAgent
将并行调用这些工具。
当工具具有可能相互干扰的副作用时,或者当代理行为需要在不同模型之间保持一致时,您可能需要禁用并行工具调用。这应该在模型客户端级别完成。
对于 OpenAIChatCompletionClient
和 AzureOpenAIChatCompletionClient
,设置 parallel_tool_calls=False
以禁用并行工具调用。
model_client_no_parallel_tool_call = OpenAIChatCompletionClient(
model="gpt-4o",
parallel_tool_calls=False, # type: ignore
)
agent_no_parallel_tool_call = AssistantAgent(
name="assistant",
model_client=model_client_no_parallel_tool_call,
tools=[web_search],
system_message="Use tools to solve tasks.",
)
在循环中运行代理#
AssistantAgent
一次执行一个步骤:一次模型调用,然后是一次工具调用(或并行工具调用),然后是可选的反思。
要在循环中运行它,例如,运行它直到它停止生成工具调用,请参阅 单代理团队。
结构化输出#
结构化输出允许模型返回具有应用程序提供的预定义模式的结构化 JSON 文本。与 JSON 模式不同,该模式可以作为 Pydantic BaseModel 类提供,该类也可用于验证输出。
一旦你在 AssistantAgent
构造函数的 output_content_type
参数中指定了基本模型类,代理将响应一个 StructuredMessage
,其 content
的类型是基本模型类的类型。
通过这种方式,您可以将代理的响应直接集成到您的应用程序中,并将模型的输出用作结构化对象。
注意
当设置了 output_content_type
时,默认情况下它要求代理反思工具的使用,并根据工具调用结果返回一个结构化输出消息。您可以通过显式设置 reflect_on_tool_use=False
来禁用此行为。
结构化输出对于在代理的响应中加入思维链推理也很有用。请参阅下面的示例,了解如何将结构化输出与助理代理一起使用。
from typing import Literal
from pydantic import BaseModel
# The response format for the agent as a Pydantic base model.
class AgentResponse(BaseModel):
thoughts: str
response: Literal["happy", "sad", "neutral"]
# Create an agent that uses the OpenAI GPT-4o model.
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(
"assistant",
model_client=model_client,
system_message="Categorize the input as happy, sad, or neutral following the JSON format.",
# Define the output content type of the agent.
output_content_type=AgentResponse,
)
result = await Console(agent.run_stream(task="I am happy."))
# Check the last message in the result, validate its type, and print the thoughts and response.
assert isinstance(result.messages[-1], StructuredMessage)
assert isinstance(result.messages[-1].content, AgentResponse)
print("Thought: ", result.messages[-1].content.thoughts)
print("Response: ", result.messages[-1].content.response)
await model_client.close()
---------- user ----------
I am happy.
---------- assistant ----------
{
"thoughts": "The user explicitly states they are happy.",
"response": "happy"
}
Thought: The user explicitly states they are happy.
Response: happy
流式传输 Token#
您可以通过设置 model_client_stream=True
来流式传输模型客户端生成的 Token。这将导致代理在 run_stream()
中生成 ModelClientStreamingChunkEvent
消息。
底层模型 API 必须支持流式传输 Token 才能使其工作。请与您的模型提供商联系以查看是否支持此功能。
model_client = OpenAIChatCompletionClient(model="gpt-4o")
streaming_assistant = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are a helpful assistant.",
model_client_stream=True, # Enable streaming tokens.
)
# Use an async function and asyncio.run() in a script.
async for message in streaming_assistant.run_stream(task="Name two cities in South America"): # type: ignore
print(message)
source='user' models_usage=None metadata={} content='Name two cities in South America' type='TextMessage'
source='assistant' models_usage=None metadata={} content='Two' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' cities' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' South' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' America' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' are' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' Buenos' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' Aires' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' Argentina' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' and' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' São' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' Paulo' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content=' Brazil' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None metadata={} content='.' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0) metadata={} content='Two cities in South America are Buenos Aires in Argentina and São Paulo in Brazil.' type='TextMessage'
messages=[TextMessage(source='user', models_usage=None, metadata={}, content='Name two cities in South America', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), metadata={}, content='Two cities in South America are Buenos Aires in Argentina and São Paulo in Brazil.', type='TextMessage')] stop_reason=None
您可以在上面的输出中看到流式传输的块。这些块由模型客户端生成,并由代理在接收到时生成。最终响应(所有块的串联)在最后一个块之后立即生成。
使用模型上下文#
AssistantAgent
具有一个 model_context
参数,可用于传入 ChatCompletionContext
对象。这允许代理使用不同的模型上下文,例如 BufferedChatCompletionContext
来限制发送到模型的上下文。
默认情况下,AssistantAgent
使用 UnboundedChatCompletionContext
,它会将完整的对话历史发送到模型。要将上下文限制为最近的 n
条消息,您可以使用 BufferedChatCompletionContext
。要按 token 数量限制上下文,您可以使用 TokenLimitedChatCompletionContext
。
from autogen_core.model_context import BufferedChatCompletionContext
# Create an agent that uses only the last 5 messages in the context to generate responses.
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
model_context=BufferedChatCompletionContext(buffer_size=5), # Only use the last 5 messages in the context.
)
其他预设 Agent#
以下预设 Agent 可用
UserProxyAgent
: 一个接受用户输入并将其作为响应返回的 agent。CodeExecutorAgent
: 一个可以执行代码的 agent。OpenAIAssistantAgent
: 一个由 OpenAI Assistant 支持的 agent,能够使用自定义工具。MultimodalWebSurfer
: 一个多模态 agent,可以搜索网络并访问网页以获取信息。FileSurfer
: 一个可以搜索和浏览本地文件以获取信息的 agent。VideoSurfer
: 一个可以观看视频以获取信息的 agent。
下一步#
在探索了 AssistantAgent
的用法后,我们现在可以继续下一节,了解 AgentChat 中的团队功能。