HostAgent 🤖

HostAgent 承担三项主要职责

任务分解。给定用户的自然语言输入，HostAgent 识别底层任务目标并将其分解为按依赖关系排序的子任务图。
应用程序生命周期管理。对于每个子任务，HostAgent 检查系统进程元数据（通过 UIA API）以确定目标应用程序是否正在运行。如果未运行，它会启动程序并将其注册到运行时。
AppAgent 实例化。 HostAgent 为每个活动应用程序生成相应的 AppAgent，为其提供任务上下文、内存引用和相关工具链（例如，API、文档）。
任务调度和控制。全局执行计划被序列化为有限状态机 (FSM)，允许 HostAgent 强制执行顺序、检测故障并解决代理之间的依赖关系。
共享状态通信。 HostAgent 读取和写入全局黑板，实现代理间通信和系统级可观察性，以便进行调试和重放。

下图说明了 HostAgent 架构及其与其他组件的交互

HostAgent 激活其 Processor 以处理用户请求并将其分解为子任务。然后将每个子任务分配给一个 AppAgent 执行。HostAgent 监控 AppAgent 的进度并确保用户请求的成功完成。

HostAgent 输入

HostAgent 接收以下输入

输入	描述	类型
用户请求	用户的自然语言请求。	字符串
应用程序信息	有关现有活动应用程序的信息。	字符串列表
桌面截图	桌面截图，为 `HostAgent` 提供上下文。	图片 (Image)
上一个子任务	上一个子任务及其完成状态。	字符串列表
上一个计划	以下子任务的上一个计划。	字符串列表
黑板	用于存储和共享代理之间信息的共享内存空间。	字典

通过处理这些输入，HostAgent 确定适当的应用程序来满足用户的请求，并协调 AppAgent 执行必要的动作。

HostAgent 输出

根据提供的输入，HostAgent 生成以下输出

输出	描述	类型
观察	当前桌面截图的观察。	字符串
思考	`HostAgent` 的逻辑推理过程。	字符串
当前子任务	将由 `AppAgent` 执行的当前子任务。	字符串
消息	将发送给 `AppAgent` 以完成子任务的消息。	字符串
ControlLabel	用于执行子任务的选定应用程序的索引。	字符串
ControlText	用于执行子任务的选定应用程序的名称。	字符串
计划	当前子任务之后以下子任务的计划。	字符串列表
状态	代理的状态，映射到 `AgentState`。	字符串
注释	提供给用户的附加评论或信息。	字符串
问题	要向用户询问额外信息的问题。	字符串列表
Bash	将由 `HostAgent` 执行的 bash 命令。它可用于打开应用程序或执行系统命令。	字符串

下面是 HostAgent 输出的一个示例

{
    "Observation": "Desktop screenshot",
    "Thought": "Logical reasoning process",
    "Current Sub-Task": "Sub-task description",
    "Message": "Message to AppAgent",
    "ControlLabel": "Application index",
    "ControlText": "Application name",
    "Plan": ["Sub-task 1", "Sub-task 2"],
    "Status": "AgentState",
    "Comment": "Additional comments",
    "Questions": ["Question 1", "Question 2"],
    "Bash": "Bash command"
}

信息

HostAgent 输出由 LLM 格式化为 JSON 对象，并可以通过 Python 中的 json.loads 方法进行解析。

HostAgent 状态

HostAgent 经历不同的状态，如 ufo/agents/states/host_agent_states.py 模块中定义。这些状态包括

状态	描述
`继续`	用于行动规划和执行的默认状态。
`待处理`	用于安全关键行动（例如，破坏性操作）；需要用户确认。
`完成`	任务完成；执行结束。
`失败`	检测到不可恢复的故障（例如，应用程序崩溃、权限错误）。

HostAgent 的状态机图如下所示

HostAgent 根据用户的请求、应用程序信息以及 AppAgent 执行子任务的进度在这些状态之间转换。

任务分解

收到用户请求后，HostAgent 将其分解为子任务并将每个子任务分配给一个 AppAgent 执行。HostAgent 根据应用程序信息和用户请求确定满足用户请求的适当应用程序。然后，它协调 AppAgent 执行必要的动作以完成子任务。我们在下图中展示了任务分解过程

创建和注册 AppAgent

当 HostAgent 确定需要一个新的 AppAgent 来完成子任务时，它会创建一个 AppAgent 实例并通过调用 create_subagent 方法将其注册到 HostAgent

def create_subagent(
        self,
        agent_type: str,
        agent_name: str,
        process_name: str,
        app_root_name: str,
        is_visual: bool,
        main_prompt: str,
        example_prompt: str,
        api_prompt: str,
        *args,
        **kwargs,
    ) -> BasicAgent:
        """
        Create an SubAgent hosted by the HostAgent.
        :param agent_type: The type of the agent to create.
        :param agent_name: The name of the SubAgent.
        :param process_name: The process name of the app.
        :param app_root_name: The root name of the app.
        :param is_visual: The flag indicating whether the agent is visual or not.
        :param main_prompt: The main prompt file path.
        :param example_prompt: The example prompt file path.
        :param api_prompt: The API prompt file path.
        :return: The created SubAgent.
        """
        app_agent = self.agent_factory.create_agent(
            agent_type,
            agent_name,
            process_name,
            app_root_name,
            is_visual,
            main_prompt,
            example_prompt,
            api_prompt,
            *args,
            **kwargs,
        )
        self.appagent_dict[agent_name] = app_agent
        app_agent.host = self
        self._active_appagent = app_agent

        return app_agent

然后，HostAgent 将子任务分配给 AppAgent 执行并监控其进度。

参考

基类：BasicAgent

HostAgent 类是 AppAgent 的管理器。

初始化 HostAgent。:name: 代理的名称。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示文件路径。 `example_prompt` (`str`) – 示例提示文件路径。 `api_prompt` (`str`) – API 提示文件路径。

源代码在 agents/agent/host_agent.py

def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(self.default_state)
    self.Puppeteer = self.create_puppeteer_interface()

`blackboard` `属性`

获取黑板。

`default_state` `property`

获取默认状态。

`status_manager` `property`

获取状态管理器。

`sub_agent_amount` `属性`

获取子代理的数量。

返回	`int` – 子代理的数量。

`create_app_agent(application_window_name, application_root_name, request, mode)`

为 HostAgent 创建应用程序代理。

参数	`application_window_name` (`str`) – 应用程序窗口的名称。 `application_root_name` (`str`) – 应用程序根的名称。 `request` (`str`) – 用户请求。 `mode` (`str`) – 会话模式。

返回	`AppAgent` – 应用程序代理。

源代码在 agents/agent/host_agent.py

def create_app_agent(
    self,
    application_window_name: str,
    application_root_name: str,
    request: str,
    mode: str,
) -> AppAgent:
    """
    Create the app agent for the host agent.
    :param application_window_name: The name of the application window.
    :param application_root_name: The name of the application root.
    :param request: The user request.
    :param mode: The mode of the session.
    :return: The app agent.
    """

    if configs.get("ACTION_SEQUENCE", False):
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT_AS"]
    else:
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT"]

    if mode in ["normal", "batch_normal", "follower"]:

        agent_name = (
            "AppAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal"
            else "BatchAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: AppAgent = self.create_subagent(
            agent_type="app",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
            main_prompt=configs["APPAGENT_PROMPT"],
            example_prompt=example_prompt,
            api_prompt=configs["API_PROMPT"],
            mode=mode,
        )

    elif mode in ["normal_operator", "batch_normal_operator"]:

        agent_name = (
            "OpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal_operator"
            else "BatchOpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: OpenAIOperatorAgent = self.create_subagent(
            "operator",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
        )

    else:
        raise ValueError(f"The {mode} mode is not supported.")

    # Create the COM receiver for the app agent.
    if configs.get("USE_APIS", False):
        app_agent.Puppeteer.receiver_manager.create_api_receiver(
            application_root_name, application_window_name
        )

    # Provision the context for the app agent, including the all retrievers.
    app_agent.context_provision(request)

    return app_agent

`create_puppeteer_interface()`

创建 Puppeteer 接口以自动化应用程序。

返回	`AppPuppeteer` – Puppeteer 接口。

源代码在 agents/agent/host_agent.py

def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer("", "")

`create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)`

创建由 HostAgent 托管的 SubAgent。

参数	`agent_type` (`str`) – 要创建的代理类型。 `agent_name` (`str`) – SubAgent 的名称。 `process_name` (`str`) – 应用程序的进程名称。 `app_root_name` (`str`) – 应用程序的根名称。

返回	`BasicAgent` – 创建的 SubAgent。

源代码在 agents/agent/host_agent.py

def create_subagent(
    self,
    agent_type: str,
    agent_name: str,
    process_name: str,
    app_root_name: str,
    *args,
    **kwargs,
) -> BasicAgent:
    """
    Create an SubAgent hosted by the HostAgent.
    :param agent_type: The type of the agent to create.
    :param agent_name: The name of the SubAgent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :return: The created SubAgent.
    """
    app_agent = self.agent_factory.create_agent(
        agent_type,
        agent_name,
        process_name,
        app_root_name,
        # is_visual,
        # main_prompt,
        # example_prompt,
        # api_prompt,
        *args,
        **kwargs,
    )
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    return app_agent

`get_active_appagent()`

获取活动应用程序代理。

返回	`AppAgent` – 活动应用程序代理。

源代码在 agents/agent/host_agent.py

def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

获取代理的提示。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示文件路径。 `example_prompt` (`str`) – 示例提示文件路径。 `api_prompt` (`str`) – API 提示文件路径。

返回	`HostAgentPrompter` – 提示器实例。

源代码在 agents/agent/host_agent.py

def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

构建消息。

参数	`image_list` (`List[str]`) – 屏幕截图图像列表。 `os_info` (`str`) – 操作系统信息。 `prev_subtask` (`List[Dict[str, str]]`) – 上一个子任务。 `plan` (`List[str]`) – 计划。 `request` (`str`) – 请求。

返回	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – 消息。

源代码在 agents/agent/host_agent.py

def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
    blackboard_prompt: List[Dict[str, str]],
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if blackboard_prompt:
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

`print_response(response_dict)`

打印响应。

参数	`response_dict` (`Dict`) – 要打印的响应字典。

源代码在 agents/agent/host_agent.py

def print_response(self, response_dict: Dict) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    """

    application = response_dict.get("ControlText")
    if not application:
        application = "[The required application needs to be opened.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    bash_command = response_dict.get("Bash", None)
    subtask = response_dict.get("CurrentSubtask")

    # Convert the message from a list to a string.
    message = list(response_dict.get("Message", ""))
    message = "\n".join(message)

    # Concatenate the subtask with the plan and convert the plan from a list to a string.
    plan = list(response_dict.get("Plan"))
    plan = [subtask] + plan
    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])

    status = response_dict.get("Status")
    comment = response_dict.get("Comment")

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if bash_command:
        utils.print_with_color(
            "Running Bash Command🔧: {bash}".format(bash=bash_command), "yellow"
        )
    utils.print_with_color(
        "Plans📚: {plan}".format(plan=plan),
        "cyan",
    )
    utils.print_with_color(
        "Next Selected application📲: {application}".format(
            application=application
        ),
        "yellow",
    )
    utils.print_with_color(
        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
    )
    utils.print_with_color("Status📊: {status}".format(status=status), "blue")

    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

`process(context)`

处理代理。

参数	`context` (`Context`) – 上下文。

源代码在 agents/agent/host_agent.py

def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    self.processor = HostAgentProcessor(agent=self, context=context)
    self.processor.process()

    # Sync the status with the processor.
    self.status = self.processor.status

`process_comfirmation()`

待办：处理确认。

源代码在 agents/agent/host_agent.py

def process_comfirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass

HostAgent 🤖

HostAgent 输入

HostAgent 输出

HostAgent 状态

任务分解

创建和注册 AppAgent

参考

blackboard 属性

default_state property

status_manager property

sub_agent_amount 属性

create_app_agent(application_window_name, application_root_name, request, mode)

create_puppeteer_interface()

create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)

get_active_appagent()

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)

print_response(response_dict)

process(context)

process_comfirmation()

`blackboard` `属性`

`default_state` `property`

`status_manager` `property`

`sub_agent_amount` `属性`

`create_app_agent(application_window_name, application_root_name, request, mode)`

`create_puppeteer_interface()`

`create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)`

`get_active_appagent()`

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

`print_response(response_dict)`

`process(context)`

`process_comfirmation()`