AppAgent 👾

AppAgent 负责在选定的应用程序上迭代执行操作，直到在特定应用程序中成功完成任务。AppAgent 由 HostAgent 创建，以在 Round 中完成子任务。AppAgent 负责在应用程序中执行必要的操作以满足用户的请求。AppAgent 具有以下功能：

ReAct 与应用程序 - AppAgent 以观察->思考->行动的工作流程与应用程序递归交互，利用视觉语言模型 (VLM) 的多模态能力来理解应用程序 UI 并满足用户的请求。
理解增强 - AppAgent 通过从异构来源（包括外部知识库和演示库）进行的检索增强生成 (RAG) 得到增强，使代理成为应用程序“专家”。
多功能技能集 - AppAgent 配备了多样化的技能集，以支持全面的自动化，例如鼠标、键盘、本机 API 和“Copilot”。

提示

您可以在强化 AppAgent 文档中找到如何使用外部知识库和演示库来增强 AppAgent。

我们在下图中展示了 AppAgent 的框架

AppAgent 输入

为了与应用程序交互，AppAgent 接收以下输入：

输入	描述	类型
用户请求	用户的自然语言请求。	字符串
子任务	由 `HostAgent` 分配给 `AppAgent` 执行的子任务描述。	字符串
当前应用程序	要交互的应用程序名称。	字符串
控制信息	应用程序中可用控件的索引、名称和控制类型。	字典列表
应用程序截图	应用程序截图，包括干净的截图、带有标记控件的注释截图，以及上一步中选定控件周围带有矩形的截图（可选）。	字符串列表
上一个子任务	上一个子任务及其完成状态。	字符串列表
上一个计划	后续步骤的上一个计划。	字符串列表
HostAgent 消息	来自 `HostAgent` 的消息，用于完成子任务。	字符串
检索到的信息	从外部知识库或演示库中检索到的信息。	字符串
黑板	用于在代理之间存储和共享信息的共享内存空间。	字典

下面是带有标记控件的注释应用程序截图示例。这遵循Set-of-Mark 范式。

通过处理这些输入，AppAgent 确定在应用程序中满足用户请求所需的行动。

提示

是否连接干净截图和注释截图可以在 config_dev.yaml 文件中的 CONCAT_SCREENSHOT 字段中配置。

提示

是否包含上一步中选定控件周围带有矩形的截图可以在 config_dev.yaml 文件中的 INCLUDE_LAST_SCREENSHOT 字段中配置。

AppAgent 输出

根据提供的输入，AppAgent 生成以下输出：

输出	描述	类型
观察	当前应用程序截图的观察结果。	字符串
思考	`AppAgent` 的逻辑推理过程。	字符串
ControlLabel	要交互的选定控件的索引。	字符串
ControlText	要交互的选定控件的名称。	字符串
功能	要在选定控件上执行的函数。	字符串
参数	函数执行所需的参数。	字符串列表
状态	代理的状态，映射到 `AgentState`。	字符串
计划	当前操作后的后续步骤计划。	字符串列表
注释	提供给用户的附加评论或信息。	字符串
保存截图	将应用程序截图保存到 `blackboard` 以备将来参考的标志。	布尔值

以下是 AppAgent 输出的示例

{
    "Observation": "Application screenshot",
    "Thought": "Logical reasoning process",
    "ControlLabel": "Control index",
    "ControlText": "Control name",
    "Function": "Function name",
    "Args": ["arg1", "arg2"],
    "Status": "AgentState",
    "Plan": ["Step 1", "Step 2"],
    "Comment": "Additional comments",
    "SaveScreenshot": true
}

信息

AppAgent 输出由 LLM 格式化为 JSON 对象，可以使用 Python 中的 json.loads 方法进行解析。

AppAgent 状态

AppAgent 状态由状态机管理，状态机根据当前状态确定要执行的下一个操作，如 ufo/agents/states/app_agent_states.py 模块中所定义。状态包括：

状态	描述
`继续`	主执行循环；评估哪些子任务已准备好启动或恢复。
`分配`	选择可用的应用程序进程并生成相应的 `AppAgent`。
`待定`	等待用户输入以解决歧义或收集额外的任务参数。
`完成`	所有子任务完成；清理代理实例并最终确定会话状态。
`失败`	在不可恢复的故障发生时进入恢复或中止模式。

AppAgent 的状态机图如下所示

AppAgent 通过这些状态来执行应用程序中必要的操作并完成 HostAgent 分配的子任务。

知识增强

AppAgent 通过从异构来源（包括外部知识库和演示库）进行的检索增强生成 (RAG) 得到增强。AppAgent 利用这些知识来增强其对应用程序的理解，并从演示中学习以提高其性能。

从帮助文档中学习

用户可以在 config.yaml 文件中向 AppAgent 提供帮助文档，以增强其对应用程序的理解并提高其性能。

提示

请在文档中查找详细配置。

提示

您也可以参考此处了解如何向 AppAgent 提供帮助文档。

在 AppAgent 中，它调用 build_offline_docs_retriever 来构建帮助文档检索器，并使用 retrived_documents_prompt_helper 为 AppAgent 构建提示。

从 Bing 搜索学习

由于帮助文档可能无法涵盖所有信息或信息可能已过时，AppAgent 还可以利用 Bing 搜索来检索最新信息。您可以在 config.yaml 文件中激活 Bing 搜索并配置搜索引擎。

提示

请在文档中查找详细配置。

提示

您也可以参考此处了解 AppAgent 中 Bing 搜索的实现。

在 AppAgent 中，它调用 build_online_search_retriever 来构建 Bing 搜索检索器，并使用 retrived_documents_prompt_helper 为 AppAgent 构建提示。

从自我演示中学习

您可以将 AppAgent 中成功的操作轨迹保存下来，以从自我演示中学习并提高其性能。在 session 完成后，AppAgent 将询问用户是否保存操作轨迹以供将来参考。您可以在 config.yaml 文件中配置自我演示的使用。

提示

您可以在文档中找到详细配置。

提示

您也可以参考此处了解 AppAgent 中自我演示的实现。

在 AppAgent 中，它调用 build_experience_retriever 来构建自我演示检索器，并使用 rag_experience_retrieve 为 AppAgent 检索演示。

从人类演示中学习

除了自我演示之外，您还可以通过使用 Windows 操作系统中内置的步骤记录器工具向 AppAgent 提供人类演示，以提高其性能。AppAgent 将从人类演示中学习，以提高其性能并实现更好的个性化。人类演示的使用可以在 config.yaml 文件中配置。

提示

您可以在文档中找到详细配置。

提示

您也可以参考此处了解 AppAgent 中人类演示的实现。

在 AppAgent 中，它调用 build_human_demonstration_retriever 来构建人类演示检索器，并使用 rag_experience_retrieve 为 AppAgent 检索演示。

自动化技能集

AppAgent 配备了多功能技能集，通过调用 create_puppeteer_interface 方法来支持应用程序中的全面自动化。这些技能包括：

技能	描述
UI 自动化	使用 `UI Automation` 和 `Win32` API 模拟用户与应用程序 UI 控件的交互。
本机 API	访问应用程序的本机 API 以执行特定功能和操作。
应用内代理	利用应用内代理与应用程序的内部功能和特性进行交互。

通过利用这些技能，AppAgent 可以高效地与应用程序交互并满足用户的请求。您可以在自动化器文档和 ufo/automator 模块中的代码中找到更多详细信息。

参考

基类：BasicAgent

管理与应用程序交互的 AppAgent 类。

初始化 AppAgent。:name: 代理的名称。

参数

process_name (str) –

应用程序的进程名称。
app_root_name (str) –

应用程序的根名称。
is_visual (bool) –

指示智能体是否可视的标志。
main_prompt (str) –

主提示文件路径。
example_prompt (str) –

示例提示文件路径。
api_prompt (str) –

API 提示文件路径。
skip_prompter (bool, 默认: False ) –

指示是否跳过提示器初始化的标志。
mode (str, 默认: 'normal' ) –

代理的模式。

源代码位于 agents/agent/app_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    skip_prompter: bool = False,
    mode: str = "normal",
) -> None:
    """
    Initialize the AppAgent.
    :name: The name of the agent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :param skip_prompter: The flag indicating whether to skip the prompter initialization.
    :param mode: The mode of the agent.
    """
    super().__init__(name=name)
    if not skip_prompter:
        self.prompter = self.get_prompter(
            is_visual, main_prompt, example_prompt, api_prompt, app_root_name
        )
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None

    self.Puppeteer = self.create_puppeteer_interface()
    self._mode = mode

    control_detection_backend = configs.get("CONTROL_BACKEND", ["uia"])

    if "omniparser" in control_detection_backend:
        omniparser_endpoint = configs.get("OMNIPARSER", {}).get("ENDPOINT", "")
        omniparser_service = OmniParser(endpoint=omniparser_endpoint)
        self.grounding_service: Optional[BasicGrounding] = OmniparserGrounding(
            service=omniparser_service
        )
    else:
        self.grounding_service: Optional[BasicGrounding] = None

    self.set_state(self.default_state)

`default_state` `property`

获取默认状态。

`mode` `property`

获取会话模式。

`status_manager` `property`

获取状态管理器。

`build_experience_retriever(db_path)`

构建经验检索器。

参数	`db_path` (`str`) – 经验数据库的路径。

返回	`None` – 经验检索器。

源代码位于 agents/agent/app_agent.py

def build_experience_retriever(self, db_path: str) -> None:
    """
    Build the experience retriever.
    :param db_path: The path to the experience database.
    :return: The experience retriever.
    """
    self.experience_retriever = self.retriever_factory.create_retriever(
        "experience", db_path
    )

`build_human_demonstration_retriever(db_path)`

构建人类演示检索器。

参数	`db_path` (`str`) – 人类演示数据库的路径。

返回	`None` – 人类演示检索器。

源代码位于 agents/agent/app_agent.py

def build_human_demonstration_retriever(self, db_path: str) -> None:
    """
    Build the human demonstration retriever.
    :param db_path: The path to the human demonstration database.
    :return: The human demonstration retriever.
    """
    self.human_demonstration_retriever = self.retriever_factory.create_retriever(
        "demonstration", db_path
    )

`build_offline_docs_retriever()`

构建离线文档检索器。

源代码位于 agents/agent/app_agent.py

def build_offline_docs_retriever(self) -> None:
    """
    Build the offline docs retriever.
    """
    self.offline_doc_retriever = self.retriever_factory.create_retriever(
        "offline", self._app_root_name
    )

`build_online_search_retriever(request, top_k)`

构建在线搜索检索器。

参数	`request` (`str`) – 在线 Bing 搜索的请求。 `top_k` (`int`) – 要检索的文档数量。

源代码位于 agents/agent/app_agent.py

def build_online_search_retriever(self, request: str, top_k: int) -> None:
    """
    Build the online search retriever.
    :param request: The request for online Bing search.
    :param top_k: The number of documents to retrieve.
    """
    self.online_doc_retriever = self.retriever_factory.create_retriever(
        "online", request, top_k
    )

`context_provision(request='')`

为应用代理提供上下文。

参数	`request` (`str`, 默认: `''` ) – 发送到 Bing 搜索检索器的请求。

源代码位于 agents/agent/app_agent.py

def context_provision(self, request: str = "") -> None:
    """
    Provision the context for the app agent.
    :param request: The request sent to the Bing search retriever.
    """

    # Load the offline document indexer for the app agent if available.
    if configs["RAG_OFFLINE_DOCS"]:
        utils.print_with_color(
            "Loading offline help document indexer for {app}...".format(
                app=self._process_name
            ),
            "magenta",
        )
        self.build_offline_docs_retriever()

    # Load the online search indexer for the app agent if available.

    if configs["RAG_ONLINE_SEARCH"] and request:
        utils.print_with_color("Creating a Bing search indexer...", "magenta")
        self.build_online_search_retriever(
            request, configs["RAG_ONLINE_SEARCH_TOPK"]
        )

    # Load the experience indexer for the app agent if available.
    if configs["RAG_EXPERIENCE"]:
        utils.print_with_color("Creating an experience indexer...", "magenta")
        experience_path = configs["EXPERIENCE_SAVED_PATH"]
        db_path = os.path.join(experience_path, "experience_db")
        self.build_experience_retriever(db_path)

    # Load the demonstration indexer for the app agent if available.
    if configs["RAG_DEMONSTRATION"]:
        utils.print_with_color("Creating an demonstration indexer...", "magenta")
        demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
        db_path = os.path.join(demonstration_path, "demonstration_db")
        self.build_human_demonstration_retriever(db_path)

`create_puppeteer_interface()`

创建 Puppeteer 接口以自动化应用程序。

返回	`AppPuppeteer` – Puppeteer 接口。

源代码位于 agents/agent/app_agent.py

def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer(self._process_name, self._app_root_name)

`demonstration_prompt_helper(request)`

使用演示检索器获取 AppAgent 的示例和提示。

参数	`request` – AppAgent 的请求。

返回	`Tuple[List[Dict[str, Any]]]` – AppAgent 的示例和提示。

源代码位于 agents/agent/app_agent.py

def demonstration_prompt_helper(self, request) -> Tuple[List[Dict[str, Any]]]:
    """
    Get the examples and tips for the AppAgent using the demonstration retriever.
    :param request: The request for the AppAgent.
    :return: The examples and tips for the AppAgent.
    """

    # Get the examples and tips for the AppAgent using the experience and demonstration retrievers.
    if configs["RAG_EXPERIENCE"]:
        experience_results = self.rag_experience_retrieve(
            request, configs["RAG_EXPERIENCE_RETRIEVED_TOPK"]
        )
    else:
        experience_results = []

    if configs["RAG_DEMONSTRATION"]:
        demonstration_results = self.rag_demonstration_retrieve(
            request, configs["RAG_DEMONSTRATION_RETRIEVED_TOPK"]
        )
    else:
        demonstration_results = []

    return experience_results, demonstration_results

`external_knowledge_prompt_helper(request, offline_top_k, online_top_k)`

检索外部知识并构建提示。

参数	`request` (`str`) – 请求。 `offline_top_k` (`int`) – 要检索的离线文档数量。 `online_top_k` (`int`) – 要检索的在线文档数量。

返回	`Tuple[str, str]` – 外部知识的提示消息。

源代码位于 agents/agent/app_agent.py

def external_knowledge_prompt_helper(
    self, request: str, offline_top_k: int, online_top_k: int
) -> Tuple[str, str]:
    """
    Retrieve the external knowledge and construct the prompt.
    :param request: The request.
    :param offline_top_k: The number of offline documents to retrieve.
    :param online_top_k: The number of online documents to retrieve.
    :return: The prompt message for the external_knowledge.
    """

    # Retrieve offline documents and construct the prompt
    if self.offline_doc_retriever:

        offline_docs = self.offline_doc_retriever.retrieve(
            request,
            offline_top_k,
            filter=None,
        )

        format_string = "[Similar Requests]: {question}\nStep: {answer}\n"

        offline_docs_prompt = self.prompter.retrived_documents_prompt_helper(
            "[Help Documents]",
            "",
            [
                format_string.format(
                    question=doc.metadata.get("title", ""),
                    answer=doc.metadata.get("text", ""),
                )
                for doc in offline_docs
            ],
        )
    else:
        offline_docs_prompt = ""

    # Retrieve online documents and construct the prompt
    if self.online_doc_retriever:
        online_search_docs = self.online_doc_retriever.retrieve(
            request, online_top_k, filter=None
        )
        online_docs_prompt = self.prompter.retrived_documents_prompt_helper(
            "Online Search Results",
            "Search Result",
            [doc.page_content for doc in online_search_docs],
        )
    else:
        online_docs_prompt = ""

    return offline_docs_prompt, online_docs_prompt

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)`

获取代理的提示。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示文件路径。 `example_prompt` (`str`) – 示例提示文件路径。 `api_prompt` (`str`) – API 提示文件路径。 `app_root_name` (`str`) – 应用程序的根名称。

返回	`AppAgentPrompter` – 提示器实例。

源代码位于 agents/agent/app_agent.py

def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    app_root_name: str,
) -> AppAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :param app_root_name: The root name of the app.
    :return: The prompter instance.
    """
    return AppAgentPrompter(
        is_visual, main_prompt, example_prompt, api_prompt, app_root_name
    )

`message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)`

为 AppAgent 构建提示消息。

参数

dynamic_examples (str) –

从自我演示和人工演示中检索到的动态示例。
dynamic_knowledge (str) –

从外部知识库中检索到的动态知识。
image_list (List) –

屏幕截图图像列表。
control_info (str) –

控制信息。
plan (List[str]) –

计划列表。
request (str) –

整体用户请求。
subtask (str) –

当前 AppAgent 要处理的子任务。
current_application (str) –

当前应用程序名称。
host_message (List[str]) –

来自 HostAgent 的消息。
blackboard_prompt (List[Dict[str, str]]) –

来自黑板的提示消息。
last_success_actions (List[Dict[str, Any]]) –

上一步中成功操作的列表。
include_last_screenshot (bool) –

指示是否包含最后一张截图的标志。

返回	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – 提示消息。

源代码位于 agents/agent/app_agent.py

def message_constructor(
    self,
    dynamic_examples: str,
    dynamic_knowledge: str,
    image_list: List,
    control_info: str,
    prev_subtask: List[Dict[str, str]],
    plan: List[str],
    request: str,
    subtask: str,
    current_application: str,
    host_message: List[str],
    blackboard_prompt: List[Dict[str, str]],
    last_success_actions: List[Dict[str, Any]],
    include_last_screenshot: bool,
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the prompt message for the AppAgent.
    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
    :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base.
    :param image_list: The list of screenshot images.
    :param control_info: The control information.
    :param plan: The plan list.
    :param request: The overall user request.
    :param subtask: The subtask for the current AppAgent to process.
    :param current_application: The current application name.
    :param host_message: The message from the HostAgent.
    :param blackboard_prompt: The prompt message from the blackboard.
    :param last_success_actions: The list of successful actions in the last step.
    :param include_last_screenshot: The flag indicating whether to include the last screenshot.
    :return: The prompt message.
    """
    appagent_prompt_system_message = self.prompter.system_prompt_construction(
        dynamic_examples
    )

    appagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=control_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
        subtask=subtask,
        current_application=current_application,
        host_message=host_message,
        retrieved_docs=dynamic_knowledge,
        last_success_actions=last_success_actions,
        include_last_screenshot=include_last_screenshot,
    )

    if blackboard_prompt:
        appagent_prompt_user_message = (
            blackboard_prompt + appagent_prompt_user_message
        )

    appagent_prompt_message = self.prompter.prompt_construction(
        appagent_prompt_system_message, appagent_prompt_user_message
    )

    return appagent_prompt_message

`print_response(response_dict, print_action=True)`

打印响应。

参数	`response_dict` (`Dict[str, Any]`) – 要打印的响应字典。 `print_action` (`bool`, 默认: `True` ) – 指示是否打印操作的标志。

源代码位于 agents/agent/app_agent.py

def print_response(
    self, response_dict: Dict[str, Any], print_action: bool = True
) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    :param print_action: The flag indicating whether to print the action.
    """

    control_text = response_dict.get("ControlText")
    control_label = response_dict.get("ControlLabel")
    if not control_text and not control_label:
        control_text = "[No control selected.]"
        control_label = "[No control label selected.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    plan = response_dict.get("Plan")
    status = response_dict.get("Status")
    comment = response_dict.get("Comment")
    function_call = response_dict.get("Function")
    args = utils.revise_line_breaks(response_dict.get("Args"))

    # Generate the function call string
    action = self.Puppeteer.get_command_string(function_call, args)

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if print_action:
        utils.print_with_color(
            "Selected item🕹️: {control_text}, Label: {label}".format(
                control_text=control_text, label=control_label
            ),
            "yellow",
        )
        utils.print_with_color(
            "Action applied⚒️: {action}".format(action=action), "blue"
        )
        utils.print_with_color("Status📊: {status}".format(status=status), "blue")
    utils.print_with_color(
        "Next Plan📚: {plan}".format(plan="\n".join(plan)), "cyan"
    )
    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

    screenshot_saving = response_dict.get("SaveScreenshot", {})

    if screenshot_saving.get("save", False):
        utils.print_with_color(
            "Notice: The current screenshot📸 is saved to the blackboard.",
            "yellow",
        )
        utils.print_with_color(
            "Saving reason: {reason}".format(
                reason=screenshot_saving.get("reason")
            ),
            "yellow",
        )

`process(context)`

处理代理。

参数	`context` (`Context`) – 上下文。

源代码位于 agents/agent/app_agent.py

def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    if configs.get("ACTION_SEQUENCE", False):
        self.processor = AppAgentActionSequenceProcessor(
            agent=self, context=context, ground_service=self.grounding_service
        )
    else:
        self.processor = AppAgentProcessor(
            agent=self, context=context, ground_service=self.grounding_service
        )
    self.processor.process()
    self.status = self.processor.status

`process_comfirmation()`

处理用户确认。

返回	`布尔值` – 决定。

源代码位于 agents/agent/app_agent.py

def process_comfirmation(self) -> bool:
    """
    Process the user confirmation.
    :return: The decision.
    """
    action = self.processor.actions
    control_text = self.processor.control_text

    decision = interactor.sensitive_step_asker(action, control_text)

    if not decision:
        utils.print_with_color("The user has canceled the action.", "red")

    return decision

`rag_demonstration_retrieve(request, demonstration_top_k)`

为用户请求检索演示示例。

参数	`request` (`str`) – 用户请求。 `demonstration_top_k` (`int`) – 要检索的文档数量。

返回	`字符串` – 检索到的示例和提示字符串。

源代码位于 agents/agent/app_agent.py

def rag_demonstration_retrieve(self, request: str, demonstration_top_k: int) -> str:
    """
    Retrieving demonstration examples for the user request.
    :param request: The user request.
    :param demonstration_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips string.
    """

    retrieved_docs = []

    # Retrieve demonstration examples.
    demonstration_docs = self.human_demonstration_retriever.retrieve(
        request, demonstration_top_k
    )

    if demonstration_docs:
        for doc in demonstration_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            subtask = doc.metadata.get("Sub-task", "")
            tips = doc.metadata.get("Tips", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )

        return retrieved_docs
    else:
        return []

`rag_experience_retrieve(request, experience_top_k)`

为用户请求检索经验示例。

参数	`request` (`str`) – 用户请求。 `experience_top_k` (`int`) – 要检索的文档数量。

返回	`List[Dict[str, Any]]` – 检索到的示例和提示字典。

源代码位于 agents/agent/app_agent.py

def rag_experience_retrieve(
    self, request: str, experience_top_k: int
) -> List[Dict[str, Any]]:
    """
    Retrieving experience examples for the user request.
    :param request: The user request.
    :param experience_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips dictionary.
    """

    retrieved_docs = []

    # Retrieve experience examples. Only retrieve the examples that are related to the current application.
    experience_docs = self.experience_retriever.retrieve(
        request,
        experience_top_k,
        filter=lambda x: self._app_root_name.lower()
        in [app.lower() for app in x["app_list"]],
    )

    if experience_docs:
        for doc in experience_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            tips = doc.metadata.get("Tips", "")
            subtask = doc.metadata.get("Sub-task", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )

    return retrieved_docs

AppAgent 👾

AppAgent 输入

AppAgent 输出

AppAgent 状态

知识增强

从帮助文档中学习

从 Bing 搜索学习

从自我演示中学习

从人类演示中学习

自动化技能集

参考

default_state property

mode property

status_manager property

build_experience_retriever(db_path)

build_human_demonstration_retriever(db_path)

build_offline_docs_retriever()

build_online_search_retriever(request, top_k)

context_provision(request='')

create_puppeteer_interface()

demonstration_prompt_helper(request)

external_knowledge_prompt_helper(request, offline_top_k, online_top_k)

get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)

message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)

print_response(response_dict, print_action=True)

process(context)

process_comfirmation()

rag_demonstration_retrieve(request, demonstration_top_k)

rag_experience_retrieve(request, experience_top_k)

`default_state` `property`

`mode` `property`

`status_manager` `property`

`build_experience_retriever(db_path)`

`build_human_demonstration_retriever(db_path)`

`build_offline_docs_retriever()`

`build_online_search_retriever(request, top_k)`

`context_provision(request='')`

`create_puppeteer_interface()`

`demonstration_prompt_helper(request)`

`external_knowledge_prompt_helper(request, offline_top_k, online_top_k)`

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)`

`message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)`

`print_response(response_dict, print_action=True)`

`process(context)`

`process_comfirmation()`

`rag_demonstration_retrieve(request, demonstration_top_k)`

`rag_experience_retrieve(request, experience_top_k)`