执行

实例化后的计划将由一个 execute 任务执行。在此阶段，给定任务-动作数据，执行过程将根据词语环境匹配真实的控制器，并逐步执行计划。执行完成后，evaluation 代理将评估整个执行过程的质量。

ExecuteFlow

ExecuteFlow 类旨在促进 Windows 应用程序环境中任务的执行和评估。它提供与应用程序 UI 交互、执行预定义任务、捕获屏幕截图以及评估执行结果的功能。该类还处理任务的日志记录和错误管理。

任务执行

ExecuteFlow 类中的任务执行遵循结构化序列，以确保任务性能的准确性和可追溯性。

初始化:
加载配置设置和日志路径。
查找与任务匹配的应用程序窗口。
检索或创建一个 ExecuteAgent 来执行任务。
计划执行:
遍历 instantiated_plan 中的每个步骤。
解析步骤以提取信息，例如子任务、控制文本和所需操作。
动作执行:
在应用程序窗口中查找与指定控制文本匹配的控件。
如果未找到匹配的控件，则抛出错误。
使用代理的 Puppeteer 框架执行指定的操作（例如，点击、输入文本）。
捕获应用程序窗口和选定控件的屏幕截图，用于日志记录和调试。
结果日志记录:
记录步骤执行的详细信息，包括控件信息、执行的动作和结果。
终结:
保存应用程序窗口的最终状态。
优雅地退出应用程序客户端。

评估

ExecuteFlow 类中的评估过程旨在根据预定义提示评估已执行任务的性能。

开始评估:
评估在任务执行后立即开始。
它使用在类构建期间初始化的 ExecuteEvalAgent。
执行评估:
ExecuteEvalAgent 使用输入提示（例如，主提示和 API 提示）和任务执行期间生成的日志组合来评估任务。
评估过程输出结果摘要（例如，质量标志、评论和任务类型）。
记录并输出结果:
在控制台中显示评估结果。
返回评估摘要以及已执行的计划，以便进行进一步分析或报告。

参考

ExecuteFlow

基类：AppAgentProcessor

用于执行任务并保存结果的 ExecuteFlow 类。

初始化任务的执行流。

参数	`task_file_name` (`str`) – 正在处理的任务文件的名称。 `context` (`Context`) – 当前会话的上下文对象。 `environment` (`WindowsAppEnv`) – 正在处理的应用程序的环境对象。

源代码位于 execution/workflow/execute_flow.py

def __init__(
    self, task_file_name: str, context: Context, environment: WindowsAppEnv
) -> None:
    """
    Initialize the execute flow for a task.
    :param task_file_name: Name of the task file being processed.
    :param context: Context object for the current session.
    :param environment: Environment object for the application being processed.
    """

    super().__init__(agent=ExecuteAgent, context=context)

    self.execution_time = None
    self.eval_time = None
    self._app_env = environment
    self._task_file_name = task_file_name
    self._app_name = self._app_env.app_name

    log_path = _configs["EXECUTE_LOG_PATH"].format(task=task_file_name)
    self._initialize_logs(log_path)

    self.application_window = self._app_env.find_matching_window(task_file_name)
    self.app_agent = self._get_or_create_execute_agent()
    self.eval_agent = self._get_or_create_evaluation_agent()

    self._matched_control = None  # Matched control for the current step.

`execute(request, instantiated_plan)`

执行执行流：执行任务并保存结果。

参数	`request` (`str`) – 要执行的原始请求。 `instantiated_plan` (`List[Dict[str, Any]]`) – 包含要执行的步骤的实例化计划。

返回	`Tuple[List[Dict[str, Any]], Dict[str, str]]` – 包含任务质量标志、评论和任务类型的元组。

源代码位于 execution/workflow/execute_flow.py

def execute(
    self, request: str, instantiated_plan: List[Dict[str, Any]]
) -> Tuple[List[Dict[str, Any]], Dict[str, str]]:
    """
    Execute the execute flow: Execute the task and save the result.
    :param request: Original request to be executed.
    :param instantiated_plan: Instantiated plan containing steps to execute.
    :return: Tuple containing task quality flag, comment, and task type.
    """

    start_time = time.time()
    try:
        executed_plan = self.execute_plan(instantiated_plan)
    except Exception as error:
        raise RuntimeError(f"Execution failed. {error}")
    finally:
        self.execution_time = round(time.time() - start_time, 3)

    start_time = time.time()
    try:
        result, _ = self.eval_agent.evaluate(
            request=request, log_path=self.log_path
        )
        utils.print_with_color(f"Result: {result}", "green")
    except Exception as error:
        raise RuntimeError(f"Evaluation failed. {error}")
    finally:
        self.eval_time = round(time.time() - start_time, 3)

    return executed_plan, result

`execute_action()`

执行动作。

源代码位于 execution/workflow/execute_flow.py

def execute_action(self) -> None:
    """
    Execute the action.
    """

    control_selected = None
    # Find the matching window and control.
    self.application_window = self._app_env.find_matching_window(
        self._task_file_name
    )
    if self.control_text == "":
        control_selected = self.application_window
    else:
        self._control_label, control_selected = (
            self._app_env.find_matching_controller(
                self.filtered_annotation_dict, self.control_text
            )
        )
        if control_selected:
            self._matched_control = control_selected.window_text()

    if not control_selected:
        # If the control is not found, raise an error.
        raise RuntimeError(f"Control with text '{self.control_text}' not found.")

    try:
        # Get the selected control item from the annotation dictionary and LLM response.
        # The LLM response is a number index corresponding to the key in the annotation dictionary.
        if control_selected:

            if _ufo_configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(_ufo_configs.get("RECTANGLE_TIME", 0))

            control_coordinates = PhotographerDecorator.coordinate_adjusted(
                self.application_window.rectangle(), control_selected.rectangle()
            )

            self._control_log = {
                "control_class": control_selected.element_info.class_name,
                "control_type": control_selected.element_info.control_type,
                "control_automation_id": control_selected.element_info.automation_id,
                "control_friendly_class_name": control_selected.friendly_class_name(),
                "control_coordinates": {
                    "left": control_coordinates[0],
                    "top": control_coordinates[1],
                    "right": control_coordinates[2],
                    "bottom": control_coordinates[3],
                },
            }

            self.app_agent.Puppeteer.receiver_manager.create_ui_control_receiver(
                control_selected, self.application_window
            )

            # Save the screenshot of the tagged selected control.
            self.capture_control_screenshot(control_selected)

            self._results = self.app_agent.Puppeteer.execute_command(
                self._operation, self._args
            )
            self.control_reannotate = None
            if not utils.is_json_serializable(self._results):
                self._results = ""

                return

    except Exception:
        self.general_error_handler()

`execute_plan(instantiated_plan)`

从执行代理获取执行结果。

参数	`instantiated_plan` (`List[Dict[str, Any]]`) – 包含要执行的步骤的计划。

返回	`List[Dict[str, Any]]` – 已执行步骤列表。

源代码位于 execution/workflow/execute_flow.py

def execute_plan(
    self, instantiated_plan: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
    """
    Get the executed result from the execute agent.
    :param instantiated_plan: Plan containing steps to execute.
    :return: List of executed steps.
    """

    # Initialize the step counter and capture the initial screenshot.
    self.session_step = 0
    try:
        time.sleep(1)
        # Initialize the API receiver
        self.app_agent.Puppeteer.receiver_manager.create_api_receiver(
            self.app_agent._app_root_name, self.app_agent._process_name
        )
        # Initialize the control receiver
        current_receiver = self.app_agent.Puppeteer.receiver_manager.receiver_list[
            -1
        ]

        if current_receiver is not None:
            self.application_window = self._app_env.find_matching_window(
                self._task_file_name
            )
            current_receiver.com_object = (
                current_receiver.get_object_from_process_name()
            )

        self.init_and_final_capture_screenshot()
    except Exception as error:
        raise RuntimeError(f"Execution initialization failed. {error}")

    # Initialize the success flag for each step.
    for index, step_plan in enumerate(instantiated_plan):
        instantiated_plan[index]["Success"] = None
        instantiated_plan[index]["MatchedControlText"] = None

    for index, step_plan in enumerate(instantiated_plan):
        try:
            self.session_step += 1

            # Check if the maximum steps have been exceeded.
            if self.session_step > _configs["MAX_STEPS"]:
                raise RuntimeError("Maximum steps exceeded.")

            self._parse_step_plan(step_plan)

            try:
                self.process()
                instantiated_plan[index]["Success"] = True
                instantiated_plan[index]["ControlLabel"] = self._control_label
                instantiated_plan[index][
                    "MatchedControlText"
                ] = self._matched_control
            except Exception as ControllerNotFoundError:
                instantiated_plan[index]["Success"] = False
                raise ControllerNotFoundError

        except Exception as error:
            err_info = RuntimeError(
                f"Step {self.session_step} execution failed. {error}"
            )
            raise err_info
    # capture the final screenshot
    self.session_step += 1
    time.sleep(1)
    self.init_and_final_capture_screenshot()
    # save the final state of the app

    win_com_receiver = None
    for receiver in reversed(
        self.app_agent.Puppeteer.receiver_manager.receiver_list
    ):
        if isinstance(receiver, WinCOMReceiverBasic):
            if receiver.client is not None:
                win_com_receiver = receiver
                break

    if win_com_receiver is not None:
        win_com_receiver.save()
        time.sleep(1)
        win_com_receiver.client.Quit()

    print("Execution complete.")

    return instantiated_plan

`general_error_handler()`

处理一般错误。

源代码位于 execution/workflow/execute_flow.py

def general_error_handler(self) -> None:
    """
    Handle general errors.
    """

    pass

`init_and_final_capture_screenshot()`

捕获屏幕截图。

源代码位于 execution/workflow/execute_flow.py

def init_and_final_capture_screenshot(self) -> None:
    """
    Capture the screenshot.
    """

    # Define the paths for the screenshots saved.
    screenshot_save_path = self.log_path + f"action_step{self.session_step}.png"

    self._memory_data.add_values_from_dict(
        {
            "CleanScreenshot": screenshot_save_path,
        }
    )

    self.photographer.capture_app_window_screenshot(
        self.application_window, save_path=screenshot_save_path
    )
    # Capture the control screenshot.
    control_selected = self._app_env.app_window
    self.capture_control_screenshot(control_selected)

`log_save()`

记录为 PrefillAgent 构建的提示消息。

源代码位于 execution/workflow/execute_flow.py

def log_save(self) -> None:
    """
    Log the constructed prompt message for the PrefillAgent.
    """

    step_memory = {
        "Step": self.session_step,
        "Subtask": self.subtask,
        "ControlLabel": self._control_label,
        "ControlText": self.control_text,
        "Action": self.action,
        "ActionType": self.app_agent.Puppeteer.get_command_types(self._operation),
        "Results": self._results,
        "Application": self.app_agent._app_root_name,
        "TimeCost": self.time_cost,
    }
    self._memory_data.add_values_from_dict(step_memory)
    self.log(self._memory_data.to_dict())

`print_step_info()`

打印步骤信息。

源代码位于 execution/workflow/execute_flow.py

def print_step_info(self) -> None:
    """
    Print the step information.
    """

    utils.print_with_color(
        "Step {step}: {subtask}".format(
            step=self.session_step,
            subtask=self.subtask,
        ),
        "magenta",
    )

`process()`

处理当前步骤。

源代码位于 execution/workflow/execute_flow.py

def process(self) -> None:
    """
    Process the current step.
    """

    step_start_time = time.time()
    self.print_step_info()
    self.capture_screenshot()
    self.execute_action()
    self.time_cost = round(time.time() - step_start_time, 3)
    self.log_save()

ExecuteAgent

基类：AppAgent

用于任务执行的代理。

初始化 ExecuteAgent。

参数	`name` (`str`) – 智能体的名称。 `process_name` (`str`) – 进程的名称。 `app_root_name` (`str`) – 应用程序根的名称。

源代码位于 execution/agent/execute_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
):
    """
    Initialize the ExecuteAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param app_root_name: The name of the app root.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.Puppeteer = self.create_puppeteer_interface()

ExecuteEvalAgent

基类：EvaluationAgent

用于任务执行评估的代理。

初始化 ExecuteEvalAgent。

参数	`name` (`str`) – 智能体的名称。 `app_root_name` (`str`) – 应用程序根的名称。 `is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示。 `example_prompt` (`str`) – 示例提示。 `api_prompt` (`str`) – API 提示。

源代码位于 execution/agent/execute_eval_agent.py

def __init__(
    self,
    name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the ExecuteEvalAgent.
    :param name: The name of the agent.
    :param app_root_name: The name of the app root.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    super().__init__(
        name=name,
        app_root_name=app_root_name,
        is_visual=is_visual,
        main_prompt=main_prompt,
        example_prompt=example_prompt,
        api_prompt=api_prompt,
    )

`get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None)`

获取代理的提示器。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `prompt_template` (`str`) – 提示模板。 `example_prompt_template` (`str`) – 示例提示模板。 `api_prompt_template` (`str`) – API 提示模板。 `root_name` (`Optional[str]`, 默认值: `None` ) – 根的名称。

返回	`ExecuteEvalAgentPrompter` – 提示器。

源代码位于 execution/agent/execute_eval_agent.py

def get_prompter(
    self,
    is_visual: bool,
    prompt_template: str,
    example_prompt_template: str,
    api_prompt_template: str,
    root_name: Optional[str] = None,
) -> ExecuteEvalAgentPrompter:
    """
    Get the prompter for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param prompt_template: The prompt template.
    :param example_prompt_template: The example prompt template.
    :param api_prompt_template: The API prompt template.
    :param root_name: The name of the root.
    :return: The prompter.
    """

    return ExecuteEvalAgentPrompter(
        is_visual=is_visual,
        prompt_template=prompt_template,
        example_prompt_template=example_prompt_template,
        api_prompt_template=api_prompt_template,
        root_name=root_name,
    )

« 上一步下一步 »