实例化

实例化过程包含三个关键步骤

根据指定的应用程序和指令，选择一个模板文件。
使用当前屏幕截图预填充任务。
筛选已建立的任务。

给定初始任务，数据流首先选择一个模板（阶段 1），然后根据 Word 环境预填充初始任务以获取任务-动作数据（阶段 2）。最后，它将筛选已建立的任务以评估任务-动作数据的质量。

1. 选择模板文件

您的应用程序的模板必须在 dataflow/templates/app 中定义和描述。例如，如果您想为 Word 应用程序实例化任务，请将相关的 .docx 文件放在 dataflow /templates/word 中，并附带一个 description.json 文件。将根据其描述与指令的匹配程度来选择合适的模板。

ChooseTemplateFlow 使用语义匹配，其中任务描述与模板描述通过嵌入和 FAISS 进行比较，以实现高效的最近邻搜索。如果语义匹配失败，则从可用文件中随机选择一个模板。

2. 预填充任务

预填充流程

PrefillFlow 类通过利用 PrefillAgent 进行任务规划和动作生成，协调任务计划和 UI 交互的完善。它自动化 UI 控件更新，捕获屏幕截图，并在执行期间管理消息和响应的日志。

预填充代理

PrefillAgent 类通过使用 PrefillPrompter 构建定制的提示消息，促进任务实例化和动作序列生成。它集成系统、用户和动态上下文，为下游工作流生成可操作的输入。

3. 筛选任务

筛选流程

FilterFlow 类旨在通过利用 FilterAgent 来处理和完善任务计划。FilterFlow 类充当任务实例化和过滤过程执行之间的桥梁，旨在根据预定义的过滤条件完善任务步骤并预填充任务相关文件。

筛选代理

FilterAgent 类是一个专门的代理，用于评估实例化任务是否正确。它继承自 BasicAgent 类，并包含多种方法和属性来处理其功能。

参考

选择模板流程

根据给定的任务上下文选择并复制最相关模板文件的类。

使用给定的任务上下文初始化流程。

参数	`app_name` (`str`) – 应用程序的名称。 `file_extension` (`str`) – 模板的文件扩展名。 `task_file_name` (`str`) – 任务文件的名称。

源代码位于 instantiation/workflow/choose_template_flow.py

def __init__(self, app_name: str, task_file_name: str, file_extension: str):
    """
    Initialize the flow with the given task context.
    :param app_name: The name of the application.
    :param file_extension: The file extension of the template.
    :param task_file_name: The name of the task file.
    """

    self._app_name = app_name
    self._file_extension = file_extension
    self._task_file_name = task_file_name
    self.execution_time = None
    self._embedding_model = self._load_embedding_model(
        model_name=_configs["CONTROL_FILTER_MODEL_SEMANTIC_NAME"]
    )

`execute()`

执行流程并返回复制的模板路径。

返回	`字符串` – 复制的模板文件的路径。

源代码位于 instantiation/workflow/choose_template_flow.py

def execute(self) -> str:
    """
    Execute the flow and return the copied template path.
    :return: The path to the copied template file.
    """

    start_time = time.time()
    try:
        template_copied_path = self._choose_template_and_copy()
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)
    return template_copied_path

预填充流程

基类：AppAgentProcessor

通过完善规划步骤和自动化 UI 交互来管理预填充过程的类

使用应用程序上下文初始化预填充流程。

参数	`app_name` (`str`) – 应用程序的名称。 `task_file_name` (`str`) – 用于日志记录和跟踪的任务文件名称。 `environment` (`WindowsAppEnv`) – 应用程序的环境。

源代码位于 instantiation/workflow/prefill_flow.py

def __init__(
    self,
    app_name: str,
    task_file_name: str,
    environment: WindowsAppEnv,
) -> None:
    """
    Initialize the prefill flow with the application context.
    :param app_name: The name of the application.
    :param task_file_name: The name of the task file for logging and tracking.
    :param environment: The environment of the app.
    """

    self.execution_time = None
    self._app_name = app_name
    self._task_file_name = task_file_name
    self._app_env = environment
    # Create or reuse a PrefillAgent for the app
    if self._app_name not in PrefillFlow._app_prefill_agent_dict:
        PrefillFlow._app_prefill_agent_dict[self._app_name] = PrefillAgent(
            "prefill",
            self._app_name,
            is_visual=True,
            main_prompt=_configs["PREFILL_PROMPT"],
            example_prompt=_configs["PREFILL_EXAMPLE_PROMPT"],
            api_prompt=_configs["API_PROMPT"],
        )
    self._prefill_agent = PrefillFlow._app_prefill_agent_dict[self._app_name]

    # Initialize execution step and UI control tools
    self._execute_step = 0
    self._control_inspector = ControlInspectorFacade(_BACKEND)
    self._photographer = PhotographerFacade()

    # Set default states
    self._status = ""

    # Initialize loggers for messages and responses
    self._log_path_configs = _configs["PREFILL_LOG_PATH"].format(
        task=self._task_file_name
    )
    os.makedirs(self._log_path_configs, exist_ok=True)

    # Set up loggers
    self._message_logger = BaseSession.initialize_logger(
        self._log_path_configs, "prefill_messages.json", "w", _configs
    )
    self._response_logger = BaseSession.initialize_logger(
        self._log_path_configs, "prefill_responses.json", "w", _configs
    )

`execute(template_copied_path, original_task, refined_steps)`

通过检索实例化结果开始执行。

参数	`template_copied_path` (`str`) – 要使用的已复制模板的路径。 `original_task` (`str`) – 要完善的原始任务。 `refined_steps` (`List[str]`) – 指导完善过程的步骤。

返回	`Dict[str, Any]` – 完善后的任务和相应的行动计划。

源代码位于 instantiation/workflow/prefill_flow.py

def execute(
    self, template_copied_path: str, original_task: str, refined_steps: List[str]
) -> Dict[str, Any]:
    """
    Start the execution by retrieving the instantiated result.
    :param template_copied_path: The path of the copied template to use.
    :param original_task: The original task to refine.
    :param refined_steps: The steps to guide the refinement process.
    :return: The refined task and corresponding action plans.
    """

    start_time = time.time()
    try:
        instantiated_request, instantiated_plan = self._instantiate_task(
            template_copied_path, original_task, refined_steps
        )
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)

    return {
        "instantiated_request": instantiated_request,
        "instantiated_plan": instantiated_plan,
    }

预填充代理

基类：BasicAgent

用于任务实例化和行动序列生成的代理。

初始化 PrefillAgent。

参数	`name` (`str`) – 智能体的名称。 `process_name` (`str`) – 进程的名称。 `is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示。 `example_prompt` (`str`) – 示例提示。 `api_prompt` (`str`) – API 提示。

源代码位于 instantiation/agent/prefill_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the PrefillAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self.prompter: PrefillPrompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self._process_name = process_name

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

获取代理的提示。这是 BasicAgent 中需要实现的抽象方法。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示。 `example_prompt` (`str`) – 示例提示。 `api_prompt` (`str`) – API 提示。

返回	`字符串` – 提示字符串。

源代码位于 instantiation/agent/prefill_agent.py

def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str) -> str:
    """
    Get the prompt for the agent.
    This is the abstract method from BasicAgent that needs to be implemented.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    :return: The prompt string.
    """

    return PrefillPrompter(is_visual, main_prompt, example_prompt, api_prompt)

`message_constructor(dynamic_examples, given_task, reference_steps, log_path)`

为 PrefillAgent 构建提示消息。

参数	`dynamic_examples` (`str`) – 从自我演示和人工演示中检索到的动态示例。 `given_task` (`str`) – 给定的任务。 `reference_steps` (`List[str]`) – 参考步骤。 `log_path` (`str`) – 日志的路径。

返回	`List[str]` – 提示消息。

源代码位于 instantiation/agent/prefill_agent.py

def message_constructor(
    self,
    dynamic_examples: str,
    given_task: str,
    reference_steps: List[str],
    log_path: str,
) -> List[str]:
    """
    Construct the prompt message for the PrefillAgent.
    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
    :param given_task: The given task.
    :param reference_steps: The reference steps.
    :param log_path: The path of the log.
    :return: The prompt message.
    """

    prefill_agent_prompt_system_message = self.prompter.system_prompt_construction(
        dynamic_examples
    )
    prefill_agent_prompt_user_message = self.prompter.user_content_construction(
        given_task, reference_steps, log_path
    )
    appagent_prompt_message = self.prompter.prompt_construction(
        prefill_agent_prompt_system_message,
        prefill_agent_prompt_user_message,
    )

    return appagent_prompt_message

`process_comfirmation()`

确认进程。这是 BasicAgent 中需要实现的抽象方法。

源代码位于 instantiation/agent/prefill_agent.py

def process_comfirmation(self) -> None:
    """
    Confirm the process.
    This is the abstract method from BasicAgent that needs to be implemented.
    """

    pass

筛选流程

根据过滤条件完善计划步骤并预填充文件的类。

初始化任务的过滤流程。

参数	`app_name` (`str`) – 正在处理的应用程序的名称。 `task_file_name` (`str`) – 正在处理的任务文件的名称。

源代码位于 instantiation/workflow/filter_flow.py

def __init__(self, app_name: str, task_file_name: str) -> None:
    """
    Initialize the filter flow for a task.
    :param app_name: Name of the application being processed.
    :param task_file_name: Name of the task file being processed.
    """

    self.execution_time = None
    self._app_name = app_name
    self._log_path_configs = _configs["FILTER_LOG_PATH"].format(task=task_file_name)
    self._filter_agent = self._get_or_create_filter_agent()
    self._initialize_logs()

`execute(instantiated_request)`

执行过滤流程：过滤任务并保存结果。

参数	`instantiated_request` (`str`) – 要过滤的请求对象。

返回	`Dict[str, Any]` – 包含任务质量标志、注释和任务类型的元组。

源代码位于 instantiation/workflow/filter_flow.py

def execute(self, instantiated_request: str) -> Dict[str, Any]:
    """
    Execute the filter flow: Filter the task and save the result.
    :param instantiated_request: Request object to be filtered.
    :return: Tuple containing task quality flag, comment, and task type.
    """

    start_time = time.time()
    try:
        judge, thought, request_type = self._get_filtered_result(
            instantiated_request
        )
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)
    return {
        "judge": judge,
        "thought": thought,
        "request_type": request_type,
    }

筛选代理

基类：BasicAgent

评估实例化任务是否正确的代理。

初始化 FilterAgent。

参数	`name` (`str`) – 智能体的名称。 `process_name` (`str`) – 进程的名称。 `is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示。 `example_prompt` (`str`) – 示例提示。 `api_prompt` (`str`) – API 提示。

源代码位于 instantiation/agent/filter_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the FilterAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self.prompter: FilterPrompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self._process_name = process_name

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

获取代理的提示。

参数	`is_visual` (`bool`) – 指示智能体是否可视的标志。 `main_prompt` (`str`) – 主提示。 `example_prompt` (`str`) – 示例提示。 `api_prompt` (`str`) – API 提示。

返回	`FilterPrompter` – 提示字符串。

源代码位于 instantiation/agent/filter_agent.py

def get_prompter(
    self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str
) -> FilterPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    :return: The prompt string.
    """

    return FilterPrompter(is_visual, main_prompt, example_prompt, api_prompt)

`message_constructor(request, app)`

为 FilterAgent 构建提示消息。

参数	`request` (`str`) – 请求语句。 `app` (`str`) – 操作的应用程序名称。

返回	`List[str]` – 提示消息。

源代码位于 instantiation/agent/filter_agent.py

def message_constructor(self, request: str, app: str) -> List[str]:
    """
    Construct the prompt message for the FilterAgent.
    :param request: The request sentence.
    :param app: The name of the operated app.
    :return: The prompt message.
    """

    filter_agent_prompt_system_message = self.prompter.system_prompt_construction(
        app=app
    )
    filter_agent_prompt_user_message = self.prompter.user_content_construction(
        request
    )
    filter_agent_prompt_message = self.prompter.prompt_construction(
        filter_agent_prompt_system_message, filter_agent_prompt_user_message
    )

    return filter_agent_prompt_message

`process_comfirmation()`

确认进程。这是 BasicAgent 中需要实现的抽象方法。

源代码位于 instantiation/agent/filter_agent.py

def process_comfirmation(self) -> None:
    """
    Confirm the process.
    This is the abstract method from BasicAgent that needs to be implemented.
    """

    pass