推测性多动作执行

UFO² 引入了一项名为推测性多动作执行的新功能。此功能允许代理将几个预测步骤捆绑到一个 LLM 调用中,然后实时验证这些步骤。与单独推断每个步骤相比,这种方法可以使查询次数减少高达 51%。代理将首先预测一批可能的动作,然后通过一次性验证它们与实时的 UIA 状态。我们在下图中说明了推测性多动作执行

Speculative Multi-Action Execution

配置

要激活推测性多动作执行,您需要在 config_dev.yaml 文件中将 ACTION_SEQUENCE 设置为 True

ACTION_SEQUENCE: True

参考资料

推测性多动作执行的实现位于 ufo/agents/processors/actions.py 文件中。以下类用于推测性多动作执行

agents/processors/actions.py 中的源代码
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    function: str = "",
    args: Dict[str, Any] = {},
    control_label: str = "",
    control_text: str = "",
    after_status: str = "",
    results: Optional[ActionExecutionLog] = None,
    configs=Config.get_instance().config_data,
):
    self._function = function
    self._args = args
    self._control_label = control_label
    self._control_text = control_text
    self._after_status = after_status
    self._results = ActionExecutionLog() if results is None else results
    self._configs = configs
    self._control_log = BaseControlLog()

after_status property

获取状态。

返回
  • 字符串

    状态。

args property

获取参数。

返回
  • Dict[str, Any]

    参数。

command_string property

生成函数调用字符串。

返回
  • 字符串

    函数调用字符串。

control_label property

获取控件标签。

返回
  • 字符串

    控件标签。

control_log property writable

获取控制日志。

返回
  • BaseControlLog

    控制日志。

control_text property

获取控制文本。

返回
  • 字符串

    控制文本。

function property

获取函数名称。

返回
  • 字符串

    函数。

results property writable

获取结果。

返回
  • ActionExecutionLog

    结果。

action_flow(puppeteer, control_dict, application_window)

执行动作流。

参数
  • puppeteer (AppPuppeteer) –

    控制应用程序的 puppeteer。

  • control_dict (Dict[str, UIAWrapper]) –

    控件字典。

  • application_window (UIAWrapper) –

    控件所在的应用程序窗口。

返回
  • Tuple[ActionExecutionLog, BaseControlLog]

    动作执行日志。

agents/processors/actions.py 中的源代码
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def action_flow(
    self,
    puppeteer: AppPuppeteer,
    control_dict: Dict[str, UIAWrapper],
    application_window: UIAWrapper,
) -> Tuple[ActionExecutionLog, BaseControlLog]:
    """
    Execute the action flow.
    :param puppeteer: The puppeteer that controls the application.
    :param control_dict: The control dictionary.
    :param application_window: The application window where the control is located.
    :return: The action execution log.
    """
    control_selected: UIAWrapper = control_dict.get(self.control_label, None)

    # If the control is selected, but not available, return an error.
    if control_selected is not None and not self._control_validation(
        control_selected
    ):
        self.results = ActionExecutionLog(
            status="error",
            traceback="Control is not available.",
            error="Control is not available.",
        )
        self._control_log = BaseControlLog()

        return self.results

    # Create the control receiver.
    puppeteer.receiver_manager.create_ui_control_receiver(
        control_selected, application_window
    )

    if self.function:

        if self._configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
            if control_selected:
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(self._configs.get("RECTANGLE_TIME", 0))

        self._control_log = self._get_control_log(
            control_selected=control_selected, application_window=application_window
        )

        try:
            return_value = self.execute(puppeteer=puppeteer)
            if not utils.is_json_serializable(return_value):
                return_value = ""

            self.results = ActionExecutionLog(
                status="success",
                return_value=return_value,
            )

        except Exception as e:

            import traceback

            self.results = ActionExecutionLog(
                status="error",
                traceback=traceback.format_exc(),
                error=str(e),
            )
        return self.results

count_repeat_times(previous_actions)

获取前一个动作中相同动作的次数。

参数
  • previous_actions (List[Dict[str, Any]]) –

    前一个动作。

返回
  • int

    前一个动作中相同动作的次数。

agents/processors/actions.py 中的源代码
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def count_repeat_times(self, previous_actions: List[Dict[str, Any]]) -> int:
    """
    Get the times of the same action in the previous actions.
    :param previous_actions: The previous actions.
    :return: The times of the same action in the previous actions.
    """

    count = 0
    for action in previous_actions[::-1]:
        if self.is_same_action(action):
            count += 1
        else:
            break
    return count

execute(puppeteer)

执行动作。

参数
  • puppeteer (AppPuppeteer) –

    控制应用程序的 puppeteer。

agents/processors/actions.py 中的源代码
234
235
236
237
238
239
def execute(self, puppeteer: AppPuppeteer) -> Any:
    """
    Execute the action.
    :param puppeteer: The puppeteer that controls the application.
    """
    return puppeteer.execute_command(self.function, self.args)

get_operation_point_list()

获取动作的操作点。

返回
  • List[Tuple[int]]

    动作的操作点。

agents/processors/actions.py 中的源代码
364
365
366
367
368
369
370
371
372
373
374
375
def get_operation_point_list(self) -> List[Tuple[int]]:
    """
    Get the operation points of the action.
    :return: The operation points of the action.
    """

    if "path" in self.args:
        return [(point["x"], point["y"]) for point in self.args["path"]]
    elif "x" in self.args and "y" in self.args:
        return [(self.args["x"], self.args["y"])]
    else:
        return []

is_same_action(action_to_compare)

检查两个动作是否相同。

参数
  • action_to_compare (Dict[str, Any]) –

    与当前动作进行比较的动作。

返回
  • 布尔值

    两个动作是否相同。

agents/processors/actions.py 中的源代码
159
160
161
162
163
164
165
166
167
168
169
170
def is_same_action(self, action_to_compare: Dict[str, Any]) -> bool:
    """
    Check whether the two actions are the same.
    :param action_to_compare: The action to compare with the current action.
    :return: Whether the two actions are the same.
    """

    return (
        self.function == action_to_compare.get("Function")
        and self.args == action_to_compare.get("Args")
        and self.control_text == action_to_compare.get("ControlText")
    )

print_result()

打印动作执行结果。

agents/processors/actions.py 中的源代码
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def print_result(self) -> None:
    """
    Print the action execution result.
    """

    utils.print_with_color(
        "Selected item🕹️: {control_text}, Label: {label}".format(
            control_text=self.control_text, label=self.control_label
        ),
        "yellow",
    )
    utils.print_with_color(
        "Action applied⚒️: {action}".format(action=self.command_string), "blue"
    )

    result_color = "red" if self.results.status != "success" else "green"

    utils.print_with_color(
        "Execution result📜: {result}".format(result=asdict(self.results)),
        result_color,
    )

to_dict(previous_actions)

将动作转换为字典。

参数
  • previous_actions (Optional[List[Dict[str, Any]]]) –

    前一个动作。

返回
  • Dict[str, Any]

    动作的字典。

agents/processors/actions.py 中的源代码
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def to_dict(
    self, previous_actions: Optional[List[Dict[str, Any]]]
) -> Dict[str, Any]:
    """
    Convert the action to a dictionary.
    :param previous_actions: The previous actions.
    :return: The dictionary of the action.
    """

    action_dict = {
        "Function": self.function,
        "Args": self.args,
        "ControlLabel": self.control_label,
        "ControlText": self.control_text,
        "Status": self.after_status,
        "Results": asdict(self.results),
    }

    # Add the repetitive times of the same action in the previous actions if the previous actions are provided.
    if previous_actions:
        action_dict["RepeatTimes"] = self.count_repeat_times(previous_actions)

    return action_dict

to_string(previous_actions)

将动作转换为字符串。

参数
  • previous_actions (Optional[List[OneStepAction]]) –

    前一个动作。

返回
  • 字符串

    动作的字符串。

agents/processors/actions.py 中的源代码
211
212
213
214
215
216
217
def to_string(self, previous_actions: Optional[List["OneStepAction"]]) -> str:
    """
    Convert the action to a string.
    :param previous_actions: The previous actions.
    :return: The string of the action.
    """
    return json.dumps(self.to_dict(previous_actions), ensure_ascii=False)