会话

一个 Session 是用户和 UFO 之间的一次对话实例。它是一个持续的交互过程,从用户发起请求开始,到请求完成结束。UFO 支持在同一个会话中进行多个请求。每个请求由一次 Round 交互按顺序处理,直到用户的请求得到满足。我们在下图中展示 SessionRound 之间的关系

Session and Round Image

会话生命周期

一个 Session 的生命周期如下

1. 会话初始化

当用户开始与 UFO 对话时,Session 被初始化。Session 对象被创建,并启动第一次 Round 交互。在此阶段,用户的请求由 HostAgent 处理,以确定执行请求的适当应用程序。Context 对象被创建,用于存储会话中所有 Rounds 共享的对话状态。

2. 会话处理

Session 初始化后,交互的 Round 开始,它通过协调 HostAgentAppAgent 来完成单个用户请求。

3. 下一轮

第一次 Round 完成后,Session 从用户请求下一个请求,以开始下一 Round 交互。这个过程一直持续到用户没有更多的请求。Session 的核心逻辑如下所示

def run(self) -> None:
    """
    Run the session.
    """

    while not self.is_finished():

        round = self.create_new_round()
        if round is None:
            break
        round.run()

    if self.application_window is not None:
        self.capture_last_snapshot()

    if self._should_evaluate and not self.is_error():
        self.evaluation()

    self.print_cost()

4. 会话终止

如果用户没有更多的请求或决定结束对话,Session 将被终止,对话结束。如果配置了 EvaluationAgent,它将评估 Session 的完整性。

参考

基类:ABC

UFO 中的基本会话。一个会话由多轮交互和对话组成。

初始化会话。

参数
  • task (str) –

    当前任务的名称。

  • should_evaluate (bool) –

    是否评估会话。

  • id (int) –

    会话的 ID。

源代码在 module/basic.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
def __init__(self, task: str, should_evaluate: bool, id: int) -> None:
    """
    Initialize a session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    """

    self._should_evaluate = should_evaluate
    self._id = id

    # Logging-related properties
    self.log_path = f"logs/{task}/"
    utils.create_folder(self.log_path)

    self._rounds: Dict[int, BaseRound] = {}

    self._context = Context()
    self._init_context()
    self._finish = False
    self._results = {}

    self._host_agent: HostAgent = AgentFactory.create_agent(
        "host",
        "HostAgent",
        configs["HOST_AGENT"]["VISUAL_MODE"],
        configs["HOSTAGENT_PROMPT"],
        configs["HOSTAGENT_EXAMPLE_PROMPT"],
        configs["API_PROMPT"],
    )

application_window property writable

获取会话的应用程序。返回:会话的应用程序。

context property

获取会话的上下文。返回:会话的上下文。

cost property writable

获取会话的成本。返回:会话的成本。

current_round property

获取会话的当前轮次。返回:会话的当前轮次。

evaluation_logger property

获取用于评估的日志记录器。返回:用于评估的日志记录器。

id property

获取会话的ID。返回:会话的ID。

results property writable

获取会话的评估结果。返回:会话的评估结果。

rounds property

获取会话的轮次。返回:会话的轮次。

session_type property

获取会话的类名。返回:会话的类名。

step property

获取会话的步骤。返回:会话的步骤。

total_rounds property

获取会话中的总轮次。返回:会话中的总轮次。

add_round(id, round)

向会话添加一轮。

参数
  • id (int) –

    轮次的ID。

  • round (BaseRound) –

    要添加的轮次。

源代码在 module/basic.py
433
434
435
436
437
438
439
def add_round(self, id: int, round: BaseRound) -> None:
    """
    Add a round to the session.
    :param id: The id of the round.
    :param round: The round to be added.
    """
    self._rounds[id] = round

capture_last_snapshot()

捕获应用程序的最后快照,包括屏幕截图和XML文件(如果已配置)。

源代码在 module/basic.py
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
def capture_last_snapshot(self) -> None:
    """
    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
    """

    # Capture the final screenshot
    screenshot_save_path = self.log_path + f"action_step_final.png"

    if self.application_window is not None:

        try:
            PhotographerFacade().capture_app_window_screenshot(
                self.application_window, save_path=screenshot_save_path
            )

        except Exception as e:
            utils.print_with_color(
                f"Warning: The last snapshot capture failed, due to the error: {e}",
                "yellow",
            )

        if configs.get("SAVE_UI_TREE", False):
            step_ui_tree = ui_tree.UITree(self.application_window)

            ui_tree_path = os.path.join(self.log_path, "ui_trees")

            ui_tree_file_name = "ui_tree_final.json"

            step_ui_tree.save_ui_tree_to_json(
                os.path.join(
                    ui_tree_path,
                    ui_tree_file_name,
                )
            )

        if configs.get("SAVE_FULL_SCREEN", False):

            desktop_save_path = self.log_path + f"desktop_final.png"

            # Capture the desktop screenshot for all screens.
            PhotographerFacade().capture_desktop_screen_screenshot(
                all_screens=True, save_path=desktop_save_path
            )

        # Save the final XML file
        if configs["LOG_XML"]:
            log_abs_path = os.path.abspath(self.log_path)
            xml_save_path = os.path.join(log_abs_path, f"xml/action_step_final.xml")

            app_agent = self._host_agent.get_active_appagent()
            if app_agent is not None:
                app_agent.Puppeteer.save_to_xml(xml_save_path)

create_following_round()

创建下一轮。返回:下一轮。

源代码在 module/basic.py
426
427
428
429
430
431
def create_following_round(self) -> BaseRound:
    """
    Create a following round.
    return: The following round.
    """
    pass

create_new_round() abstractmethod

创建一个新回合。

源代码在 module/basic.py
411
412
413
414
415
416
@abstractmethod
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round.
    """
    pass

evaluation()

评估会话。

源代码在 module/basic.py
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
def evaluation(self) -> None:
    """
    Evaluate the session.
    """
    utils.print_with_color("Evaluating the session...", "yellow")

    is_visual = configs.get("EVALUATION_AGENT", {}).get("VISUAL_MODE", True)

    evaluator = EvaluationAgent(
        name="eva_agent",
        app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
        is_visual=is_visual,
        main_prompt=configs["EVALUATION_PROMPT"],
        example_prompt="",
        api_prompt=configs["API_PROMPT"],
    )

    requests = self.request_to_evaluate()

    # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation.
    try:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=configs.get("EVA_ALL_SCREENSHOTS", True),
        )
    except Exception as e:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=False,
        )

    # Add additional information to the evaluation result.
    additional_info = {"level": "session", "request": requests, "id": 0}
    result.update(additional_info)

    self.results = result

    self.cost += cost

    evaluator.print_response(result)

    self.evaluation_logger.info(json.dumps(result))

experience_saver()

将当前轨迹保存为智能体经验。

源代码在 module/basic.py
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
def experience_saver(self) -> None:
    """
    Save the current trajectory as agent experience.
    """
    utils.print_with_color(
        "Summarizing and saving the execution flow as experience...", "yellow"
    )

    summarizer = ExperienceSummarizer(
        configs["APP_AGENT"]["VISUAL_MODE"],
        configs["EXPERIENCE_PROMPT"],
        configs["APPAGENT_EXAMPLE_PROMPT"],
        configs["API_PROMPT"],
    )
    experience = summarizer.read_logs(self.log_path)
    summaries, cost = summarizer.get_summary_list(experience)

    experience_path = configs["EXPERIENCE_SAVED_PATH"]
    utils.create_folder(experience_path)
    summarizer.create_or_update_yaml(
        summaries, os.path.join(experience_path, "experience.yaml")
    )
    summarizer.create_or_update_vector_db(
        summaries, os.path.join(experience_path, "experience_db")
    )

    self.cost += cost
    utils.print_with_color("The experience has been saved.", "magenta")

initialize_logger(log_path, log_filename, mode='a', configs=configs) staticmethod

初始化日志记录。log_path:日志文件的路径。log_filename:日志文件的名称。返回:日志记录器。

源代码在 module/basic.py
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
@staticmethod
def initialize_logger(
    log_path: str, log_filename: str, mode="a", configs=configs
) -> logging.Logger:
    """
    Initialize logging.
    log_path: The path of the log file.
    log_filename: The name of the log file.
    return: The logger.
    """
    # Code for initializing logging
    logger = logging.Logger(log_filename)

    if not configs["PRINT_LOG"]:
        # Remove existing handlers if PRINT_LOG is False
        logger.handlers = []

    log_file_path = os.path.join(log_path, log_filename)
    file_handler = logging.FileHandler(log_file_path, mode=mode, encoding="utf-8")
    formatter = logging.Formatter("%(message)s")
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)
    logger.setLevel(configs["LOG_LEVEL"])

    return logger

is_error()

检查会话是否处于错误状态。返回:如果会话处于错误状态,则为True,否则为False。

源代码在 module/basic.py
619
620
621
622
623
624
625
626
def is_error(self):
    """
    Check if the session is in error state.
    return: True if the session is in error state, otherwise False.
    """
    if self.current_round is not None:
        return self.current_round.state.name() == AgentStatus.ERROR.value
    return False

is_finished()

检查会话是否结束。返回:如果会话结束,则为True,否则为False。

源代码在 module/basic.py
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
def is_finished(self) -> bool:
    """
    Check if the session is ended.
    return: True if the session is ended, otherwise False.
    """
    if (
        self._finish
        or self.step >= configs["MAX_STEP"]
        or self.total_rounds >= configs["MAX_ROUND"]
    ):
        return True

    if self.is_error():
        return True

    return False

next_request() abstractmethod

获取会话的下一个请求。返回:会话的请求。

源代码在 module/basic.py
418
419
420
421
422
423
424
@abstractmethod
def next_request(self) -> str:
    """
    Get the next request of the session.
    return: The request of the session.
    """
    pass

print_cost()

打印会话的总成本。

源代码在 module/basic.py
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
def print_cost(self) -> None:
    """
    Print the total cost of the session.
    """

    if isinstance(self.cost, float) and self.cost > 0:
        formatted_cost = "${:.2f}".format(self.cost)
        utils.print_with_color(
            f"Total request cost of the session: {formatted_cost}$", "yellow"
        )
    else:
        utils.print_with_color(
            "Cost is not available for the model {host_model} or {app_model}.".format(
                host_model=configs["HOST_AGENT"]["API_MODEL"],
                app_model=configs["APP_AGENT"]["API_MODEL"],
            ),
            "yellow",
        )

request_to_evaluate() abstractmethod

获取要评估的请求。返回:要评估的请求。

源代码在 module/basic.py
645
646
647
648
649
650
651
@abstractmethod
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    return: The request(s) to evaluate.
    """
    pass

run()

运行会话。

源代码在 module/basic.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
def run(self) -> None:
    """
    Run the session.
    """

    while not self.is_finished():

        round = self.create_new_round()
        if round is None:
            break
        round.run()

    if self.application_window is not None:
        self.capture_last_snapshot()

    if self._should_evaluate and not self.is_error():
        self.evaluation()

    if configs.get("LOG_TO_MARKDOWN", True):

        file_path = self.log_path
        trajectory = Trajectory(file_path)
        trajectory.to_markdown(file_path + "/output.md")

    self.print_cost()