使用 LiteLLM 和 Ollama 的本地 LLM#
在本笔记本中,我们将创建两个代理,Joe 和 Cathy,他们喜欢互相讲笑话。 这些代理将使用本地运行的 LLM。
请按照 https://msdocs.cn/autogen/docs/topics/non-openai-models/local-litellm-ollama/ 上的指南了解如何安装 LiteLLM 和 Ollama。
我们建议您阅读该链接,但如果您赶时间并使用 Linux,请运行以下命令
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:1b
pip install 'litellm[proxy]'
litellm --model ollama/llama3.2:1b
这将运行代理服务器,并且它将在 'http://0.0.0.0:4000/' 上可用。
首先,让我们导入一些类。
from dataclasses import dataclass
from autogen_core import (
AgentId,
DefaultTopicId,
MessageContext,
RoutedAgent,
SingleThreadedAgentRuntime,
default_subscription,
message_handler,
)
from autogen_core.model_context import BufferedChatCompletionContext
from autogen_core.models import (
AssistantMessage,
ChatCompletionClient,
SystemMessage,
UserMessage,
)
from autogen_ext.models.openai import OpenAIChatCompletionClient
设置我们的本地 LLM 模型客户端。
def get_model_client() -> OpenAIChatCompletionClient: # type: ignore
"Mimic OpenAI API using Local LLM Server."
return OpenAIChatCompletionClient(
model="llama3.2:1b",
api_key="NotRequiredSinceWeAreLocal",
base_url="http://0.0.0.0:4000",
model_capabilities={
"json_output": False,
"vision": False,
"function_calling": True,
},
)
定义一个简单的消息类
@dataclass
class Message:
content: str
现在,是代理。
我们使用 SystemMessage
定义代理的角色,并设置终止条件。
@default_subscription
class Assistant(RoutedAgent):
def __init__(self, name: str, model_client: ChatCompletionClient) -> None:
super().__init__("An assistant agent.")
self._model_client = model_client
self.name = name
self.count = 0
self._system_messages = [
SystemMessage(
content=f"Your name is {name} and you are a part of a duo of comedians."
"You laugh when you find the joke funny, else reply 'I need to go now'.",
)
]
self._model_context = BufferedChatCompletionContext(buffer_size=5)
@message_handler
async def handle_message(self, message: Message, ctx: MessageContext) -> None:
self.count += 1
await self._model_context.add_message(UserMessage(content=message.content, source="user"))
result = await self._model_client.create(self._system_messages + await self._model_context.get_messages())
print(f"\n{self.name}: {message.content}")
if "I need to go".lower() in message.content.lower() or self.count > 2:
return
await self._model_context.add_message(AssistantMessage(content=result.content, source="assistant")) # type: ignore
await self.publish_message(Message(content=result.content), DefaultTopicId()) # type: ignore
设置代理。
runtime = SingleThreadedAgentRuntime()
model_client = get_model_client()
cathy = await Assistant.register(
runtime,
"cathy",
lambda: Assistant(name="Cathy", model_client=model_client),
)
joe = await Assistant.register(
runtime,
"joe",
lambda: Assistant(name="Joe", model_client=model_client),
)
让我们运行所有内容!
runtime.start()
await runtime.send_message(
Message("Joe, tell me a joke."),
recipient=AgentId(joe, "default"),
sender=AgentId(cathy, "default"),
)
await runtime.stop_when_idle()
# Close the connections to the model clients.
await model_client.close()
/tmp/ipykernel_1417357/2124203426.py:22: UserWarning: Resolved model mismatch: gpt-4o-2024-05-13 != ollama/llama3.1:8b. Model mapping may be incorrect.
result = await self._model_client.create(self._system_messages + await self._model_context.get_messages())
Joe: Joe, tell me a joke.
Cathy: Here's one:
Why couldn't the bicycle stand up by itself?
(waiting for your reaction...)
Joe: *laughs* It's because it was two-tired! Ahahaha! That's a good one! I love it!
Cathy: *roars with laughter* HAHAHAHA! Oh man, that's a classic! I'm glad you liked it! The setup is perfect and the punchline is just... *chuckles* Two-tired! I mean, come on! That's genius! We should definitely add that one to our act!
Joe: I need to go now.