随着 AI 技术的发展,GPT-4 在多种语言处理任务上表现卓越,包括机器翻译、文本分类和文本生成。同时,也有其他优质开源大语言模型涌现,比如 Llama、ChatGLM、Qwen 等等。这些优秀的开源模型可以帮助团队快速搭建一个出色的 LLM 应用。
然而,这些模型框架并未提供一致的可观测体验,这增加了开发者在不同框架之间对比迁移的难度。如何在减少开发成本的同时,能够统一使用 OpenAI 的接口?如何能高效地持续监控 LLM 应用的运行表现,而又不增加额外的开发复杂度?

GreptimeDB 是一款用 Rust 语言编写的时序数据库,具有分布式、开源、云原生、兼容性强等特点,帮助企业实时读写、处理和分析时序数据的同时,降低长期存储的成本。
GreptimeAI 构建于开源时序数据库 GreptimeDB 之上,是为大语言模型(LLM)应用提供的一套可观测解决方案,目前已经支持 LangChain 和 OpenAI 的生态。GreptimeAI 使您能够实时全面地了解成本、性能、流量和安全性方面的情况,帮助团队提升 LLM 应用的可靠性。
import uuid
from typing import AsyncGenerator
import bentoml
from annotated_types import Ge, Le
from typing_extensions import Annotated
# Import utility for creating OpenAI-compatible endpoints. See https://github.com/bentoml/BentoVLLM.
from bentovllm_openai.utils import openai_endpoints
MAX_TOKENS = 1024
# Define a prompt tem)plate to guide the model's behavior and response style
PROMPT_TEMPLATE = """<s>[INST]
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
{user_prompt} [/INST] """
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"
# Decorators to mark the class as a BentoML service with OpenAI-compatible endpoints
@openai_endpoints(served_model=MODEL_ID)
@bentoml.service(
name="mistral-7b-instruct-service",
traffic={
"timeout": 300,
},
resources={
"gpu": 1,
"gpu_type": "nvidia-l4",
},
)
class VLLM:
def __init__(self) -> None:
from vllm import AsyncEngineArgs, AsyncLLMEngine
ENGINE_ARGS = AsyncEngineArgs(
model=MODEL_ID,
max_model_len=MAX_TOKENS
)
self.engine = AsyncLLMEngine.from_engine_args(ENGINE_ARGS)
@bentoml.api
async def generate(
self,
prompt: str = "Explain superconductors like I'm five years old",
max_tokens: Annotated[int, Ge(128), Le(MAX_TOKENS)] = MAX_TOKENS,
) -> AsyncGenerator[str, None]:
from vllm import SamplingParams
SAMPLING_PARAM = SamplingParams(max_tokens=max_tokens)
prompt = PROMPT_TEMPLATE.format(user_prompt=prompt)
stream = await self.engine.add_request(uuid.uuid4().hex, prompt, SAMPLING_PARAM)
cursor = 0
async for request_output in stream:
text = request_output.outputs[0].text
yield text[cursor:]
cursor = len(text)
想要更详细地了解 BentoML Service 并将其部署到 BentoCloud,请参见 此教程[5]。

(BentoCloud 控制台交互)
pip install openai greptimeai
3.2 设置 GreptimeAI 凭证

点击 Setup Guide 了解如何设置凭证。

3.3 Patch OpenAI 客户端
# you can pass <host>, <database>, <token> into this setup method
# if you do not want to export the credentials as environmental variables.
openai_patcher.setup(client=client)
from openai import OpenAI
from greptimeai import openai_patcher
client = OpenAI(base_url='<your_bentocloud_deployment_url>/v1', api_key='na')
openai_patcher.setup(client=client)
chat_completion = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.2",
messages=[
{
"role": "user",
"content": "Explain superconductors like I'm five years old"
}
],
stream=True,
)
for chunk in chat_completion:
print(chunk.choices[0].delta.content or "", end="")
3.4 GreptimeAI 看板
LLM 应用调用 OpenAI 接口的 metrics 和 trace 数据会被自动收集到 GreptimeAI 服务中,你可以在 GreptimeAI Overview 中看到整体使用情况,并在对应的功能标签找到你感兴趣的、有价值的数据。

(The demonstration of GreptimeAI)
欢迎尝试 GreptimeAI + BentoCloud 的方案,也欢迎分享使用这个方案的体验和见解。
[1] https://github.com/bentoml/BentoML
[2] https://www.bentoml.com/
[3] https://docs.bentoml.org/en/latest/guides/services.html
[4] https://github.com/vllm-project/vllm
[5] https://docs.bentoml.org/en/latest/use-cases/large-language-models/vllm.html
关于 Greptime
GreptimeDB 是一款用 Rust 语言编写的时序数据库,具有分布式、开源、云原生、兼容性强等特点,帮助企业实时读写、处理和分析时序数据的同时,降低长期存储的成本。
GreptimeCloud 是一款全托管的云上数据库即服务(DBaaS)解决方案,基于开源时序数据库 GreptimeDB 打造,能够高效支持可观测、物联网、金融等领域的应用。用户可以通过内置的可观测解决方案 GreptimeAI 全面地掌握 LLM 应用的成本、性能、流量和安全等情况。
车云一体解决方案 是一款深入车企实际业务场景的时序数据库解决方案,解决了企业车辆数据呈几何倍数增长后的实际业务痛点。
GreptimeCloud 和 GreptimeAI 已正式公测,欢迎关注公众号或官网了解最新动态!我们提供 GreptimDB Enterprise 版本,如有需要欢迎联系小助手(微信搜索 greptime 或扫描下方二维码添加)。

GreptimeDB 作为开源项目,欢迎对时序数据库、Rust 语言等内容感兴趣的同学们参与贡献和讨论。第一次参与项目的同学推荐先从带有 Good First Issue 标签的 Issue 入手,期待在开源社群里遇见你!
Star us on GitHub Now:
https://github.com/GreptimeTeam/greptimedb
官网:https://greptime.cn/
文档:https://docs.greptime.cn/
Twitter: https://twitter.com/Greptime
Slack: https://greptime.com/slack
LinkedIn: https://www.linkedin.com/company/greptime/
👇 点击下方阅读原文,立即免费注册 GreptimeAI!




