GreptimeDB 慢查询记录介绍——精准定位 SQL/PromQL 慢查询

GreptimeDB 2025-08-29

在日常数据库运维中，我们经常会遇到这样的场景：用户反馈某个查询响应很慢，但当我们去排查时，查询已经执行完毕，相关的执行信息也随之消失，导致无法准确定位性能瓶颈；或是在系统监控中发现某个时间段数据库负载异常，却因为缺少详细的查询日志记录，难以追溯具体是哪些 SQL/PromQL 语句导致的问题。

从 v0.15 开始（本文写作时 GreptimeDB v0.16.0 已经发布，建议使用最新版本测试），GreptimeDB 引入了慢查询记录系统表 greptime_private.slow_queries
：通过相应配置，可以自动记录执行时间过长的慢查询，为 DBA 和开发者提供便捷的诊断和分析工具。无论是事后问题排查，还是日常性能监控，都能够通过历史慢查询记录快速定位问题根源。

开启慢查询记录

仅需如下配置即可开启慢查询记录：

[slow_query]
enable = true
record_type = "system_table"
threshold = "10s"
sample_ratio = 1.0
ttl = "30d"

其中：

enable
设置为 true
表示开启慢查询记录；
record_type
表示记录的方式

log
仅打印日志。
system_table
记录到系统表 greptime_private.slow_queries
，这是更为推荐的方式。

threshold
表示慢查询的耗时阈值。超过这个时间会被记录，它是一个时间范围字符串，例如 1h 表示 1 小时，1m 表示 1 分钟，这里设置成了 10 秒；
sample_ratio
是采样比例，如果慢查询数量较多，为了避免额外记录带来更多的性能劣化，我们可以设置一个采样比例，这里设置成了百分百采样，也就是全记录；
ttl
，当 record_type
是 system_table
，这个参数用于设置 slow_queries
表的数据有效期，上面设置为 30 天（默认值）。

注：单机版开启上述配置即可，集群版本需要在 Frontend 节点开启上述配置。

单机开启慢查询记录

首先请阅读单机版本的安装指南[1]，可裸金属二进制部署或使用 Docker 启动。

首先下载默认的单机样例配置文件到本地并保存为 standalone.toml
文件：

curl -o standalone.toml \
https://raw.githubusercontent.com/GreptimeTeam/greptimedb/refs/tags/v0.16.0/config/standalone.example.toml

编辑该文件，找到 [slow_query]
部分，开启被注释的配置并设置 enable
为 true
即可：

## The slow query log options.
[slow_query]
## Whether to enable slow query log.
enable = true

## The record type of slow queries. It can be `system_table` or `log`.
## @toml2docs:none-default
record_type = "system_table"

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

然后保存退出。

如果是二进制部署，直接通过 -c
指定配置文件启动即可：

./greptime standalone start -c standalone.toml

如果使用 Docker，需要挂载配置文件启动：

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb_data:/greptimedb_data" \
  -v "$(pwd)/standalone.toml:/standalone.toml" \
  --name greptime --rm \
  greptime/greptimedb:v0.16.0 standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003 \
  -c standalone.toml

编者注：这里在命令行额外指定了 MySQL 和 HTTP 等协议端口，因为配置文件默认监听本地地址 127.0.0.1
，容器外无法访问，因此通过命令行参数覆盖使用 0.0.0.0
。

集群开启慢查询记录

集群版本，我们推荐使用 Helm Chart 来部署，请先阅读安装指南[2]。假设此时的操作环境已经安装了 GreptimeDB 的 Operator 和 Etcd 集群。

创建一个 values.yaml
文件并写入：

slowQuery:
  enable: true
  recordType: "system_table"
  threshold: "10s"
  sampleRatio: "1.0"
  ttl: "30d"

创建一个 GreptimeDB 集群并启动：

helm install mycluster \
  greptime/greptimedb-cluster \
  -f values.yaml \
  -n default

确认所有 Pod 正常启动：

kubectl -n default get pods

预期输出：

NAME                                 READY   STATUS    RESTARTS   AGE
mycluster-datanode-0                 1/1     Running   0          70s
mycluster-flownode-0                 1/1     Running   0          33s
mycluster-frontend-f8989595d-bm2pf   1/1     Running   0          39s
mycluster-meta-6964f7b654-mdp6q      1/1     Running   0          106s

接下来转发 Frontend 服务来访问 GreptimeDB 集群：

kubectl -n default port-forward svc/mycluster-frontend 4000:4000 4001:4001 4002:4002 4003:4003

我们可以通过检查 /config
API 确认配置是否生效：

curl -sS http://localhost:4000/config |grep slow_query -A5

正常情况下应该配置输出为：

[slow_query]
enable = true
record_type = "system_table"
threshold = "10s"
sample_ratio = 1.0
ttl = "30d"

验证慢查询记录

用户已经通过上述步骤成功设置了单机版或集群版的慢查询记录，我们可以模拟下慢查询，通过 MySQL 客户端访问 GreptimeDB 4002 端口：

mysql -h 127.0.0.1 -P 4002

执行一个慢查询为例：

WITH RECURSIVE slow_cte AS (
    SELECT 1 as n, md5(random()) as hash
    UNION ALL
    SELECT n + 1, md5(concat(hash, n))
    FROM slow_cte
    WHERE n < 1000000
)
SELECT COUNT(*) FROM slow_cte;

这个查询用递归 CTE 从 1 迭代到 1,000,000，每步基于上一步的哈希计算新的 MD5，最终只统计生成的行数。

这个查询在笔者的机器上要耗时 1 分多钟：

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (1 min 0.42 sec)

接下来查询 greptime_private.slow_queries
表即可看到记录的慢查询信息：

SELECT * FROM greptime_private.slow_queries\G;

输出结果为：

*************************** 1. row ***************************
        cost: 60394
   threshold: 10000
       query: WITH RECURSIVE slow_cte AS (SELECT 1 AS n, md5(random()) AS hash UNION ALL SELECT n + 1, md5(concat(hash, n)) FROM slow_cte WHERE n < 1000000) SELECT COUNT(*) FROM slow_cte
   is_promql: 0
   timestamp: 2025-08-2007:07:08.300677
promql_range: 0
 promql_step: 0
promql_start: 1970-01-0100:00:00
  promql_end: 1970-01-0100:00:00
1 row in set (0.03 sec)

ERROR:
No query specified

可以看到 GreptimeDB 自动记录了这个慢查询，并提供了以下信息：

cost
字段是耗时，单位为毫秒，整个查询耗时大于 1 分钟；
query
字段是原始查询的 SQL 或者 PromQL 字符串。拿到这条 SQL 就可以进一步利用 EXPLAIN
和 EXPLAIN ANALYZE
等进行分析（查看文档[3])；
is_promql
，promql_range
，promql_step
，promql_start
和 promql_end
记录 PromQL 的慢查询信息，比如查询的 start
，end
和 step
等参数；
timetamp
是慢查询发生的时间点（示例为 UTC 时间）。

总结

本文展示 GreptimeDB 慢查询自动记录功能。从 v0.15 开始可以通过系统表和配置自动记录 SQL 和 PromQL 慢查询，方便诊断分析、排查和监控性能问题。此外，GreptimeDB 企业版进一步集成了自动查询性能分析与建议改进的可视化界面。

了解更多，欢迎联系我们（添加小助手微信：greptime）。

Reference

[1] https://docs.greptime.cn/getting-started/installation/greptimedb-standalone/

[2] https://docs.greptime.cn/user-guide/deployments-administration/deploy-on-kubernetes/deploy-greptimedb-cluster/

[3] https://docs.greptime.com/reference/sql/explain/

关于 Greptime

Greptime 格睿科技专注于打造新一代可观测数据库，服务开发者与企业用户，覆盖从从边缘设备到云端企业级部署的多样化需求。

GreptimeDB 开源版：开源、云原生，统一处理指标、日志和追踪数据，适合中小规模 IoT，个人项目与可观测性场景；
GreptimeDB 企业版：面向关键业务，提供更高性能、高安全性、高可用性和智能化运维服务；
GreptimeCloud 云服务：全托管云服务，零运维体验“企业级”可观测数据库，弹性扩展，按需付费。

欢迎加入开源社区参与贡献与交流！推荐从带有 good first issue
标签的任务入手，一起共建可观测未来。

⭐ Star us on GitHub：https://github.com/GreptimeTeam/greptimedb

📚 官网：https://greptime.cn/

📖 文档：https://docs.greptime.cn/

🌍 Twitter：https://twitter.com/Greptime

💬 Slack：https://greptime.com/slack

💼 LinkedIn：https://www.linkedin.com/company/greptime/

往期精彩文章：

点击「阅读原文」，立即体验 GreptimeDB！

文章转载自GreptimeDB，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。