暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

MySQL迁移OpenGauss性能优化实战:一条SQL从570秒到5毫秒的蜕变

原创 szrsu 2025-12-30
520

一、问题背景

在从MySQL迁移至OpenGauss数据库的过程中,遇到了一个严重的性能问题。一条在MySQL中运行正常的关联查询语句,在OpenGauss中执行时间达到了570秒,严重影响了系统性能。本文将详细记录优化过程,分享从问题诊断到最终解决的过程。

二、环境准备

2.1 表结构定义

-- 创建 t1 表(文件信息表) CREATE TABLE t1 ( object_id character varying(255) NOT NULL, file_name character varying(255), md5 character varying(255), size bigint, valid smallint DEFAULT 1, PRIMARY KEY (object_id) ); -- 创建索引 CREATE INDEX t1_idx_md5 ON t1 USING btree (md5); CREATE INDEX t1_idx_valid ON t1 USING btree (valid); -- 创建 t2 表(版本关联表) CREATE TABLE t2 ( id bigserial PRIMARY KEY, version_id bigint NOT NULL, valid smallint DEFAULT 1, asset_version_format_id integer DEFAULT 0, object_id varbinary(255) );

2.2 测试数据生成

-- 插入 t1 数据(约10万行有效数据) INSERT INTO t1 (object_id, file_name, md5, size, valid) SELECT 'file_' || i, 'file_' || i || '.jpg', md5(random()::text), (random() * 100000)::bigint, 1 FROM generate_series(1, 100000) i; -- 插入100条无效数据 INSERT INTO t1 (object_id, file_name, md5, size, valid) SELECT 'invalid_' || i, 'invalid_' || i || '.jpg', md5(random()::text), (random() * 100000)::bigint, 0 FROM generate_series(1, 100) i; -- 插入 t2 数据(100条特定版本数据) INSERT INTO t2 (version_id, valid, asset_version_format_id, object_id) SELECT 48725, 1, 0, ('file_' || (i % 1000 + 1)) FROM generate_series(1, 100) i; -- 插入其他版本数据(10万条) INSERT INTO t2 (version_id, valid, asset_version_format_id, object_id) SELECT (random() * 100000)::bigint, 1, floor(random() * 5)::integer, ('file_' || (random() * 1000)::integer)::bytea FROM generate_series(1, 100000) i;

三、性能问题初现

3.1 原始查询语句

SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name FROM t1 t1 JOIN t2 t2 ON t2.object_id = t1.object_id WHERE t2.version_id = 48725 AND t2.valid = 1 AND t2.asset_version_format_id = 0 AND t1.valid = 1;

执行结果:耗时约570秒

3.2 数据分布分析

-- 查看t2表符合条件的数据量 select count(*) from t2 WHERE t2.version_id = 48725 AND t2.valid = 1 AND t2.asset_version_format_id = 0; -- 结果:100行(仅占总数据量的0.1%) -- t2表总数据量 select count(*) from t2; -- 结果:100100行

四、执行计划深度分析

4.1 初始执行计划

EXPLAIN SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name FROM t1 t1 JOIN t2 t2 ON t2.object_id = t1.object_id WHERE t2.version_id = 48725 AND t2.valid = 1 AND t2.asset_version_format_id = 0 AND t1.valid = 1;

执行计划输出

                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2393.45 rows=1 width=1588)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Index Scan using t1_idx_valid on t1  (cost=0.00..144.84 rows=34 width=1556)
         Index Cond: (valid = 1)
   ->  Materialize  (cost=0.00..2239.68 rows=1 width=32)
         ->  Seq Scan on t2  (cost=0.00..2239.68 rows=1 width=32)
               Filter: ((version_id = 48725) AND (valid = 1) AND (asset_version_format_id = 0))
(7 rows)

4.2 问题诊断

  1. 缺失关键索引:t2表缺少针对version_id的过滤索引
  2. 连接顺序不当:执行计划选择t1作为驱动表,但实际t2过滤后数据量更小

五、优化步骤实施

5.1 第一步:创建缺失索引

-- 为t1表的连接字段创建索引 create index idx_t1_objectid on t1 (object_id); -- 为t2表创建复合索引(覆盖version_id和object_id) create index idx_t2_version_objectid on t2 (version_id, object_id);

5.2 第二步:更新统计信息

-- 收集最新的统计信息 analyze table t1; analyze table t2;

5.3 第三步:分析优化后执行计划

添加索引后的执行计划:

                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.18..659088.83 rows=40 width=73)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Seq Scan on t1  (cost=0.00..2585.25 rows=99987 width=65)
         Filter: (valid = 1)
   ->  Materialize  (cost=5.18..338.96 rows=25 width=8)
         ->  Bitmap Heap Scan on t2  (cost=5.18..338.83 rows=25 width=8)
               Recheck Cond: (version_id = 48725)
               Filter: ((valid = 1) AND (asset_version_format_id = 0))
               ->  Bitmap Index Scan on idx_t2_version_objectid  (cost=0.00..5.17 rows=123 width=0)
                     Index Cond: (version_id = 48725)

t2过滤后数据较少,由它当做驱动表才是正确的。

使用hint提示优化器调整连接顺序:

EXPLAIN SELECT /*+ leading((t2 t1)) */ t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name FROM t1 JOIN t2 ON t2.object_id = t1.object_id WHERE t2.version_id = 48725 AND t2.valid = 1 AND t2.asset_version_format_id = 0 AND t1.valid = 1;

调整后执行计划如下:

                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..710741.89 rows=40 width=73)
   Join Filter: (t2.object_id = (t1.object_id)::text)
   ->  Index Scan using idx_t2_version_objectid on t2  (cost=0.00..466.99 rows=25 width=8)
         Index Cond: (version_id = 48725)
         Filter: ((valid = 1) AND (asset_version_format_id = 0))
   ->  Materialize  (cost=0.00..5136.18 rows=99987 width=65)
         ->  Seq Scan on t1  (cost=0.00..2585.25 rows=99987 width=65)
               Filter: (valid = 1)

执行计划有所改善,但仍不是最优。

问题依旧存在:虽然t2使用了索引,但连接条件中的隐式类型转换问题,t2.object_id为varbinary类型,t1.object_id为varchar类型,连接时存在隐式类型转换,导致t1还是使用全表扫描。

六、终极优化方案

6.1 根本问题解决:消除类型转换

消除类型转换有多种方式,可以将数据类型都调整成一样(t2.object_id可以由varbinary改成varchar类型),也可以改写sql,显示处理类型转换:

-- 改写SQL,显式处理类型转换 EXPLAIN SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name FROM t2 JOIN t1 ON convert_from(t2.object_id, 'UTF8') = t1.object_id WHERE t2.version_id = 48725 AND t2.valid = 1 AND t2.asset_version_format_id = 0 AND t1.valid = 1;

优化后的执行计划

                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.18..546.02 rows=40 width=73)
   ->  Bitmap Heap Scan on t2  (cost=5.18..338.83 rows=25 width=8)
         Recheck Cond: (version_id = 48725)
         Filter: ((valid = 1) AND (asset_version_format_id = 0))
         ->  Bitmap Index Scan on idx_t2_version_objectid  (cost=0.00..5.17 rows=123 width=0)
               Index Cond: (version_id = 48725)
   ->  Index Scan using idx_t1_objectid on t1  (cost=0.00..8.28 rows=1 width=65)
         Index Cond: ((object_id)::text = convert_from((t2.object_id)::bytea, 'UTF8'::name))
         Filter: (valid = 1)

6.2 最终执行效果

mydb[15400]=#explain analyze SELECT t1.object_id, t2.object_id, t1.size, t1.md5, t1.file_name mydb-# FROM t2 mydb-# JOIN t1 ON convert_from(t2.object_id, 'UTF8') = t1.object_id mydb-# WHERE t2.version_id = 48725 mydb-# AND t2.valid = 1 mydb-# AND t2.asset_version_format_id = 0 mydb-# AND t1.valid = 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=5.18..546.02 rows=40 width=73) (actual time=0.350..4.377 rows=100 loops=1) -> Bitmap Heap Scan on t2 (cost=5.18..338.83 rows=25 width=8) (actual time=0.183..0.367 rows=100 loops=1) Recheck Cond: (version_id = 48725) Filter: ((valid = 1) AND (asset_version_format_id = 0)) Rows Removed by Filter: 2 Heap Blocks: exact=3 -> Bitmap Index Scan on idx_t2_version_objectid (cost=0.00..5.17 rows=123 width=0) (actual time=0.097..0.097 rows=102 loops=1) Index Cond: (version_id = 48725) -> Index Scan using idx_t1_objectid on t1 (cost=0.00..8.28 rows=1 width=65) (actual time=3.175..3.326 rows=100 loops=100) Index Cond: ((object_id)::text = convert_from((t2.object_id)::bytea, 'UTF8'::name)) Filter: (valid = 1) Total runtime: 4.823 ms (12 rows)

最终执行时间:5 ms左右

七、总结

本次优化案例展示了数据库迁移过程中可能遇到的性能陷阱,通过深入分析执行计划,识别出隐式类型转换这一根本问题,并采用针对性的优化策略,最终实现了从570秒到5毫秒的性能飞跃。

优化阶段 执行时间 性能提升 关键改进点
原始SQL 570,000 ms 基准 -
仅添加索引 未显著改善 不明显 添加缺失索引
调整连接顺序 有所改善 一般 使用hint调整连接顺序
最终优化SQL 5 ms 114000倍 消除隐式类型转换

核心优化要点:

  1. 数据类型一致性:确保连接条件两边的数据类型完全一致
  2. 显式类型转换:使用convert_from()等函数显式处理数据
  3. 复合索引设计:针对高频查询条件设计覆盖索引
  4. 统计信息更新:数据变化后及时更新统计信息
  5. 执行计划分析:定期分析关键SQL的执行计划
最后修改时间:2025-12-30 10:00:51
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论