简单易懂:IN 查询语句简单直观,易于理解和编写。 效率高:在处理一组离散的数值时,IN 查询比多次使用 OR 查询效率更高
优化器计算 Cost 时, 大量参数会带来额外的 Cost 计算开销 IN 的动态参数列表会产生大量不同的 SQL 模板, 很容易占满执行计划缓存 大量 IN 参数在分布式场景需要通过预计算裁剪掉一部分, 从而避免对 DN 产生不必要的计算开销
IN 查询与执行计划管理
A prepared statement can be executed with either a generic plan or a custom plan. A generic plan is the same across all executions, while a custom plan is generated for a specific execution using the parameter values given in that call. Use of a generic plan avoids planning overhead, but in some situations a custom plan will be much more efficient to execute because the planner can make use of knowledge of the parameter values. (Of course, if the prepared statement has no parameters, then this is moot and a generic plan is always used.)
prepared statement 可以通过 generic plan 或 custom plan 来执行。generic plan 对所有执行都相同,而 custom plan 则是针对特定执行使用该调用中给出的参数值生成的。使用 generic plan 可以避免规划开销,但在某些情况下,由于规划器可以利用对参数值的了解,custom plan 的执行效率会高得多。(当然,如果 prepared statement 没有参数,那么这一点就无关紧要了,总是会使用 generic plan 。)
-- 翻译自阿里大模型, 引自 PostgreSQL社区技术文章: https://www.postgresql.org/docs/current/sql-prepare.html
SELECT order_info FROM orders WHERE item_id IN(?) AND seller_id IN(?) ORDERY BY gmt;

功能展示
// 创建好 DB 与 表CREATE DATABASE test MODE=AUTO;use testCREATE TABLE tb_h(id bigint not null auto_increment,bid int,name varchar(30),birthday datetime not null,primary key(id))PARTITION BY HASH(id)PARTITIONS 8;
select * from tb_h where id in(1,2,3,4,5);
mysql> select * from information_schema.plan_cache where `sql` like '%tb_h%' limit 10\G*************************** 1. row ***************************COMPUTE_NODE: 10.2.100.161:3021 SCHEMA_NAME: test TABLE_NAMES: tb_h ID: 1d4daf02 HIT_COUNT: 0 SQL: SELECT *FROM tb_hWHERE id IN (?) TYPE_DIGEST: 1926227933676693664 PLAN:Gather LogicalView(tableNames=[[tb_h]]) PARAMETER: [[1,2,3,4,5]]1 row in set (0.01 sec)
mysql> explain select * from tb_h where id in(4566);+----------------------------------------------------------------------------------------------------------------------------------------------+| LOGICAL EXECUTIONPLAN |+----------------------------------------------------------------------------------------------------------------------------------------------+| Gather(concurrent=true) || LogicalView(tables="tb_h[p8]", sql="SELECT `id`, `bid`, `name`, `birthday` FROM `tb_h` AS `tb_h` FORCE INDEX(PRIMARY) WHERE (`id` IN(?))") || HitCache:true || Source:PLAN_CACHE || TemplateId: 1d4daf02 |+----------------------------------------------------------------------------------------------------------------------------------------------+5 rows in set (0.00 sec)mysql> explain select * from tb_h where id in(45, 66);+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| LOGICAL EXECUTIONPLAN |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Gather(concurrent=true) || LogicalView(tables="tb_h[p2,p7]", shardCount=2, sql="SELECT `id`, `bid`, `name`, `birthday` FROM `tb_h` AS `tb_h` FORCE INDEX(PRIMARY) WHERE (`id` IN(?))", pruningInfo="all size:2*2(part), pruning size:2") || HitCache:true || Source:PLAN_CACHE || TemplateId: 1d4daf02 |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+5 rows in set (0.00 sec)
IN 查询与预裁剪
(E AND Col = P1) OR
(E AND Col = P2) OR
(E AND Col = P3) OR
...
(E AND Col = Pn) OR

设定最大计算次数, 从最长的 IN 表达式依次纳入笛卡尔积计算, 中途如果超出了最大计算次数, 则将剩余的 IN 表达式看做一个整体, 不再对其裁剪. 如果单次计算会覆盖所有分片, 则认为该表达式不具备分片裁剪能力, 直接退出裁剪
功能展示
//建表CREATE TABLE `hash_tbl_todays_AutoPruning` (`id` bigint(20) NOT NULL AUTO_INCREMENT,`bid` int(11) DEFAULT NULL,`name` varchar(30) DEFAULT NULL,`birthday` datetime NOT NULL,PRIMARY KEY (`id`),KEY `auto_shard_key_birthday` USING BTREE (`birthday`)) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4PARTITION BY HASH(TO_DAYS(`birthday`))PARTITIONS 8 ;// 将 EXPLAIN_PRUNING_DETAIL 开关打开便于观察 IN 裁剪的效果SET GLOBAL EXPLAIN_PRUNING_DETAIL=TRUE;
裁剪信息说明
mysql> explain select * from hash_tbl_todays_AutoPruning where birthday in ('2024-09-08', '2024-09-10', '2024-10-08', '2023-12-30') and bid>100;+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| LOGICAL EXECUTIONPLAN |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Gather(concurrent=true) || LogicalView(tables="hash_tbl_todays_AutoPruning[p1,p4,p7]", shardCount=3, sql="SELECT `id`, `bid`, `name`, `birthday` FROM `hash_tbl_todays_AutoPruning` AS `hash_tbl_todays_AutoPruning` WHERE ((`birthday` IN(?)) AND (`bid` > ?))", pruningInfo="all size:4*3(part), pruning size:8, pruning time:0ms, pruning detail:(TEST_P00000_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00006])->(PruneRaw('2024-09-10'));(TEST_P00000_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00000])->(PruneRaw('2023-12-30'));(TEST_P00001_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00003])->(PruneRaw('2024-09-08','2024-10-08'))") || HitCache:true || Source:PLAN_CACHE || TemplateId: f36182f0 |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+5 rows in set (0.00 sec)
all size:4*3(part), pruning size:8, pruning time:0ms, pruning detail:(TEST_P00000_GROUP,[hash_tbl_todays_AutoPruning_dn1M_00006])->(PruneRaw('2024-09-10'));(TEST_P00000_GROUP,[hash_tbl_todays_AutoPruning_dn1M_00000])->(PruneRaw('2023-12-30'));(TEST_P00001_GROUP,[hash_tbl_todays_AutoPruning_dn1M_00003])->(PruneRaw('2024-09-08','2024-10-08')
多列 IN 裁剪
mysql> explain select * from hash_tbl_todays_AutoPruning where (name, birthday) in (('xiaoli', '2024-09-08'), ('ming', '2024-09-10'), ('xiaoT', '2024-10-08'), ('yi', '2023-12-30')) and bid>100;+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| LOGICAL EXECUTIONPLAN |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Gather(concurrent=true) || LogicalView(tables="hash_tbl_todays_AutoPruning[p1,p4,p7]", shardCount=3, sql="SELECT `id`, `bid`, `name`, `birthday` FROM `hash_tbl_todays_AutoPruning` AS `hash_tbl_todays_AutoPruning` WHERE ((((`name`, `birthday`)) IN(?)) AND (`bid` > ?))", pruningInfo="all size:4*3(part), pruning size:8, pruning time:0ms, pruning detail:(TEST_P00000_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00006])->(PruneRaw(('ming','2024-09-10')));(TEST_P00000_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00000])->(PruneRaw(('yi','2023-12-30')));(TEST_P00001_GROUP, [hash_tbl_todays_AutoPruning_dn1M_00003])->(PruneRaw(('xiaoli','2024-09-08'),('xiaoT','2024-10-08')))") || HitCache:true || Source:PLAN_CACHE || TemplateId: ce4cba38 |+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+5 rows in set (0.00 sec)
多个 IN 表达式裁剪

CREATE TABLE tb_k_k_tp( id bigint not null auto_increment, bid int, name varchar(30), birthday datetime not null, primary key(id)) PARTITION BY KEY(bid) PARTITIONS 2SUBPARTITION BY KEY(id) SUBPARTITIONS 4;mysql> explain select * from tb_k_k_tp where bid in (1,2,3,4,5,6,7,8,9,10) and id in (11,12,13,14,15,16);+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| LOGICAL EXECUTIONPLAN |+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Gather(concurrent=true) || LogicalView(tables="tb_k_k_tp[p1sp1,p1sp2,p1sp3,p1sp4,p2sp1,p2sp2,p2sp3,p2sp4]", shardCount=8, sql="SELECT `id`, `bid`, `name`, `birthday` FROM `tb_k_k_tp` AS `tb_k_k_tp` FORCE INDEX(PRIMARY) WHERE((`id` IN(?)) AND (`bid` IN(?)))", pruningInfo="all size:16*8(part), pruning size:76, pruning time:1ms, pruning detail:(TEST_P00000_GROUP, [tb_k_k_tp_UyWm_00004])->(PruneRaw(1,4,7,8,9,10),PruneRaw(11,12));(TEST_P00001_GROUP, [tb_k_k_tp_UyWm_00005])->(PruneRaw(1,4,7,8,9,10),PruneRaw(13,16));(TEST_P00000_GROUP, [tb_k_k_tp_UyWm_00006])->(PruneRaw(1,4,7,8,9,10),PruneRaw(14));(TEST_P00001_GROUP, [tb_k_k_tp_UyWm_00007])->(PruneRaw(1,4,7,8,9,10),PruneRaw(15));(TEST_P00000_GROUP, [tb_k_k_tp_UyWm_00000])->(PruneRaw(2,3,5,6),PruneRaw(11,12));(TEST_P00000_GROUP, [tb_k_k_tp_UyWm_00002])->(PruneRaw(2,3,5,6),PruneRaw(14));(TEST_P00001_GROUP, [tb_k_k_tp_UyWm_00003])->(PruneRaw(2,3,5,6),PruneRaw(15));(TEST_P00001_GROUP, [tb_k_k_tp_UyWm_00001])->(PruneRaw(2,3,5,6),PruneRaw(13,16))") || HitCache:true || Source:PLAN_CACHE || TemplateId: f70bd95a |+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+5 rows in set (0.00 sec)
性能测试
测试环境:
PolarDB-X数据库规格:8C32G
测试方法:sysbench IN value 测试
SELECT c FROM sbtest1 WHERE id in (?)
IN 数量较小时(5个左右), 由于 CN CPU 最先成为瓶颈, 导致目前不做裁剪 QPS 比裁剪略高.但同时对 dn 的 cpu 和 io 压力会远远高于不做裁剪 IN 数量增加时, 不做裁剪会给 DN 大量的 cpu 和 IO 压力, 从而导致QPS远远不及裁剪
正确性证明
总结
优化 Cost 计算机制:针对含有大量参数的 IN 查询,调整成本评估算法,旨在更准确地预测资源消耗,从而选择更加合理的执行路径。 智能管理执行计划缓存:通过对 parser, 优化器, 执行器的全方位改造, 使得查询引擎具备控制 IN 参数产生的冗余 SQL 模板数量,确保不会因计划缓存溢出而影响整体性能。 增强分布式处理能力:利用高效的预处理技术识别和移除不必要的 IN 参数项,减轻各 DN 数据节点的工作负荷;加快整个查询过程的速度及数据库的整体性能表现。




