适用范围
在V3.2.4上进行测试验证
存储层估行介绍
我们知道统计信息收集是发生在计划生成之前的,因此如果在收集完统计信息之后,又发生了大量的增删改操作,但是又没有及时的重新收集统计信息,这个时候就会可能导致计划生成因为使用了过期的统计信息而不能准确的估算行数,从而不能选择到最优的执行计划。因此,为了解决这类问题,OceanBase 数据库优化器支持了实时的存储层估行机制,以获得更加实时准确的统计信息。
计划生成的时候,会基于表的索引生成多条基表的扫描路径,简称“基表路径”;由于 OceanBase 数据库存储层针对索引都是以树形结构存在,因此在计划生成的时候会针对每一个基表路径都会基于表的谓词条件生成多个查询范围区域(简称“QUERY RANGE”),以便快速扫描获取指定的数据区域,避免无效的数据扫描。因此,基于这个特点,OceanBase 数据库优化器会在计划生成的时候,把基表路径上的相关 QUERY RANGE 提前发给存储层,存储层快速的估算 QUERY RANGE 的总行数返回给优化器。从而优化器就能得到一个实时准确的行数,这一过程就是存储层估行。
但是并不是所有的基表路径都能够进行存储层估行的,由于存储层估行其实也是比较耗时的操作,为了计划生成的整体时间耗时考虑,对于存储层估行做了以下两个限制:
- 基表路径上的 QUERY RANGE 个数不能太多,当前限制不能超过 10 个,否则不进行存储层估行。
- 分区表基表路径上不能分区裁剪出只有一个分区,则不能进行存储层估行,非分区表无此限制。(经验证可以存储估行。裁剪后多分区取最大分区估行后*分区数)
需要注意的是,存储层估行仅仅应用于基表路径上,并不适用于其他算子,比如 JOIN/GROUP BY 等等。
JOIN列不能进行存储层估行
通过存储层估行的描述可以看到,其实现是基于拿到实际谓词实际值,生成QUERY RANGE,基于索引结构, 去实际实时进行估行的(基于B+TREE 分支),JOIN列只能在实际执行后才能拿到外表传入的JOIN列值,生成执行计划阶段没有值,所以不能进行存储层估行,只能基于统计信息的NDK等值进行估行。在数据倾斜,数据变化量大时,可能选择不好的执行路径。当然这不能算是OB的问题,因为机制上在PLAN生成阶段没办法实现。
测试验证
-
准备测试数据
drop table taba; create table taba(id number,id2 number,c1 varchar2(100)); insert into taba select rownum,rownum,'test'||rownum from dual connect by rownum<=100000; commit; update taba set id2=0 where id2<=50000; commit; create index idx_taba_id on taba(id); create index idx_taba_id2 on taba(id2); alter system major freeze; -
简单查询,id2=0,估行为29410,相对准确,所以选择的全表扫
obclient [SYS]> explain extended select * from taba where id2=0\G *************************** 1. row *************************** Query Plan: ==================================== |ID|OPERATOR |NAME|EST. ROWS|COST | ------------------------------------ |0 |TABLE SCAN|TABA|29410 |40790| ==================================== Outputs & filters: ------------------------------------- 0 - output([TABA.ID(0x7f714661c690)], [TABA.ID2(0x7f714661c290)], [TABA.C1(0x7f714661c990)]), filter([TABA.ID2(0x7f714661c290) = 0(0x7f714661bb70)]), access([TABA.ID2(0x7f714661c290)], [TABA.ID(0x7f714661c690)], [TABA.C1(0x7f714661c990)]), partitions(p0), is_index_back=false, filter_before_indexback[false], range_key([TABA.__pk_increment(0x7f7146667ab0)]), range(MIN ; MAX)always true Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "SYS.TABA"@"SEL$1") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- TABA:table_rows:100000, physical_range_rows:100000, logical_range_rows:100000, index_back_rows:0, output_rows:2, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[IDX_TABA_ID2,TABA], pruned_index_name[IDX_TABA_ID], estimation info[table_id:1100611139454067, (table_type:1, version:0-1717060578799299-1717060578799299, logical_rc:100000, physical_rc:100000), (table_type:7, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:5, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:0, version:1717060592120189-1717060592120189-9223372036854775807, logical_rc:0, physical_rc:0)] Parameters: ------------------------------------- 1 row in set (0.006 sec)physical_range_rows:100000 这里估出的是走主键索引需要扫描的行数。这里是全表,扫描的键值如下
range_key([TABA.__pk_increment(0x7f7146667ab0)]), range(MIN ; MAX)always true
估行是基于索引IDX_TABA_ID2的,我们可以通过HINT指定该索引查看执行计划physical_range_rows:29410,与EST. ROWS一致进行确认。
obclient [SYS]> explain extended select /*+index(taba IDX_TABA_ID2) */ * from taba where id2=0\G *************************** 1. row *************************** Query Plan: =================================================== |ID|OPERATOR |NAME |EST. ROWS|COST | --------------------------------------------------- |0 |TABLE SCAN|TABA(IDX_TABA_ID2)|29410 |113263| =================================================== Outputs & filters: ------------------------------------- 0 - output([TABA.ID(0x7f9f3125cb10)], [TABA.ID2(0x7f9f3125c710)], [TABA.C1(0x7f9f3125ce10)]), filter(nil), access([TABA.ID2(0x7f9f3125c710)], [TABA.ID(0x7f9f3125cb10)], [TABA.C1(0x7f9f3125ce10)]), partitions(p0), is_index_back=true, range_key([TABA.ID2(0x7f9f3125c710)], [TABA.__pk_increment(0x7f9f312a7f20)]), range(0,MIN ; 0,MAX), range_cond([TABA.ID2(0x7f9f3125c710) = 0(0x7f9f3125bff0)]) Used Hint: ------------------------------------- /*+ INDEX(@"SEL$1" "SYS.TABA"@"SEL$1" "IDX_TABA_ID2") */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$1" "SYS.TABA"@"SEL$1" "IDX_TABA_ID2") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- TABA:table_rows:100000, physical_range_rows:29410, logical_range_rows:29410, index_back_rows:29410, output_rows:29410, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[IDX_TABA_ID2], pruned_index_name[IDX_TABA_ID,TABA], estimation info[table_id:1100611139454069, (table_type:1, version:0-1717060578799299-1717060578799299, logical_rc:29410, physical_rc:29410), (table_type:7, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:5, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:0, version:1717060592120189-1717060592120189-9223372036854775807, logical_rc:0, physical_rc:0)] Parameters: ------------------------------------- 1 row in set (0.003 sec) -
最终测试SQL,JOIN列数据存在倾斜
explain extended select * from taba a,taba b where a.id=1 and a.id2=b.id2 obclient [SYS]> select * from taba where id=1; +------+------+-------+ | ID | ID2 | C1 | +------+------+-------+ | 1 | 0 | test1 | +------+------+-------+ 1 row in set (0.005 sec) id2 数据倾斜 obclient [SYS]> select count(*) from taba where id2=0; +----------+ | COUNT(*) | +----------+ | 50000 | +----------+ 1 row in set (0.033 sec) -
通过JOIN a.id2=b.id2,则估行为3,估行与实际行数严重不符,说明JOIN时是基于统计信息进行估行,在数据倾斜时估行不准确导致生成性能差的执行计划。
obclient [SYS]> explain extended select * from taba a,taba b where a.id=1 and a.id2=b.id2\G *************************** 1. row *************************** Query Plan: ==================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| ---------------------------------------------------- |0 |NESTED-LOOP JOIN| |3 |137 | |1 | TABLE SCAN |A(IDX_TABA_ID) |1 |92 | |2 | TABLE SCAN |B(IDX_TABA_ID2)|3 |44 |<<<该SQL JOIN时使用的id2=0,实际行数为5000,这时HASH JOIN更好 ==================================================== Outputs & filters: ------------------------------------- 0 - output([A.ID(0x7f714661d090)], [A.ID2(0x7f714661e210)], [A.C1(0x7f714661e910)], [B.ID(0x7f714661ec10)], [B.ID2(0x7f714661e510)], [B.C1(0x7f714661ef10)]), filter(nil), conds(nil), nl_params_([A.ID2(0x7f714661e210)]), batch_join=true 1 - output([A.ID(0x7f714661d090)], [A.ID2(0x7f714661e210)], [A.C1(0x7f714661e910)]), filter(nil), access([A.ID(0x7f714661d090)], [A.ID2(0x7f714661e210)], [A.C1(0x7f714661e910)]), partitions(p0), is_index_back=true, range_key([A.ID(0x7f714661d090)], [A.__pk_increment(0x7f7146670c60)]), range(1,MIN ; 1,MAX), range_cond([A.ID(0x7f714661d090) = 1(0x7f714661c970)]) 2 - output([B.ID2(0x7f714661e510)], [B.ID(0x7f714661ec10)], [B.C1(0x7f714661ef10)]), filter(nil), access([B.ID2(0x7f714661e510)], [B.ID(0x7f714661ec10)], [B.C1(0x7f714661ef10)]), partitions(p0), is_index_back=true, range_key([B.ID2(0x7f714661e510)], [B.__pk_increment(0x7f714667ab50)]), range(MIN ; MAX), range_cond([? = B.ID2(0x7f714661e510)(0x7f714668bf50)]) Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA LEADING(@"SEL$1" ("SYS.A"@"SEL$1" "SYS.B"@"SEL$1" )) USE_NL(@"SEL$1" ("SYS.B"@"SEL$1" )) PQ_DISTRIBUTE(@"SEL$1" ("SYS.B"@"SEL$1" ) LOCAL LOCAL) NO_USE_NL_MATERIALIZATION(@"SEL$1" ("SYS.B"@"SEL$1" )) INDEX(@"SEL$1" "SYS.A"@"SEL$1" "IDX_TABA_ID") INDEX(@"SEL$1" "SYS.B"@"SEL$1" "IDX_TABA_ID2") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- A:table_rows:100000, physical_range_rows:1, logical_range_rows:1, index_back_rows:1, output_rows:1, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[IDX_TABA_ID,IDX_TABA_ID2], unstable_index_name[TABA], estimation info[table_id:1100611139454068, (table_type:1, version:0-1717060578799299-1717060578799299, logical_rc:1, physical_rc:1), (table_type:7, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:5, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:0, version:1717060592120189-1717060592120189-9223372036854775807, logical_rc:0, physical_rc:0)] B:table_rows:100000, physical_range_rows:2, logical_range_rows:2, index_back_rows:2, output_rows:2, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[IDX_TABA_ID2], pruned_index_name[IDX_TABA_ID], unstable_index_name[TABA], estimation info[table_id:1100611139454069, (table_type:1, version:0-1717060578799299-1717060578799299, logical_rc:100000, physical_rc:100000), (table_type:7, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:5, version:1717060568710874-1717060568710874-1717060592120189, logical_rc:0, physical_rc:0), (table_type:0, version:1717060592120189-1717060592120189-9223372036854775807, logical_rc:0, physical_rc:0)] Parameters: ------------------------------------- 1 row in set (0.006 sec)




