7.5 如何诊断和调优 OceanBase SQL 执行计划
如何使用 EXPLAIN 查看理论执行计划
查看执行计划需要一个好的客户端,这样能将注意力集中在执行计划上。推荐使用以下客户端:
命令行 OBClient 命令。OBClient 的好处是格式化显示结果,当一行结果太长的时候格式会不美观,这个时候可以让 SQL 以
\G替换;结尾,结果集会列传行展示,提高可读性。OceanBase 官方客户端 ODC。直接选中 SQL,点击右上角的执行计划,会以表格形式展示执行计划。
开源的 DBeaver 客户端工具。由于 DBeaver 和 OceanBase 还没有紧密适配,所以只能以 EXPLAIN SQL 的方式查看执行计划,目前这种方式体验最好。
EXPLAIN 说明
EXPLAIN 命令完整的语法是:
{EXPLAIN | DESCRIBE | DESC}
[BASIC | OUTLINE | EXTENDED | EXTENDED_NOADDR | PARTITIONS | FORMAT = {TRADITIONAL| JSON}]
{SELECT statement | DELETE statement | INSERT statement | REPLACE statement| UPDATE statement}学习遵从由简入繁原则,先从默认的 BASIC 格式入手,不用指定。FORMAT 有 TRADITIONAL 和 JSON 两种格式,默认是 TRADITIONAL 格式,可读性更好,JSON 格式对程序解析比较友好。
先看一个简单的 SQL 执行计划格式:
EXPLAIN SELECT count(*) FROM BMSQL_ITEM;
obclient> EXPLAIN
-> SELECT count(*) FROM BMSQL_ITEM \G
*************************** 1. row ***************************
Query Plan: ===============================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
-----------------------------------------------
|0 |SCALAR GROUP BY| |1 |78754|
|1 | TABLE SCAN |BMSQL_ITEM|99995 |59653|
===============================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_COUNT(*)]), filter(nil),
group(nil), agg_func([T_FUN_COUNT(*)])
1 - output([1]), filter(nil),
access([BMSQL_ITEM.I_ID]), partitions(p0)
1 row in set (0.01 sec)整个 EXPLAIN 结果是一行。
在 OceanBase 内部,这个结果是以 JSON 格式存储。示例如下:
{
"ID": 1,
"OPERATOR": "GROUP BY",
"NAME": "GROUP BY",
"EST.ROWS": 1,
"COST": 4352454,
"output": [
"T_FUN_COUNT(*)"
],
"CHILD_1": {
"ID": 0,
"OPERATOR": "TABLE SCAN",
"NAME": "TABLE SCAN",
"EST.ROWS": 10000000,
"COST": 2442330,
"output": [
"1"
]
}
}这个 JSON 内容描述的是一个树,对普通用户可读性不好。
传统格式的展示分为两部分:
第一部分是用表格形式展示执行计划这棵树。每行是一个独立的操作,操作符是
OPERATOR,操作有 ID。操作展示可能会缩进。缩进表示是内部操作,可以嵌套。执行顺序遵循由内到外,由上到下。操作符支持的内容也是 SQL 引擎成熟度的一个体现。OPERATOR:表示操作算子的名称,TABLE SCAN是常用操作算子,表示扫描。NAME:表示算子操作的对象。可以是表名、索引名、内部临时视图名。需要注意的是,如果扫描主键,依然展示表名。因为 OceanBase 里的表和索引的本质都是索引组织表,表数据跟主键索引是一个概念。EST. ROWS:执行当前算子输出的行数,跟统计信息有关。OceanBase 里表的统计信息目前只有在集群合并的时候才更新。COST:执行当前算子预估的成本。COST 计算比较复杂,暂时先不深入。
第二部分的内容跟第一部分有关,主要是描述第一部分算子的具体信息。
output:表示当前算子输出的表达式(包含列)。filter:表示当前算子的过滤表达式,nil表示无。如果当前算子是访问存储层,这个过滤表达式可以下推(push)。
常见执行计划算子
TABLE GET
表示主键直接等值访问,后面接表名。OceanBase 里主键就是表数据。
EXPLAIN
SELECT i_id, i_name, I_PRICE FROM BMSQL_ITEM WHERE i_id=10
\G
*************************** 1. row ***************************
Query Plan: ========================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------
|0 |TABLE GET|BMSQL_ITEM|1 |53 |
========================================
Outputs & filters:
-------------------------------------
0 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE]), filter(nil),
access([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE]), partitions(p0)
1 row in set (0.00 sec)TABLE SCAN
表示全表扫描、主键扫描或索引扫描。具体看后面的操作对象名是表还是索引。
注意
扫描主键也是表名。
CREATE UNIQUE INDEX idx_item_uk ON bmsql_item(i_name);
EXPLAIN extended_noaddr
SELECT i_id, i_name, I_PRICE FROM BMSQL_ITEM WHERE i_name = 'w2uw7BJj5tG5BTlSdfT'
\G
*************************** 1. row ***************************
Query Plan: ======================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
------------------------------------------------------
|0 |TABLE SCAN|BMSQL_ITEM(IDX_ITEM_UK)|1 |88 |
======================================================
Outputs & filters:
-------------------------------------
0 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE]), filter(nil),
access([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE]), partitions(p0),
is_index_back=true,
range_key([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.shadow_pk_0]), range(w2uw7BJj5tG5BTlSdfT,MIN ; w2uw7BJj5tG5BTlSdfT,MAX),
range_cond([BMSQL_ITEM.I_NAME = 'w2uw7BJj5tG5BTlSdfT'])
1 row in set (0.00 sec)说明:
range_cond:条件中能用于索引的列。range_key:索引中能用于过滤的列及其值域。
TOP-N SORT 和 LIMIT
TOP-N SORT:常用的场景排序后可能只返回最大或最小的前 N 条记录。
LIMIT:限制返回的行数。
MySQL [tpccdb]> explain extended_noaddr SELECT i_id, i_name FROM BMSQL_ITEM WHERE i_name LIKE 'w2%' ORDER BY I_PRICE DESC limit 10 \G
输出:
*************************** 1. row ***************************
Query Plan: ========================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------------
|0 |LIMIT | |10 |479 |
|1 | TOP-N SORT | |10 |478 |
|2 | TABLE SCAN|bmsql_item(idx_item_uk)|61 |406 |
========================================================
Outputs & filters:
-------------------------------------
0 - output([bmsql_item.i_id], [bmsql_item.i_name]), filter(nil), limit(10), offset(nil)
1 - output([bmsql_item.i_id], [bmsql_item.i_name]), filter(nil), sort_keys([bmsql_item.i_price, DESC]), topn(10)
2 - output([bmsql_item.i_name], [bmsql_item.i_id], [bmsql_item.i_price]), filter(nil),
access([bmsql_item.i_name], [bmsql_item.i_id], [bmsql_item.i_price]), partitions(p0),
is_index_back=true,
range_key([bmsql_item.i_name], [bmsql_item.shadow_pk_0]), range(w2,MIN ; w2������������������������������� ,MAX),
range_cond([(T_OP_LIKE, bmsql_item.i_name, 'w2%', '\\')])
1 row in set (0.002 sec)NESTED-LOOP JOIN
EXPLAIN extended_noaddr SELECT w.W_ID , w.W_NAME ,w.W_CITY ,d.D_ID ,d.D_NAME ,d.D_STATE FROM BMSQL_WAREHOUSE w JOIN BMSQL_DISTRICT d ON (w.w_id = d.D_W_ID) WHERE w.W_ID =1 ORDER BY D.D_name ;
输出:
====================================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
----------------------------------------------------
|0 |SORT | |10 |154 |
|1 | NESTED-LOOP JOIN CARTESIAN| |10 |97 |
|2 | TABLE GET |W |1 |53 |
|3 | TABLE SCAN |D |10 |39 |
====================================================
Outputs & filters:
-------------------------------------
0 - output([W.W_ID], [W.W_NAME], [W.W_CITY], [D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil), sort_keys([D.D_NAME, ASC])
1 - output([W.W_ID], [W.W_NAME], [W.W_CITY], [D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil),
conds(nil), nl_params_(nil), inner_get=false, self_join=false, batch_join=false
2 - output([W.W_ID], [W.W_NAME], [W.W_CITY]), filter(nil),
access([W.W_ID], [W.W_NAME], [W.W_CITY]), partitions(p0),
is_index_back=false,
range_key([W.W_ID]), range[1 ; 1],
range_cond([W.W_ID = 1])
3 - output([D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil),
access([D.D_ID], [D.D_NAME], [D.D_STATE]), partitions(p0),
is_index_back=false,
range_key([D.D_W_ID], [D.D_ID]), range(1,MIN ; 1,MAX),
range_cond([D.D_W_ID = 1])说明:
这个算法的整体性能取决于外部表返回的记录数(循环的次数)和内部表的查询性能。
经验是小表作为外部表,大表作为内部表。不过实际并不是按照表的大小区分,而是由过滤条件应用后的结果集大小来定。可以对比下面 SQL 的执行计划。
两个表的访问都走了主键索引,主键就是数据,所以执行计划中每个算子后面的名称(
NAME)都是表名。注意 2 和 3 里的is_index_back=false。通过 2 和 3 的
range_cond可知,这里针对内部表和外部表扫描时都传入了条件W_ID=1。所以这里的JOIN条件被消除,外层连接就变为笛卡儿积(CARTESIAN)。这也是常用的一个优化手段。
MERGE JOIN
MERGE JOIN 主要用于两个不是很小或很大的结果集的连接,它们没有有效的过滤条件或者这个条件上没有合适的索引。
MERGE JOIN 算法基本分两大阶段:
排序,将两个结果集分别按连接字段排序。
合并,分别从两个结果集里读取记录,进行比较、遍历等。
如果结果集本来就是有序的,那么第一阶段可以优化。MERGE JOIN 可以用于等值运算,也可以用于不等值运算(小于、大于、小于等于、大于等于)。
MERGE JOIN 主要利用数据主键或者索引的有序,此时它的性能有可能会更好。数据量非常大的时候,MERGE JOIN 性能并不是很好,要设法规避。
EXPLAIN extended_noaddr SELECT w.W_ID , w.W_NAME ,w.W_CITY ,d.D_ID ,d.D_NAME ,d.D_STATE FROM BMSQL_WAREHOUSE w JOIN BMSQL_DISTRICT d ON (w.w_id = d.D_W_ID) ORDER BY d.D_NAME ;
输出:
======================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
--------------------------------------
|0 |SORT | |1000 |10093|
|1 | MERGE JOIN | |1000 |1312 |
|2 | TABLE SCAN|W |100 |59 |
|3 | TABLE SCAN|D |1000 |426 |
======================================
Outputs & filters:
-------------------------------------
0 - output([W.W_ID], [W.W_NAME], [W.W_CITY], [D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil), sort_keys([D.D_NAME, ASC])
1 - output([W.W_ID], [W.W_NAME], [W.W_CITY], [D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil),
equal_conds([W.W_ID = D.D_W_ID]), other_conds(nil)
2 - output([W.W_ID], [W.W_NAME], [W.W_CITY]), filter(nil),
access([W.W_ID], [W.W_NAME], [W.W_CITY]), partitions(p0),
is_index_back=false,
range_key([W.W_ID]), range(MIN ; MAX)always true
3 - output([D.D_W_ID], [D.D_ID], [D.D_NAME], [D.D_STATE]), filter(nil),
access([D.D_W_ID], [D.D_ID], [D.D_NAME], [D.D_STATE]), partitions(p0),
is_index_back=false,
range_key([D.D_W_ID], [D.D_ID]), range(MIN,MIN ; MAX,MAX)always true说明:
两个表在连接条件列上都有主键索引(即数据),所以返回的数据都是有序的,此处使用了
MERGE JOIN。在外层再根据排序条件执行排序(
SORT)。MERGE JOIN可以临时排序的大小受集群隐含参数(_sort_area_size)限制。
HASH JOIN
HASH JOIN 用于两个比较大的结果集之间的连接,通常没有比较好的过滤条件或者过滤条件上没有索引。
EXPLAIN extended_noaddr SELECT c.C_W_ID , c.C_D_ID , c.C_FIRST || ',' || c.C_LAST , c.C_PAYMENT_CNT , c.C_YTD_PAYMENT , o.O_W_ID , o.O_D_ID ,o.O_OL_CNT FROM BMSQL_CUSTOMER c JOIN BMSQL_OORDER o ON (c.C_ID = o.O_C_ID) WHERE c.C_W_ID = 1 AND c.C_D_ID = 5 AND (c.C_W_ID <> o.O_W_ID OR c.C_D_ID <> o.O_D_ID ) ORDER BY c.C_W_ID , c.C_D_ID , c.C_LAST , o.O_W_ID , o.O_D_ID ;
输出:
=========================================
|ID|OPERATOR |NAME|EST. ROWS|COST |
-----------------------------------------
|0 |SORT | |2238162 |45593377|
|1 | HASH JOIN | |2238162 |3764643 |
|2 | TABLE SCAN|C |2973 |3515 |
|3 | TABLE SCAN|O |3019348 |785737 |
=========================================
Outputs & filters:
-------------------------------------
0 - output([C.C_W_ID], [C.C_D_ID], [(T_OP_CNN, (T_OP_CNN, C.C_FIRST, ?), C.C_LAST)], [C.C_PAYMENT_CNT], [C.C_YTD_PAYMENT], [O.O_W_ID], [O.O_D_ID], [O.O_OL_CNT]), filter(nil), sort_keys([C.C_LAST, ASC], [O.O_W_ID, ASC], [O.O_D_ID, ASC])
1 - output([C.C_W_ID], [C.C_D_ID], [C.C_FIRST], [C.C_LAST], [C.C_PAYMENT_CNT], [C.C_YTD_PAYMENT], [O.O_W_ID], [O.O_D_ID], [O.O_OL_CNT]), filter(nil),
equal_conds([C.C_ID = O.O_C_ID]), other_conds([C.C_W_ID != O.O_W_ID OR C.C_D_ID != O.O_D_ID])
2 - output([C.C_ID], [C.C_W_ID], [C.C_D_ID], [C.C_FIRST], [C.C_LAST], [C.C_PAYMENT_CNT], [C.C_YTD_PAYMENT]), filter(nil),
access([C.C_ID], [C.C_W_ID], [C.C_D_ID], [C.C_FIRST], [C.C_LAST], [C.C_PAYMENT_CNT], [C.C_YTD_PAYMENT]), partitions(p0),
is_index_back=false,
range_key([C.C_W_ID], [C.C_D_ID], [C.C_ID]), range(1,5,MIN ; 1,5,MAX),
range_cond([C.C_W_ID = 1], [C.C_D_ID = 5])
3 - output([O.O_C_ID], [O.O_W_ID], [O.O_D_ID], [O.O_OL_CNT]), filter(nil),
access([O.O_C_ID], [O.O_W_ID], [O.O_D_ID], [O.O_OL_CNT]), partitions(p0),
is_index_back=false,
range_key([O.O_W_ID], [O.O_D_ID], [O.O_ID]), range(MIN,MIN,MIN ; MAX,MAX,MAX)always true说明:
尽管条件
(c.C_W_ID <> o.O_W_ID OR c.C_D_ID <> o.O_D_ID )不在 JOIN 的 ON 条件里,优化器还是能识别出来并自动应用,并且将条件分类为equal_conds和other_conds。HASH JOIN也分外部表和内部表。内部表是probe table,外部表是hash table。通常数据库会挑选结果集相对小的表作为外部表,并在连接条件上用哈希函数构建
hash table。然后循环遍历probe table,对连接条件列用哈希函数,探测是否在hash table中存在。如果存在,则返回匹配的记录。这个跟
NESTED-LOOP JOIN很类似,不同之处是HASH JOIN会在连接条件列上用哈希函数,并在内存中构建hash table。OceanBase 优化器一次能构建的最大
hash table受内部参数(_hash_table_size)限制。如果外部表的结果集比这个大,就需要分多次构建hash table,这个也叫multiple pass,会涉及到一些内存和文件数据交换,以及多次哈希探测,性能相对会下降一些。HASH JOIN的细节比较复杂,此处不详细讨论。目前只要能识别出HASH JOIN,以及掌握产生后如何规避HASH JOIN算法。
SUBPLAN SCAN 和 COUNT
算子 SUBPLAN SCAN 跟 TABLE SCAN 类似,不同的是:
SUBPLAN SCAN是从视图(包括内部临时生成的)里读取数据。TABLE SCAN是从基表(或者索引)里扫描数据。
还是前面列举的分页查询场景例子。由于在 SELECT 后面增加了一列排序序号(计算出来的),执行计划就多了两步。
EXPLAIN extended_noaddr SELECT i_id, i_name, i_price, i_data , rn FROM ( SELECT rownum rn, i_id, i_name, I_PRICE, I_DATA FROM ( SELECT i_id, i_name, i_price , i_data FROM BMSQL_ITEM WHERE i_name LIKE 'wu%' ORDER BY i_price DESC ) t WHERE rownum <= 15 ) WHERE rn > 10 ;
输出:
============================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
------------------------------------------------------------
|0 |COUNT | |5 |332 |
|1 | SUBPLAN SCAN |T |5 |331 |
|2 | LIMIT | |5 |330 |
|3 | TOP-N SORT | |15 |328 |
|4 | TABLE SCAN|BMSQL_ITEM(BMSQL_ITEM_UK)|30 |245 |
============================================================
Outputs & filters:
-------------------------------------
0 - output([T.I_ID], [T.I_NAME], [T.I_PRICE], [T.I_DATA], [rownum() + ?]), filter(nil)
1 - output([T.I_ID], [T.I_NAME], [T.I_PRICE], [T.I_DATA]), filter(nil),
access([T.I_ID], [T.I_NAME], [T.I_PRICE], [T.I_DATA])
2 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA]), filter(nil), limit(?), offset(?)
3 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA]), filter(nil), sort_keys([BMSQL_ITEM.I_PRICE, DESC]), topn(? + ?)
4 - output([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA]), filter(nil),
access([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA]), partitions(p0),
is_index_back=true,
range_key([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.shadow_pk_0]), range(wu,MIN ; wu������������������������������� ,MAX),
range_cond([(T_OP_LIKE, BMSQL_ITEM.I_NAME, ?, '\')])上面 COUNT 算子是为了兼容 Oracle 的 rownum 功能,作用在于为 rownum 表达式进行自增操作。
在一般场景下,当 SQL 查询含有 rownum 时,SQL 优化器就会在生成计划的时候分配一个 COUNT 算子。如果 SELECT 里不涉及到 rownum 值的展示时,COUNT 算子也可能会被优化为 LIMIT 算子。
EXPLAIN extended_noaddr SELECT * FROM ( SELECT i_id, i_name FROM BMSQL_ITEM WHERE i_name LIKE 'w2u%' ORDER BY I_PRICE DESC ) WHERE rownum < 5 ;
输出:
==========================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------------
|0 |LIMIT | |1 |91 |
|1 | TOP-N SORT | |1 |90 |
|2 | TABLE SCAN|BMSQL_ITEM(BMSQL_ITEM_UK)|1 |88 |
==========================================================
Outputs & filters:
-------------------------------------
0 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME]), filter(nil), limit(?), offset(nil)
1 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME]), filter(nil), sort_keys([BMSQL_ITEM.I_PRICE, DESC]), topn(?)
2 - output([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE]), filter(nil),
access([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE]), partitions(p0),
is_index_back=true,
range_key([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.shadow_pk_0]), range(w2u,MIN ; w2u�������������������������������,MAX),
range_cond([(T_OP_LIKE, BMSQL_ITEM.I_NAME, ?, '\')])
EXCHANGE IN|OUT REMOTE
首先看要访问表的主副本节点,然后直连另外一个节点。人为构造一个远程执行计划。
主键扫描示例
explain extended_Noaddr select /*+ test20210405 */ * from bmsql_item where i_id=100\G obclient: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** Query Plan: =================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| --------------------------------------------------- |0 |EXCHANGE IN REMOTE | |1 |54 | |1 | EXCHANGE OUT REMOTE| |1 |53 | |2 | TABLE GET |BMSQL_ITEM|1 |53 | =================================================== Outputs & filters: ------------------------------------- 0 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil) 1 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil) 2 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil), access([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), partitions(p0), is_index_back=false, range_key([BMSQL_ITEM.I_ID]), range[100 ; 100], range_cond([BMSQL_ITEM.I_ID = 100])
Exchange算子是分布式场景下,用于线程间进行数据交互的算子。它一般成对出现,数据源端有一个out算子,目的端会有一个in算子。算子 2 是实际取数据的执行计划,
TABLE GET是主键访问数据的算子。算子 1 的
EXCHANGE OUT REMOTE在远端机器上负责读取并传输数据。算子 0 的
EXCHANGE IN REMOTE在本地节点接收数据。
索引扫描示例
explain extended_Noaddr select /*+ test20210405 */ * from bmsql_item where i_name LIKE 'Ax%'\G obclient: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** Query Plan: ==================================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------------------------- |0 |EXCHANGE IN REMOTE | |33 |294 | |1 | EXCHANGE OUT REMOTE| |33 |264 | |2 | TABLE SCAN |BMSQL_ITEM(BMSQL_ITEM_IDX1)|33 |264 | ==================================================================== Outputs & filters: ------------------------------------- 0 - output([BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil) 1 - output([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil) 2 - output([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), filter(nil), access([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID], [BMSQL_ITEM.I_PRICE], [BMSQL_ITEM.I_DATA], [BMSQL_ITEM.I_IM_ID]), partitions(p0), is_index_back=true, range_key([BMSQL_ITEM.I_NAME], [BMSQL_ITEM.I_ID]), range(Ax,MIN ; Ax������������������������������� ,MAX), range_cond([(T_OP_LIKE, BMSQL_ITEM.I_NAME, ?, '\')])
实际上业务都是通过 ODP 连接,能正确路由到 OBServer 节点上,很大程度规避了远程执行计划,不过并不能从根本上避免。后面还会举例说明。
主备切换如何影响执行计划
当 SQL 包含多表连接时,如果多表的主副本都在同一个 OBServer 节点上或是当前连接的节点,这就是个本地 SQL 执行计划(
plan_type= 1)。当多表的主副本是分散在不同的 OBServer 节点上时,这就是分布式 SQL 执行计划(
plan_type=3)。当多表的主副本都在同一个 OBServer 节点上但不是当前连接的节点,这就是一个远程 SQL 执行计划(
plan_type=2)。
通常我们不直接调整分区主副本的位置,而是通过调整表或表分组、数据库或者租户的 PRIMARY_ZONE 来间接改变表主副本的位置。理论上改变了主副本的位置后,执行计划也会变化。
实际情况下,由于分区主副本位置的变动信息 ODP 并不会立即获得,所以有时候观察 EXPLAIN 执行计划也不会变,这点需要留意。
查看表的主副本位置。
SELECT a.tenant_name, d.database_name, tg.tablegroup_name ,t.table_name, t.part_num , t2.partition_id, t2.ZONE, t2.svr_ip , t2.svr_port, round(t2.data_size/1024/1024) data_size_mb, t2.row_count
FROM oceanbase.__all_tenant AS a
JOIN oceanbase.__all_virtual_database AS d ON ( a.tenant_id = d.tenant_id )
JOIN oceanbase.__all_virtual_table AS t ON (t.tenant_id = d.tenant_id AND t.database_id = d.database_id)
JOIN oceanbase.__all_virtual_meta_table t2 ON (t.tenant_id = t2.tenant_id AND (t.table_id=t2.table_id OR t.tablegroup_id=t2.table_id) AND t2.ROLE IN (1) )
LEFT JOIN oceanbase.__all_virtual_tablegroup AS tg ON (t.tenant_id = tg.tenant_id and t.tablegroup_id = tg.tablegroup_id)
WHERE a.tenant_id IN (1002 ) AND t.table_type IN (3)
AND d.database_name = 'tpccdb' and table_name in ('bmsql_item','bmsql_oorder')
ORDER BY t.tenant_id, tg.tablegroup_name, d.database_name, t.table_name, t2.partition_id
;输出:
+-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ | tenant_name | database_name | tablegroup_name | table_name | part_num | partition_id | ZONE | svr_ip | svr_port | data_size_mb | row_count | +-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ | obmysql | tpccdb | NULL | bmsql_item | 1 | 0 | zone3 | 172.20.249.51 | 2882 | 7 | 100000 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 0 | zone3 | 172.20.249.51 | 2882 | 1 | 139802 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 1 | zone2 | 172.20.249.49 | 2882 | 2 | 203152 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 2 | zone1 | 172.20.249.52 | 2882 | 2 | 150858 | +-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ 4 rows in set (0.053 sec)
从结果可以看出,表 bmsql_item 是个单分区的普通表,主副本在节点 172.20.249.51 上。 表 bmsql_oorder 有 3 个分区的表,主副本分别在三个节点上。
查看以下 SQL 的执行计划。
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 \G
输出:
*************************** 1. row ***************************
Query Plan: ==========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
------------------------------------------
|0 |LIMIT | |10 |407 |
|1 | NESTED-LOOP JOIN| |10 |406 |
|2 | TABLE SCAN |o |10 |39 |
|3 | TABLE GET |i |1 |36 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=true
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
3 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.008 sec)从上面可以看出 TABLE SCAN 访问表 bmsql_oorder 的时候,访问的是分区 partitions(p0)。0 号分区的主副本在 zone3 的节点 172.20.249.51 上,跟表 bmsql_item 在同一个节点,所以这是个本地 JOIN。
如果调整表 bmsql_item 的 PRIMARY_ZONE ,换一个节点,再看看这个 SQL。
MySQL [tpccdb]> alter table bmsql_item primary_zone='zone1'; Query OK, 0 rows affected (0.043 sec)
查看主副本实际位置。
+-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ | tenant_name | database_name | tablegroup_name | table_name | part_num | partition_id | ZONE | svr_ip | svr_port | data_size_mb | row_count | +-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ | obmysql | tpccdb | NULL | bmsql_item | 1 | 0 | zone1 | 172.20.249.52 | 2882 | 7 | 100000 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 0 | zone3 | 172.20.249.51 | 2882 | 1 | 139802 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 1 | zone2 | 172.20.249.49 | 2882 | 2 | 203152 | | obmysql | tpccdb | tpcc_group | bmsql_oorder | 3 | 2 | zone1 | 172.20.249.52 | 2882 | 2 | 150858 | +-------------+---------------+-----------------+--------------+----------+--------------+-------+---------------+----------+--------------+-----------+ 4 rows in set (0.039 sec)
进 sys 租户查看 bmsql_item 主副本在 zone1 的 节点 172.20.249.52 上了。 再进业务租户查看 SQL 执行计划。
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 \G
输出:
*************************** 1. row ***************************
Query Plan: ==========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
------------------------------------------
|0 |LIMIT | |10 |407 |
|1 | NESTED-LOOP JOIN| |10 |406 |
|2 | TABLE SCAN |o |10 |39 |
|3 | TABLE GET |i |1 |36 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=true
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
3 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.004 sec)跟前面执行计划比,没有发生变化。这个不符合理论。
其原因就是当分区主备副本发生切换后,OceanBase 数据库和 ODP 内部的分区 LOCATION CACHE 并不是立即更新,而是在 SQL 第一次执行的时候才触发更新。EXPLAIN 查看执行计划的时候,并不会执行 SQL,所以看到的不是实际的执行计划,而是理论的。
我们可以简单访问一下表 bmsql_item 触发 OceanBase 数据库内部 LOCATION CACHE 更新。然后再查看上面 SQL 的执行计划。
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 \G
输出:
*************************** 1. row ***************************
Query Plan: ==================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------
|0 |LIMIT | |10 |409 |
|1 | NESTED-LOOP JOIN | |10 |408 |
|2 | TABLE SCAN |o |10 |39 |
|3 | PX COORDINATOR | |1 |37 |
|4 | EXCHANGE OUT DISTR|:EX10000|1 |36 |
|5 | TABLE GET |i |1 |36 |
==================================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=false
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
3 - output([i.i_name], [i.i_price]), filter(nil)
4 - output([i.i_name], [i.i_price]), filter(nil), is_single, dop=1
5 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.004 sec)从上面执行计划可以看出,对表 bmsql_item 的访问变成远程访问了 EXCHANGE OUT DISTR。并且由于 bmsql_oorder 的条件过滤后会有很多记录,所以这里远程访问 bmsql_item 的数据还用了并行(PX COORDINATOR)。
如上更新的是表 bmsql_item 而不是 bmsql_oorder 也是有原因的,因为这个连接 SQL 被 ODP 路由到哪个 OBServer 节点是由 from 语句后的表决定的,即 bmsql_oorder 的 o.o_w_Id=3 对应的分区的主副本。这个演示是先保证 ODP 路由的节点是固定的,方便对比。
下面我们调整这个 SQL 里的表连接顺序再试一次。先回滚 bmsql_item 的 PRIMARY_ZONE 变更。
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_item i , bmsql_oorder o where o.o_id = i.i_id and o.o_w_id = 3 limit 10\G
输出:
*************************** 1. row ***************************
Query Plan: ==========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
------------------------------------------
|0 |LIMIT | |10 |407 |
|1 | NESTED-LOOP JOIN| |10 |406 |
|2 | TABLE SCAN |o |10 |39 |
|3 | TABLE GET |i |1 |36 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=true
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
3 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.003 sec)调整表 bmsql_item 的 PRIMARY_ZONE 到 zone1 后,查看上面 SQL 的 EXPLAIN 结果,跟前面一样,没有发生变化。
那么再次使用前面的技巧,对 bmsql_item 访问一次。
MySQL [tpccdb]> select count(*) from bmsql_item;
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.083 sec)
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_item i , bmsql_oorder o where o.o_id = i.i_id and o.o_w_id = 3 limit 10\G
*************************** 1. row ***************************
Query Plan: ==================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------
|0 |LIMIT | |10 |409 |
|1 | NESTED-LOOP JOIN | |10 |408 |
|2 | TABLE SCAN |o |10 |39 |
|3 | PX COORDINATOR | |1 |37 |
|4 | EXCHANGE OUT DISTR|:EX10000|1 |36 |
|5 | TABLE GET |i |1 |36 |
==================================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=false
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
3 - output([i.i_name], [i.i_price]), filter(nil)
4 - output([i.i_name], [i.i_price]), filter(nil), is_single, dop=1
5 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.003 sec)从上面结果可见,执行计划再次发生变化,不过跟前面那个 SQL 的变化是一样的,对表 bmsql_item 的访问依然是远程访问。
看到这里您可能就会奇怪,既然 SQL 被路由到表 bmsql_item 的主副本所在的节点了,为什么对表 bmsql_item 的访问还是远程访问?
这个问题就出在 EXPLAIN 被路由到的节点上。ODP 对 EXPLAIN 语句的路由第一次是随机的,后面每次都跟随前面的 EXPLAIN 语句的路由。虽然上面改变了表 bmsql_item 的位置,但是 EXPLAIN 语句依然发往同一个节点(即 bmsql_oorder 所在的节点),在那个节点上生产的执行计划就永远是上面这种。
以上 SQL 被路由到哪个节点信息,可以通过 SQL审计视图查询。
SELECT /*+ read_consistency(weak) ob_querytimeout(100000000) */ substr(usec_to_time(request_time),1,19) request_time_, s.svr_ip, s.plan_id, s.client_Ip, s.sid,s.tenant_id, s.tenant_name, s.user_name, s.db_name, s.query_sql, s.affected_rows, s.return_rows, s.ret_code, s.event, s.elapsed_time, s.queue_time, s.execute_time, round(s.request_memory_used/1024/1024/1024,2) req_mem_mb, plan_type, is_executor_rpc, is_inner_sql, TRANSACTION_HASH ,trace_id FROM gv$sql_audit s WHERE 1=1 and s.tenant_id = 1002 and user_name='u_tpcc' and query_sql like '%bmsql_item%' and request_time >= time_to_usec(date_sub(CURRENT_TIMESTAMP, interval 30 minute )) ORDER BY request_time DESC LIMIT 100;
如下可以换一个 ODP,或者把之前的 ODP KILL 掉重新拉起。再看看上面 SQL 的执行计划,就变成我们期望的那种执行计划了。
MySQL [tpccdb]> explain extended_noaddr select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_item i , bmsql_oorder o where o.o_id = i.i_id and o.o_w_id = 3 limit 10\G
输出:
*************************** 1. row ***************************
Query Plan: ==================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------
|0 |LIMIT | |10 |409 |
|1 | NESTED-LOOP JOIN | |10 |408 |
|2 | PX COORDINATOR | |10 |40 |
|3 | EXCHANGE OUT DISTR|:EX10000|10 |39 |
|4 | TABLE SCAN |o |10 |39 |
|5 | TABLE GET |i |1 |36 |
==================================================
Outputs & filters:
-------------------------------------
0 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil), limit(10), offset(nil)
1 - output([o.o_w_id], [o.o_d_id], [o.o_id], [i.i_name], [i.i_price], [o.o_c_id]), filter(nil),
conds(nil), nl_params_([o.o_id]), batch_join=true
2 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil)
3 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil), is_single, dop=1
4 - output([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), filter(nil),
access([o.o_w_id], [o.o_id], [o.o_d_id], [o.o_c_id]), partitions(p0),
is_index_back=false,
range_key([o.o_w_id], [o.o_d_id], [o.o_id]), range(3,MIN,MIN ; 3,MAX,MAX),
range_cond([o.o_w_id = 3])
5 - output([i.i_name], [i.i_price]), filter(nil),
access([i.i_name], [i.i_price]), partitions(p0),
is_index_back=false,
range_key([i.i_id]), range(MIN ; MAX),
range_cond([? = i.i_id])
1 row in set (0.030 sec)结论:
EXPLAIN 看执行计划还是很方便的,执行计划的关键是表连接算法和顺序、索引等等。这些都能通过 EXPLAIN 看出问题。
EXPLAIN 的执行计划可能跟期望的执行计划有些差别,往往是 LOCATION CACHE 和 SQL 路由导致的,请牢记这两个影响因素。
在生产环境中,表被频繁访问的时候,即使分区主副本切换了,OceanBase 数据库也会很快更新路由。虽然是被动更新,代价就是主备副本切换后业务第一次访问的 SQL 会变慢(早期 OceanBase 版本是报错,后期版本改为 OceanBase 内部重试,所以会略微变慢。重试后,OceanBase 数据库的 LOCATION CACHE 就更新了)。
如何查看 SQL 实际执行计划
上面内容介绍说 EXPLAIN 执行计划跟期望的执行计划有差别,这个不好说,但是我们可以查看 SQL 实际执行计划来验证是不是有差异。查看 SQL 的实际执行计划要求 SQL 执行过。
运行下面两个 SQL,查看 SQL 审计视图,获取执行节点 和 PLAN_ID 信息。
select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 ;
select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_item i , bmsql_oorder o where o.o_id = i.i_id and o.o_w_id = 3 limit 10 ;
SELECT /*+ read_consistency(weak) ob_querytimeout(100000000) */ substr(usec_to_time(request_time),1,19) request_time_, s.svr_ip, s.client_Ip, s.sid,s.tenant_id, s.tenant_name, s.user_name, s.db_name, s.query_sql, s.plan_id, s.plan_type, s.affected_rows, s.return_rows, s.ret_code, s.event, s.elapsed_time, s.queue_time, s.execute_time
FROM oceanbase.gv$sql_audit s
WHERE 1=1 and s.tenant_id = 1002
and user_name='u_tpcc' and query_sql like 'select o.o_w_id%'
and request_time >= time_to_usec(date_sub(CURRENT_TIMESTAMP, interval 5 minute ))
ORDER BY request_time DESC
LIMIT 10 \G
# 输出:
*************************** 1. row ***************************
request_time_: 2021-10-05 11:24:50
svr_ip: 172.20.249.52
client_Ip: 172.20.249.51
sid: 3221668666
tenant_id: 1002
tenant_name: obmysql
user_name: u_tpcc
db_name: tpccdb
query_sql: select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_item i , bmsql_oorder o where o.o_id = i.i_id and o.o_w_id = 3 limit 10
plan_id: 3305
plan_type: 3
affected_rows: 0
return_rows: 10
ret_code: 0
event: default condition wait
elapsed_time: 20058
queue_time: 73
execute_time: 19726
*************************** 2. row ***************************
request_time_: 2021-10-05 11:24:46
svr_ip: 172.20.249.51
client_Ip: 172.20.249.51
sid: 3222238517
tenant_id: 1002
tenant_name: obmysql
user_name: u_tpcc
db_name: tpccdb
query_sql: select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10
plan_id: 273
plan_type: 3
affected_rows: 0
return_rows: 10
ret_code: 0
event: system internal wait
elapsed_time: 141562
queue_time: 48
execute_time: 139714
2 rows in set (0.119 sec)其中 tenant_id、svr_ip、svr_port 和 plan_id 列信息很重要。查看视图 gv$plan_cache_plan_explain 需要这些字段信息。
MySQL [tpccdb]> SELECT ip, plan_depth, plan_line_id,operator,name,rows,cost,property from oceanbase.`gv$plan_cache_plan_explain` WHERE tenant_id=1002 AND ip = '172.20.249.52' AND port=2882 AND plan_id=3305; +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ip | plan_depth | plan_line_id | operator | name | rows | cost | property | +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 172.20.249.52 | 0 | 0 | PHY_LIMIT | NULL | 10 | 408 | NULL | | 172.20.249.52 | 1 | 1 | PHY_NESTED_LOOP_JOIN | NULL | 10 | 407 | NULL | | 172.20.249.52 | 2 | 2 | PHY_PX_FIFO_COORD | NULL | 10 | 39 | NULL | | 172.20.249.52 | 3 | 3 | PHY_PX_REDUCE_TRANSMIT | NULL | 10 | 38 | NULL | | 172.20.249.52 | 4 | 4 | PHY_TABLE_SCAN | o | 10 | 38 | table_rows:139802, physical_range_rows:47417, logical_range_rows:47407, index_back_rows:0, output_rows:47407, est_method:local_storage, avaiable_index_name[bmsql_oorder], estimation info[table_id:1101710651081625, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:47407, physical_rc:47407), (table_type:7, version:1633335566720232-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:5, version:1633335566720232-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:0, version:1633335792045894-1633335792045894-9223372036854775807, logical_rc:0, physical_rc:10)] | | 172.20.249.52 | 2 | 5 | PHY_TABLE_SCAN | i | 1 | 35 | table_rows:100000, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:1, est_method:local_storage, estimation info[table_id:1101710651081627, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:100000, physical_rc:100000), (table_type:7, version:1633335566970454-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:5, version:1633335566970454-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:0, version:1633335794237719-1633335794237719-9223372036854775807, logical_rc:0, physical_rc:1)] | +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 6 rows in set (0.030 sec) MySQL [tpccdb]> SELECT ip, plan_depth, plan_line_id,operator,name,rows,cost,property from oceanbase.`gv$plan_cache_plan_explain` where tenant_id=1002 AND ip = '172.20.249.51' AND port=2882 AND plan_id=273; +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ip | plan_depth | plan_line_id | operator | name | rows | cost | property | +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 172.20.249.51 | 0 | 0 | PHY_LIMIT | NULL | 10 | 408 | NULL | | 172.20.249.51 | 1 | 1 | PHY_NESTED_LOOP_JOIN | NULL | 10 | 407 | NULL | | 172.20.249.51 | 2 | 2 | PHY_TABLE_SCAN | o | 10 | 38 | table_rows:139802, physical_range_rows:47417, logical_range_rows:47407, index_back_rows:0, output_rows:47407, est_method:local_storage, avaiable_index_name[bmsql_oorder], estimation info[table_id:1101710651081625, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:47407, physical_rc:47407), (table_type:7, version:1633335563707246-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:5, version:1633335563707246-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:0, version:1633335792045894-1633335792045894-9223372036854775807, logical_rc:0, physical_rc:10)] | | 172.20.249.51 | 2 | 3 | PHY_PX_FIFO_COORD | NULL | 1 | 36 | NULL | | 172.20.249.51 | 3 | 4 | PHY_PX_REDUCE_TRANSMIT | NULL | 1 | 35 | NULL | | 172.20.249.51 | 4 | 5 | PHY_TABLE_SCAN | i | 1 | 35 | table_rows:100000, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:1, est_method:local_storage, estimation info[table_id:1101710651081627, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:100000, physical_rc:100000), (table_type:7, version:1633335563894016-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:5, version:1633335563894016-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:0, version:1633335794237719-1633335794237719-9223372036854775807, logical_rc:0, physical_rc:1)] | +---------------+------------+--------------+---------------------------+------+------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 6 rows in set (0.065 sec)
如果是在网页上,且以上输出结果格式化正确,对比 2 个 SQL 的实际执行计划可以看出分别是对那个表进行远程访问。
除了通过 SQL 审计视图定位具体的 SQL 及其执行计划外,还可以通过查看缓存的执行计划汇总视图 gv$plan_cache_plan_stat。
SELECT s.tenant_id, svr_ip,plan_Id,sql_id,TYPE, query_sql, first_load_time, avg_exe_usec, slow_count,executions, slowest_exe_usec,s.outline_id FROM oceanbase.`gv$plan_cache_plan_stat` s WHERE s.tenant_id = 1002 -- 改成具体的 tenant_id ORDER BY avg_exe_usec desc limit 10 ; +-----------+---------------+---------+----------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------+ | tenant_id | svr_ip | plan_Id | sql_id | TYPE | query_sql | first_load_time | avg_exe_usec | slow_count | executions | slowest_exe_usec | outline_id | +-----------+---------------+---------+----------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------+ | 1002 | 172.20.249.49 | 251 | FC3FED8CCB2946DE54F1C5BA3656023C | 1 | SELECT d_tax, d_next_o_id FROM bmsql_district WHERE d_w_id = 4 AND d_id = 3 FOR UPDATE | 2021-10-04 20:52:36.677973 | 25668895 | 5 | 39 | 999820024 | -1 | | 1002 | 172.20.249.49 | 350 | 54B5A5861DAF78F52D9ADFBEE83D35B5 | 2 | INSERT INTO bmsql_new_order ( no_o_id, no_d_id, no_w_id) VALUES (4860, 9, 5) | 2021-10-04 20:52:38.032855 | 1231951 | 3 | 3 | 1237821 | -1 | | 1002 | 172.20.249.49 | 322 | B447DE16B3F42D2409B2A2BE50328E63 | 2 | UPDATE bmsql_warehouse SET w_ytd = w_ytd + 3684.0 WHERE w_id = 8 | 2021-10-04 20:52:36.762523 | 486163 | 2 | 2 | 703346 | -1 | | 1002 | 172.20.249.52 | 2596 | F4585305C4CB9B091C750826A7DEDD13 | 1 | UPDATE bmsql_district SET d_ytd = d_ytd + 134.32 WHERE d_w_id = 2 AND d_id = 9 | 2021-10-04 20:52:36.689589 | 237542 | 5 | 16877 | 999895631 | -1 | | 1002 | 172.20.249.52 | 2602 | FC3FED8CCB2946DE54F1C5BA3656023C | 1 | SELECT d_tax, d_next_o_id FROM bmsql_district WHERE d_w_id = 2 AND d_id = 6 FOR UPDATE | 2021-10-04 20:52:36.727204 | 237286 | 4 | 16889 | 999746020 | -1 | | 1002 | 172.20.249.51 | 272 | 9D276020142C5B8259B85C3E8966C579 | 3 | select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 | 2021-10-05 11:02:46.942620 | 159385 | 1 | 1 | 159385 | -1 | | 1002 | 172.20.249.49 | 309 | B447DE16B3F42D2409B2A2BE50328E63 | 2 | UPDATE bmsql_warehouse SET w_ytd = w_ytd + 4878.65 WHERE w_id = 8 | 2021-10-04 20:52:36.741234 | 159196 | 4 | 8 | 452859 | -1 | | 1002 | 172.20.249.51 | 273 | 274FB280E08E876819555632D6951C5A | 3 | select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 | 2021-10-05 11:24:46.098640 | 141562 | 1 | 1 | 141562 | -1 | | 1002 | 172.20.249.52 | 3367 | CDA629CFD7B13E58328149264FA50055 | 3 | SELECT /*+ read_consistency(weak) ob_querytimeout(100000000) */ substr(usec_to_time(request_time),1,19) request_time_, s.svr_ip, s.client_Ip, s.sid,s.tenant_id, s.tenant_name, s.user_name, s.db_name, s.query_sql, s.plan_id, s.plan_type, s.affected_rows, s.return_rows, s.ret_code, s.event, s.elapsed_time, s.queue_time, s.execute_time FROM oceanbase.gv$sql_audit s WHERE 1=1 and s.tenant_id = 1002 and user_name='u_tpcc' and query_sql like 'select o.o_w_id%' and request_time >= time_to_usec(date_sub(CURRENT_TIMESTAMP, interval 5 minute )) ORDER BY request_time DESC LIMIT 10 | 2021-10-05 11:25:07.245044 | 115650 | 1 | 1 | 115650 | -1 | | 1002 | 172.20.249.52 | 3362 | 55E820294D4EEFB396B1BAD3CE27CD47 | 3 | SELECT /*+ read_consistency(weak) ob_querytimeout(100000000) */ substr(usec_to_time(request_time),1,19) request_time_, s.svr_ip, s.client_Ip, s.sid,s.tenant_id, s.tenant_name, s.user_name, s.db_name, s.query_sql, s.plan_id, s.plan_type, s.affected_rows, s.return_rows, s.ret_code, s.event, s.elapsed_time, s.queue_time, s.execute_time FROM oceanbase.gv$sql_audit s WHERE 1=1 and s.tenant_id = 1002 and user_name='u_tpcc' and query_sql like 'select o.o_w_id%' and request_time >= time_to_usec(date_sub(CURRENT_TIMESTAMP, interval 60 minute )) ORDER BY request_time DESC LIMIT 100 | 2021-10-05 11:24:04.440565 | 111119 | 1 | 2 | 150067 | -1 | +-----------+---------------+---------+----------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------+ 10 rows in set (0.014 sec)
从这个视图里可以看到全局的 SQL 执行汇总。适合找 TOP N 慢 SQL。根据里面的节点信息、SQLID 和 PLANID 信息,既可以到 SQL 审计视图里定位具体的 SQL 信息,也可以查看实际运行的执行计划信息。
执行计划可以清空,命令如下:
ALTER SYSTEM flush plan cache GLOBAL;
仅用于测试环境研究,生产环境的 SQL 执行计划缓存通常不可随便清空。清空执行计划会导致所有 SQL 要重新进行一次硬解析。
如何使用 Outline 改变执行计划
OceanBase 在 SQL 执行和性能诊断方面的逻辑参考了 Oracle 的设计思路。OceanBase 也支持 SQL Outline 功能,能够修改在线运行的 SQL 执行计划,同时也支持 SQL 限流功能。
Outline 的用法也是通过 SQL Hint 固定 SQL 的执行计划,可以调整表连接算法、使用的索引等等。
创建 Outline 的语法如下:
CREATE [OR REPLACE] OUTLINE outline_name ON stmt [ TO target_stmt ]; # 或 CREATE [OR REPLACE] OUTLINE outline_name ON 'SQLID' using 'HINTS';
其中 stmt 为一个带有 HINT 的 DML 语句。限流或固定计划,通过 stmt 中的 HINT 来区分。
如果期望对含有 HINT 的语句进行限流和固定计划,则需要 TO target_stmt 来指明相应的 SQL。
create outline outline_name on stmt1 to stmt2;对stmt2创建 outline,让stmt2使用stmt1中的hint。指定
OR REPLACE后,即可对已经存在执行计划或限流规则进行替换(注:限流规则和执行计划间可以彼此替换)。在使用
target_stmt时,严格要求stmt与target_stmt在去掉hint后完全匹配(实现中为去掉hint的signature相同)。若是在创建限流时使用target_stmt,则同时要求fix_param完全匹配。
通过语句文本或者 SQLID 都可以创建 Outline。不过由于文本中空格和换行如果有差异就会导致匹配不上。所以,不建议通过文本创建 Outline(除非文本非常简单)。这里通过 SQLID 来创建 Outline,只要 SQL 文本不变,SQLID 就不会变。
Outline 相关视图
存储 outline 的 schema 信息的系统表
oceanbase.__all_outline oceanbase.__all_outline_history
固定计划相关虚拟表和视图
展示的均是当前租户的信息:
oceanbase.__tenant_virtual_outline # 用于 outline 迁移使用的虚拟表,同时显示固定计划的信息。 oceanbase.gv$outline # 在 __tenant_virutal_outline 基础上创建的视图。 information_schema.dba_outlines # 在 __all_table 上创建的视图。
限流相关虚拟表和试图
下表展示的均是当前租户的信息:
oceanbase.__tenant_virtual_concurrent_limit_sql # 展示限流信息 oceanbase.gv$concurrent_limit_sql # 在 __tenant_virtual_concurrent_limit_sql 上创建的视图。
Outline 调优执行计划示例
下面示例还是针对前面测试 SQL,尝试调整一下执行计划里表连接顺序。
执行原 SQL,通过视图
gv$sql_audit或gv$plan_cache_plan_stat找到该 SQL 的 SQLID。根据 SQLID 创建 Outline,指定 HINTS。
create outline otl_test_1 on "9D276020142C5B8259B85C3E8966C579" using hint /*+ leading(i) */ ; MySQL [tpccdb]> select * from oceanbase.__all_outline \G *************************** 1. row *************************** gmt_create: 2021-10-05 12:18:26.032586 gmt_modified: 2021-10-05 12:18:26.032586 tenant_id: 0 outline_id: 1001 database_id: 1052 schema_version: 1633407506030264 name: otl_test_1 signature: outline_content: /*+leading(i) */ sql_text: owner: root used: 0 version: 3-b20901e8c84d3ea774beeaca963c67d7802e4b4e compatible: 1 enabled: 1 format: 0 outline_params: ���� outline_target: sql_id: 9D276020142C5B8259B85C3E8966C579 owner_id: NULL 1 row in set (0.006 sec)再次执行 SQL
select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 ;
查看实际执行计划
SELECT s.tenant_id, svr_ip,plan_Id,sql_id,TYPE, query_sql, first_load_time, avg_exe_usec, slow_count,executions, slowest_exe_usec,s.outline_id FROM oceanbase.`gv$plan_cache_plan_stat` s WHERE s.tenant_id = 1002 and sql_id='9D276020142C5B8259B85C3E8966C579' ORDER BY avg_exe_usec desc limit 10 ;
输出:
+-----------+---------------+---------+----------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------------+ | tenant_id | svr_ip | plan_Id | sql_id | TYPE | query_sql | first_load_time | avg_exe_usec | slow_count | executions | slowest_exe_usec | outline_id | +-----------+---------------+---------+----------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------------+ | 1002 | 172.20.249.51 | 282 | 9D276020142C5B8259B85C3E8966C579 | 3 | select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 | 2021-10-05 12:19:15.443750 | 141419 | 1 | 1 | 141419 | 1101710651032553 | | 1002 | 172.20.249.52 | 3488 | 9D276020142C5B8259B85C3E8966C579 | 3 | select o.o_w_id , o.o_d_id ,o.o_id , i.i_name ,i.i_price ,o.o_c_id from bmsql_oorder o , bmsql_item i where o.o_id = i.i_id and o.o_w_id = 3 limit 10 | 2021-10-05 12:18:36.746965 | 88434 | 0 | 1 | 88434 | 1101710651032553 | +-----------+---------------+---------+----------------------------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+------------+------------+------------------+------------------+ 2 rows in set (0.092 sec) SELECT ip, plan_depth, plan_line_id,operator,name,rows,cost,property from oceanbase.`gv$plan_cache_plan_explain` where tenant_id=1002 AND ip = '172.20.249.51' AND port=2882 AND plan_id=282 ; +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ip | plan_depth | plan_line_id | operator | name | rows | cost | property | +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 172.20.249.51 | 0 | 0 | PHY_LIMIT | NULL | 10 | 126991 | NULL | | 172.20.249.51 | 1 | 1 | PHY_MERGE_JOIN | NULL | 10 | 126990 | NULL | | 172.20.249.51 | 2 | 2 | PHY_PX_FIFO_COORD | NULL | 1453 | 856 | NULL | | 172.20.249.51 | 3 | 3 | PHY_PX_REDUCE_TRANSMIT | NULL | 1453 | 546 | NULL | | 172.20.249.51 | 4 | 4 | PHY_TABLE_SCAN | i | 1453 | 546 | table_rows:100000, physical_range_rows:100001, logical_range_rows:100000, index_back_rows:0, output_rows:100000, est_method:local_storage, avaiable_index_name[bmsql_item], pruned_index_name[idx_item_uk], estimation info[table_id:1101710651081627, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:100000, physical_rc:100000), (table_type:7, version:1633335563894016-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:5, version:1633335563894016-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:0, version:1633335794237719-1633335794237719-9223372036854775807, logical_rc:0, physical_rc:1)] | | 172.20.249.51 | 2 | 5 | PHY_SORT | NULL | 689 | 125832 | NULL | | 172.20.249.51 | 3 | 6 | PHY_TABLE_SCAN | o | 47407 | 11907 | table_rows:139802, physical_range_rows:47417, logical_range_rows:47407, index_back_rows:0, output_rows:47407, est_method:local_storage, avaiable_index_name[bmsql_oorder], estimation info[table_id:1101710651081625, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:47407, physical_rc:47407), (table_type:7, version:1633335563707246-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:5, version:1633335563707246-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:0, version:1633335792045894-1633335792045894-9223372036854775807, logical_rc:0, physical_rc:10)] | +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 7 rows in set (0.026 sec) SELECT ip, plan_depth, plan_line_id,operator,name,rows,cost,property from oceanbase.`gv$plan_cache_plan_explain` where tenant_id=1002 AND ip = '172.20.249.52' AND port=2882 AND plan_id=3488 ; +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ip | plan_depth | plan_line_id | operator | name | rows | cost | property | +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 172.20.249.52 | 0 | 0 | PHY_LIMIT | NULL | 10 | 135657 | NULL | | 172.20.249.52 | 1 | 1 | PHY_MERGE_JOIN | NULL | 10 | 135656 | NULL | | 172.20.249.52 | 2 | 2 | PHY_TABLE_SCAN | i | 1453 | 546 | table_rows:100000, physical_range_rows:100001, logical_range_rows:100000, index_back_rows:0, output_rows:100000, est_method:local_storage, avaiable_index_name[bmsql_item], pruned_index_name[idx_item_uk], estimation info[table_id:1101710651081627, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:100000, physical_rc:100000), (table_type:7, version:1633335566970454-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:5, version:1633335566970454-1633335774562189-1633335794237719, logical_rc:0, physical_rc:0), (table_type:0, version:1633335794237719-1633335794237719-9223372036854775807, logical_rc:0, physical_rc:1)] | | 172.20.249.52 | 2 | 3 | PHY_PX_FIFO_COORD | NULL | 47407 | 134808 | NULL | | 172.20.249.52 | 3 | 4 | PHY_PX_REDUCE_TRANSMIT | NULL | 47407 | 125832 | NULL | | 172.20.249.52 | 4 | 5 | PHY_SORT | NULL | 47407 | 125832 | NULL | | 172.20.249.52 | 5 | 6 | PHY_TABLE_SCAN | o | 47407 | 11907 | table_rows:139802, physical_range_rows:47417, logical_range_rows:47407, index_back_rows:0, output_rows:47407, est_method:local_storage, avaiable_index_name[bmsql_oorder], estimation info[table_id:1101710651081625, (table_type:1, version:0-1633335774562189-1633335774562189, logical_rc:47407, physical_rc:47407), (table_type:7, version:1633335566720232-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:5, version:1633335566720232-1633335774562189-1633335792045894, logical_rc:0, physical_rc:0), (table_type:0, version:1633335792045894-1633335792045894-9223372036854775807, logical_rc:0, physical_rc:10)] | +---------------+------------+--------------+---------------------------+------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 7 rows in set (0.008 sec)
从上面结果看出,这个 SQL 执行了两次,两次被路由到的节点不同,导致远程访问的表会不一样。但是两个 SQL 都是用同一个 Outline 执行成功的。
实际执行计划里,表
bmsql_item确实是驱动表,表连接算法也从NESTED-LOOP JOIN变成了MERGE JOIN。 当然,这个只是示例,这个 HINTS 并不一定会让 SQL 性能更好。删除 Outline
drop outline otl_test_1 ;
删除 Outline 后,该 SQL ID 的执行计划重新生成,又回归到原始的执行计划了。
如何更新统计信息
OceanBase 当前版本(3.1)还没有手动更新表统计信息能力,表的统计信息更新是在合并(MAJOR FREEZE)的时候更新的。
默认情况下,OceanBase 收集统计信息会扫描所有数据块,所以统计信息里的分区大小和行数非常准确。如果表非常大,为了节省这个时间,也可以通过参数 merge_stat_sampling_ratio 设置一个采样比例(默认是 100)。
在 sys 租户运行:
alter system set merge_stat_sampling_ratio = 50 ;
查看统计信息
SELECT t.tenant_id, a.tenant_name, t.table_name, d.database_name, tg.tablegroup_name , t.part_num , t2.partition_id, t2.ZONE, t2.svr_ip , round(t2.data_size/1024/1024) data_size_mb, t2.row_count FROM oceanbase.__all_tenant AS a JOIN oceanbase.__all_virtual_database AS d ON ( a.tenant_id = d.tenant_id ) JOIN oceanbase.__all_virtual_table AS t ON (t.tenant_id = d.tenant_id AND t.database_id = d.database_id) JOIN oceanbase.__all_virtual_meta_table t2 ON (t.tenant_id = t2.tenant_id AND (t.table_id=t2.table_id OR t.tablegroup_id=t2.table_id) AND t2.ROLE IN (1) ) LEFT JOIN oceanbase.__all_virtual_tablegroup AS tg ON (t.tenant_id = tg.tenant_id and t.tablegroup_id = tg.tablegroup_id) WHERE a.tenant_id IN (1002 ) AND t.table_type IN (3) AND d.database_name = 'tpccdb' ORDER BY t.tenant_id, tg.tablegroup_name, d.database_name, t.table_name, t2.partition_id ;
输出:
+-----------+-------------+------------------+---------------+-----------------+----------+--------------+-------+---------------+--------------+-----------+ | tenant_id | tenant_name | table_name | database_name | tablegroup_name | part_num | partition_id | ZONE | svr_ip | data_size_mb | row_count | +-----------+-------------+------------------+---------------+-----------------+----------+--------------+-------+---------------+--------------+-----------+ | 1002 | obmysql | bmsql_config | tpccdb | NULL | 1 | 0 | zone2 | 172.20.249.49 | 0 | 4 | | 1002 | obmysql | bmsql_item | tpccdb | NULL | 1 | 0 | zone1 | 172.20.249.52 | 7 | 100000 | | 1002 | obmysql | bmsql_customer | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 40 | 90000 | | 1002 | obmysql | bmsql_customer | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 54 | 120000 | | 1002 | obmysql | bmsql_customer | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 41 | 90000 | | 1002 | obmysql | bmsql_district | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 0 | 30 | | 1002 | obmysql | bmsql_district | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 0 | 40 | | 1002 | obmysql | bmsql_district | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 0 | 30 | | 1002 | obmysql | bmsql_history | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 3 | 137475 | | 1002 | obmysql | bmsql_history | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 5 | 200065 | | 1002 | obmysql | bmsql_history | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 4 | 149261 | | 1002 | obmysql | bmsql_new_order | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 0 | 31962 | | 1002 | obmysql | bmsql_new_order | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 0 | 45222 | | 1002 | obmysql | bmsql_new_order | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 0 | 33058 | | 1002 | obmysql | bmsql_oorder | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 1 | 139802 | | 1002 | obmysql | bmsql_oorder | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 2 | 203152 | | 1002 | obmysql | bmsql_oorder | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 2 | 150858 | | 1002 | obmysql | bmsql_order_line | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 49 | 1397517 | | 1002 | obmysql | bmsql_order_line | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 72 | 2029431 | | 1002 | obmysql | bmsql_order_line | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 53 | 1508395 | | 1002 | obmysql | bmsql_stock | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 75 | 300000 | | 1002 | obmysql | bmsql_stock | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 99 | 400000 | | 1002 | obmysql | bmsql_stock | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 75 | 300000 | | 1002 | obmysql | bmsql_warehouse | tpccdb | tpcc_group | 3 | 0 | zone3 | 172.20.249.51 | 0 | 3 | | 1002 | obmysql | bmsql_warehouse | tpccdb | tpcc_group | 3 | 1 | zone2 | 172.20.249.49 | 0 | 4 | | 1002 | obmysql | bmsql_warehouse | tpccdb | tpcc_group | 3 | 2 | zone1 | 172.20.249.52 | 0 | 3 | +-----------+-------------+------------------+---------------+-----------------+----------+--------------+-------+---------------+--------------+-----------+ 26 rows in set (0.014 sec)
抽查一个表数据,可以看出行数统计还是很精准的。
MySQL [tpccdb]> select 'p0', count(*) from bmsql_oorder partition (p0) union select 'p1', count(*) from bmsql_oorder partition (p1) union select 'p2', count(*) from bmsql_oorder partition (p2) ; +------+----------+ | p0 | count(*) | +------+----------+ | p0 | 139802 | | p1 | 203152 | | p2 | 150876 | +------+----------+ 3 rows in set (0.623 sec)




