GBase 8a MPP在大规模集群上应用时,比如200节点左右的集群环境,如何判断是否存在单节点性能瓶颈导致出现木桶效应,或者数据倾斜负载不均导致部分节点负载较重呢?
可以通过设置调高集群的日志级别,通过观察每个任务在各个节点的执行时间,就能判断是否存在数据倾斜以及单点性能瓶颈。
通过以下命令设置集群日志级别:
设置全局参数,需重新登录session生效:set global
gcluster_log_level=7;
设置session参数,仅在当前session生效:set
gcluster_log_level=7;
当执行SQL后,日志会记录每次下发给node节点的任务以及node执行的时间。日志位置一般在集群安装目录下,比如:/opt/gcluster/log/express.log
以下是一个query的例子,我们从计划执行阶段开始看:
GBase 8a MPP在大规模集群上应用时,比如200节点左右的集群环境,如何判断是否存在单节点性能瓶颈导致出现木桶效应,或者数据倾斜负载不均导致部分节点负载较重呢?
可以通过设置调高集群的日志级别,通过观察每个任务在各个节点的执行时间,就能判断是否存在数据倾斜以及单点性能瓶颈。
通过以下命令设置集群日志级别:
设置全局参数,需重新登录session生效:set global
gcluster_log_level=7;
设置session参数,仅在当前session生效:set
gcluster_log_level=7;
当执行SQL后,日志会记录每次下发给node节点的任务以及node执行的时间。日志位置一般在集群安装目录下,比如:/opt/gcluster/log/express.log
以下是一个query的例子,我们从计划执行阶段开始看:
2022-04-13
15:00:22.244 [EXEC_P][INFO
][S:151][Q:438]<ExecutePlan|100>:##############################
2022-04-13
15:00:22.244 [EXEC_P][INFO ][S:151][Q:438]<ExecutePlan|101>:#### Main
Query Begin ####
2022-04-13
15:00:22.244 [EXEC_P][INFO ][S:151][Q:438]<ExecutePlan|102>:##############################
2022-04-13
15:00:22.244 [EXEC_P][INFO ][S:151][Q:438]<ExecutePlan|106>:---Top plan
is [0x398aea0]---
2022-04-13
15:00:22.267 [EXEC_P][INFO ][S:151][Q:438]<ExecuteAllStep|193>:## STEP: 0
...
2022-04-13
15:00:22.282 [SQLDISP][INFO
][S:151][Q:438]<ExecSQL|2528>:Target:HOST(10.10.55.81:5050->359) ,
SQL:CREATE TABLE `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s AS SELECT
/*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n1`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE
'%sandy%') LIMIT 0 ;.
2022-04-13
15:00:22.282 [SQLDISP][INFO ][S:151][Q:438]<ExecSQL|2528>:Target:HOST(10.10.55.80:5050->351)
, SQL:CREATE TABLE `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s AS SELECT
/*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n2` `vcname000001.tpch.part`
WHERE (`vcname000001.tpch.part`.`p_name` LIKE '%sandy%') LIMIT 0 ;.
2022-04-13
15:00:22.282 [SQLDISP][INFO
][S:151][Q:438]<ExecSQL|2528>:Target:HOST(10.10.55.82:5050->353) ,
SQL:CREATE TABLE `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s AS SELECT
/*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n3`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE
'%sandy%') LIMIT 0 ;.
2022-04-13
15:00:22.282 [SQLDISP][INFO
][S:151][Q:438]<ExecSQL|2528>:Target:HOST(10.10.55.83:5050->357) ,
SQL:CREATE TABLE `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s AS SELECT
/*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */ `vcname000001.tpch.part`.`p_partkey`
AS `p_partkey` FROM `tpch`.`part_n4` `vcname000001.tpch.part` WHERE
(`vcname000001.tpch.part`.`p_name` LIKE '%sandy%') LIMIT 0 ;.
2022-04-13
15:00:22.303 [EXECTR][INFO ][S:151][Q:438]<PrintNodePerformance|158>:
-----------------+----Node
Performance----+---------------
+-Type-+-------Node------+-Port-+---Elapsed---+--Status--+
| T |
10.10.55.83 | 5050 | 0 s | Finish |
| T |
10.10.55.81 | 5050 | 0 s | Finish |
| T |
10.10.55.82 | 5050 | 0 s | Finish |
| T |
10.10.55.80 | 5050 | 0 s | Finish |
+------+-----------------+------+-------------+----------+
第一步集群下发建临时表任务到4个node节点
2022-04-13
15:00:22.303 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.81:5050->359) ,
SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n1`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE
'%sandy%') target into server (HOST '10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83',
PORT 5050, USER 'root', PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_1362561546_151_t5_1_1649828623_s', COMMENT 'scn 0' ).
2022-04-13
15:00:22.303 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.80:5050->351) ,
SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n2`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE
'%sandy%') target into server (HOST
'10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050, USER 'root',
PASSWORD '', DATABASE 'gctmpdb', TABLE '_tmp_1362561546_151_t5_1_1649828623_s',
COMMENT 'scn 0' ).
2022-04-13
15:00:22.303 [SQLDISP][INFO ][S:151][Q:438]:Target:HOST(10.10.55.82:5050->353)
, SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n3`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE '%sandy%')
target into server (HOST '10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83',
PORT 5050, USER 'root', PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_1362561546_151_t5_1_1649828623_s', COMMENT 'scn 0' ).
2022-04-13
15:00:22.303 [SQLDISP][INFO ][S:151][Q:438]:Target:HOST(10.10.55.83:5050->357)
, SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.part`.`p_partkey` AS `p_partkey` FROM `tpch`.`part_n4`
`vcname000001.tpch.part` WHERE (`vcname000001.tpch.part`.`p_name` LIKE
'%sandy%') target into server (HOST
'10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050, USER 'root',
PASSWORD '', DATABASE 'gctmpdb', TABLE '_tmp_1362561546_151_t5_1_1649828623_s',
COMMENT 'scn 0' ).
2022-04-13
15:00:22.954 [EXECTR][INFO ][S:151][Q:438]:
-----------------+----Node
Performance----+---------------
+-Type-+-------Node------+-Port-+---Elapsed---+--Status--+
| T |
10.10.55.83 | 5050 | 0 s | Finish |
| T |
10.10.55.81 | 5050 | 0 s | Finish |
| T |
10.10.55.80 | 5050 | 0 s | Finish |
| T |
10.10.55.82 | 5050 | 0 s | Finish |
+------+-----------------+------+-------------+----------+
第二步,将part表拉成复制表
2022-04-13
15:00:22.954 [EXECTR][INFO ][S:151][Q:438]:Passed
tasks to async API, good luck!
2022-04-13
15:00:22.955 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.81:5050->359) ,
SQL:flush temporary commit `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s
scn_number 0.
2022-04-13
15:00:22.955 [SQLDISP][INFO ][S:151][Q:438]:Target:HOST(10.10.55.80:5050->351)
, SQL:flush temporary commit `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s
scn_number 0.
2022-04-13
15:00:22.955 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.82:5050->353) ,
SQL:flush temporary commit `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s
scn_number 0.
2022-04-13
15:00:22.955 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.83:5050->357) ,
SQL:flush temporary commit `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s scn_number
0.
2022-04-13
15:00:22.957 [EXECTR][INFO ][S:151][Q:438]:
-----------------+----Node
Performance----+---------------
+-Type-+-------Node------+-Port-+---Elapsed---+--Status--+
| T |
10.10.55.82 | 5050 | 0 s | Finish |
| T |
10.10.55.83 | 5050 | 0 s | Finish |
| T |
10.10.55.80 | 5050 | 0 s | Finish |
| T |
10.10.55.81 | 5050 | 0 s | Finish |
+------+-----------------+------+-------------+----------+
第三步,part表复制表commit。
2022-04-13
15:00:22.958 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.81:5050->359) ,
SQL:CREATE TABLE `gctmpdb`._tmp_rht_1362561546_151_t5_2_1649828623_s_n1 AS
SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */ `vcname000001.tpch.lineitem`.`l_suppkey`
AS `l_suppkey`, `vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n1`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n1` `vcname000001.tpch.orders`
ON (`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) LIMIT 0
;.
2022-04-13
15:00:22.958 [SQLDISP][INFO ][S:151][Q:438]:Target:HOST(10.10.55.80:5050->351)
, SQL:CREATE TABLE `gctmpdb`._tmp_rht_1362561546_151_t5_2_1649828623_s_n2 AS
SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`, `vcname000001.tpch.orders`.`o_orderdate`
AS `o_orderdate`, `vcname000001.tpch.lineitem`.`l_extendedprice` AS
`l_extendedprice`, `vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n2`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n2`
`vcname000001.tpch.orders` ON (`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) LIMIT 0
;.
2022-04-13
15:00:22.958 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.82:5050->353) ,
SQL:CREATE TABLE `gctmpdb`._tmp_rht_1362561546_151_t5_2_1649828623_s_n3 AS
SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n3`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n3`
`vcname000001.tpch.orders` ON (`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) LIMIT 0
;.
2022-04-13
15:00:22.958 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.83:5050->357) ,
SQL:CREATE TABLE `gctmpdb`._tmp_rht_1362561546_151_t5_2_1649828623_s_n4 AS
SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n4` `vcname000001.tpch.lineitem`
INNER JOIN `tpch`.`orders_n4` `vcname000001.tpch.orders` ON
(`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) LIMIT 0
;.
2022-04-13
15:00:22.965 [EXECTR][INFO ][S:151][Q:438]:
-----------------+----Node
Performance----+---------------
+-Type-+-------Node------+-Port-+---Elapsed---+--Status--+
| T |
10.10.55.82 | 5050 | 0 s | Finish |
| T |
10.10.55.83 | 5050 | 0 s | Finish |
| T |
10.10.55.80 | 5050 | 0 s | Finish |
| T |
10.10.55.81 | 5050 | 0 s | Finish |
+------+-----------------+------+-------------+----------+
第四步,建临时表,将存储lineitem和orders表join结果按l_suppkey列重分布结果
2022-04-13
15:00:22.967 [SQLDISP][INFO ][S:151][Q:438]:Target:HOST(10.10.55.81:5050->359)
, SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`, `vcname000001.tpch.lineitem`.`l_extendedprice`
AS `l_extendedprice`, `vcname000001.tpch.lineitem`.`l_discount` AS
`l_discount`, `vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s
`p` INNER JOIN ( `tpch`.`lineitem_n1`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n1`
`vcname000001.tpch.orders` ON (`vcname000001.tpch.orders`.`o_orderkey` = `vcname000001.tpch.lineitem`.`l_orderkey`)
) ON (`p`.`p_partkey` =
`vcname000001.tpch.lineitem`.`l_partkey`) target into server (HOST
'10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050, USER 'root',
PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_rht_1362561546_151_t5_2_1649828623_s_n1,_tmp_rht_1362561546_151_t5_2_1649
2022-04-13
15:00:22.967 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.80:5050->351) ,
SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`, `vcname000001.tpch.lineitem`.`l_partkey`
AS `l_partkey` FROM `gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER
JOIN ( `tpch`.`lineitem_n2`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n2` `vcname000001.tpch.orders`
ON (`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) target
into server (HOST '10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050,
USER 'root', PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_rht_1362561546_151_t5_2_1649828623_s_n1,_tmp_rht_1362561546_151_t5_2_1649
2022-04-13
15:00:22.968 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.82:5050->353) ,
SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n3` `vcname000001.tpch.lineitem`
INNER JOIN `tpch`.`orders_n3` `vcname000001.tpch.orders` ON
(`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) target
into server (HOST '10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050,
USER 'root', PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_rht_1362561546_151_t5_2_1649828623_s_n1,_tmp_rht_1362561546_151_t5_2_1649
2022-04-13
15:00:22.969 [SQLDISP][INFO
][S:151][Q:438]:Target:HOST(10.10.55.83:5050->357) ,
SQL:SELECT /*10.10.55.81_151_1_2022-04-13_15:00:21*/ /*+ TID('1114287') */
`vcname000001.tpch.lineitem`.`l_suppkey` AS `l_suppkey`,
`vcname000001.tpch.orders`.`o_orderdate` AS `o_orderdate`,
`vcname000001.tpch.lineitem`.`l_extendedprice` AS `l_extendedprice`,
`vcname000001.tpch.lineitem`.`l_discount` AS `l_discount`,
`vcname000001.tpch.lineitem`.`l_quantity` AS `l_quantity`,
`vcname000001.tpch.lineitem`.`l_partkey` AS `l_partkey` FROM
`gctmpdb`._tmp_1362561546_151_t5_1_1649828623_s `p` INNER JOIN ( `tpch`.`lineitem_n4`
`vcname000001.tpch.lineitem` INNER JOIN `tpch`.`orders_n4`
`vcname000001.tpch.orders` ON (`vcname000001.tpch.orders`.`o_orderkey` =
`vcname000001.tpch.lineitem`.`l_orderkey`) )
ON (`p`.`p_partkey` = `vcname000001.tpch.lineitem`.`l_partkey`) target
into server (HOST '10.10.55.81,10.10.55.80,10.10.55.82,10.10.55.83', PORT 5050,
USER 'root', PASSWORD '', DATABASE 'gctmpdb', TABLE
'_tmp_rht_1362561546_151_t5_2_1649828623_s_n1,_tmp_rht_1362561546_151_t5_2_1649
2022-04-13
15:01:08.739 [EXECTR][INFO ][S:151][Q:438]:
-----------------+----Node
Performance----+---------------
+-Type-+-------Node------+-Port-+---Elapsed---+--Status--+
| T |
10.10.55.80 | 5050 | 46 s | Finish |
| T |
10.10.55.83 | 5050 | 46 s | Finish |
| T |
10.10.55.81 | 5050 | 36 s | Finish |
| T |
10.10.55.82 | 5050 | 34 s | Finish |
+------+-----------------+------+-------------+----------+
第五步,将存储lineitem和orders表join结果按l_suppkey列重分布,可以看出在55.80和55.83节点上,性能比55.82节点慢10s左右。对此,可以进步分析node的日志,确认是否存在数据倾斜,以及单节点性能瓶颈。在此案例中,最终我们发现80和83节点均存在一块磁盘故障导致RAID重构,影响了IO性能。