第一种场景:两表关联,关联键是分区键,但是没有过滤条件。
举个简单例子:表t1 为哈希分区表,有1000个分区,记录数50W行。
localhost:ytt>show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int DEFAULT NULL,
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 1000 */
1 row in set (0.00 sec)
表 t1_no_pt 为普通表,为表t1的克隆,但是移除掉表分区,记录数也同样为50W条。
localhost:ytt>show create table t1_no_pt\G
*************************** 1. row ***************************
Table: t1_no_pt
Create Table: CREATE TABLE `t1_no_pt` (
`id` int DEFAULT NULL,
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
这两张表在这种场景下的查询性能对比:分区表和普通表关联查询,执行时间为6.76秒。
localhost:ytt>select count(*) from t1_no_pt a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
| 1014068 |
+----------+
1 row in set (6.76 sec)
两张分区表关联查询,执行时间为4.32秒。
localhost:ytt>select count(*) from t1 a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
| 1014068 |
+----------+
1 row in set (4.32 sec)
两张普通表关联查询,执行时间只用了0.87秒。
localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id);
+----------+
| count(*) |
+----------+
| 1014068 |
+----------+
1 row in set (0.87 sec)
第二种场景:两表关联,关联键是分区键,但是有过滤条件。
这里又细分为两种子场景:
1. 过滤条件为分区键
类似这样的查询:select * from t1 inner join t2 using(id) where t1.id = xxx;
两分区表关联并且过滤条件为分区键,执行时间为0.01秒。
localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
| 81 |
+----------+
1 row in set (0.01 sec)
两普通表关联,同样的条件,执行时间为0.55秒,比两分区表关联慢很多倍。
localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
| 81 |
+----------+
1 row in set (0.55 sec)
用分区表和普通表关联,执行时间0.32秒,介于前两者之间。
localhost:ytt>select count(*) from t1 a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
| 81 |
+----------+
1 row in set (0.32 sec)
补一个两分区表关联和两普通表关联的执行计划对比,会表现的更加明显:分区表关联成本381.9,扫描行数为280;普通表关联成本249264389.78,扫描行数249125777。此时分区表关联性能提升非常明显!
localhost:ytt>explain format=tree select count(*) from t1 a inner join t1 b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (no condition) (cost=381.90 rows=280)
-> Filter: (b.id = 19172) (cost=1.02 rows=53)
-> Table scan on b (cost=1.02 rows=529)
-> Hash
-> Filter: (a.id = 19172) (cost=53.65 rows=53)
-> Table scan on a (cost=53.65 rows=529)
1 row in set (0.00 sec)
localhost:ytt>explain format=tree select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (no condition) (cost=249264389.78 rows=249125777)
-> Filter: (b.id = 19172) (cost=1.87 rows=49913)
-> Table scan on b (cost=1.87 rows=499125)
-> Hash
-> Filter: (a.id = 19172) (cost=50257.25 rows=49913)
-> Table scan on a (cost=50257.25 rows=499125)
1 row in set (0.00 sec)
2. 过滤条件非分区键
依然用表t1和表t1_no_pt来举例:两分区表之间关联, 执行时间为6.16秒。
localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
| 50552 |
+----------+
1 row in set (6.16 sec)
两普通表关联,执行时间为0.7秒,反而比分区表快很多。
localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
| 50552 |
+----------+
1 row in set (0.70 sec)
第三种场景:两表关联,关联键非分区键,但是过滤条件是分区键。
对于这样的场景,分区表同样不能带来性能提升!
两分区表关联性能很差,执行时间为6.05秒。
localhost:ytt>select count(*) from t1 a inner join t1 b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
| 225868 |
+----------+
1 row in set (6.05 sec)
两普通表关联性能好很多,执行时间0.54秒。
localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
| 225868 |
+----------+
1 row in set (0.54 sec)
改下之前的SQL,用过滤好的分区表数据和普通表关联,这样性能比两普通表关联要好些:执行时间为0.39秒。
localhost:ytt>select count(*) from (select * from t1 a where a.id = 19172) t inner join t1_no_pt b using(r1);
+----------+
| count(*) |
+----------+
| 225868 |
+----------+
1 row in set (0.39 sec)
第四种场景:分区表关联,关联键也是分区键,但是两张分区表分区算法、或者分区数目有差异。
表t2和表t1结构相同,记录数也相同,但是分区数目不一样,表t1有1000个分区,表t2只有50个分区:
localhost:ytt>show create table t2\G
*************************** 1. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int DEFAULT NULL,
`r1` int DEFAULT NULL,
`r2` int DEFAULT NULL,
`log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 50 */
1 row in set (0.01 sec)
localhost:ytt>select count(*) from t1 a inner join t2 b using(id);
+----------+
| count(*) |
+----------+
| 1014068 |
+----------+
1 row in set (6.43 sec)
同样,关联两张普通表:执行时间1.98秒。执行时间比分区表要快。
localhost:ytt>select count(*) from t1_no_pt a inner join t2_no_pt b using(id);
+----------+
| count(*) |
+----------+
| 1014068 |
+----------+
1 row in set (1.98 sec)
那基于表关联是否该用分区表做个总结:
分区键为关联条件。 如果分区键为非关联条件,那过滤条件必须得是分区键。 两分区表的分区方法,分区数目必须一致。
================================
转文至此。
以下是个人微信公众号,欢迎关注:





