数据准备
下载测试数据
wget https://github.com/datacharmer/test_db/archive/refs/heads/master.zip
解压:
unzip master.zip
cd test_db-master/
建库导入数据(employees,sakila)
mysql -uroot -p123456 <employees.sql
cd sakila
mysql -uroot -p123456 testdb <sakila-mv-schema.sql
mysql -uroot -p123456 testdb <sakila-mv-data.sql
SQL案例
当前MYSQL版本:8.0.28


Antijoin
MySQL8.0 支持anti-join优化,NOT EXISTS转变成anti join
SQL语句:
查询employees,salaries,在2000-01-01前没领过薪水的员工
select * from employees
where
not exists
(
select * from salaries
where salaries.from_date<='2000-01-01'
and salaries.emp_no=employees.emp_no
);
8.0默认执行计划优化antijoin
explain format=tree select * from employees
where
not exists
(
select * from salaries
where salaries.from_date<='2000-01-01'
and salaries.emp_no=employees.emp_no
);


关闭semijoin模拟mysql 5.7
/*+ SET_VAR(optimizer_switch = ‘semijoin=off’) */
可以看到执行计划是依赖子查询:subquery in condition; dependent
explain format=tree select /*+ SET_VAR(optimizer_switch = 'semijoin=off') */ * from employees
where
not exists
(
select * from salaries
where salaries.from_date<='2000-01-01'
and salaries.emp_no=employees.emp_no
);


IN优化exists
SQL语句:
#in
SELECT * FROM sakila.film
WHERE film_id IN(
SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
#exists
SELECT * FROM sakila.film
WHERE EXISTS (
SELECT * FROM sakila.film_actor WHERE actor_id = 1
AND film_actor.film_id = film.film_id);
分别以默认/‘semijoin=off’/‘subquery_materialization_cost_based=off’ 方式比较in 与exists执行计划
“IN”
默认情况下:IN被改写成 join
explain
SELECT * FROM sakila.film
WHERE film_id IN(
SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
show warnings ;




关闭semijoin情况
explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'semijoin=off') */ * FROM sakila.film
WHERE film_id IN(
SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
show warnings ;
子查询materialized_subquery,先执行子查询


关闭subquery_materialization_cost_based情况
未产生物化视图
explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'subquery_materialization_cost_based=off') */ * FROM sakila.film
WHERE film_id IN(
SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
show warnings ;


“EXISTS”
默认情况:被semijoin 改写成join
explain
SELECT * FROM sakila.film
WHERE EXISTS (
SELECT * FROM sakila.film_actor WHERE actor_id = 1
AND film_actor.film_id = film.film_id);
show warnings;




关闭semijoin情况
未改写成join, DEPENDENT SUBQUERY关联子查询,film驱动film_actor
explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'semijoin=off') */ * FROM sakila.film
WHERE EXISTS (
SELECT * FROM sakila.film_actor WHERE actor_id = 1
AND film_actor.film_id = film.film_id);
show warnings;


关闭subquery_materialization_cost_based情况
改写semi join,film_actor驱动film
explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'subquery_materialization_cost_based=off') */ * FROM sakila.film
WHERE EXISTS (
SELECT * FROM sakila.film_actor WHERE actor_id = 1
AND film_actor.film_id = film.film_id);
show warnings;


结论:
不管是in还是exists 在开启semijoin=on 时,SQL就改写成 join
in 在semijoin=off 时,子查询使用materialization
MySQL 8.0.16 开始 EXISTS 子查询与 in 子查询 等价
MySQL 8.0.17 开始 in/EXISTS 转换成antijoin
半连接返回左表中与右表至少匹配一次的数据行,通常体现为 EXISTS 或者 IN 子查询
反连接返回左表中与右表不匹配的数据行,通常体现为 NOT EXISTS 或者 NOT IN 子查询
Hash join
有索引
原SQL
select * from employees.employees e
join employees.salaries s on e.emp_no=s.emp_no;';
无hint
explain format=tree
select * from employees.employees e
join employees.salaries s on e.emp_no=s.emp_no;


hint提示
explain format=tree
select * from employees.employees e IGNORE INDEX (Primary)
join employees.salaries s IGNORE INDEX (Primary) on e.emp_no=s.emp_no;


无索引或提示忽略索引倾向hash join
无索引
表准备
CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);
执行计划
explain format=tree SELECT * FROM t1
JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
JOIN t3 ON (t2.c1 = t3.c1);


给t1,t2,t3添加索引
create index idx_t1_c1c2 on t1(c1,c2) ;
create index idx_t2_c1c2 on t1(c1,c2) ;
create index idx_t3_c1 on t1(c1) ;
再看执行计划
explain format=tree SELECT * FROM t1
JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
JOIN t3 ON (t2.c1 = t3.c1);


外层是一个nested loop join,t2,t3 是hash join;
CBO受索引影响
在MySQL 8.0.20及更高版本中,联接不再需要包含至少一个等联接条件才能使用哈希联接(这点超过传统hash join实现,传统hash join 只能等值连接)




