暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

mysql antijoin与hash join

原创 zhou 2024-04-05
183

数据准备

下载测试数据

wget https://github.com/datacharmer/test_db/archive/refs/heads/master.zip

解压:

unzip master.zip
cd test_db-master/

建库导入数据(employees,sakila)

mysql -uroot -p123456   <employees.sql
cd sakila
mysql -uroot -p123456 testdb <sakila-mv-schema.sql
mysql -uroot -p123456 testdb <sakila-mv-data.sql

SQL案例

当前MYSQL版本:8.0.28


20240404163303image.png

Antijoin

MySQL8.0 支持anti-join优化,NOT EXISTS转变成anti join

SQL语句:

查询employees,salaries,在2000-01-01前没领过薪水的员工

select * from employees
where
not exists
(
  select * from salaries
  where salaries.from_date<='2000-01-01'
  and salaries.emp_no=employees.emp_no
);

8.0默认执行计划优化antijoin

explain format=tree  select * from employees
where
not exists
(
  select * from salaries
  where salaries.from_date<='2000-01-01'
  and salaries.emp_no=employees.emp_no
);


20240404162512image.png
关闭semijoin模拟mysql 5.7

/*+ SET_VAR(optimizer_switch = ‘semijoin=off’) */

可以看到执行计划是依赖子查询:subquery in condition; dependent

explain format=tree select /*+ SET_VAR(optimizer_switch = 'semijoin=off') */   * from employees 
where
not exists
(
  select    * from salaries
  where salaries.from_date<='2000-01-01'
  and salaries.emp_no=employees.emp_no  
);


20240404162710image.png

IN优化exists

SQL语句:

#in 
SELECT * FROM sakila.film
          WHERE film_id IN(
         SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
#exists
 SELECT * FROM sakila.film
 WHERE EXISTS (
       SELECT * FROM sakila.film_actor WHERE actor_id = 1
       AND film_actor.film_id = film.film_id);

分别以默认/‘semijoin=off’/‘subquery_materialization_cost_based=off’ 方式比较in 与exists执行计划

“IN”

默认情况下:IN被改写成 join

explain  
SELECT   * FROM sakila.film
          WHERE film_id IN(
         SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
      show warnings        ; 


20240404222402image.png

20240404222435image.png
关闭semijoin情况

explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'semijoin=off') */ * FROM sakila.film
          WHERE film_id IN(
         SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
 show warnings   ;     

子查询materialized_subquery,先执行子查询


20240404224245image.png
关闭subquery_materialization_cost_based情况

未产生物化视图

explain format=tree
SELECT /*+ SET_VAR(optimizer_switch = 'subquery_materialization_cost_based=off') */ * FROM sakila.film
          WHERE film_id IN(
         SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
 show warnings   ;  


20240404224941image.png

“EXISTS”

默认情况:被semijoin 改写成join

explain         
SELECT * FROM sakila.film
 WHERE EXISTS (
       SELECT * FROM sakila.film_actor WHERE actor_id = 1
       AND film_actor.film_id = film.film_id);
   show warnings;   


20240404225142image.png

20240404225159image.png
关闭semijoin情况

未改写成join, DEPENDENT SUBQUERY关联子查询,film驱动film_actor

explain  format=tree       
SELECT /*+ SET_VAR(optimizer_switch = 'semijoin=off') */ * FROM sakila.film
 WHERE EXISTS (
       SELECT * FROM sakila.film_actor WHERE actor_id = 1
       AND film_actor.film_id = film.film_id);
show warnings;


20240404225312image.png
关闭subquery_materialization_cost_based情况

改写semi join,film_actor驱动film

explain  format=tree       
SELECT /*+ SET_VAR(optimizer_switch = 'subquery_materialization_cost_based=off') */ * FROM sakila.film
 WHERE EXISTS (
       SELECT * FROM sakila.film_actor WHERE actor_id = 1
       AND film_actor.film_id = film.film_id);
show warnings;

20240404225554image.png

结论:

不管是in还是exists 在开启semijoin=on 时,SQL就改写成 join
in 在semijoin=off 时,子查询使用materialization
MySQL 8.0.16 开始 EXISTS 子查询与 in 子查询 等价
MySQL 8.0.17 开始 in/EXISTS 转换成antijoin
半连接返回左表中与右表至少匹配一次的数据行,通常体现为 EXISTS 或者 IN 子查询
反连接返回左表中与右表不匹配的数据行,通常体现为 NOT EXISTS 或者 NOT IN 子查询

Hash join
有索引

原SQL

select  * from employees.employees e  
join employees.salaries  s  on e.emp_no=s.emp_no;';   

无hint

explain format=tree 
select  * from employees.employees e  
join employees.salaries  s  on e.emp_no=s.emp_no;

20240405001158image.png

hint提示

explain format=tree 
select  * from employees.employees e IGNORE INDEX (Primary)  
join employees.salaries  s IGNORE INDEX (Primary) on e.emp_no=s.emp_no;


20240405001240image.png
无索引或提示忽略索引倾向hash join

无索引

表准备

CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

执行计划

explain format=tree SELECT * FROM t1
    JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    JOIN t3 ON (t2.c1 = t3.c1);

20240405002307image.png

给t1,t2,t3添加索引

    
create index idx_t1_c1c2 on t1(c1,c2) ;     
create index idx_t2_c1c2 on t1(c1,c2) ;
create index idx_t3_c1 on t1(c1) ;

再看执行计划

explain format=tree SELECT * FROM t1
    JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    JOIN t3 ON (t2.c1 = t3.c1);

20240405002420image.png

外层是一个nested loop join,t2,t3 是hash join;

CBO受索引影响

在MySQL 8.0.20及更高版本中,联接不再需要包含至少一个等联接条件才能使用哈希联接(这点超过传统hash join实现,传统hash join 只能等值连接)

最后修改时间:2024-04-05 19:30:10
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论