暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MySQL 联接索引

原创 Kyle Hailey 2020-04-16
593

MySQL比Oracle示例更简单, 因为MySQL仅具有嵌套循环,而没有哈希联接。

这是查询,就像上一篇文章中的Oracle示例一样

select max(t1.data) 
from  t1, t2  
where t1.id = t2.id 
and t1.clus  = 1

因此,下面只有4个示例全部由NL加入

  1. 在T1和T2上进行全表扫描
    7.83秒
  2. T1谓词过滤器列上的索引
    7.39秒
  3. T2连接列上的索引
    0.49秒
  4. T2连接列和T1谓词过滤器列上的索引
    0.06秒

在MySQL 8.0.18之前没有在MySQL上进行“解释分析”的想法,而我在8.0.17上进行了测试,因此解释计划的成本是估计值而不是实际值,但是查询的耗时是实际值。

drop table seed;
CREATE TABLE seed ( id INT AUTO_INCREMENT PRIMARY KEY, val INT);
 insert into seed(val) values (1);
insert into seed(val) select val from seed;  /*  2 */
insert into seed(val) select val from seed;  /*  4 */
insert into seed(val) select val from seed;  /*  8 */
insert into seed(val) select val from seed;  /*  16 */
insert into seed(val) select val from seed;  /*  32 */
insert into seed(val) select val from seed;  /*  64 */
insert into seed(val) select val from seed;  /*  128 */
insert into seed(val) select val from seed;  /*  256 */
insert into seed(val) select val from seed;  /*  512 */
insert into seed(val) select val from seed;  /*  1024 */
insert into seed(val) select val from seed;  /*  2048 */
insert into seed(val) select val from seed;  /*  4096 */
insert into seed(val) select val from seed;  /*  8192 */
insert into seed(val) select val from seed;  /*  16384 */
insert into seed(val) select val from seed;  /*  32768 */
insert into seed(val) select val from seed;  /*  131072 */
insert into seed(val) select val from seed;  /*  262144 */
insert into seed(val) select val from seed;  /*  524288 */
insert into seed(val) select val from seed;  /*  1048576 */
 select count(*) from seed;

 drop table dummy;
 
  create table dummy ( 
    id  int,
    clus int,
    val int,
    data VARCHAR(40)
 );
 
 insert into dummy (
    id, clus , val, data
)
 select 
 id,
 truncate(id/169,0),
 mod(id,10000), 
 CONV(FLOOR(RAND() * 99999999999999), 10, 36) 
from seed
;

select count(*) from dummy where clus = 1;
+----------+
| count(*) |
+----------+
|      106 |
+----------+
select count(*) from dummy where val =1;
+----------+
| count(*) |
+----------+
|      106 |
+----------+

drop table t1;
drop table t2;

create table t1 as select * from dummy;
create table t2 as select * from dummy; 

奇怪的是,种子表中的序列号中有间隔,因此截断中的奇异值得到106个值,即truncate(id / 169,0)。

首先,我们在没有索引的情况下运行查询(我已经创建了索引,因此在示例中使用“ USE INDEX()”将其强制关闭)

两个节点都具有访问类型

"access_type": "ALL"

因此他们正在进行全表扫描。

explain format=json select max(t1.data) 
from 
t1 USE INDEX ()
, t2 USE INDEX ()
where t1.id = t2.id 
and t1.clus = 1
;

1 row in set (7.83 sec)

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "18102538.86"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "t1",
          "access_type": "ALL",
          "rows_examined_per_scan": 1045682,
          "rows_produced_per_join": 172,
          "filtered": "0.02",
          "cost_info": {
            "read_cost": "105376.24",
            "eval_cost": "17.21",
            "prefix_cost": "105393.45",
            "data_read_per_join": "29K"
          },
          "used_columns": [
            "id",
            "clus",
            "data"
          ],
          "attached_condition": "(`kyle`.`t1`.`clus` = 1)"
        }
      },
      {
        "table": {
          "table_name": "t2",
          "access_type": "ALL",
          "rows_examined_per_scan": 1045682,
          "rows_produced_per_join": 172,
          "filtered": "0.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "917.35",
            "eval_cost": "17.21",
            "prefix_cost": "18102538.86",
            "data_read_per_join": "29K"
          },
          "used_columns": [
            "id"
          ],
          "attached_condition": "(`kyle`.`t2`.`id` = `kyle`.`t1`.`id`)"
        }
      }
    ]
  }
}

我们可以看到T1上的过滤器应该过滤大部分表(从1045682中抽出172行),所以现在在t1.clus上添加索引过滤器(我在t1上删除了“ USE INDEX()”,所以T1上的索引是现在用于解释)。执行时间稍快一些,但与7.39和7.39相比并没有那么多。我们看到T1现在可以通过索引访问

explain format=json select max(t1.data) 
from  t1, t2  
USE INDEX ()
where t1.id = t2.id 
and t1.clus    = 1
;
 row in set (7.39 sec)

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "11085148.28"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "t1",
          "access_type": "ref",
          "possible_keys": [
            "t1_clus"
          ],
          "key": "t1_clus",
          "used_key_parts": [
            "clus"
          ],
          "key_length": "5",
          "ref": [
            "const"
          ],
          "rows_examined_per_scan": 106,
          "rows_produced_per_join": 106,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "26.50",
            "eval_cost": "10.60",
            "prefix_cost": "37.10",
            "data_read_per_join": "18K"
          },
          "used_columns": [
            "id",
            "clus",
            "data"
          ]
        }
      },
      {
        "table": {
          "table_name": "t2",
          "access_type": "ALL",
          "rows_examined_per_scan": 1045682,
          "rows_produced_per_join": 105,
          "filtered": "0.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "881.98",
            "eval_cost": "10.60",
            "prefix_cost": "11085148.28",
            "data_read_per_join": "18K"
          },
          "used_columns": [
            "id"
          ],
          "attached_condition": "(`kyle`.`t2`.`id` = `kyle`.`t1`.`id`)"
        }
      }
    ]
  }
}
 

在T2上,我们查看了1045682后访问了105行,因此这是建立索引的好机会。现在添加索引或连接列,我们可以大大缩短执行时间。我们从7.39秒缩短至0.49或7.39 / 0.49,或提高了15倍

explain format=json select max(t1.data) 
from  t1 USE INDEX () , t2  
where t1.id = t2.id 
and t1.clus   = 1
;
1 row in set (0.49 sec)

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "105582.64"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "t1",
          "access_type": "ALL",
          "rows_examined_per_scan": 1045682,
          "rows_produced_per_join": 172,
          "filtered": "0.02",
          "cost_info": {
            "read_cost": "105376.24",
            "eval_cost": "17.21",
            "prefix_cost": "105393.45",
            "data_read_per_join": "29K"
          },
          "used_columns": [
            "id",
            "clus",
            "data"
          ],
          "attached_condition": "((`kyle`.`t1`.`clus` = 1) and (`kyle`.`t1`.`id` is not null))"
        }
      },
      {
        "table": {
          "table_name": "t2",
          "access_type": "ref",
          "possible_keys": [
            "t2_id"
          ],
          "key": "t2_id",
          "used_key_parts": [
            "id"
          ],
          "key_length": "5",
          "ref": [
            "kyle.t1.id"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 172,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "171.98",
            "eval_cost": "17.21",
            "prefix_cost": "105582.64",
            "data_read_per_join": "29K"
          },
          "used_columns": [
            "id"
          ]
        }
      }
    ]
  }
}

现在使用这两个索引,这使我们的执行时间减少了0.43倍,但是与之前的0.49 / 0.06相比,即8倍的改进,仍然相当不错

explain format=json select max(t1.data) 
from  t1, t2  
where t1.id = t2.id 
and t1.clus  = 1
;
1 row in set (0.06 sec)

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "153.62"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "t1",
          "access_type": "ref",
          "possible_keys": [
            "t1_clus"
          ],
          "key": "t1_clus",
          "used_key_parts": [
            "clus"
          ],
          "key_length": "5",
          "ref": [
            "const"
          ],
          "rows_examined_per_scan": 106,
          "rows_produced_per_join": 106,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "26.50",
            "eval_cost": "10.60",
            "prefix_cost": "37.10",
            "data_read_per_join": "18K"
          },
          "used_columns": [
            "id",
            "clus",
            "data"
          ],
          "attached_condition": "(`kyle`.`t1`.`id` is not null)"
        }
      },
      {
        "table": {
          "table_name": "t2",
          "access_type": "ref",
          "possible_keys": [
            "t2_id"
          ],
          "key": "t2_id",
          "used_key_parts": [
            "id"
          ],
          "key_length": "5",
          "ref": [
            "kyle.t1.id"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 106,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "105.93",
            "eval_cost": "10.60",
            "prefix_cost": "153.63",
            "data_read_per_join": "18K"
          },
          "used_columns": [
            "id"
          ]
        }
      }
    ]
  }

原文链接:http://www.oaktable.net/content/indexes-joins-mysql

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论