概述
在对OceanBase的SQL进行优化的时候,发现一则有趣的现象:IN和EXISTS语句。
介绍
针对以下SQL:
SELECT a.acctg_trans_id FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND EXISTS (SELECT * FROM Acctg_Trans_Entry b WHERE b.acctg_trans_id = a.acctg_trans_id AND b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;
在进行优化时,改写为in语句:
SELECT count(*) FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND a.acctg_trans_id IN (SELECT b.acctg_trans_id FROM Acctg_Trans_Entry b WHERE b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;
发现执行计划有非常大的变化。
以下是EXISTS语句的执行计划:
mysql> explain extended SELECT a.acctg_trans_id FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND EXISTS (SELECT * FROM Acctg_Trans_Entry b WHERE b.acctg_trans_id = a.acctg_trans_id AND b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;
+----------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------+
| ====================================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
--------------------------------------------------------------------
|0 |LIMIT | |1 |133 |
|1 | NESTED-LOOP SEMI JOIN| |1 |133 |
|2 | TABLE SCAN |a |1 |45 |
|3 | TABLE SCAN |b(idx_acctg_trans_entry6)|141 |68 |
====================================================================
Outputs & filters:
-------------------------------------
0 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter(nil), limit(1), offset(nil)
1 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter(nil),
conds(nil), nl_params_([a.acctg_trans_id(0x7f9307bb34b0)]), inner_get=false, self_join=false, batch_join=false
2 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter([a.posted_date(0x7f9307ba1c50) = ?(0x7f9307ba1610)], [a.status(0x7f9307ba0c40) = '1'(0x7f9307ba0600)]),
access([a.status(0x7f9307ba0c40)], [a.posted_date(0x7f9307ba1c50)], [a.acctg_trans_id(0x7f9307bb34b0)]), partitions(p0),
is_index_back=false, filter_before_indexback[false,false],
range_key([a.acctg_trans_id(0x7f9307bb34b0)]), range(MIN ; MAX)always true
3 - output([b.acctg_trans_id(0x7f9307d0a6d0)]), filter(nil),
access([b.acctg_trans_id(0x7f9307d0a6d0)]), partitions(p0),
is_index_back=false,
range_key([b.acctg_trans_id(0x7f9307bb3150)], [b.export_flag(0x7f9307bb44c0)], [b.extend_field(0x7f9307bb54d0)], [b.acctg_trans_entry_seq_id(0x7f9307c85430)]), range(MIN ; MAX),
range_cond([b.export_flag(0x7f9307bb44c0) = '1'(0x7f9307bb3e80)], [b.extend_field(0x7f9307bb54d0) = '0'(0x7f9307bb4e90)], [b.acctg_trans_id(0x7f9307bb3150) = ?(0x7f9307cb15e0)])
Used Hint:
-------------------------------------
/*+
*/
Outline Data:
-------------------------------------
/*+
BEGIN_OUTLINE_DATA
LEADING(@"SEL$1" ("fdm.a"@"SEL$1" "fdm.b"@"SEL$2" ))
USE_NL(@"SEL$1" ("fdm.b"@"SEL$2" ))
NO_USE_NL_MATERIALIZATION(@"SEL$1" ("fdm.b"@"SEL$2" ))
FULL(@"SEL$1" "fdm.a"@"SEL$1")
INDEX(@"SEL$2" "fdm.b"@"SEL$2" "idx_acctg_trans_entry6")
END_OUTLINE_DATA
*/
Plan Type:
-------------------------------------
LOCAL
Optimization Info:
-------------------------------------
a:table_rows:3317, physical_range_rows:3317, logical_range_rows:3317, index_back_rows:0, output_rows:276, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans], estimation info[table_id:1203965232465921, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3317, physical_rc:3317), (table_type:0, version:1597687200554831-1597687200554831-9223372036854775807, logical_rc:0, physical_rc:0)]
b:table_rows:3308983, physical_range_rows:140, logical_range_rows:140, index_back_rows:0, output_rows:140, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans_entry,idx_acctg_trans_entry5,idx_acctg_trans_entry6], pruned_index_name[idx_acctg_trans_entry3,idx_acctg_trans_entry4,idx_acctg_trans_entry2,idx_acctg_trans_entry1], estimation info[table_id:1203965232465953, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3308983, physical_rc:3308983), (table_type:0, version:1598348822178330-1598348822178330-9223372036854775807, logical_rc:0, physical_rc:0)]
level 0:
***********
paths(@18446744073709551614(ordering([a.acctg_trans_id]), cost=2758.833322))
paths(@18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772), @18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772), @18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772))
level 1:
***********
paths(<@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853))
Parameters
-------------------------------------
{obj:{"DATETIME":"2020-03-21 00:00:00.000"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0} |
+----------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql>
以下是改写为in语句后的执行计划:
mysql> explain extended SELECT count(*) FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND a.acctg_trans_id IN (SELECT b.acctg_trans_id FROM Acctg_Trans_Entry b WHERE b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;
+----------------------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------------------+
| ===================================================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
-------------------------------------------------------------------
|0 |LIMIT | |1 |4073159|
|1 | SCALAR GROUP BY | |1 |4073159|
|2 | MERGE JOIN | |432 |4073077|
|3 | TABLE SCAN |a |277 |2759 |
|4 | SUBPLAN SCAN |VIEW1 |5878 |4069046|
|5 | MERGE DISTINCT| |5878 |4068235|
|6 | TABLE SCAN |b(idx_acctg_trans_entry6)|696896 |3913701|
===================================================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_COUNT(*)(0x7f9307bb8990)]), filter(nil), limit(1), offset(nil)
1 - output([T_FUN_COUNT(*)(0x7f9307bb8990)]), filter(nil),
group(nil), agg_func([T_FUN_COUNT(*)(0x7f9307bb8990)])
2 - output([1]), filter(nil),
equal_conds([a.acctg_trans_id(0x7f9307bb8630) = VIEW1.acctg_trans_id(0x7f9307bcc170)(0x7f9307bcc4d0)]), other_conds(nil),
merge_directions([ASC])
3 - output([a.acctg_trans_id(0x7f9307bb8630)]), filter([a.posted_date(0x7f9307ba16f0) = ?(0x7f9307ba10b0)], [a.status(0x7f9307ba06e0) = '1'(0x7f9307ba00a0)]),
access([a.status(0x7f9307ba06e0)], [a.posted_date(0x7f9307ba16f0)], [a.acctg_trans_id(0x7f9307bb8630)]), partitions(p0),
is_index_back=false, filter_before_indexback[false,false],
range_key([a.acctg_trans_id(0x7f9307bb8630)]), range(MIN ; MAX)always true
4 - output([VIEW1.acctg_trans_id(0x7f9307bcc170)]), filter(nil),
access([VIEW1.acctg_trans_id(0x7f9307bcc170)])
5 - output([b.acctg_trans_id(0x7f9307bb4c10)]), filter(nil),
distinct([b.acctg_trans_id(0x7f9307bb4c10)])
6 - output([b.acctg_trans_id(0x7f9307bb4c10)]), filter([b.export_flag(0x7f9307bb3540) = '1'(0x7f9307bb2f00)], [b.extend_field(0x7f9307bb4550) = '0'(0x7f9307bb3f10)]),
access([b.export_flag(0x7f9307bb3540)], [b.extend_field(0x7f9307bb4550)], [b.acctg_trans_id(0x7f9307bb4c10)]), partitions(p0),
is_index_back=false, filter_before_indexback[false,false],
range_key([b.acctg_trans_id(0x7f9307bb4c10)], [b.export_flag(0x7f9307bb3540)], [b.extend_field(0x7f9307bb4550)], [b.acctg_trans_entry_seq_id(0x7f9307c29d20)]), range(MIN,MIN,MIN,MIN ; MAX,MAX,MAX,MAX)always true
Used Hint:
-------------------------------------
/*+
*/
Outline Data:
-------------------------------------
/*+
BEGIN_OUTLINE_DATA
LEADING(@"SEL$1" ("fdm.a"@"SEL$1" "VIEW1"@"SEL$1" ))
USE_MERGE(@"SEL$1" ("VIEW1"@"SEL$1" ))
FULL(@"SEL$1" "fdm.a"@"SEL$1")
INDEX(@"SEL$2" "fdm.b"@"SEL$2" "idx_acctg_trans_entry6")
END_OUTLINE_DATA
*/
Plan Type:
-------------------------------------
LOCAL
Optimization Info:
-------------------------------------
a:table_rows:3317, physical_range_rows:3317, logical_range_rows:3317, index_back_rows:0, output_rows:276, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans], estimation info[table_id:1203965232465921, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3317, physical_rc:3317), (table_type:0, version:1597687200554831-1597687200554831-9223372036854775807, logical_rc:0, physical_rc:0)]
b:table_rows:3308983, physical_range_rows:3308983, logical_range_rows:3308983, index_back_rows:0, output_rows:827245, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans_entry,idx_acctg_trans_entry5,idx_acctg_trans_entry6], pruned_index_name[idx_acctg_trans_entry3,idx_acctg_trans_entry4,idx_acctg_trans_entry2,idx_acctg_trans_entry1], estimation info[table_id:1203965232465953, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3308983, physical_rc:3308983), (table_type:0, version:1598348822178330-1598348822178330-9223372036854775807, logical_rc:0, physical_rc:0)]
level 0:
***********
paths(@18446744073709551614(ordering([a.acctg_trans_id]), cost=2758.833322))
paths(@sub_18446744073709551612(ordering([VIEW1.acctg_trans_id]), cost=4069045.599699), @sub_18446744073709551612(ordering([VIEW1.acctg_trans_id]), cost=4069045.599699))
level 1:
***********
paths(<@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811))
Parameters
-------------------------------------
{obj:{"DATETIME":"2020-03-21 00:00:00.000"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0} |
+----------------------------------------------------------------------------------+
1 row in set (0.01 sec)
OceanBase的版本为:
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 2.2.20 |
+-----------+
1 row in set (0.01 sec)
所有变量均为默认:
mysql> show variables;
+--------------------------------------+---------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+--------------------------------------+---------------------------------------------------------------------------------------------------+
| autocommit | ON |
| auto_increment_cache_size | 10000 |
| auto_increment_increment | 1 |
| auto_increment_offset | 1 |
| binlog_row_image | FULL |
| block_encryption_mode | aes-128-ecb |
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb4 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
| connect_timeout | 10 |
| datadir | /usr/local/mysql/data/ |
| div_precision_increment | 4 |
| error_on_overlap_time | OFF |
| explicit_defaults_for_timestamp | ON |
| foreign_key_checks | ON |
| group_concat_max_len | 1024 |
| identity | 0 |
| init_connect | |
| interactive_timeout | 28800 |
| is_result_accurate | ON |
| last_insert_id | 0 |
| license | |
| lower_case_table_names | 1 |
| max_allowed_packet | 4194304 |
| max_sp_recursion_depth | 0 |
| max_user_connections | 0 |
| net_buffer_length | 16384 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| ob_bnl_join_cache_size | 10485760 |
| ob_check_sys_variable | ON |
| ob_compatibility_mode | MYSQL |
| ob_create_table_strict_mode | ON |
| ob_default_replica_num | 1 |
| ob_early_lock_release | OFF |
| ob_enable_aggregation_pushdown | ON |
| ob_enable_blk_nestedloop_join | OFF |
| ob_enable_hash_group_by | ON |
| ob_enable_index_direct_select | OFF |
| ob_enable_jit | OFF |
| ob_enable_plan_cache | ON |
| ob_enable_sql_audit | ON |
| ob_enable_trace_log | OFF |
| ob_enable_transformation | ON |
| ob_enable_transmission_checksum | ON |
| ob_enable_truncate_flashback | OFF |
| ob_interm_result_mem_limit | 2147483648 |
| ob_last_schema_version | 0 |
| ob_log_level | disabled |
| ob_max_parallel_degree | 32 |
| ob_org_cluster_id | 0 |
| ob_plan_cache_evict_high_percentage | 90 |
| ob_plan_cache_evict_low_percentage | 50 |
| ob_plan_cache_percentage | 5 |
| ob_query_timeout | 10000000 |
| ob_read_consistency | STRONG |
| ob_reserved_meta_memory_percentage | 10 |
| ob_route_policy | READONLY_ZONE_FIRST |
| ob_sql_audit_percentage | 10 |
| ob_sql_work_area_percentage | 10 |
| ob_stmt_parallel_degree | 1 |
| ob_tcp_invited_nodes | % |
| ob_temp_tablespace_size_percentage | 0 |
| ob_timestamp_service | GTS |
| ob_trx_idle_timeout | 120000000 |
| ob_trx_timeout | 100000000 |
| optimizer_capture_sql_plan_baselines | OFF |
| optimizer_use_sql_plan_baselines | OFF |
| parallel_max_servers | 0 |
| parallel_servers_target | 0 |
| plugin_dir | ./plugin_dir/ |
| query_cache_size | 1048576 |
| query_cache_type | OFF |
| read_only | OFF |
| recyclebin | OFF |
| sql_auto_is_null | OFF |
| sql_mode | STRICT_ALL_TABLES |
| sql_quote_show_create | ON |
| sql_select_limit | 9223372036854775807 |
| sql_throttle_cpu | -1 |
| sql_throttle_current_priority | 100 |
| sql_throttle_io | -1 |
| sql_throttle_logical_reads | -1 |
| sql_throttle_network | -1 |
| sql_throttle_priority | -1 |
| sql_throttle_rt | -1 |
| sql_warnings | OFF |
| system_time_zone | +08:00 |
| timestamp | 0 |
| time_zone | +8:00 |
| tracefile_identifier | |
| tx_isolation | READ-COMMITTED |
| tx_read_only | OFF |
| undo_retention | 0 |
| version | 2.2.20 |
| version_comment | OceanBase 2.2.20 (r1855201-68b678db2d1be49388ae12502678e5e3016d1c42) (Built Aug 10 2019 11:56:23) |
| wait_timeout | 28800 |
| _primary_zone_entity_count | 0 |
+--------------------------------------+---------------------------------------------------------------------------------------------------+
105 rows in set (0.01 sec)
mysql>
从执行计划上来看,改写为in语句之后,产生了VIEW1名称的一个临时表,COST值直线偏高,执行计划也发生了非常大的变化。
思考
在MySQL中,in语句的算法有五种。其中一种就是会把in中的子查询做成一个临时固化表,如果经常使用的话,会为其加hash索引,使其更高效。
这里似乎只是用到了一种,其他在MySQL中的算法是否会用到,还有待观察与验证。
最后修改时间:2020-08-28 19:27:08
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




