暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

OceanBase:In与EXISTS有趣的现象

原创 清酒和歌 2020-08-28
1773

概述

在对OceanBase的SQL进行优化的时候,发现一则有趣的现象:IN和EXISTS语句。

介绍

针对以下SQL:

SELECT a.acctg_trans_id FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND EXISTS (SELECT * FROM Acctg_Trans_Entry b WHERE b.acctg_trans_id = a.acctg_trans_id AND b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;

在进行优化时,改写为in语句:

SELECT count(*) FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND a.acctg_trans_id IN (SELECT b.acctg_trans_id FROM Acctg_Trans_Entry b WHERE b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1;

发现执行计划有非常大的变化。
以下是EXISTS语句的执行计划:

mysql> explain extended SELECT a.acctg_trans_id FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND EXISTS (SELECT * FROM Acctg_Trans_Entry b WHERE b.acctg_trans_id = a.acctg_trans_id AND b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1; +----------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------+ | ==================================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------------------------- |0 |LIMIT | |1 |133 | |1 | NESTED-LOOP SEMI JOIN| |1 |133 | |2 | TABLE SCAN |a |1 |45 | |3 | TABLE SCAN |b(idx_acctg_trans_entry6)|141 |68 | ==================================================================== Outputs & filters: ------------------------------------- 0 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter(nil), limit(1), offset(nil) 1 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter(nil), conds(nil), nl_params_([a.acctg_trans_id(0x7f9307bb34b0)]), inner_get=false, self_join=false, batch_join=false 2 - output([a.acctg_trans_id(0x7f9307bb34b0)]), filter([a.posted_date(0x7f9307ba1c50) = ?(0x7f9307ba1610)], [a.status(0x7f9307ba0c40) = '1'(0x7f9307ba0600)]), access([a.status(0x7f9307ba0c40)], [a.posted_date(0x7f9307ba1c50)], [a.acctg_trans_id(0x7f9307bb34b0)]), partitions(p0), is_index_back=false, filter_before_indexback[false,false], range_key([a.acctg_trans_id(0x7f9307bb34b0)]), range(MIN ; MAX)always true 3 - output([b.acctg_trans_id(0x7f9307d0a6d0)]), filter(nil), access([b.acctg_trans_id(0x7f9307d0a6d0)]), partitions(p0), is_index_back=false, range_key([b.acctg_trans_id(0x7f9307bb3150)], [b.export_flag(0x7f9307bb44c0)], [b.extend_field(0x7f9307bb54d0)], [b.acctg_trans_entry_seq_id(0x7f9307c85430)]), range(MIN ; MAX), range_cond([b.export_flag(0x7f9307bb44c0) = '1'(0x7f9307bb3e80)], [b.extend_field(0x7f9307bb54d0) = '0'(0x7f9307bb4e90)], [b.acctg_trans_id(0x7f9307bb3150) = ?(0x7f9307cb15e0)]) Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA LEADING(@"SEL$1" ("fdm.a"@"SEL$1" "fdm.b"@"SEL$2" )) USE_NL(@"SEL$1" ("fdm.b"@"SEL$2" )) NO_USE_NL_MATERIALIZATION(@"SEL$1" ("fdm.b"@"SEL$2" )) FULL(@"SEL$1" "fdm.a"@"SEL$1") INDEX(@"SEL$2" "fdm.b"@"SEL$2" "idx_acctg_trans_entry6") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- a:table_rows:3317, physical_range_rows:3317, logical_range_rows:3317, index_back_rows:0, output_rows:276, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans], estimation info[table_id:1203965232465921, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3317, physical_rc:3317), (table_type:0, version:1597687200554831-1597687200554831-9223372036854775807, logical_rc:0, physical_rc:0)] b:table_rows:3308983, physical_range_rows:140, logical_range_rows:140, index_back_rows:0, output_rows:140, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans_entry,idx_acctg_trans_entry5,idx_acctg_trans_entry6], pruned_index_name[idx_acctg_trans_entry3,idx_acctg_trans_entry4,idx_acctg_trans_entry2,idx_acctg_trans_entry1], estimation info[table_id:1203965232465953, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3308983, physical_rc:3308983), (table_type:0, version:1598348822178330-1598348822178330-9223372036854775807, logical_rc:0, physical_rc:0)] level 0: *********** paths(@18446744073709551614(ordering([a.acctg_trans_id]), cost=2758.833322)) paths(@18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772), @18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772), @18446744073709551613(ordering([b.acctg_trans_id], [b.acctg_trans_entry_seq_id]), cost=4599806.061772)) level 1: *********** paths(<@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853), <@18446744073709551614 -M-> @18446744073709551613>(ordering([a.acctg_trans_id]), cost=16617648.081853)) Parameters ------------------------------------- {obj:{"DATETIME":"2020-03-21 00:00:00.000"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0} | +----------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql>

以下是改写为in语句后的执行计划:

mysql> explain extended SELECT count(*) FROM Acctg_Trans a WHERE STATUS = '1' AND a.posted_date = '2020-03-21 00:00:00.000' AND a.acctg_trans_id IN (SELECT b.acctg_trans_id FROM Acctg_Trans_Entry b WHERE b.export_flag = '1' AND b.extend_field = '0') ORDER BY a.acctg_trans_id LIMIT 1; +----------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------+ | =================================================================== |ID|OPERATOR |NAME |EST. ROWS|COST | ------------------------------------------------------------------- |0 |LIMIT | |1 |4073159| |1 | SCALAR GROUP BY | |1 |4073159| |2 | MERGE JOIN | |432 |4073077| |3 | TABLE SCAN |a |277 |2759 | |4 | SUBPLAN SCAN |VIEW1 |5878 |4069046| |5 | MERGE DISTINCT| |5878 |4068235| |6 | TABLE SCAN |b(idx_acctg_trans_entry6)|696896 |3913701| =================================================================== Outputs & filters: ------------------------------------- 0 - output([T_FUN_COUNT(*)(0x7f9307bb8990)]), filter(nil), limit(1), offset(nil) 1 - output([T_FUN_COUNT(*)(0x7f9307bb8990)]), filter(nil), group(nil), agg_func([T_FUN_COUNT(*)(0x7f9307bb8990)]) 2 - output([1]), filter(nil), equal_conds([a.acctg_trans_id(0x7f9307bb8630) = VIEW1.acctg_trans_id(0x7f9307bcc170)(0x7f9307bcc4d0)]), other_conds(nil), merge_directions([ASC]) 3 - output([a.acctg_trans_id(0x7f9307bb8630)]), filter([a.posted_date(0x7f9307ba16f0) = ?(0x7f9307ba10b0)], [a.status(0x7f9307ba06e0) = '1'(0x7f9307ba00a0)]), access([a.status(0x7f9307ba06e0)], [a.posted_date(0x7f9307ba16f0)], [a.acctg_trans_id(0x7f9307bb8630)]), partitions(p0), is_index_back=false, filter_before_indexback[false,false], range_key([a.acctg_trans_id(0x7f9307bb8630)]), range(MIN ; MAX)always true 4 - output([VIEW1.acctg_trans_id(0x7f9307bcc170)]), filter(nil), access([VIEW1.acctg_trans_id(0x7f9307bcc170)]) 5 - output([b.acctg_trans_id(0x7f9307bb4c10)]), filter(nil), distinct([b.acctg_trans_id(0x7f9307bb4c10)]) 6 - output([b.acctg_trans_id(0x7f9307bb4c10)]), filter([b.export_flag(0x7f9307bb3540) = '1'(0x7f9307bb2f00)], [b.extend_field(0x7f9307bb4550) = '0'(0x7f9307bb3f10)]), access([b.export_flag(0x7f9307bb3540)], [b.extend_field(0x7f9307bb4550)], [b.acctg_trans_id(0x7f9307bb4c10)]), partitions(p0), is_index_back=false, filter_before_indexback[false,false], range_key([b.acctg_trans_id(0x7f9307bb4c10)], [b.export_flag(0x7f9307bb3540)], [b.extend_field(0x7f9307bb4550)], [b.acctg_trans_entry_seq_id(0x7f9307c29d20)]), range(MIN,MIN,MIN,MIN ; MAX,MAX,MAX,MAX)always true Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA LEADING(@"SEL$1" ("fdm.a"@"SEL$1" "VIEW1"@"SEL$1" )) USE_MERGE(@"SEL$1" ("VIEW1"@"SEL$1" )) FULL(@"SEL$1" "fdm.a"@"SEL$1") INDEX(@"SEL$2" "fdm.b"@"SEL$2" "idx_acctg_trans_entry6") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- a:table_rows:3317, physical_range_rows:3317, logical_range_rows:3317, index_back_rows:0, output_rows:276, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans], estimation info[table_id:1203965232465921, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3317, physical_rc:3317), (table_type:0, version:1597687200554831-1597687200554831-9223372036854775807, logical_rc:0, physical_rc:0)] b:table_rows:3308983, physical_range_rows:3308983, logical_range_rows:3308983, index_back_rows:0, output_rows:827245, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[acctg_trans_entry,idx_acctg_trans_entry5,idx_acctg_trans_entry6], pruned_index_name[idx_acctg_trans_entry3,idx_acctg_trans_entry4,idx_acctg_trans_entry2,idx_acctg_trans_entry1], estimation info[table_id:1203965232465953, (table_type:1, version:0-1598551210469200-1598551210469200, logical_rc:3308983, physical_rc:3308983), (table_type:0, version:1598348822178330-1598348822178330-9223372036854775807, logical_rc:0, physical_rc:0)] level 0: *********** paths(@18446744073709551614(ordering([a.acctg_trans_id]), cost=2758.833322)) paths(@sub_18446744073709551612(ordering([VIEW1.acctg_trans_id]), cost=4069045.599699), @sub_18446744073709551612(ordering([VIEW1.acctg_trans_id]), cost=4069045.599699)) level 1: *********** paths(<@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811), <@18446744073709551614 -M-> @sub_18446744073709551612>(ordering([a.acctg_trans_id]), cost=4073076.106811)) Parameters ------------------------------------- {obj:{"DATETIME":"2020-03-21 00:00:00.000"}, accuracy:{length:-1, precision:-1, scale:-1}, flag:0} | +----------------------------------------------------------------------------------+ 1 row in set (0.01 sec)

OceanBase的版本为:

mysql> select @@version; +-----------+ | @@version | +-----------+ | 2.2.20 | +-----------+ 1 row in set (0.01 sec)

所有变量均为默认:

mysql> show variables; +--------------------------------------+---------------------------------------------------------------------------------------------------+ | Variable_name | Value | +--------------------------------------+---------------------------------------------------------------------------------------------------+ | autocommit | ON | | auto_increment_cache_size | 10000 | | auto_increment_increment | 1 | | auto_increment_offset | 1 | | binlog_row_image | FULL | | block_encryption_mode | aes-128-ecb | | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb4 | | collation_connection | utf8mb4_general_ci | | collation_database | utf8mb4_general_ci | | collation_server | utf8mb4_general_ci | | connect_timeout | 10 | | datadir | /usr/local/mysql/data/ | | div_precision_increment | 4 | | error_on_overlap_time | OFF | | explicit_defaults_for_timestamp | ON | | foreign_key_checks | ON | | group_concat_max_len | 1024 | | identity | 0 | | init_connect | | | interactive_timeout | 28800 | | is_result_accurate | ON | | last_insert_id | 0 | | license | | | lower_case_table_names | 1 | | max_allowed_packet | 4194304 | | max_sp_recursion_depth | 0 | | max_user_connections | 0 | | net_buffer_length | 16384 | | net_read_timeout | 30 | | net_write_timeout | 60 | | ob_bnl_join_cache_size | 10485760 | | ob_check_sys_variable | ON | | ob_compatibility_mode | MYSQL | | ob_create_table_strict_mode | ON | | ob_default_replica_num | 1 | | ob_early_lock_release | OFF | | ob_enable_aggregation_pushdown | ON | | ob_enable_blk_nestedloop_join | OFF | | ob_enable_hash_group_by | ON | | ob_enable_index_direct_select | OFF | | ob_enable_jit | OFF | | ob_enable_plan_cache | ON | | ob_enable_sql_audit | ON | | ob_enable_trace_log | OFF | | ob_enable_transformation | ON | | ob_enable_transmission_checksum | ON | | ob_enable_truncate_flashback | OFF | | ob_interm_result_mem_limit | 2147483648 | | ob_last_schema_version | 0 | | ob_log_level | disabled | | ob_max_parallel_degree | 32 | | ob_org_cluster_id | 0 | | ob_plan_cache_evict_high_percentage | 90 | | ob_plan_cache_evict_low_percentage | 50 | | ob_plan_cache_percentage | 5 | | ob_query_timeout | 10000000 | | ob_read_consistency | STRONG | | ob_reserved_meta_memory_percentage | 10 | | ob_route_policy | READONLY_ZONE_FIRST | | ob_sql_audit_percentage | 10 | | ob_sql_work_area_percentage | 10 | | ob_stmt_parallel_degree | 1 | | ob_tcp_invited_nodes | % | | ob_temp_tablespace_size_percentage | 0 | | ob_timestamp_service | GTS | | ob_trx_idle_timeout | 120000000 | | ob_trx_timeout | 100000000 | | optimizer_capture_sql_plan_baselines | OFF | | optimizer_use_sql_plan_baselines | OFF | | parallel_max_servers | 0 | | parallel_servers_target | 0 | | plugin_dir | ./plugin_dir/ | | query_cache_size | 1048576 | | query_cache_type | OFF | | read_only | OFF | | recyclebin | OFF | | sql_auto_is_null | OFF | | sql_mode | STRICT_ALL_TABLES | | sql_quote_show_create | ON | | sql_select_limit | 9223372036854775807 | | sql_throttle_cpu | -1 | | sql_throttle_current_priority | 100 | | sql_throttle_io | -1 | | sql_throttle_logical_reads | -1 | | sql_throttle_network | -1 | | sql_throttle_priority | -1 | | sql_throttle_rt | -1 | | sql_warnings | OFF | | system_time_zone | +08:00 | | timestamp | 0 | | time_zone | +8:00 | | tracefile_identifier | | | tx_isolation | READ-COMMITTED | | tx_read_only | OFF | | undo_retention | 0 | | version | 2.2.20 | | version_comment | OceanBase 2.2.20 (r1855201-68b678db2d1be49388ae12502678e5e3016d1c42) (Built Aug 10 2019 11:56:23) | | wait_timeout | 28800 | | _primary_zone_entity_count | 0 | +--------------------------------------+---------------------------------------------------------------------------------------------------+ 105 rows in set (0.01 sec) mysql>

从执行计划上来看,改写为in语句之后,产生了VIEW1名称的一个临时表,COST值直线偏高,执行计划也发生了非常大的变化。

思考

在MySQL中,in语句的算法有五种。其中一种就是会把in中的子查询做成一个临时固化表,如果经常使用的话,会为其加hash索引,使其更高效。
这里似乎只是用到了一种,其他在MySQL中的算法是否会用到,还有待观察与验证。

最后修改时间:2020-08-28 19:27:08
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论