
问题说明:
数据库迁移到新服务器后,某一张视图查询缓慢,原库查询耗时3分钟,新库查询5小时以上。
环境说明:
新库:
DB:Oracle 11.2.0.4.0 RAC
OS:Redhat 7.6
原库:
DB:Oracle 11.2.0.4.0 单机
OS:AIX 5.1
问题分析:
处理此类问题,常见思路如下:1 分别对比原库和新库该SQL对应的执行计划。2 对比两个执行计划,找到差异部分或最耗时的部分。3 分析执行计划差异原因,根据原因尝试解决问题。

本次案例视图引用的SQL使用了28个union all组成,关联了30多张表。
在执行如下SQL时,新库会卡住很久,超过5小时。
select count(*) from v_cjc_01;
视图内SQL类似如下:
create or replace view v_cjc_01 asselect col1,col2,col3,...col10from t1where xxx=yyyunion allselect col1,col2,col3,...col10from t2where zzz=yyyunion all......union allselect col1,col2,col3,...col10from t3where xxx=yyy and col2 not in (select col2 from t3 a where col1=aaa col3=bbb)union allselect col1,col2,col3,...col10from t5where ttt=yyy;
查看执行计划,发现主要有两处不同
一:
原执行计划
|* 40 | HASH JOIN | | 72408 | 2616K| 2264K| 28026 (1)| 00:05:37 || 41 | VIEW | VW_JF_SET$5EF83EA7 | 72408 | 1414K| | 3028 (1)| 00:00:37 || 42 | UNION-ALL | | | | | | ||* 43 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 72407 | 2050K| | 1514 (1)| 00:00:19 ||* 44 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 29 | | 1514 (1)| 00:00:19 || 45 | INDEX FAST FULL SCAN | IDX_MB_ACCINF_ACCNO | 9179K| 148M| | 12212 (1)| 00:02:27 |----------------------------------------------------------------------------------------------------------------------
新执行计划
| 13 | NESTED LOOPS | | 236 | 9676 | 2003 (1)| 00:00:25 ||* 14 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 235 | 5640 | 1533 (1)| 00:00:19 ||* 15 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 || 16 | NESTED LOOPS | | 1 | 41 | 1535 (1)| 00:00:19 ||* 17 | TABLE ACCESS FULL | CJC_T1_00000000000000000 | 1 | 24 | 1533 (1)| 00:00:19 ||* 18 | INDEX RANGE SCAN | IDX_MB_ACCINF_ACCNO | 1 | 17 | 2 (0)| 00:00:01 |
二:
原执行计划
|* 16 | HASH JOIN RIGHT ANTI NA | | 6647 | 1174K| 12M| 26593 (1)| 00:05:20 ||* 17 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 130K| 10M| | 11309 (1)| 00:02:16 ||* 18 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 664K| 60M| | 11309 (1)| 00:02:16 |
新执行计划
|* 21 | FILTER | | | | | ||* 22 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 4928 | 202K| 11126 (1)| 00:02:14 ||* 23 | TABLE ACCESS FULL | CJC_T2_00000000000000000 | 1 | 27 | 11124 (1)| 00:02:14 |
根据第一处不同,可以看到:原执行计划,出现了VW_JF_SET$5EF83EA7,说明原库使用到了Join Factorization特性,在该特性下,union all连接各个分支查询SQL时,会将各个分支部分中公共部分单独提出作为一个单独的结果集,然后在和原union all剩下的部分做关联,从而减少了公共部分表重复扫描的问题,性能会有提升。

为什么新库没有使用到Join Factorization特性呢?
对比两个库的_OPTIMIZER_JOIN_FACTORIZATION参数,都是使用的默认值true。说明优化器认为新库不适合使用Join Factorization特性。但是从执行计划看两种表数据量都比较大,结果集也很大,新库使用的NESTED LOOPS明显比原库HASH JOIN性能差。
可以通过10053事件对比查看原库和新库对同一SQL是如何进行改写,如何计算cost的?
原库和新库分别执行:
alter session set tracefile_identifier='10053';ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';select count(*) from v_cjc_01;ALTER SESSION SET EVENTS '10053 trace name context off';select value from v$diag_info where name='Default Trace File';
查看生成的trace文件:
搜索关键字Final query after transformations
新库SQL改写部分:
UNION ALL(SELECT '0' "KERNAL_INTER4"FROM "CHEN"."CJC_T1_00000000000000000" "A",CHENPMB."MB_ACCINF" "B"WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"AND "A"."MPB_BUSINESSTYPE" = '10'AND "A"."MPB_TRANTIME" IS NOT NULLAND LENGTH("A"."MPB_TRANTIME") = 14AND "A"."MPB_TRANTIME" > '20200501000000') UNION ALL(SELECT '0' "KERNAL_INTER4"FROM "CHEN"."CJC_T1_00000000000000000" "A",CHENPMB."MB_ACCINF" "B"WHERE "B"."AIF_ACCNO" = "A"."CJC_T3_000000"AND "A"."MPB_BUSINESSTYPE" = '11'AND "A"."MPB_TRANTIME" IS NOT NULLAND LENGTH("A"."MPB_TRANTIME") = 14AND "A"."MPB_TRANTIME" > '20200501000000')
原库SQL改写部分:
UNION ALL(SELECT '0' "KERNAL_INTER4"FROM CHENPMB."EA_EACCT_INFO_S" "EA_EACCT_INFO_S"WHERE "EA_EACCT_INFO_S"."EA_ACCOUNT_OPEN_RESULT" = '1') UNION ALL(SELECT "VW_JF_SET$5EF83EA7"."ITEM_2" "KERNAL_INTER4"FROM ((SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"FROM "CHEN"."CJC_T1_00000000000000000" "A"WHERE "A"."MPB_TRANTIME" > '20200501000000'AND LENGTH("A"."MPB_TRANTIME") = 14AND "A"."MPB_TRANTIME" IS NOT NULLAND "A"."MPB_BUSINESSTYPE" = '10') UNION ALL(SELECT "A"."CJC_T3_000000" "ITEM_1", '0' "ITEM_2"FROM "CHEN"."CJC_T1_00000000000000000" "A"WHERE "A"."MPB_TRANTIME" > '20200501000000'AND LENGTH("A"."MPB_TRANTIME") = 14AND "A"."MPB_TRANTIME" IS NOT NULLAND "A"."MPB_BUSINESSTYPE" = '11')) "VW_JF_SET$5EF83EA7",CHENPMB."MB_ACCINF" "B"WHERE "B"."AIF_ACCNO" = "VW_JF_SET$5EF83EA7"."ITEM_1")) "MB_YINGXIAO_COUNT1"
尝试按照原库10053转换的SQL改写新库SQL后,执行速度恢复正常,但是SQL改动量较大。

根据第二处不同,可以看到:
改写前原SQL部分:
select CUSTOMERID as CSTNO,TRANTIME as TRANTIME,'10' as TRANTYPE,to_date(TRANTIME, 'yyyymmddhh24miss') as TRANTIME1,createtime,1 as cnt,0 as amt,'0' as kernal_inter1,'0' as kernal_inter2,'0' as kernal_inter3,'0' as kernal_inter4from CJC_T2_00000000000000000 cwhere c.tranname = 'XX绑定'and c.retcode = '0000'and c.TRANTIME is not nulland length(c.TRANTIME) = 14and c.TRANTIME > '20200501000000'and c.medicalcardno not in(select a.medicalcardnofrom CJC_T2_00000000000000000 awhere a.tranname = 'XX解绑'and a.retcode = '0000')
新库SQL改写部分:
SELECT '0' "KERNAL_INTER4"FROM CHENPMB."CJC_T2_00000000000000000" "A",CHENPMB."CJC_T2_00000000000000000" "C"WHERE "C"."TRANNAME" = 'XX绑定'AND "C"."RETCODE" = '0000'AND "C"."TRANTIME" IS NOT NULLAND LENGTH("C"."TRANTIME") = 14AND "C"."TRANTIME" > '20200501000000'AND "C"."MEDICALCARDNO" = "A"."MEDICALCARDNO"AND "A"."TRANNAME" = 'XX解绑'AND "A"."RETCODE" = '0000'
原库SQL改写部分:
(SELECT '0' "KERNAL_INTER4"FROM CHENPMB."CJC_T2_00000000000000000" "C"WHERE "C"."TRANNAME" = 'XX绑定'AND "C"."RETCODE" = '0000'AND "C"."TRANTIME" IS NOT NULLAND LENGTH("C"."TRANTIME") = 14AND "C"."TRANTIME" > '20200501000000'AND NOT EXISTS(SELECT 0FROM CHENPMB."CJC_T2_00000000000000000" "A"WHERE LNNVL("A"."MEDICALCARDNO" <> "C"."MEDICALCARDNO")AND "A"."TRANNAME" = 'XX解绑'AND "A"."RETCODE" = '0000'))
根据执行计划可知,
原库将not in改写成NOT EXISTS,表关联走的 HASH JOIN RIGHT ANTI NA,效率较高。尝试将新库SQL中not in改写成NOT EXISTS后,效率有明显提升,不到5分钟可以执行完成。

SQL是否还有优化空间呢?
可以看看Oracle给出的建议sql tuning advisor:
SQL>DECLAREmy_task_name VARCHAR2(30);BEGINmy_task_name := DBMS_SQLTUNE.CREATE_TUNING_TASK(sql_id => 'gzjy5nh7g0sfs',scope => 'COMPREHENSIVE',time_limit => 3600,task_name => 'cjc_sql_tuing_task',description => 'SQL TUNE TEST');DBMS_SQLTUNE.EXECUTE_TUNING_TASK(task_name => 'cjc_sql_tuing_task');END;/
SELECT status FROM DBA_ADVISOR_TASKS WHERE task_name = 'cjc_sql_tuing_task';SET LONG 10000SET LONGCHUNKSIZE 1000SET LINESIZE 300SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('cjc_sql_tuing_task') FROM DUAL;
建议如下:
1- SQL Profile Finding (see explain plans section below)
--------------------------------------------------------为此语句找到了性能更好的执行计划 2。选择以下 SQL 概要文件之一进行实施。Recommendation (estimated benefit: 95.14%)------------------------------------------- 考虑接受推荐的 SQL 概要文件。execute dbms_sqltune.accept_sql_profile(task_name =>'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE);Recommendation (estimated benefit: 99.95%)------------------------------------------- 考虑接受建议的 SQL 概要文件, 以便对此语句使用并行执行。execute dbms_sqltune.accept_sql_profile(task_name =>'cjc_sql_tuing_task', task_owner => 'CHENPMB', replace => TRUE,profile_type => DBMS_SQLTUNE.PX_PROFILE);与 DOP 128 并行执行此查询会使 SQL 概要文件计划上的响应时间缩短 99.13%。但是, 启用并行执行时要付出一些代价。它将增加语句的资源消耗(预计为 11.07%), 这会导致系统吞吐量降低。此外, 由于在非常短的持续时间内消耗了这些资源, 因此如果没有足够可用的硬件容量,并发语句的响应时间将受到负面影响。
2- Index Finding (see explain plans section below)
--------------------------------------------------通过创建一个或多个索引可以改进此语句的执行计划。Recommendation (estimated benefit: 99.7%)------------------------------------------ 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。create index CHENPMB.IDX$$_04000001 onCHENPMB.MB_PAY_LOG("MPL_TRANCODE","MPL_STT",LENGTH("MPL_TIME"),"MPL_TIME");- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。create index CHENPMB.IDX$$_04000002 onCHENPMB.MB_FINANCE_TRANS_INCREMENT("MFT_TRAN_STATUS",LENGTH("MFT_TRAN_TIME"),"MFT_TRAN_CODE","MFT_TRAN_TIME");- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。create index CHEN.IDX$$_04000003 on CHEN.CJC_T1_00000000000000000("MPB_BUSINESSTYPE",LENGTH("MPB_TRANTIME"),"MPB_TRANTIME","CJC_T3_000000");- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。create index CHENPMB.IDX$$_04000004 onCHENPMB.MB_LOANBYFINANCE_RECORD("MLR_TRANCODE","MLR_LOANTYPEFLAG","MLR_STATUS",LENGTH("MLR_TRANTIME"),"MLR_TRANTIME");- 考虑运行可以改进物理方案设计的访问指导或者创建推荐的索引。create index CHENPMB.IDX$$_04000006 onCHENPMB.MB_TRANFLOW_INCREMENT("TRF_BSNCODE","TRF_STT",LENGTH("TRF_SUBTIME"),"TRF_SUBTIME","TRF_FLOWNO");Rationale---------创建推荐的索引可以显著地改进此语句的执行计划。但是, 使用典型的 SQL 工作量运行 "访问指导"可能比单个语句更可取。通过这种方法可以获得全面的索引建议案, 包括计算索引维护的开销和附加的空间消耗。

最终建议:
1 将SQL中not in部分改写成not exists。2 创建组合索引。
优化效果:
SQL由原来5小时查不出结果,优化到12秒完成查询。
###2021-03-31 22:00 chenjuchao###

文章转载自IT小Chen,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




