问题描述
今天开发的同事找我说是一个报表系统的SQL执行第一次很快,再执行就慢的几乎查询无响应,找他要来了SQL,也是常见的表关连LIFT JOIN 、GROUP BY的统计报表, 出于篇幅原因不再贴SQL, 把SQL放到TOAD 中问题依然可以再现, 在同一窗口连续查询两次,第一次瞬间返回,第二次等待了有4分多钟 ,对会话设置了10046 SQL TRACE, 下面有Trace 的结果
sys@anbob>select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production sys@ORCL>show parameter cursor_shar NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ cursor_sharing string EXACT SQL> select 2 plan_hash_value, 3 BUFFER_GETS,ROWS_PROCESSED,CHILD_NUMBER,ELAPSED_TIME,cpu_TIME, 4 is_bind_sensitive, 5 is_bind_aware,LAST_ACTIVE_TIME 6 from v$sql WHERE SQL_ID='6vd7zxpj6qd8h'; PLAN_HASH_VALUE BUFFER_GETS ROWS_PROCESSED CHILD_NUMBER ELAPSED_TIME CPU_TIME I I LAST_ACTIVE_TIME --------------- ----------- -------------- ------------ ------------ ---------- - - ------------------- 841592968 233739 6 0 672757 670899 N N 2013-12-05 15:28:54 3830877904 4336911 6 1 172400111 172367796 N N 2013-12-05 15:31:47
专家解答
Note:
From above output you can see the SQL parsing the two cursors and generated different execution plan, and ELAPSED_TIME, CPU_TIME also reflects the time required to perform before and after the gap between the two
sys@anbob>oradebug short_stack; ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-kxhrHash()+386<-qerhjSplitProbe()+184<-qerhjInnerProbeHashTable()+74<-qerixFetchFastFullScan()+2476<-rwsfcd()+103<-qerhjFetch()+593<-qerflFetch()+135<-rwsfcd()+103<-qeruaFetch()+574<-qervwFetch()+139<-qerjotRowProc()+353<-qertbFetchByRowID()+3236<-qerjotRowProc()+353<-qertbFetchByRowID()+3236<-qerjotRowProc()+353<-qerixtFetch()+800<-qerjotFetch()+1155<-qerjotFetch()+1961<-qerjotFetch()+1961<-qerjotFetch()+1961<-qerflFetch()+135<-qergsFetch()+775<-qervwFetch()+139<-qergsFetch()+775<-qervwFetch()+139<-qerjoFetch()+390<-rwsfcd()+103<-qerhjFetch()+3187<-qercoFetch()+205<-opifch2()+2995<-kpoal8()+2934<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
Note:
When I dump the call stack of slowly oracle process , the display is also doing HASH related operations
trace file of formated by tkprof utility
--note execution plan had truncated PLAN_HASH_VALUE:841592968 Rows Operation --------- --------------------- 38,045 NESTED LOOPS (cr=134876 pr=0 pw=0 time=286598 us cost=6 size=45 card=1) 38,045 TABLE ACCESS BY INDEX ROWID ICME_PROJECT_SCORE (cr=53372 pr=0 pw=0 time=160677 us cost=5 size=31 card=1) 38,045 INDEX RANGE SCAN IDX_ICME_PROJECT_SCORE_M (cr=23985 pr=0 pw=0 time=66046 us cost=4 size=0 card=1) (object id 17468) 38,045 INDEX UNIQUE SCAN PK_ICME_SUBJECT (cr=81504 pr=0 pw=0 time=93357 us cost=1 size=14 card=1) (object id 17554) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.14 0.15 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.74 0.74 0 233737 0 6 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.89 0.89 0 233737 0 6 Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 SQL*Net message from client 2 0.08 0.09 ******************************************************************************** PLAN_HASH_VALUE:3830877904 Rows Operation --------- --------------------- 38,045 HASH JOIN (cr=4220506 pr=0 pw=0 time=346507995 us cost=216 size=1727775 card=38395) 38,045 TABLE ACCESS BY INDEX ROWID ICME_PROJECT_SCORE (cr=53338 pr=0 pw=0 time=262176 us cost=5 size=1179395 card=38045) 38,531 INDEX RANGE SCAN IDX_ICME_PROJECT_SCORE_IDS (cr=23496 pr=0 pw=0 time=102193 us cost=4 size=0 card=1) (object id 17470) 1,771,236,310 INDEX FAST FULL SCAN PK_ICME_SUBJECT (cr=4167168 pr=0 pw=0 time=231040457 us cost=209 size=4569180 card=326370) (object id 17554) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.15 0.15 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 371.90 371.97 0 4336909 0 6 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 372.06 372.13 0 4336909 0 6 Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 SQL*Net message from client 2 0.05 0.05
Note:
从上面的trace显示sql两次的执行计划发生了改变有NL join 转为 Hash join,解决可以使用 Sql hint 的use_nl 强置指定执行计划, 可是为什么同一sql 在几乎样的环境(资源空闲)里执行
会产生两个执行计划, 猜测应该与11G 优化器的游标生成有关, 尝试禁用自适应游标共享(Adaptive Cursor Sharing), reference to disable 11gr2 problem feature list by Maclean 's and to disable the feature recommended.
And in this book there is a suggestion , the following excerpt from <>
Adaptive Cursor Sharing is enabled by default. In OnLine Transaction Processing (OLTP) databases, there is often only one optimal
execution plan, regardless of the actual value of binding variables. In this kind of database, we probably want to disable the
Adaptive Cursor Sharing feature to eliminate the related overhead. To disable it, you need to set the CURSOR_SHARING
initialization parameter to EXACT and the hidden _OPTIMIZER_EXTENDED_CURSOR_SHARING_REL initialization parameter to NONE, as follows:
ALTER SYSTEM SET CURSOR_SHARING = EXACT SCOPE = BOTH;
ALTER SYSTEM SET _OPTIMIZER_EXTENDED_CURSOR_SHARING_REL = NONE SCOPE = BOTH;
Then I try to modify a hidden parameter of ECS,
alter system set "_OPTIMIZER_EXTENDED_CURSOR_SHARING_REL" = NONE
Solve the above problem of slow second execute slowly , and next execute used the "NL join" execution plan , usually , We to use another disabled ACS only(
"_optimizer_adaptive_cursor_sharing" =FALSE ), Burleson Consulting have introduced about The relationship between ECS and ACS .
The extended cursor_sharing (ECS) working with adaptive cursor_sharing to allow real-time bind peeking to see if the bind variable value influences the execution plan for the SQL statement.
According to the Oracle optimizer group. "Bind Aware Cursor Sharing" (BASC) is a complex technology that consists of Adaptive Cursor Sharing (ACS) and Extended Cursor Sharing (ECS) technologies:
Adaptive cursor_sharing monitors queries where extended cursor_sharing is "applicable" (e.g. binds were peeked, predicate shape).
Adaptive cursor_sharing identify queries with variable data volume.
Adaptive cursor_sharing requires you to enable extended. cursor_sharing (bind_aware hint).
According to Oracle, if any of the following checks fail, extended cursor_sharing will be disabled:
other hints are used in the query
Stored outlines (plan stability) is in-use
There are more than 14 binds in the SQL statement
Parameters such as bind_peeking=false is set
The release of Oracle does not support extended cursor sharing