暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

cursor_sharing+outline的问题

原创 Roger 2011-09-19
575

昨天同事遇到一个有些怪异的问题,使用了outlines,但是SQLID执行仍然出现执行计划错误的情况,
导致大量的read by other session等待,严重影响业务,最后通过revoke和grant来解决了。
下面的问题就是如何去找到当时的问题所在?换句话说,当时的那个sql语句为什么不走index scan而
选择低效的full table scan呢?

错误的执行计划如下:


Plan hash value: 1414349908

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost ({39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b}CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 100K(100)| | | |
| 1 | SORT ORDER BY | | 157K| 16M| 35M| 100K (2)| 00:20:03 | | |
| 2 | HASH UNIQUE | | 157K| 16M| 35M| 96389 (2)| 00:19:17 | | |
| 3 | HASH JOIN | | 157K| 16M| | 92578 (2)| 00:18:31 | | |
| 4 | TABLE ACCESS FULL | ACCT_ITEM_TYPE | 2817 | 73242 | | 15 (0)| 00:00:01 | | |
| 5 | HASH JOIN | | 157K| 12M| | 92561 (2)| 00:18:31 | | |
| 6 | TABLE ACCESS FULL | BILLING_CYCLE | 156 | 3900 | | 15 (0)| 00:00:01 | | |
| 7 | HASH JOIN | | 158K| 8647K| | 92544 (2)| 00:18:31 | | |
| 8 | TABLE ACCESS BY INDEX ROWID| A_PAY_OWE_STATE | 3 | 21 | | 2 (0)| 00:00:01 | | |
| 9 | INDEX RANGE SCAN | INDX_PAYMENT_METHOD_01 | 3 | | | 1 (0)| 00:00:01 | | |
| 10 | PARTITION HASH SINGLE | | 368K| 17M| | 92538 (2)| 00:18:31 | KEY | KEY |
| 11 | TABLE ACCESS FULL | ACCT_ITEM_OWE | 368K| 17M| | 92538 (2)| 00:18:31 | KEY | KEY |
-----------------------------------------------------------------------------------------------------------------------------------


下面是正确的执行计划:


Plan hash value: 2474252999

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost ({39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b}CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 198 (100)| | | |
| 1 | SORT ORDER BY | | 52 | 5304 | 198 (3)| 00:00:03 | | |
| 2 | HASH UNIQUE | | 52 | 5304 | 197 (2)| 00:00:03 | | |
| 3 | HASH JOIN | | 52 | 5304 | 196 (2)| 00:00:03 | | |
| 4 | HASH JOIN | | 52 | 4004 | 181 (2)| 00:00:03 | | |
| 5 | TABLE ACCESS BY INDEX ROWID | A_PAY_OWE_STATE | 3 | 21 | 2 (0)| 00:00:01 | | |
| 6 | INDEX RANGE SCAN | INDX_PAYMENT_METHOD_01 | 3 | | 1 (0)| 00:00:01 | | |
| 7 | MERGE JOIN | | 121 | 8470 | 178 (1)| 00:00:03 | | |
| 8 | TABLE ACCESS BY INDEX ROWID | BILLING_CYCLE | 70 | 1470 | 4 (0)| 00:00:01 | | |
| 9 | INDEX FULL SCAN | PK_BILLING_CYCLE | 157 | | 1 (0)| 00:00:01 | | |
| 10 | SORT JOIN | | 132 | 6468 | 174 (1)| 00:00:03 | | |
| 11 | TABLE ACCESS BY GLOBAL INDEX ROWID| ACCT_ITEM_OWE | 132 | 6468 | 173 (0)| 00:00:03 | ROW L | ROW L |
| 12 | INDEX RANGE SCAN | IDX_ACCT_ITEM_ACCT_ID | 132 | | 9 (0)| 00:00:01 | | |
| 13 | TABLE ACCESS FULL | ACCT_ITEM_TYPE | 2845 | 71125 | 15 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------------------------


我们可以发现,区别就在是否使用了index IDX_ACCT_ITEM_ACCT_ID,进而导致整个执行计划差别非常之大,
还可以看到正常的情况下使用了merge join。

处理问题的思路是这样的:


1. 首先查看outline是否正常使用

SQL> select owner,name,used,version,ENABLED from DBA_OUTLINES
2 where owner='ACCT';

OWNER NAME USED VERSION ENABLED
--------- ------------------------------ ------------------- --------
ACCT SYS_OUTLINE_11091615175537003 USED 10.2.0.4.0 ENABLED

-- outline正常。


2. 检查outline定义是否包含index hint IDX_ACCT_ITEM_ACCT_ID.

SQL> select ol_name, hint#, hint_text
2 from OUTLN.ol$hints
3 order by ol_name, hint#;

OL_NAME HINT# HINT_TEXT
------------------------------ ----- ------------------------------------------------------------------------------------------
SYS_OUTLINE_11091615175537003 1 USE_HASH(@"SEL$5DA710D3" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003 2 USE_HASH(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2")
SYS_OUTLINE_11091615175537003 3 USE_MERGE(@"SEL$5DA710D3" "A"@"SEL$1")
SYS_OUTLINE_11091615175537003 4 LEADING(@"SEL$5DA710D3" "H"@"SEL$1" "A"@"SEL$1" "A_PAY_OWE_STATE"@"SEL$2" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003 5 FULL(@"SEL$5DA710D3" "D"@"SEL$1")
SYS_OUTLINE_11091615175537003 6 INDEX_RS_ASC(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2" ("A_PAY_OWE_STATE"."PAYMENT_METHOD"))
SYS_OUTLINE_11091615175537003 7 INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID"))
SYS_OUTLINE_11091615175537003 8 INDEX(@"SEL$5DA710D3" "H"@"SEL$1" ("BILLING_CYCLE"."BILLING_CYCLE_ID"))
SYS_OUTLINE_11091615175537003 9 OUTLINE(@"SEL$2")
SYS_OUTLINE_11091615175537003 10 OUTLINE(@"SEL$1")
SYS_OUTLINE_11091615175537003 11 UNNEST(@"SEL$2")
SYS_OUTLINE_11091615175537003 12 OUTLINE_LEAF(@"SEL$5DA710D3")
SYS_OUTLINE_11091615175537003 13 ALL_ROWS
SYS_OUTLINE_11091615175537003 14 OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
SYS_OUTLINE_11091615175537003 15 IGNORE_OPTIM_EMBEDDED_HINTS

15 rows selected.

从INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID"))看,outline是有的。


3. 检查相关的对象在故障期间之前是否做过相应的ddl操作

SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS
2 from dba_objects
3 where OBJECT_NAME='ACCT_ITEM_OWE';

OWNER OBJECT_NAME SUBOBJECT_NAME LAST_DDL_TIM STATUS
-------- ---------------- ---------------------- ------------ -------
COMM ACCT_ITEM_OWE 02-MAR-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_0 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_1 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_2 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_3 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_4 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_5 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_6 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_7 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_8 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_9 28-FEB-09 VALID
ACCT ACCT_ITEM_OWE 19-SEP-11 VALID

12 rows selected.

SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS from
2 dba_objects where OBJECT_NAME='IDX_ACCT_ITEM_ACCT_ID';

OWNER OBJECT_NAME SUBOBJECT_NAME LAST_DDL_TIM STATUS
----------- ----------------------- ----------------- ------------ -------
ACCT IDX_ACCT_ITEM_ACCT_ID 28-FEB-09 VALID

检查发现, 9月18号并无DDL操作。


4. 查询该SQLID是否存在多个child number或存在多个执行计划
昨天下午查询v$sql已经找不到那个SQLID了,通过如下查询发现:

SQL> select sql_id,CHILD_NUMBER,SQL_TYPE_MISMATCH,OPTIMIZER_MISMATCH,OUTLINE_MISMATCH,BIND_MISMATCH,STATS_ROW_MISMATCH
2 from V$SQL_SHARED_CURSOR
3 where SQL_ID='&id';

Enter value for id: 4u9y643v2k8fw
old 2: from V$SQL_SHARED_CURSOR where SQL_ID='&id'
new 2: from V$SQL_SHARED_CURSOR where SQL_ID='4u9y643v2k8fw'

SQL_ID CHILD_NUMBER S O O B S
------------- ------------ - - - - -
4u9y643v2k8fw 0 N N N N N
4u9y643v2k8fw 1 N N Y N N
4u9y643v2k8fw 2 N N N N N
4u9y643v2k8fw 3 N N Y N N
4u9y643v2k8fw 4 N N Y N N
4u9y643v2k8fw 5 N N Y N N
4u9y643v2k8fw 6 N N N N N
4u9y643v2k8fw 7 N N N N N
4u9y643v2k8fw 8 N N N N N
4u9y643v2k8fw 9 N N N N N
4u9y643v2k8fw 10 N N N N N
4u9y643v2k8fw 11 N N N N N
4u9y643v2k8fw 12 N N N N N
4u9y643v2k8fw 13 N N N N N
4u9y643v2k8fw 14 N N N N N
4u9y643v2k8fw 15 N N N N N
4u9y643v2k8fw 16 N N N N N
4u9y643v2k8fw 17 N N N N N
4u9y643v2k8fw 18 N N N N N
4u9y643v2k8fw 19 N N N N N
4u9y643v2k8fw 20 N N N N N

21 rows selected.

备注:这里的第2个O是指OUTLINE_MISMATCH。

可以发现该sqlid出现了多个child cursor,另外检查发现参数cursor_sharing设置为simlair了。
该sqlid sql version肯定也较高。

这里有一点让我不解的是,通过DBA_HIST_*相关视图查询,发现当时出问题的时候的sqlid的child number
跟V$SQL_SHARED_CURSOR并不相同,当然这也不能完全说明问题,很可能sql已经被flush out多次了。

通过查hist视图,可以发现当时故障期间出现大量的read by other session等待:

EVENT COUNT(*) SESSION_ID SESSION SQL_ID SQL_CHILD_NUMBER SQL_PLAN_HASH_VALUE FORCE_MATCHING_SIGNATURE TIME_WAITED
----------------------- --------- ---------- ------- ------------- ---------------- ------------------- ------------------------ -----------
db file scattered read 1 1382 WAITING 4u9y643v2k8fw 18 1414349908 0 4502
read by other session 1 1398 WAITING 4u9y643v2k8fw 18 1414349908 0 8099
read by other session 1 1404 WAITING 4u9y643v2k8fw 18 1414349908 0 89
db file scattered read 1 1424 WAITING 4u9y643v2k8fw 18 1414349908 0 8483
read by other session 1 1633 WAITING 4u9y643v2k8fw 18 1414349908 0 10470
read by other session 1 2081 WAITING 4u9y643v2k8fw 18 1414349908 4.3893E+18 2047
read by other session 1 2239 WAITING 4u9y643v2k8fw 18 1414349908 0 5403
read by other session 1 685 WAITING 4u9y643v2k8fw 18 1414349908 0 11577
read by other session 1 1094 WAITING 4u9y643v2k8fw 18 1414349908 0 6340
read by other session 1 1168 WAITING 4u9y643v2k8fw 18 1414349908 4.3893E+18 1427
read by other session 1 1304 WAITING 4u9y643v2k8fw 18 1414349908 0 12656


5. 查看sqlid是否使用了outline

SQL> select SQL_ID,CHILD_NUMBER
2 from v$sql_plan
3 where sql_id='4u9y643v2k8fw';

SQL_ID CHILD_NUMBER
------------- ------------
4u9y643v2k8fw 12
4u9y643v2k8fw 12
.....
4u9y643v2k8fw 12
4u9y643v2k8fw 2
4u9y643v2k8fw 2
4u9y643v2k8fw 2
......
4u9y643v2k8fw 2
4u9y643v2k8fw 2

28 rows selected.

下午查询时只有number为2 12

SQL> SELECT info outline_name
2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
3 FROM v$sql_plan p,
4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
5 '/other_xml/info'
6 )
7 )
8 ) d
9 WHERE other_xml IS NOT NULL
10 AND ID = 1
11 AND sql_id = '4u9y643v2k8fw'
12 AND child_number = 3)
13 WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
3 FROM v$sql_plan p,
4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
5 '/other_xml/info'
6 )
7 )
8 ) d
9 WHERE other_xml IS NOT NULL
10 AND ID = 1
11 AND sql_id = '4u9y643v2k8fw'
12 AND child_number = 4)
13 WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
3 FROM v$sql_plan p,
4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
5 '/other_xml/info'
6 )
7 )
8 ) d
9 WHERE other_xml IS NOT NULL
10 AND ID = 1
11 AND sql_id = '4u9y643v2k8fw'
12 AND child_number = 5)
13 WHERE r = 4;

no rows selected

SQL> SELECT info outline_name
2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
3 FROM v$sql_plan p,
4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
5 '/other_xml/info'
6 )
7 )
8 ) d
9 WHERE other_xml IS NOT NULL
10 AND ID = 1
11 AND sql_id = '4u9y643v2k8fw'
12 AND child_number = 2)
13 WHERE r = 4;

OUTLINE_NAME
--------------------------------------------------------------------------------
"SYS_OUTLINE_11091615175537003"

SQL> SELECT info outline_name
2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info
3 FROM v$sql_plan p,
4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml),
5 '/other_xml/info'
6 )
7 )
8 ) d
9 WHERE other_xml IS NOT NULL
10 AND ID = 1
11 AND sql_id = '4u9y643v2k8fw'
12 AND child_number = 12)
13 WHERE r = 4;

OUTLINE_NAME
--------------------------------------------------------------------------------
"SYS_OUTLINE_11091615175537003"

可以看到,该sql的当前的2个child cursor都是使用了outline的。
但是这说明不了问题,既然该sql存在多个child cursor,那么必然存在多个plan hash value值。
这里我们需要去查询那个出问题的sqlid对应的plan_hash_value所对应的执行计划。
虽然说是有awr快照,但是通过这个方式已经查不到信息了。

select * from table(dbms_xplan.display_awr('&sql_id','&child_number',null,'advanced'));

我这里使用的如下的查询方式:

SQL> set long 1000000
SQL> set linesize 200

SQL> SELECT plan_hash_value, XMLTYPE (other_xml).EXTRACT ('/*')
2 FROM dba_hist_sql_plan
3 WHERE sql_id = '4u9y643v2k8fw'
4 AND plan_hash_value IN (1414349908)
5 AND DBMS_LOB.SUBSTR (other_xml, 1000) LIKE '{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} 6 /

PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------
1414349908

10.2.0.4

1414349908

c10c
c103
313042

PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------
313044
c6481504081c43








PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------










PLAN_HASH_VALUE
---------------
XMLTYPE(OTHER_XML).EXTRACT('/*')
--------------------------------------------------------------------------------------------------------------------------------








从上面的查询来看,当时是使用了outline,但是问题出在这里:








也就是说正常情况下,这3条当中至少有1条是ACCT_ID的index rang scan才对。

今天上午,和同事沟通,说又出问题了,同事将参数cursor_sharing调为exact了,即默认值。
虽然这样会导致很多sql语句的硬解析,但是听他说上午调整以后到11点时,还没出现问题。
跟itpub版主棉花糖ONE交流了一下,他认为很可能是outline和cursor_sharing的bug。

但是我仍然觉得cursor_sharing可能不是真正的原因,因为ACCT_ID列是应用绑定变量列,
并不是通过参数cursor_sharing的设置来实现绑定变量的。

最后总结一下:

9i以后都不推荐再用outline了,感觉不太稳定,推荐使用sql profile来固定执行计划。
至于最后的结论到底是outline的bug还是cursor_sharing=similar的bug,现在还确定不了,只是可能性非常大。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论