JL Computer Consultancy
Bug hunting with 10053
|
Jan 2007 |
If Oracle seems to be ignoring a hint,
it’s either because you’ve made a mistake in the syntax, or in the
use of the hint, or because you’ve hit an Oracle bug. And sometimes the only
way to work out what’s going on is to look at the 10053 trace
file.
In a recent posting on the comp.databases.oracle.server
newsgroup, someone came up with an example of Oracle 10gR2 behaving a little
strangely with the index_ss() hint, which should make Oracle do a
skip scan on a specified index – if it’s legal. Their
code – slightly stripped for brevity and anonymity – looked like
this (prsn_id is the second column of the primary key).
Select
{list of columns}
from (
select
--+ index_ss(tf tf_pk)
{list
of columns}
from
tf,
tt,
ot
where
tf.trtp_id
= tt.trtp_id
and ot.transaction_type_dscr =
tt.transaction_type_dscr
)
where
tf.prsn_id = 5485186
;
The execution plans, taken from
the 10053 trace file, were as follows:
====================
Plan Table (Hinted)
====================
-----------------------------------------------------+-----------------------------------+---------------+
| Id
| Operation
| Name | Rows | Bytes | Cost | Time
| Pstart| Pstop |
---------------------------------------------------------------------+-------------------+---------------+
| 0
| SELECT STATEMENT
| | |
| 2322K | | |
|
| 1
| TABLE ACCESS BY GLOBAL INDEX
ROWID | TF | 1 |
69 | 46K | 00:11:53 | ROW L | ROW L |
| 2
| NESTED LOOPS | |
57 | 8037 | 2322K | 09:15:44 | |
|
| 3
| MERGE JOIN | |
51 | 3672 | 7 |
00:00:01 | | |
| 4
| TABLE ACCESS BY INDEX
ROWID | OT |
51 | 2346 | 2 |
00:00:01 | | |
| 5
| INDEX FULL SCAN | OT_O1 | 51
| | 1 |
00:00:01 | | |
| 6
| SORT JOIN | |
51 | 1326 | 5 |
00:00:01 | | |
| 7
| TABLE ACCESS FULL | TT |
51 | 1326 | 4 |
00:00:01 | | |
| 8
| INDEX FULL SCAN | TF_PK | 6
| | 46K |
00:11:53 | | |
-----------------------------------------------------+-----------------------------------+---------------+
======================
Plan Table (Unhinted)
======================
-----------------------------------------------+-----------------------------------+---------------+
| Id
| Operation |
Name | Rows | Bytes | Cost | Time
| Pstart| Pstop |
-----------------------------------------------+-----------------------------------+---------------+
| 0
| SELECT STATEMENT
| | |
| 28K | | |
|
| 1
| HASH JOIN | |
57 | 8037 | 28K |
00:07:48 | | |
| 2
| MERGE JOIN | |
51 | 3672 | 7 |
00:00:01 | | |
| 3
| TABLE ACCESS BY INDEX ROWID |
OT |
51 | 2346 | 2 |
00:00:01 | | |
| 4
| INDEX FULL SCAN | OT_01 | 51 |
| 1 | 00:00:01 | |
|
| 5
| SORT JOIN | |
51 | 1326 | 5 |
00:00:01 | | |
| 6
| TABLE ACCESS FULL | TT |
51 | 1326 | 4 |
00:00:01 | | |
| 7
|
| 8
| TABLE ACCESS FULL | TF |
57 | 3933 | 28K |
00:07:48 | 1 | 37 |
-----------------------------------------------+-----------------------------------+---------------+
Clearly, the presence of the hint
has affected the execution plan; but you see in line eight of the hinted plan
that the specified index has been used for a full scan, not for a
skip scan. So has Oracle not ignored the hint, but it
hasn’t done the right thing. To me, this clearly says: “bug”
– but how do you confirm that? You’ve got to look at the 10053
trace file.
To demonstrate my point,
I’ve picked out the “single table access”
sections to show that Oracle is obeying the hint; then I’ve
picked up the parts of a “Now Joining” section to
show that Oracle is (still) obeying the hint; then I’ve picked out an
example of a “Now Joining” section that shows Oracle
going wrong. Finally I’ve extracted all the join “Join
Order” lines to show you when Oracle did the right thing and when
it did the wrong thing.
Single table access paths:
From the unhinted
trace file – note that the presence of the calculations for tablescan,
index skip scan, and index full scan, with the skip
scan rejected and the tablescan accepted.
SINGLE TABLE ACCESS PATH
Column (#2): PRSN_ID(NUMBER)
AvgLen: 5.00 NDV: 229384 Nulls: 0 Density: 4.3595e-06 Min: -1 Max:
5766028
Table: SDWH_TCKT_TRNSCTN_FCT
Alias: TF
Card: Original: 12965145 Rounded:
57 Computed: 56.52 Non Adjusted: 56.52
Access Path: TableScan
Cost: 29085.71 Resp: 29085.71 Degree: 0
Cost_io: 28450.00 Cost_cpu:
5147272783
Resp_io: 28450.00 Resp_cpu:
5147272783
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (skip-scan)
SS sel: 4.3595e-06 ANDV (#skips):
4388009
SS io: 4388009.00 vs. table scan
io: 28450.00
Skip Scan rejected
Access Path: index (FullScan)
Index: SDWH_TCTR_FCT_PK
resc_io: 46226.00 resc_cpu:
2935717121
ix_sel: 1 ix_sel_with_filters:
4.3595e-06
Cost: 46668.64 Resp: 46668.64 Degree: 1
Best:: AccessPath: TableScan
Cost: 29085.71 Degree: 1
Resp: 29085.71 Card: 56.52 Bytes: 0
Now compare this with the section
from the hinted trace file. Note that the only access path
considered is the skip scan, the hint is obeyed. It
strikes me as a little odd that the calculations are done twice, and
don’t quite match – this may (I guess) be part of the bugginess
that we are about to expose.
SINGLE TABLE ACCESS PATH
Column (#2): PRSN_ID(NUMBER)
AvgLen: 5.00 NDV: 229384 Nulls: 0 Density: 4.3595e-06 Min: -1 Max:
5766028
Table: TF Alias: TF
Card: Original: 12965145 Rounded:
57 Computed: 56.52 Non Adjusted: 56.52
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (skip-scan)
SS sel: 4.3595e-06 ANDV (#skips):
4388009
SS io: 4388009.00 vs. table scan io: 28450.00
Skip Scan chosen
Access Path: index (SkipScan)
Index: SDWH_TCTR_FCT_PK
resc_io: 4388065.00 resc_cpu:
42752426477
ix_sel: 4.3595e-06
ix_sel_with_filters: 4.3595e-06
Cost: 4393345.11 Resp:
4393345.11 Degree: 1
Best:: AccessPath: IndexRange Index: SDWH_TCTR_FCT_PK
Cost: 4393345.11 Degree: 1
Resp: 4393345.11 Card: 56.52 Bytes: 0
So now we look at a “Now
Joining” section when Oracle did the right thing – this is from
the join order {OT, TF, TT}, which happens to be the fourth join
order examined. We start with the unhinted example and, as you can see, Oracle
evaluates the costs of using a tablescan, or index full
scan, to access TF as the inner table. (Strangely, no sign of a skip
scan calculation - could this possibly
mean that skip scans are actually not allowed for joins ?).
***********************
Join order[4]: OT[OT]#1
TF[TF]#2 TT[TT]#0
***************
Now joining: TF [TF]#2
***************
NL Join
Outer table: Card: 51.00 Cost:
4.01 Resp: 4.01 Degree: 1
Bytes: 46
Inner table: TF Alias: TF
Access Path: TableScan
NL Join: Cost: 1483288.21 Resp: 1483288.21 Degree: 0
Cost_io: 1450867.00 Cost_cpu:
262510981535
Resp_io: 1450867.00 Resp_cpu:
262510981535
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (FullScan)
Index: TF_PK
resc_io: 46226.00 resc_cpu:
2935717121
ix_sel: 1 ix_sel_with_filters:
4.3595e-06
NL Join: Cost: 2380104.44 Resp:
2380104.44 Degree: 1
Cost_io: 2357530.00 Cost_cpu:
182782762539
Resp_io: 2357530.00 Resp_cpu:
182782762539
Best NL cost: 1483288.21
resc: 1483288.21 resc_io: 1450867.00
resc_cpu: 262510981535
resp: 1483288.21 resp_io: 1450867.00
resp_cpu: 262510981535
Join Card: 2882.60 = outer (51.00) * inner (56.52) * sel
(1)
Join Card - Rounded: 2883 Computed:
2882.60
Join order aborted: cost > best
plan cost
When we switch to the hinted
trace, this is what we see in the same part of the trace file. Oracle evaluates
only the option for a skip scan (twice, again). So it has obeyed the hint
– it did a calculation that hasn’t appeared by default, and it
didn’t do the calculations that it would have done by default.
This is an important point about
hints, by the way: some of them mean “do
calculation X”, but that isn’t the same as an
order to “perform operation X”. For those hints, the
optimizer has to do the calculation that it wouldn’t otherwise do, but it
can still say “that’s not the cheapest option”.
***********************
Join order[4]: OT[OT]#1
TF[TF]#2 TT[TT]#0
***************
Now joining: TF[TF]#2
***************
NL Join
Outer table: Card: 51.00 Cost:
4.01 Resp: 4.01 Degree: 1
Bytes: 46
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (skip-scan)
SS sel: 4.3595e-06 ANDV (#skips):
4388009
SS io: 4388009.00 vs. table scan io: 28450.00
Skip Scan chosen
Access Path: index (SkipScan)
Index: SDWH_TCTR_FCT_PK
resc_io: 4388065.00 resc_cpu:
42752426477
ix_sel: 4.3595e-06
ix_sel_with_filters: 4.3595e-06
NL Join: Cost: 224060604.37 Resp:
224060604.37 Degree: 1
Cost_io: 223791319.00 Cost_cpu:
2180373819962
Resp_io: 223791319.00 Resp_cpu:
2180373819962
Best NL cost: 224060604.37
resc: 224060604.37 resc_io:
223791319.00 resc_cpu: 2180373819962
resp: 224060604.37 resp_io:
223791319.00 resp_cpu: 2180373819962
Join Card: 2882.60 = outer (51.00) * inner (56.52) * sel
(1)
Join Card - Rounded: 2883 Computed:
2882.60
Join order aborted: cost > best
plan cost
Now, let’s look at a piece
of the trace file where something goes wrong. This is the first join order:
{TT, OT, TF}. First the unhinted version, and you’ll notice that the
optimizer tries four options, the two we saw in the earlier example plus a full
scan on a different index (which is possible in this case because
columns from both TT and OT are available at this
point in this join order) and an option for using bitmap conversion
(again because of the availability of columns from both TT and OT):
***************
Now joining:
SDWH_TCKT_TRNSCTN_FCT[TF]#2
***************
NL Join
Outer table: Card: 51.00 Cost:
8.52 Resp: 8.52 Degree: 1
Bytes: 72
Inner table: TF Alias: TF
Access Path: TableScan
NL Join: Cost: 1483292.72 Resp: 1483292.72 Degree: 0
Cost_io: 1450871.00 Cost_cpu:
262515101134
Resp_io: 1450871.00 Resp_cpu:
262515101134
kkofmx: index
filter:"TF"."TRTP_ID"="TT"."TRTP_ID"
AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
kkofmx: index
filter:"TF"."TRTP_ID"="TT"."TRTP_ID"
AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (FullScan)
Index: TF_I2
resc_io: 134431.00 resc_cpu:
4595772062
ix_sel: 1 ix_sel_with_filters:
0.11111
NL Join: Cost: 2100968.38 Resp:
2100968.38 Degree: 1
Cost_io: 2081471.00 Cost_cpu:
157868142107
Resp_io: 2081471.00 Resp_cpu:
157868142107
Access Path: index (FullScan)
Index: TF_PK
resc_io: 46178.00 resc_cpu:
2935227863
ix_sel: 1 ix_sel_with_filters:
4.8439e-07
NL Join: Cost: 2377405.55 Resp:
2377405.55 Degree: 1
Cost_io: 2354834.00 Cost_cpu:
182759342462
Resp_io: 2354834.00 Resp_cpu:
182759342462
****** trying bitmap/domain indexes ******
Best NL cost: 1483292.72
resc: 1483292.72 resc_io: 1450871.00
resc_cpu: 262515101134
resp: 1483292.72 resp_io: 1450871.00
resp_cpu: 262515101134
Join Card: 56.52 = outer (51.00) * inner (56.52) * sel
(0.019608)
Join Card - Rounded: 57 Computed:
56.52
Compare this with what happens in
the hinted version. Oracle evaluates just one option – ignoring the bitmap
options, the tablescan, and the index full scan on
the “unspecified” index. All it tries to use is a mechanism
involving the index specified in the hint. But it’s the wrong method. I
think that this means there is a bug – somewhere a pointer has slipped or
a branch has been taken incorrectly, and we’ve followed the wrong code
path whilst “trying” to follow the right one.
***************
Now joining: TF [TF]#2
***************
NL Join
Outer table: Card: 51.00 Cost:
8.52 Resp: 8.52 Degree: 1
Bytes: 72
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
kkofmx: index filter:"TF"."TRTP_ID"="TT"."TRTP_ID"
AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
Access Path: index (FullScan)
Index: TF_PK
resc_io: 46178.00 resc_cpu:
2935227863
ix_sel: 1 ix_sel_with_filters:
4.8439e-07
NL Join: Cost: 2377405.55 Resp:
2377405.55 Degree: 1
Cost_io: 2354834.00 Cost_cpu:
182759342462
Resp_io: 2354834.00 Resp_cpu:
182759342462
Best NL cost: 2377405.55
resc: 2377405.55 resc_io: 2354834.00
resc_cpu: 182759342462
resp: 2377405.55 resp_io: 2354834.00
resp_cpu: 182759342462
Join Card: 56.52 = outer (51.00) * inner (56.52) * sel
(0.019608)
Join Card - Rounded: 57 Computed:
56.52
Having got this far, we have to
ask why we saw this happening on one join order, but the correct actions on the
other join orders. All over the place in the trace file, the optimizer followed
the hint correctly – it’s just one special set of circumstance
where it goes wrong. Here’s a summary of the join orders, showing what
happened on each nested loop join order:
Join order[1]: [TT]#0
[OT]#1 [TF]#2 Full Scan
Join order[2]: [TT]#0
[TF]#2 [OT]#1 Full Scan
Join order[3]: [OT]#1
[TT]#0 [TF]#2 Full Scan
Join order[4]: [OT]#1
[TF]#2 [TT]#0 Skip Scan
Join order[5]: [TF]#2
[TT]#0 [OT]#1 N/A (driving table)
but Skip Scan
Join order[6]: [TF]#2
[OT]#1 [TT]#0 N/A (driving table)
but Skip Scan
There is a pattern here, though
not immediately obvious. If TT appears in the join order before TF,
you get the full scan in the nested loop
calculation (I didn’t show you any of the merge or hash
join calculations, but they always used the skip scan).
Look again at the trace file contents for the nested loop calculations and note
the kkofmx lines:
kkofmx: index
filter:"TF"."PRSN_ID"=5485186 AND
"TF"."TRTP_ID"="TT"."TRTP_ID" AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
kkofmx: index
filter:"TF"."TRTP_ID"="TT"."TRTP_ID"
AND
"OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"
When we have a join order where TT appears before TF, the optimizer can “do something” with the predicate "TF"."TRTP_ID"="TT"."TRTP_ID", and for some reason, this results in it calculating the full scan strategy rather than the skip scan strategy. Now, it just happens that trtp_id is another column (the eighth out of nine) in the index referenced in the hint – and maybe this is what is confusing the optimizer at this point.
I advised the original poster to raise an SR with Oracle about this, and also suggested that (just out of curiosity) he might like to change the suspect predicate to tf.trtp_id+0 =tt.trtp_id so that the column could not appear as an access predicate for the index, only as a filter predicate. With this change in place the optimizer re-discovered the skip scan – with an important limitation. Oracle used the skip scan to visit table TF once, and then used a hash join to join the rows to the previous intermediate result set, with the following execution plan:
===========================================
Plan Table (Hinted, and using trtp_id + 0)
===========================================
-----------------------------------------------------+-----------------------------------+---------------+
| Id
| Operation
| Name | Rows | Bytes | Cost | Time
| Pstart| Pstop |
-----------------------------------------------------+-----------------------------------+---------------+
| 0
| SELECT STATEMENT
| | |
| 4286K | | |
|
| 1
| HASH JOIN | |
57 | 7980 | 4286K | 17:04:05 | |
|
| 2
| MERGE JOIN | |
51 | 3621 | 7 |
00:00:01 | | |
| 3
| TABLE ACCESS BY INDEX
ROWID | OT |
51 | 2295 | 2 |
00:00:01 | | |
| 4
| INDEX FULL SCAN | OT_01 | 51 |
| 1 | 00:00:01 | |
|
| 5
| SORT JOIN | |
51 | 1326 | 5 |
00:00:01 | | |
| 6
| TABLE ACCESS FULL | TT |
51 | 1326 | 4 |
00:00:01 | | |
| 7
| TABLE ACCESS BY GLOBAL INDEX
ROWID | TF | 57 |
3933 | 4286K | 17:04:05 | ROW L |
ROW L |
| 8
| INDEX SKIP SCAN | TF_PK | 57 |
| 4286K | 17:04:04 | |
|
-----------------------------------------------------+-----------------------------------+---------------+
Conclusion:
I can’t claim to have proved
that there is a bug here, but I think the evidence is fairly convincing that
Oracle is trying to obey the hint but something has gone wrong at a very
specific point in the optimizer code path. Of course, it is possible that a skip
scan (for reasons we don’t know) is not considered to be a legal
access path for the second table in a nested loop join – in
which case we have a documentation bug.
I still believe that Oracle
doesn’t ignore hints.




