嵌套循环连接时，对索引块访问的处理

原创赵勇 2020-07-11

973

Filed under: Infrastructure,Nested Loop,Oracle — Jonathan Lewis @ 7:04 pm GMT Jan 21,2011
测验九（1月21日发布）：
这是一个从10.2.0.3下，启用“gather_plan_statistics”后，从v$sql_plan取出的执行计划：

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   3 |    INDEX RANGE SCAN          | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CHILD  |    200 |      0 |00:00:00.01 |     602 |
|   5 |    INDEX UNIQUE SCAN         | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |
----------------------------------------------------------------------------------------

两张表的定义代码是相同的：

rem
rem     Script:         nl_gets.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2011
rem
rem     Last tested 
rem             11.1.0.6
rem
 
create table &m_table_name
as
select
        rownum                                  id,
        trunc(dbms_random.value(1,&m_rowcount)) n1,
        lpad(rownum,10,'0')                     small_vc,
        rpad('x',100)                           padding
from
        big_row_source
where
        rownum <= &m_rowcount --> comment to avoid wordpress format issue
;

索引PAR_I1和CHI_I1都是在ID列上的索引，并且，如你所见，CHI_I1是一个唯一索引。和表不同，索引的定义代码不一定是相同的。

请陈述理由，你认为CHI_I1索引的BLEVEL是多少？
（注：为了避免歧义，请记住，索引高度=BLEVEL+1；为了防止有人认为这有什么不同，表和索引都是在同一个表空间中创建的，这是我通常设置的1MB统一区、本地管理、8K块大小和自由列表管理的表空间）。

顺便说一句，这个测验并没有什么复杂和精密的设计。这一切都是关于观察和理解一个相当普通的情况。另一方面，我肯定会有一些有趣的变化和奇怪的副作用。如果你真的想变聪明的话，

1月23日的答案：
我没有把我的答案写进一个新的帖子里，但是为了给新来的访问者一个在看到我的答案之前，找出答案的机会，我在答案之前留下了一个很大的空白。

读了这些评论，我想所有相关的地方都被涵盖了。我们有过一些评论，包括嵌套循环连接中根块的特殊处理、“buffer is pinned count”统计、11g中可能发生的更改、我为子索引设置了一个大pctfree的猜测，以及计划中一行中的“a-rows”与另一行中的“start”之间的联系等。

做为参考，我的查询看起来是这样的：

select
        par.small_vc,
        chi.small_vc
from
        parent  par,
        child   chi
where
        par.id between 301 and 500
and     chi.id = par.n1
and     chi.n1 = 0
;

执行计划的处理机制是：
ID为3的行：对PAR_I1的索引范围扫描（最终找到200个符合要求的ROWID）
ID为2的行：对来自于ID为3这一行的每一个ROWID，检查其匹配的行。（最终返回200行）。
然后两百个调用到：
ID为5的行：CHI_I1的索引唯一扫描，总是找到单行，因而总共返回200个ROWID。
ID为4的行：检查表中匹配的行，但总是不满足，因此总共返回零行

允许您回答我的问题的关键计数，是第5行和第4行中的402和602个逻辑读。暂时忽略奇怪的数2，这些数字显然是什么的200倍，这是相当好的，因为我们知道了我们在做一些事情的200倍。当然，最简单的解决方案是，第4行得到的（602–402=）200代表对200个表块的访问，剩下的400个（加上一个奇怪的数2）要考虑。
考虑其可能性：
Blevel = 0 – 我们仅访问一个索引块（应该是根块，但看起来像叶子块），然后访问表。
Blevel = 1 – 我们访问根，下一层的叶子块，然后是表。
Blevel = 2 – 我们访问根，分支层，叶子块，然后是表。

因此，乍一看，我们可能会认为，从索引中获取400个缓冲区访问的唯一方法是让Blevel=1，访问根和叶。但这不是它的工作方式（除了部分版本中的bug）。

运行嵌套循环联接时，Oracle会“PIN”内部（第二个表）索引的根块，这意味着它将一个结构附加到缓冲区头中的“用户列表”中，该结构将会话的状态对象直接链接到缓冲区头。因此，在我的嵌套循环连接中，Oracle获取一次根块并保持其PIN，然后获得分支块和叶块200次。子索引的blevel是2。（父索引的blevel只有1–我将子索引上的pctfree设置为了90，以使其比需要的空间大得多。）

不过，这个“根块pin”并不是查询中唯一的被优化而PIN住的。具体来说，步骤如下-如果启用了event 10200（“一致读取”跟踪方法之一），您可以在跟踪文件中看到这些内容。

1、获取父索引的根块并PIN住
2、获取父索引中第一个相关叶子块并PIN住
3、获取父表中的第一个相关表块并PIN住
4、获取子索引的根块并PIN住
5、获取子索引的相关分支块（不PIN）
6、获取子索引的相关叶子块（不PIN）
7、获取子表的表块（不PIN）–行最终不匹配谓词被弃用
8、通过pin重新访问父索引叶块 (“buffer is pinned count”加1)
9、通过pin重新访问父表块 ( “buffer is pinned count”加1)
10、通过pin重新访问子索引根块 (“buffer is pinned count”加1)
11、得到子索引的一个新分支
12、得到子索引的一个新叶子
13、得到表的一个新块
14、从8开始重复执行，总共循环200次

当然，这个周期有一些偏差。我需要的来自父对象的行，分布在表中的4个连续的块中，因此上面父行（9）中大约每50行将变成“释放当前PIN的父块，获取新的父块并PIN它”。另一个我无法解释的小小的奇怪现象是，Oracle确实像第一次访问子索引时，在第二次访问子索引时“获取”了子根块，然后从那一刻起将其PIN住。所以计数是：
在父索引获取2次（根和叶）
在父表获取4次（每个表块访问一次）
在子索引获取402次（2次在根块，200次在分支块，200次在叶子块）
在子表上获取200次（访问行/块）

不可避免地，情况会发生变化——Oracle在“一致获取-检查”、“buffer is pinned count”以及11g中对缓冲块的“快速路径”访问等方面变得越来越聪明。这是Charles Hooper在11.2.0.2中记录的gets差异的一个线索，并解释了为什么我选择使用10.2.0.3和一个唯一的索引作为示例。如果您想调查其他变化，它只需要v$mystat的快照、“alter system flush buffer_cache”调用和event 10200；treedump也可以非常有助于识别访问的块。

更新（2020年4月）
Update (April 2020)
我在寻找与执行计划完全不同的东西时看到了这个贴子，所以我想我应该在19c上运行测试，看看是否出现了戏剧性的变化。除了出现了一个新的NLJ计划选项，get和pins的策略没有改变：

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Starts | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        |      1 |      0 |00:00:00.01 |     608 |
|   1 |  NESTED LOOPS                         |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   NESTED LOOPS                        |        |      1 |    200 |00:00:00.01 |     408 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   4 |     INDEX RANGE SCAN                  | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   5 |    INDEX UNIQUE SCAN                  | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | CHILD  |    200 |      0 |00:00:00.01 |     200 |
-------------------------------------------------------------------------------------------------

原文链接：https://jonathanlewis.wordpress.com/2011/01/21/quiz-night-12/
原文内容如下：
Filed under: Infrastructure,Nested Loop,Oracle — Jonathan Lewis @ 7:04 pm GMT Jan 21,2011
Here’s an execution plan pulled from v$sql_plan in 10.2.0.3 with the “gather_plan_statistics” option enabled:

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   3 |    INDEX RANGE SCAN          | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CHILD  |    200 |      0 |00:00:00.01 |     602 |
|   5 |    INDEX UNIQUE SCAN         | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |
----------------------------------------------------------------------------------------
The code defining the two tables is the same:

rem
rem     Script:         nl_gets.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2011
rem
rem     Last tested 
rem             11.1.0.6
rem
 
create table &m_table_name
as
select
        rownum                                  id,
        trunc(dbms_random.value(1,&m_rowcount)) n1,
        lpad(rownum,10,'0')                     small_vc,
        rpad('x',100)                           padding
from
        big_row_source
where
        rownum <= &m_rowcount --> comment to avoid wordpress format issue
;

The indexes par_i1 and chi_i1 are on the id column and, as you can see for chi_i1, have been created as unique indexes. Unlike the table, though, the code defining them isn’t necessarily identical.

Please state, with justification, what you think the blevel is for index chi_i1 ?

(Note – to avoid any risk of confusion, remember that height = blevel + 1; and just in case anyone thinks it makes any difference, the tables and indexes were all created in the same tablespace which is my usual setup of 1MB uniform extents, locally managed, 8K block size and freelist management.)

The quiz isn’t intended to be cunning, devious and tricksy, by the way. It’s all about observing and understanding a fairly ordinary situation. On the other hand I’m sure there are interesting variations with strange side effects that could be introduced if you really want to get clever.

Answer 23rd Jan:
Rather than writing my answer into a new post I’ve added it below – but to give new visitors a chance to work out the answer before seeing mine I’ve left a big gap before the answer.

Reading the comments I think all the interesting points have been covered. We have comments covering special treatment of root blocks in nested loop joins, the “buffer is pinned count” statistic, possible changes in 11g, the hypothesis that I had set a large pctfree for the child index,and the link between “A-rows” in one line of the plan and “Starts” in another line.

For reference, my query looked like this:

select
        par.small_vc,
        chi.small_vc
from
        parent  par,
        child   chi
where
        par.id between 301 and 500
and     chi.id = par.n1
and     chi.n1 = 0
;

The mechanics of the plan are:

Line 3 – index range scan of par_i1 (which finds ultimately 200 matching rowids).
Line 2 – for each rowid from line 3 check the matching table row (which ultimately returns 200 rows)
Then two hundred calls to:

Line 5 – index unique scan of chi_i1, which always finds it single row – hence returning 200 rowids in total
Line 4- check the matching row in the table – which always fails – hence returning zero rows in total
The critical counts that allow you to answer my question are the 402 and 602 buffer gets in lines 5 and 4. Ignoring, temporarily, the odd 2 these numbers are clearly 200 times something – which is rather nice given that we know that we are doing something 200 times. The simplest solution, of course is that the (602 – 402 =) 200 gets due to line 4 represents the visits to the 200 table blocks, leaving us with 400 (plus an odd bit) to account for.

Now consider the possibilities:

Blevel = 0 – we would visit the only index block (which would be the root, but look like a leaf) then visit the table.
Blevel = 1 – we would visit the root, the leaf which is the next layer down, then the table
Blevel = 2 – we would visit the root, the branch level, the leaf, then the table.
So, at first sight we might decide the only way to get 400 buffer visits from the index is to have Blevel = 1, visit the root and the leaf. But that’s not the way it works (apart from a couple of versions which had a bug).

When running a nested loop join, Oracle “pins” the root block of the inner (second table) index, which means it attaches a structure to the “users’ list” in the buffer header that links the session’s state object directly to the buffer header. So, in my nested loop join, Oracle gets the root block once and keeps it pinnned, then gets branch and leaf blocks 200 times. The blevel on the child index is 2. (The blevel of the parent index is only 1 – I had set the pctfree on the child index to 90 to make it much larger than it needed to be.)

This “root block pin” isn’t the only pinning optimisation in the query, though. In detail, the steps are as follows – and this is something you would be table to see in the trace file if you enabled event 10200 (one of the ‘consistent reads’ traces).

1、Get the root block of parent index and pin it
2、Get the first relevant leaf block of the parent index and pin it
3、Get the first relevant table block of the parent table and pin it.
4、Get the root block of the child index and pin it
5、Get the relevant branch block of the child index (no pin)
6、Get the relevant leaf block of the child index (no pin)
7、Get the table block of the child table (no pin) — row does not match final predicate and is rejected
8、Revisit the parent index leaf block through the pin (add one to “buffer is pinned count”)
9、Revisit the parent table block through the pin (add one to “buffer is pinned count”)
10、Revisit the child index root block through the pin (add one to “buffer is pinned count”)
11、Get a new child branch
12、Get a new child leaf
13、Get a new table block
14、Repeat for a total of 200 cycles from step 8
There a couple of deviations from this cycle, of course. The rows I needed from parent were spread across 4 consecutive blocks in the table so roughly every 50 rows from parent line (9) above would become “release current pinned parent block, get new parent block and pin it”. The other little oddity that I can’t explain is that Oracle does “get” the child root block on the second visit to the index as well as the first visit – and then pins it from that moment onwards. So the counts are:

Gets on the parent index 2 (root and leaf)
Gets on the parent table 4 (one for each table block visited)
Gets on the child index 402 (two on the root block, 200 on branch blocks, 200 on leaf blocks)
Gets on the child table (200 for each row/block accessed)
Inevitably, things change – Oracle keeps getting smarter about things like ‘consistent gets – examination’, ‘buffer is pinned count’ and, in 11g, “fastpath” access to buffered blocks. This is a clue to the difference in gets that Charles Hooper recorded in 11.2.0.2 – and explains why I chose to use 10.2.0.3 with a unique index for my example. If you want to investigate other variations all it takes is snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block addresses.

Update (April 2020)
I came across this note while searching for something completely different to do with execution plans, so I thought I’d run the test on 19c just to see if dramatic changes had appeared. Apart from the appearance of one of the newer NLJ plan options the strategy of gets and pins hadn’t changed:

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Starts | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        |      1 |      0 |00:00:00.01 |     608 |
|   1 |  NESTED LOOPS                         |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   NESTED LOOPS                        |        |      1 |    200 |00:00:00.01 |     408 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   4 |     INDEX RANGE SCAN                  | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   5 |    INDEX UNIQUE SCAN                  | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | CHILD  |    200 |      0 |00:00:00.01 |     200 |
-------------------------------------------------------------------------------------------------

nested loop join index_root

最后修改时间：2020-07-11 19:52:44

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者