treedump

原创 Jonathan lewis 2020-05-02

1155

如果你认为索引内部可能发生了一些奇怪的事情，那么这里有一种机制来详细检查索引。这是我第一次在实际的Oracle 8i中描述的一个特性，尽管书中没有提到它的问题。但我后来在使用一个大索引前，总是小心的先在一个小索引上测试它。下面的演示是从10.2.0.3上的SQL*Plus会话中剪切得到的：

SQL> select object_id
  2  from user_objects
  3  where object_name = 'ORD_PLACED'
  4  ;
 OBJECT_ID
----------
    100466
1 row selected.
SQL> alter session set events 'immediate trace name treedump level 100466';
Session altered.

treedump转储一个包含了在level后指定了其对象id的索引（或索引分区）摘要的跟踪文件。一般情况下，在跟踪文件中，索引中的每个块都会占用一行，但在Oracle的某些版本中，每个叶块都会有一个完整的符号转储，因此，在对大型索引进行尝试之前，需要先在一个小索引做检查。
DUMP结果看起来类似如下（这是索引sys.i_source1的开始部分）：

----- begin tree dump
branch: 0x40035a 4195162 (0: nrow: 3, level: 2)
   branch: 0x40e2c0 4252352 (-1: nrow: 312, level: 1)
      leaf: 0x40035b 4195163 (-1: nrow: 240 rrow: 240)
      leaf: 0x40215f 4202847 (0: nrow: 218 rrow: 218)
      leaf: 0x40035c 4195164 (1: nrow: 444 rrow: 444)
      leaf: 0x40035d 4195165 (2: nrow: 444 rrow: 444)
       ... etc ...

为了生成转储，Oracle按一次一个块的顺序遍历树结构。从每一行开始的十六进制和十进制数字是用两种不同方式表示的块地址。
转储的第一行显示根块。在本例中，索引的blevel=2（这意味着高度为3），因此沿着索引向下走时，将访问根、一个级别1的分支，然后访问命中的叶块。根块当前有三个指针（nrow=3）指向级别1的分支块。

注1：根块在格式上通常与分支块没有区别，但是当索引只包含根块时，根块被格式化为叶块而不是分支块。

注2：无论发生了什么样的块分裂，根块始终保持在同一位置—— 段头块和freelist group块之后的块。

缩进一步（三个字符）的第2行显示下一级别的第一个分支块-在本例中是级别1分支块的第一个。在括号中，您可以看到这是索引条目-1，有312个指向下一级的块的指针，并且是level 1分支块。级别1的主要意味着其条目将指向叶块。

Oracle从“-1”分支块开始可能有点奇怪：计算机程序通常从0或1开始计数，但是Oracle有理由从分支块的-1开始计数。分支块中的大多数条目都是成对的格式（部分的键值、块地址），这些条目列在块的行目录中，但有一个特殊条目（称为“leftmost child-kdxbrlmc”）省略了部分的键值，并存储在行目录外的单独位置。Oracle省略了键值以节省一点空间，并从块的父级（在上面的分支级别）推断出值。由于行目录从第0行开始计数，最左边的子项必须标记为“-1”。
我没有展示全，但从跟踪文件向下312行，你会看到分支块0，而且另外两个分支块的间隔距离近似。

跟踪文件的第三行是第一个叶块（我们也是从-1开始计数）。注意，叶块中没有“level”。相反，您将得到两个行条目：块中的当前行数（rrow）和块的行目录（请参见词汇表）的当前大小（nrow）。行目录可能保存了对已删除、并提交的，但尚未从结构中清除的行的引用，这就是转储显示两条信息的原因。

示例中显示的索引部分是完全干净整洁的（每个块中的row=nrow），但是在索引开始时发生了一些奇怪的事情，因为我们可以看到有证据表明，由于索引初始填充后的插入，50/50的叶块分裂在底层发生–请注意叶块地址几乎是按顺序排列的（035b，215f，035c，035d）但是在第一个块和第三个块之间出现了完全不同的块号，以便在第一个块分割时保存溢出的内容（译者注：即原在第一个块中，被分裂出来的索引条目）。
为了便于比较，下面是在我的“索引激增”代码运行期间生成的索引DUMP部分：

branch: 0x140118a 20976010 (0: nrow: 551, level: 1)
   leaf: 0x140118b 20976011 (-1: nrow: 409 rrow: 409)
   leaf: 0x140118c 20976012 (0: nrow: 214 rrow: 214)
   leaf: 0x140118d 20976013 (1: nrow: 214 rrow: 214)
   leaf: 0x140118e 20976014 (2: nrow: 213 rrow: 213)
   leaf: 0x1401190 20976016 (3: nrow: 206 rrow: 206)
   leaf: 0x140118f 20976015 (4: nrow: 327 rrow: 327)
   leaf: 0x1401191 20976017 (5: nrow: 169 rrow: 169)
   leaf: 0x1401192 20976018 (6: nrow: 169 rrow: 169)
   leaf: 0x1401194 20976020 (7: nrow: 159 rrow: 159)
   leaf: 0x1401193 20976019 (8: nrow: 169 rrow: 169)
   leaf: 0x1401196 20976022 (9: nrow: 160 rrow: 160)
   leaf: 0x1401195 20976021 (10: nrow: 327 rrow: 327)
    ...
   leaf: 0x140152d 20976941 (544: nrow: 141 rrow: 141)
   leaf: 0x140152c 20976940 (545: nrow: 146 rrow: 146)
   leaf: 0x140152f 20976943 (546: nrow: 140 rrow: 140)
   leaf: 0x140152e 20976942 (547: nrow: 282 rrow: 282)
   leaf: 0x1401530 20976944 (548: nrow: 282 rrow: 282)
   leaf: 0x1401531 20976945 (549: nrow: 138 rrow: 138)

（如上所示）我们有一个BLEVEL=1的索引，并且有551个叶块（从-1到549）。

第一个叶块是被100%填满后经历了“90/10"分裂。

接下来的4个叶子块显示了50/50分裂的明显证据（每个块包含第一个块中最大409行的大约一半）。接下来的一个叶子块表明，该块遭受了一点困难，它的ITL急剧增长——这个块是100%的“满”（我通过块转储确定了这一事实），但它只能容纳327个条目，因为它的ITL已经增长到大约70个条目。

ITL的大小在接下来的几个块上并没有发生变化，但我们可以看到一些50/50的分裂，然后是90/10的分裂，如此往复。当我们到达列表的末尾时，我们又遇到了一个小问题，一个“full”块现在只能容纳282行，而显示50/50拆分的块大约可以容纳140行。在这次测试中，ITL增长到了112个条目的最坏情况——这不是一场彻底的灾难，但无疑是一个坏消息。

treedump还不够详细，无法告诉你索引中发生的所有事情，但它相对较短（每个块一行），并为您提供了一些有关正在发生的事情的良好线索。如果您想进一步研究，块地址的十进制形式的存在使得转储单个块变得容易，例如：

SQL> select
  2     dbms_utility.data_block_address_file(20976011) file#,
  3     dbms_utility.data_block_address_block(20976011) block#
  4  from
  5     dual
  6  /
 
     FILE#     BLOCK#
---------- ----------
         5       4491
 
1 row selected.
 
SQL> alter system dump datafile 5 block 4491;
 
System altered.

注：
在Oracle的某些版本中，每个叶块都有一个完整的块转储，这使得执行treedump非常昂贵。
在Oracle 9.2.0.6之前的版本中，存在一个BUG，这意味着如果尝试转储自动创建的支持唯一或主键约束的索引，则会话将崩溃，并出现“numeric overflow”错误。这就使得很难转储索引组织表（iot）。这是因为在ind$.property中设置了12位（4096），代码不需要这么大的数字。（从技术上讲，可以直接更新property列；它可以工作，但我不会在生产系统上执行此操作–但在你要在放弃的备份上可以尝试执行此操作。）

2010年12月更新：（参见相关贴子中的评论）：treedump的每一行大约需要75字节，因此您需要能够生成大约（75*leaf_blocks/1048576）MB的tracefile。请确保在开始之前检查参数max_dump_file_size。

原文链接地址及内容如下：
https://jonathanlewis.wordpress.com/2009/08/17/treedump/

treedump
Filed under: Indexing,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 5:31 pm BST Aug 17,2009
Here’s a mechanism for examining indexes in some detail if you think that something odd may be going on inside them. It’s a feature that I first decribed in Practical Oracle 8i, although the book doesn’t mention a problem with it that I subsequently discovered that means I always test it carefully on a small index before I use it on a large one. Here’s a demonstration – cut from an SQL*Plus session on 10.2.0.3:

SQL> select object_id
  2  from user_objects
  3  where object_name = 'ORD_PLACED'
  4  ;
 OBJECT_ID
----------
    100466
1 row selected.
SQL> alter session set events 'immediate trace name treedump level 100466';
Session altered.

The treedump dumps a trace file with a summary of the index (or index partition) whose object_id you have specified in the level. In general you get one line in the trace file for each block in the index – but there are some versions of Oracle where you get a full symbolic dump for every leaf block, hence the need to check with a small index before trying it on a large one.

The dump will look something like the following (which is the start of a dump of the index sys.i_source1):

----- begin tree dump
branch: 0x40035a 4195162 (0: nrow: 3, level: 2)
   branch: 0x40e2c0 4252352 (-1: nrow: 312, level: 1)
      leaf: 0x40035b 4195163 (-1: nrow: 240 rrow: 240)
      leaf: 0x40215f 4202847 (0: nrow: 218 rrow: 218)
      leaf: 0x40035c 4195164 (1: nrow: 444 rrow: 444)
      leaf: 0x40035d 4195165 (2: nrow: 444 rrow: 444)
       ... etc ...

To generate the dump, Oracle walks the tree structure in order one block at a time. The hexadecimal and decimal numbers starting each line are the block address expressed in two different ways.

The first line of the dump shows the root block. In this case the index has blevel=2 (which means a height of 3), so a walk down the index will visit the root, one level 1 branch, and then hit a leaf block. The root block currently has three pointers (nrow = 3) to branch blocks at level 1.

Note 1 – The root block is normally no different in format from a branch block, but when an index consists of nothing but the root block the root block is formatted as a leaf block not a branch block.
Note 2 – whatever block splits happen, the root block is always kept at the same location – the block after the segment header block and any freelist group blocks.

Indented one step (three characters) line 2 shows you the first branch block of the level below – the first of the level 1 branch blocks in this case. In brackets you see that this is entry -1, has 312 pointers to blocks at the next level and is a level 1 branch block. The significance of level 1 is that the entries will be pointing to leaf blocks.

It may seem a little odd that Oracle starts at the “-1th” branch block: computer programs usually start counting at zero or one, but Oracle has a rationale for starting at -1 in branch blocks. Most of the entries in a branch block are pairs of the form (partial key, block address) – and these are listed in the block’s row directory – but there is one special entry (called the “leftmost child – kdxbrlmc”) which omits the partial key and gets stored in a separate place outside the row directory. Oracle omits the key to save a little space and infers the value from the block’s parent (in the branch level above). Since the row directory starts counting at row 0, the leftmost child has to be labelled “-1”.

I haven’t shown it, but 312 lines further down the trace file, you’d find branch block zero, and the other two branch blocks would be a similar sort of distance further down again.

The third line of the trace file is the first leaf block (and again we start counting from -1). Note that there is no “level” in the leaf block. Instead you get two entries for rows: the number of current rows (rrow) in the block and the current size (nrow) of the block’srow directory (see glossary). The row directory may hold references to rows that have been deleted, committed, but not yet cleaned out of the structure, which is why the dump shows two pieces of information.

The part of the index shown in the example is completely clean and tidy (rrow = nrow in every block) but something a little odd has happened at the start of the index as we can see evidence of a 50/50 leaf block split at the low end due to inserts after the index was intially filled – note how the leaf block addresses are nearly in sequence (035b, 215f, 035c, 035d) but a completely different block number appears between the first and third blocks to hold the overflow as the first block split.

For comparison purposes, here’s a section of the index produced during one run of my “index explosion” code:

branch: 0x140118a 20976010 (0: nrow: 551, level: 1)
   leaf: 0x140118b 20976011 (-1: nrow: 409 rrow: 409)
   leaf: 0x140118c 20976012 (0: nrow: 214 rrow: 214)
   leaf: 0x140118d 20976013 (1: nrow: 214 rrow: 214)
   leaf: 0x140118e 20976014 (2: nrow: 213 rrow: 213)
   leaf: 0x1401190 20976016 (3: nrow: 206 rrow: 206)
   leaf: 0x140118f 20976015 (4: nrow: 327 rrow: 327)
   leaf: 0x1401191 20976017 (5: nrow: 169 rrow: 169)
   leaf: 0x1401192 20976018 (6: nrow: 169 rrow: 169)
   leaf: 0x1401194 20976020 (7: nrow: 159 rrow: 159)
   leaf: 0x1401193 20976019 (8: nrow: 169 rrow: 169)
   leaf: 0x1401196 20976022 (9: nrow: 160 rrow: 160)
   leaf: 0x1401195 20976021 (10: nrow: 327 rrow: 327)
    ...
   leaf: 0x140152d 20976941 (544: nrow: 141 rrow: 141)
   leaf: 0x140152c 20976940 (545: nrow: 146 rrow: 146)
   leaf: 0x140152f 20976943 (546: nrow: 140 rrow: 140)
   leaf: 0x140152e 20976942 (547: nrow: 282 rrow: 282)
   leaf: 0x1401530 20976944 (548: nrow: 282 rrow: 282)
   leaf: 0x1401531 20976945 (549: nrow: 138 rrow: 138)

We have an index with blevel = 1, and 551 leaf blocks (numbered -1 to 549).

The first leaf block is 100% packed and underwent a “90/10” split.

The next 4 leaf blocks show clear evidence of 50/50 splits (each holds roughly half of the maximum 409 rows in the first block). The next leaf block suggests that a bit of a catastrophe hit the block and its ITL grew sharply – this block is 100% “full” (a fact that I ascertained by doing a block dump), but can only fit 327 entries because its ITL has grown to about 70 entries.

The size of the ITL doesn’t change over the next few blocks, but we can see a few 50/50 splits followed by a 90/10 split – and so it goes on. By the time we get to the end of the list, we’ve had another glitch, and a “full” block now holds only 282 rows, and blocks showing 50/50 splits hold about 140 rows. In this test run, the ITLs grew to a worst case of about 112 entries – not a complete disaster but certainly bad news.

The treedump isn’t detailed enough to tell you everything that’s happened to your index – but it is relatively short (one line per block) and gives you some good clues about what’s been going on. And the presence of the decimal form of the block address makes it easy to dump individual blocks if you want to investigate further, for example:

SQL> select
  2     dbms_utility.data_block_address_file(20976011) file#,
  3     dbms_utility.data_block_address_block(20976011) block#
  4  from
  5     dual
  6  /
 
     FILE#     BLOCK#
---------- ----------
         5       4491
 
1 row selected.
 
SQL> alter system dump datafile 5 block 4491;
 
System altered.

Footnote:
In some versions of Oracle you get a full block dump of every leaf block – which makes it very expensive to do a treedump.
In versions of Oracle prior to 9.2.0.6 there is a bug which means your session will crash with a “numeric overflow” error if you try to dump an index that has been created automatically to support a unique or primary key constraint. This makes it hard to dump index organized tables(IOTs). This is because bit 12 (4096) been set in ind$.property and the code isn’t expecting such a large number. (Technically it is possible to update the property column directly; it works, but I wouldn’t do it on the production system – but maybe on a backup you’re about to discard.)

Update Dec 2010: (See this comment on a related post): Each line of a treedump takes about 75 bytes, so you need to be able to generate a tracefile of around (75 * leaf_blocks / 1048576)MB. Make sure you check parameter max_dump_file_size before you start.

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者