Autoallocate and PX

原创 Jonathan Lewis 2020-03-06

1032

下面是一组从跟踪文件中看到的奇怪的结果。首先，跟踪事件10391设置为level 64。这将显示一个对象在并行表扫描时，被打散为细粒度的方式。我列出了前几个粒度：

Distributors associated to that generator:
    dist:0 nbgra:52 nid:65535 nbslv:4 size:5131
    filedist:0 filno:0 nbgra:0
    filedist:1 filno:6 nbgra:52
       gnum:0    sz:99 pnum:1048576 rid:(file:6 blk:12..130)
       gnum:1    sz:99 pnum:1048576 rid:(file:6 blk:131..235)
       gnum:2    sz:99 pnum:1048576 rid:(file:6 blk:236..341)
       gnum:3    sz:99 pnum:1048576 rid:(file:6 blk:342..446)

现在，只需稍加处理，对处理第一个粒度的并行处理的进程，施加level 8的10046跟踪。你可以从参数名中认出这是一个10g的跟踪文件。尤其要注意块被访问的顺序：

WAIT #1: nam='PX Deq: Execution Msg' ela= 24415 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=1730298979
WAIT #1: nam='direct path read' ela=  18913 file number=6 first dba= 12 block cnt= 5 obj#=59196 tim=1730318924
WAIT #1: nam='direct path read' ela=   5431 file number=6 first dba= 37 block cnt= 4 obj#=59196 tim=1730324753
WAIT #1: nam='direct path read' ela=   5017 file number=6 first dba= 49 block cnt= 8 obj#=59196 tim=1730330528
WAIT #1: nam='direct path read' ela=   6651 file number=6 first dba= 82 block cnt= 7 obj#=59196 tim=1730338312
WAIT #1: nam='direct path read' ela=   5033 file number=6 first dba= 97 block cnt= 8 obj#=59196 tim=1730343998
WAIT #1: nam='direct path read' ela=   2711 file number=6 first dba=114 block cnt= 7 obj#=59196 tim=1730348009
WAIT #1: nam='direct path read' ela=   8746 file number=6 first dba=129 block cnt= 2 obj#=59196 tim=1730357846
WAIT #1: nam='direct path read' ela=  66938 file number=6 first dba= 69 block cnt=12 obj#=59196 tim=1730425271
WAIT #1: nam='direct path read' ela=  80266 file number=6 first dba= 90 block cnt= 7 obj#=59196 tim=1730506969
WAIT #1: nam='direct path read' ela=   4849 file number=6 first dba=105 block cnt= 8 obj#=59196 tim=1730512609
WAIT #1: nam='direct path read' ela=   2831 file number=6 first dba=122 block cnt= 7 obj#=59196 tim=1730516537
WAIT #1: nam='direct path read' ela= 150077 file number=6 first dba= 29 block cnt= 4 obj#=59196 tim=1730667253
WAIT #1: nam='direct path read' ela=  47274 file number=6 first dba= 41 block cnt= 8 obj#=59196 tim=1730715604
WAIT #1: nam='direct path read' ela=      6 file number=6 first dba= 21 block cnt= 4 obj#=59196 tim=1730716822
WAIT #1: nam='direct path read' ela=  42396 file number=6 first dba= 57 block cnt= 8 obj#=59196 tim=1730759816
WAIT #1: nam='PX Deq: Execution Msg' ela= 103 sleeptime/senderid=268566527 passes=1 p3=0 obj#=59196 tim=1730760814

如你所见，这一系列对应于PX slave的等待从被告知处理第一个颗粒（PX Deq:Execution Msg），并以等待被告知下一个要处理哪个颗粒而结束。
但是为什么直接路径读会读得如此少，并且以一个可笑的顺序进行？最终，slave进程获得了99个连续的块–有些奇怪的是，如果你从块12数到块130，会发现比99个块更多。此外，好象有些块并没有读。

答案就在本文的标题中。我是在系统管理extent的表空间（extent自动分配的本地管理表空间）上创建的表。在一个“干净”的数据文件中，如果你创建了一个表并开始插入数据，其效果是这样的：前16个extents将以64KB的大小分配，接下来的63个extents是1MB，接下来的120个extents是8MB，然后最终切换到每个extents为64MB。（观测所得，Oracle文档中无类似描述）

然而，如果你跑一个并行insert /*+ append /”,每一个并行执行子进程都认为它在填充它自己的私有数据段，所以，每一个并行子进程都会产生16个64KB的extents,接着是63个1MB的extents,以此类推。此外，如果你运行3个独立的“insert /+ append */” 语句去向表中插入数据，每一个语句都会重新开始循环（上述extents的分配）。

在此之上，当你使用extent自动分配时，(并行）插入进程的最后一个步骤是做区段截断（extent trimming),以便可以释放每一个并行子进程在最后一个extent中，位于尾部的可用空间。

Oracle非常努力地避免这种extent微调的问题，它是按照最接近的“单位大小”进行的，即在实际数据限制之上的下一个64KB、1MB或8MB边界。这样做的效果是避免表空间上出现随机大小的孔，但仍然会留下几个大小不等的孔。

假设我们有一个非常大的表空间，运行在32颗CPU和默认并行度的环境中。经常删除，重建和填充对象（插入数据），甚至同时删除和重建若干个对象。那么你会发现，在整个表空间中，逐渐积累了许多各种大小的孔。

Oracle试图尽可能快的使用这些孔，但是，这只能通过分配合法大小的extent，从文件的开头开始并按顺序处理来实现。因此，如果你有许多小孔分散在整个表空间上，并且有许多并行进程运行其上，最终（你会看到）：文件中连续的108个块，是由14个并行进程产生的，14组块组，每组由8个块组成。而当对其进行并行的表扫描时，会按extent_id的顺序读取它。

将非常大的extent自动分配表空间与高度并行的,反复创建/插入混合在一起，可能会导致对象获取非常大数量的小extents，这可能会对随后的并行查询产生令人惊讶的影响。

2009年8月更新：
Christian Antognini称11.1.0.7的增强功能（悄悄的）解决了这个问题。该文链接译文链接

以下附原贴的链接和内容：
https://jonathanlewis.wordpress.com/2007/05/29/autoallocate-and-px/

Autoallocate and PX
Filed under: Infrastructure,Oracle,Parallel Execution,Performance,Troubleshooting — Jonathan Lewis @ 9:22 pm BST May 29,2007
Here’s a quirky little set of results from trace files. First a 10391 at level 64 – this shows the way in which an object was broken into granules for a parallel tablescan. I’ve just listed the first few granules:

Distributors associated to that generator:
    dist:0 nbgra:52 nid:65535 nbslv:4 size:5131
    filedist:0 filno:0 nbgra:0
    filedist:1 filno:6 nbgra:52
       gnum:0    sz:99 pnum:1048576 rid:(file:6 blk:12..130)
       gnum:1    sz:99 pnum:1048576 rid:(file:6 blk:131..235)
       gnum:2    sz:99 pnum:1048576 rid:(file:6 blk:236..341)
       gnum:3    sz:99 pnum:1048576 rid:(file:6 blk:342..446)

Now, with just a little cosmetic enhancement, the 10046 trace at level 8 of the parallel server process that was given the first granule to process. You can tell from the parameter naming that this is a 10g trace file. Note especially the order in which the blocks are visited.

WAIT #1: nam='PX Deq: Execution Msg' ela= 24415 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=1730298979
WAIT #1: nam='direct path read' ela=  18913 file number=6 first dba= 12 block cnt= 5 obj#=59196 tim=1730318924
WAIT #1: nam='direct path read' ela=   5431 file number=6 first dba= 37 block cnt= 4 obj#=59196 tim=1730324753
WAIT #1: nam='direct path read' ela=   5017 file number=6 first dba= 49 block cnt= 8 obj#=59196 tim=1730330528
WAIT #1: nam='direct path read' ela=   6651 file number=6 first dba= 82 block cnt= 7 obj#=59196 tim=1730338312
WAIT #1: nam='direct path read' ela=   5033 file number=6 first dba= 97 block cnt= 8 obj#=59196 tim=1730343998
WAIT #1: nam='direct path read' ela=   2711 file number=6 first dba=114 block cnt= 7 obj#=59196 tim=1730348009
WAIT #1: nam='direct path read' ela=   8746 file number=6 first dba=129 block cnt= 2 obj#=59196 tim=1730357846
WAIT #1: nam='direct path read' ela=  66938 file number=6 first dba= 69 block cnt=12 obj#=59196 tim=1730425271
WAIT #1: nam='direct path read' ela=  80266 file number=6 first dba= 90 block cnt= 7 obj#=59196 tim=1730506969
WAIT #1: nam='direct path read' ela=   4849 file number=6 first dba=105 block cnt= 8 obj#=59196 tim=1730512609
WAIT #1: nam='direct path read' ela=   2831 file number=6 first dba=122 block cnt= 7 obj#=59196 tim=1730516537
WAIT #1: nam='direct path read' ela= 150077 file number=6 first dba= 29 block cnt= 4 obj#=59196 tim=1730667253
WAIT #1: nam='direct path read' ela=  47274 file number=6 first dba= 41 block cnt= 8 obj#=59196 tim=1730715604
WAIT #1: nam='direct path read' ela=      6 file number=6 first dba= 21 block cnt= 4 obj#=59196 tim=1730716822
WAIT #1: nam='direct path read' ela=  42396 file number=6 first dba= 57 block cnt= 8 obj#=59196 tim=1730759816
WAIT #1: nam='PX Deq: Execution Msg' ela= 103 sleeptime/senderid=268566527 passes=1 p3=0 obj#=59196 tim=1730760814

As you can see, this stream of waits corresponds to a PX slave waiting to be told to process the first granule (PX Deq: Execution Msg), and ends with the slave waiting to be told which granule to process next.

But why are the direct path reads so small and in such a funny order? After all, the slave was given a stream of 99 consecutive blocks – and even that’s a little odd because if you count from block 12 to block 130 you get rather more than 99 blocks. Moreover, there seem to be some blocks that didn’t get read.

The answer lies in the title to this piece. I have created the table in a tablespace with system managed extents (the auto allocation type for locally managed tablespaces). In a “clean” datafile, the effect of this is that if you create a table and start to insert data into it, the first 16 extents will be allocated at 64KB, the next 63 at 1MB, followed by 120 at 8MB, before Oracle finally switches to 64MB extents. (Current observations, not documented by Oracle).

However, if you run a parallel “insert /*+ append /”, every parallel execution slave thinks it is populating its own private data segment, so each slave will generate 16 extents of 64KB, then 63 extents of 1MB, and so on. Moreover, if you use 3 separate “insert /+ append */” statements to load the table, each statement starts the cycle all over again.

On top of that, when you use autoallocate, the last stage in the insertion process does extent trimming to release the trailing free space from the last extent allocated by each slave.

Oracle tries very hard to avoid problems with this extent trimming – it’s done to the nearest “unit size”, i.e. to the next 64KB, 1MB, or 8MB boundary above the actual limit of the data. This has the effect of avoiding randomly sized holes all over the tablespace but it will still leave several holes of varying sizes.

Now assume you have a very large tablespace, are running with 32 CPUs and default parallelism, frequently drop, re-create, and populate objects, even dropping and recreating several objects concurrently. If you do this you will probably find that you gradually accumulate lots of little holes of all sorts of sizes all over the tablespace.

Oracle does try to use up these holes as fast as possible – but only by allocating extents of the legal unit sizes, starting from the start of the file and working along it in order. So if you have lots of little holes scattered through the tablespace, and lots of parallel slaves busily soaking them up, you end up with situations like the above: 108 consecutive blocks in one file consisting of 14 chunks of 8 blocks allocated by 14 different parallel execution slaves, and read in order of extent_id when a parallel tablescan starts.

Mixing very large autoallocate tablespaces with repeated parallel create/insert at high degrees of parallelism can result in objects acquiring a very large number of small extents – which can have a surprising impact on subsequent parallel queries.

Update August 2009:
Christian Antognini reports an enhancement arriving (silently) in 11.1.0.7 that addresses this issue.

autoallocate and px oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Autoallocate and PX

评论