间隔分区（翻译）

原创李佳豪 2020-08-24

1211

原文作者：Jonathan lewis
原文地址：https://jonathanlewis.wordpress.com/2020/02/18/interval-partitions-2/
译文如下（原文附后）：
今天Twitter上出现了一个奇怪的区间划分的小特性——一个只使用一个PX slave来进行插入的并行插入。由于有11亿行，并且可以选择运行parallel 32，这使得加载过程比应该的速度要慢得多。
幸运的是，对这种奇怪现象进行建模(和解决)相当容易。这是一个小的数据集和一个空的分区表:

rem
rem     Script:         pt_int_load_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem
 
create table t1 
nologging 
as
select 
        ao.* 
from 
        all_Objects ao, 
        (select rownum id from dual connect by level <= 20)
;
 
create table pt1
partition  by range (object_id) interval (1000000) (
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

我通过复制all_objects 20次创建了一个表，对于我的小沙箱来说，总共有大约1.2M行。然后，我创建了一个空的all_objects的间隔分区克隆，其中第一个分区定义为保存object_id小于1的所有行(并且数据库中没有对象可以匹配该条件)。我将间隔定义为1,000,000，由于我的数据库中最高的object_id大约是90000，因此添加到这个表的第一个分区将能够容纳来自t1的所有数据。
现在我们试着从t1并行插入到pt1，检查执行计划和并行执行统计信息:

set serveroutput off
 
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select * from t1;
 
select * from table(dbms_xplan.display_cursor);
 
start pq_tqstat

注意我是如何使用提示/*+ enable_parallel_dml */(可能是一个反向移植到11.2.0.4的12c提示)而不是使用“alter session”的，只是能够在SQL中嵌入该指令有点方便。pq_tqstat脚本是我不久前发布的脚本，用于在运行并行语句后立即报告特定于会话的动态性能视图v$pq_tqstat的内容。
下面是执行计划：

SQL_ID  25hub68pf1z1s, child number 0
-------------------------------------
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select *
from t1
 
Plan hash value: 2888707464
 
-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                            |          |       |       |   631 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                             |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                       | :TQ10001 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HIGH WATER MARK BROKERED)| PT1      |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING          |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                             |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANDOM LOCAL                  | :TQ10000 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | P->P | RANDOM LOCA|
|   7 |        PX BLOCK ITERATOR                    |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL                   | T1       |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   8 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 6 because of hint

该计划最重要的细节是，PX从服务器作为select执行加载(操作3)，然后向查询协调器发送消息(PX send QC，操作2)，告诉它有关数据加载的信息。它们不将数据发送给QC，让QC进行加载。
所以计划说我们将做并行DM，但是v$pq_tqstat告诉我们:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                215880   34785363     17.47      16.86         16          0           0
                                             1 P007                202561   34436325     16.39      16.69         17          0           0
                                             1 P008                207519   34564496     16.79      16.75         17          0           0
                                             1 P009                208408   34594770     16.86      16.77         17          0           0
                                             1 P00A                198915   33529627     16.10      16.25         16          0           0
                                             1 P00B                202537   34430603     16.39      16.69         16          0           0
                      Consumer               1 P000                     0        144      0.00       0.00         51         47           0
                                             1 P001                     0        144      0.00       0.00         51         47           0
                                             1 P002               1235820  206340464    100.00     100.00         75         47           0
                                             1 P003                     0        144      0.00       0.00         51         47           0
                                             1 P004                     0        144      0.00       0.00       1138       1134           0
                                             1 P005                     0        144      0.00       0.00       1137       1133           0
 
                    1 Producer               1 P000                     0         24      0.00       5.91         51         42           0
                                             1 P001                     0         24      0.00       5.91         50         41           0
                                             1 P002                     2        286    100.00      70.44         58         14           0
                                             1 P003                     0         24      0.00       5.91         51         43           0
                                             1 P004                     0         24      0.00       5.91         51         42           0
                                             1 P005                     0         24      0.00       5.91         51         43           0
                      Consumer               1 QC                       2        406    100.00     100.00        311        179           0
 
19 rows selected.

查询确实如提示的那样运行了parallel 6，并且6个PX从服务器扫描了t1表;但是它们都把所有的数据发送给了第二个从集合中的一个PX从然后这个PX从做了所有的插入。这个计划是并行的，但执行起来却是连续的。(您将注意到，当p004和p005充当使用者时，它们的等待和超时有些奇怪。我以后可能会担心这个问题，但这可能是在一个有4个cpu的VM上运行parallel 6的基于主机的副作用)。
序列化会导致两个问题：
（1）到底是哪里出了错?
（2）我们如何解决这个问题，并使插入“真正”并行
我对(1)的回答是“我不知道——但如果有必要我会查看它”与猜测相结合——这与表在开始只有一个分区有关，这对PX分布的随机函数产生了意想不到的副作用。
我对(2)的回答是“如果我对(1)的判断是正确的，为什么不尝试预定义两个分区，我甚至会让它们都是空的”。
这是我对pt1的新定义:

create table pt1
partition  by range (object_id) interval (1000000) (
        partition p0 values less than (0),
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

使用完全多余且永久为空的p0分区重新运行测试，计划没有改变，但是v$pq_tqstat的结果发生了戏剧性的变化:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
        1          0 Producer               1 P006                207897   34581153     16.82      16.76         23          4           0
                                            1 P007                215669   34786429     17.45      16.86         30          5           0
                                            1 P008                221474   36749626     17.92      17.81         28          5           0
                                            1 P009                204959   34497164     16.58      16.72         22          2           0
                                            1 P00A                177755   30141002     14.38      14.61         21          0           0
                                            1 P00B                208066   35585810     16.84      17.25         25          2           0
                     Consumer               1 P000                213129   35612973     17.25      17.26         82         57           0
                                            1 P001                200516   33570586     16.23      16.27         84         55           0
                                            1 P002                203395   33950449     16.46      16.45         83         56           0
                                            1 P003                205458   34235575     16.63      16.59         82         54           0
                                            1 P004                204111   33999932     16.52      16.48        581        555           0
                                            1 P005                209211   34971669     16.93      16.95        580        553           0

                   1 Producer               1 P000                     2        286     16.67      16.67        422        149           0
                                            1 P001                     2        286     16.67      16.67        398        130           0
                                            1 P002                     2        286     16.67      16.67        405        128           0
                                            1 P003                     2        286     16.67      16.67        437        161           0
                                            1 P004                     2        286     16.67      16.67        406        116           0
                                            1 P005                     2        286     16.67      16.67        440        148           0
                     Consumer               1 QC                      12       1716    100.00     100.00        242        111           0



19 rows selected.

每个使用者接收和插入大约200,000行，这是一个完全公平的并行DML。对于这么小的数据集，计时是相当无关紧要的，但是当并行性“正常”工作时，执行时间确实从7秒降到了4秒。
我已经在12.2.0.1和19.3.0.0上测试了这个脚本——在这两个版本中都出现了相同的异常，不过值得注意的是，在等待和超时方面没有出现奇怪的倾斜。

原文：
Oracle Scratchpad
February 18, 2020
Interval Partition(s)
Filed under: Oracle,Parallel Execution,Partitioning — Jonathan Lewis @ 1:45 pm GMT Feb 18,2020
A quirky little feature of interval partitioning showed up on Twitter today – a parallel insert that would only use a single PX slave to do the inserting. With 1.1 billion rows and the option for running parallel 32 this made the loading process rather slower than it ought to have been.

Fortunately it’s quite easy to model (and work around) the oddity. So here’s a small data set and an empty partitioned table to work with:

rem
rem     Script:         pt_int_load_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem
 
create table t1 
nologging 
as
select 
        ao.* 
from 
        all_Objects ao, 
        (select rownum id from dual connect by level <= 20)
;
 
create table pt1
partition  by range (object_id) interval (1000000) (
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

I’ve created a table by copying all_objects 20 times which, for my little sandbox, has given me a total of about 1.2M rows. Then I’ve created an empty interval-partitioned clone of all_objects, with the first partition defined to hold all rows where the object_id is less than 1 (and there’s no object in the database that could match that criterion). I’ve defined the interval to be 1,000,000 and since the highest object_id in my database is about 90,000 the first partition that gets added to this table will be able to hold all the data from t1.

So now we try to do a parallel insert from t1 into pt1, and check the execution plan and parallel execution statistics:

set serveroutput off
 
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select * from t1;
 
select * from table(dbms_xplan.display_cursor);
 
start pq_tqstat

Note how I’ve used the hint /*+ enable_parallel_dml */ (possible a 12c hint back-ported to 11.2.0.4) rather than using an “alter session”, it’s just a little convenience to be able to embed the directive in the SQL. The pq_tqstat script is one I published some time ago to report the contents of the session-specific dynamic performance view v$pq_tqstat immediately after running a parallel statement.

Here’s the plan:

SQL_ID  25hub68pf1z1s, child number 0
-------------------------------------
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select *
from t1
 
Plan hash value: 2888707464
 
-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                            |          |       |       |   631 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                             |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                       | :TQ10001 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HIGH WATER MARK BROKERED)| PT1      |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING          |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                             |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANDOM LOCAL                  | :TQ10000 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | P->P | RANDOM LOCA|
|   7 |        PX BLOCK ITERATOR                    |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL                   | T1       |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   8 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 6 because of hint

The most important detail of this plan is that the PX slaves do the load as select (operation 3), then send a message to the query coordinator (PX send QC, operation 2) to tell it about the data load. They do not send their data to the QC for the QC to do the load.

So the plan says we will be doing parallel DM, but here’s what v$pq_tqstat tells us:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                215880   34785363     17.47      16.86         16          0           0
                                             1 P007                202561   34436325     16.39      16.69         17          0           0
                                             1 P008                207519   34564496     16.79      16.75         17          0           0
                                             1 P009                208408   34594770     16.86      16.77         17          0           0
                                             1 P00A                198915   33529627     16.10      16.25         16          0           0
                                             1 P00B                202537   34430603     16.39      16.69         16          0           0
                      Consumer               1 P000                     0        144      0.00       0.00         51         47           0
                                             1 P001                     0        144      0.00       0.00         51         47           0
                                             1 P002               1235820  206340464    100.00     100.00         75         47           0
                                             1 P003                     0        144      0.00       0.00         51         47           0
                                             1 P004                     0        144      0.00       0.00       1138       1134           0
                                             1 P005                     0        144      0.00       0.00       1137       1133           0
 
                    1 Producer               1 P000                     0         24      0.00       5.91         51         42           0
                                             1 P001                     0         24      0.00       5.91         50         41           0
                                             1 P002                     2        286    100.00      70.44         58         14           0
                                             1 P003                     0         24      0.00       5.91         51         43           0
                                             1 P004                     0         24      0.00       5.91         51         42           0
                                             1 P005                     0         24      0.00       5.91         51         43           0
                      Consumer               1 QC                       2        406    100.00     100.00        311        179           0
 
19 rows selected.

The query did run parallel 6 as hinted – and 6 PX slaves scanned the t1 table; but they all sent all their data to one PX slave in the second slave set and that one PX slave did all the inserts. The plan was parallel, but the execution was effectively serial. (You’ll note there is something a little odd about the waits and timeout for p004 and p005 when they are acting as consumers. I may worry about that later, but it could be a host-based side effect of running parallel 6 on a VM with 4 CPUs).

The serialization leads to two questions

What went wrong?
How do we work around this and make the insert “truly” parallel
My answer to (1) is “I don’t know – but I’ll look at it if necessary” combined with the guess – it’s something to do with the table having only one partition at the outset and this has an unexpected side effect on the randomising function for the PX distribution.

My answer to (2) is “if I’m right about (1), why not try pre-defining two partitions, and I’ll even let both of them stay empty”.

So here’s my new definition for pt1:

create table pt1
partition  by range (object_id) interval (1000000) (
        partition p0 values less than (0),
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

Re-running the test with the completely redundant, and permanently empty p0 partition the plan doesn’t change but the results from v$pq_tqstat change dramatically:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                207897   34581153     16.82      16.76         23          4           0
                                             1 P007                215669   34786429     17.45      16.86         30          5           0
                                             1 P008                221474   36749626     17.92      17.81         28          5           0
                                             1 P009                204959   34497164     16.58      16.72         22          2           0
                                             1 P00A                177755   30141002     14.38      14.61         21          0           0
                                             1 P00B                208066   35585810     16.84      17.25         25          2           0
                      Consumer               1 P000                213129   35612973     17.25      17.26         82         57           0
                                             1 P001                200516   33570586     16.23      16.27         84         55           0
                                             1 P002                203395   33950449     16.46      16.45         83         56           0
                                             1 P003                205458   34235575     16.63      16.59         82         54           0
                                             1 P004                204111   33999932     16.52      16.48        581        555           0
                                             1 P005                209211   34971669     16.93      16.95        580        553           0
 
                    1 Producer               1 P000                     2        286     16.67      16.67        422        149           0
                                             1 P001                     2        286     16.67      16.67        398        130           0
                                             1 P002                     2        286     16.67      16.67        405        128           0
                                             1 P003                     2        286     16.67      16.67        437        161           0
                                             1 P004                     2        286     16.67      16.67        406        116           0
                                             1 P005                     2        286     16.67      16.67        440        148           0
                      Consumer               1 QC                      12       1716    100.00     100.00        242        111           0
 
 
 
19 rows selected.

Every consumer receives and inserts roughly 200,000 rows – it’s a totally fair parallel DML. Timings are pretty irrelevant for such a small data set but the excution time did drop from 7 seconds to 4 seconds when parallelism was working “properly”.

I’ve tested this script on 12.2.0.1 and 19.3.0.0 – the same anomaly appears in both versions though it might be worth noting that the strange skew in the waits and timeouts doesn’t appear in 19.3.0.0.

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

间隔分区（翻译）

评论