使用增量统计信息高效维护分区表的统计信息--2

原创赵勇 2022-08-04

672

介绍

本文涵盖有关当你使用分区交换加载（Partition Exchange Load PEL)时，如何高效地管理优化器统计信息的内容。该技巧用于有大量数据需要加载，而性能又是至关重要时。其多用于决策支持系统和有大量操作数据的存储。

确保你已经看过第一部分, 或者至少你熟悉增量统计信息的概念，以便你了解在分区表的语境下，何为synopsis.

分区交换加载

大多数人会比解熟悉分区交换加载，但我会简要汇总，介绍一下我将在本文中使用的术语。
下图代表了处理过程。首先，LOAD表填充有新数据，它会和“实时”应用表（SALES)中的一个分区做交换。SALES表中有第一季度和第二季度的分区（Q1和Q2),而LOAD与空的Q2分区做交换。交换的效果是通过将LOAD的“标识”与Q2交换，将LOAD中的所有数据合并到SALES中.交换是一个逻辑操作：改变发生在数据库字典，没有数据被移动。LOAD中的数据“在交换的瞬间”被发布到SALES中。

图片.png

典型的，交换的步骤看起来是这样的：

alter table SALES exchange partition Q2 with table LOAD
including indexes without validation;

操作上，这个方法比直接插入数据到SALES中更复杂一些，但是他有一些优势。比如，在该表上创建任何索引前，新的数据可以被插入到LOAD中。如果数据量很大，在加载过程结束后创建索引是非常高效的，从而避免了加载过程中需要面临的较高的索引维护成本。如果以非常高的并行速率加载数据，则性能收益将尤其明显。

你需要为分区交换加载所做的具体操作步骤，取决于你所使用的分区类型，而不管表上是本地的还是全局的索引，以及正在使用的是什么约束。在这篇博文中，我将重点介绍如何管理统计信息，但你可以从Database VLDB and Partitioning Guide中找到，如何在带有索引和约束时进行处理的详细说明。

当新的数据被加载到表中，优化器统计信息必须被更新，以便将这些新数据考虑在内。在上例中，SALES表的全局统计信息必须要更新，以便反映LOAD与Q2交换时合并到表中的数据。为了使这个步骤尽可能高效，SALES表必须使用增量统计信息维护。我期待你已经从本文的标题中猜到了，我的假设将从现在开始，我还会假设SALES表的统计信息在分区交换加载前是最新的。

Oracle Database 11g

在LOAD与Q2交换后的时刻，是没有synopsis在Q2中的；它还没有被创建。增量统计信息需要synopses来高效地更新SALES的全局统计信息，所以，Q2中的synopsis将在收集SALES表的统计信息时被自动创建。例如：

EXEC dbms_stats.gather_table_stats(ownname=>null,tabname=>'SALES')

Q2上的统计信息将被收集，synopsis也将被创建，以便SALES表上的全局统计信息可以被更新。一旦交换已经发生，Q2就需要刷新统计信息、synopsis以及还可能会有的扩展统计信息和直方图（如果SALES上有的话）。比如，如果SALES上有一个列组"（COL1,COL2)",那么Q2也需要这些统计信息。数据库会自动的处理它，因为他们会在SALES表的统计信息被收集时为你创建，因此，你不需要在交换前就在LOAD表上创建直方图和扩展统计信息。

然而有一种场景，你可能会希望在交换前就收集LOAD表上的统计信息。比如，在SALES表被收集统计信息之前，Q2就可能被查询，那么你可能希望交换一完成，就确保Q2上是有可用的统计信息。由于任何在LOAD表上收集的统计信息都将与交换后的Q2关联起来，所以，这也是容易做到的。然而，请牢记，这意味着最终新数据的统计信息会被收集两次：一次在交换前（在LOAD表上），一次在交换后（SALES表上的统计信息被重新收集时对Q2分区的采集）。Oracle Database 12c给了你一个交互选项，我会在下面介绍它。
如果你想进一步了解扩展统计信息和列的使用，请查阅这篇博文。它介绍了你要如何识别，能从使用扩展统计中受益的种子列的使用情况。注意，为了帮助识别是可以从直方图中受益的列，有些列的使用情况信息是一直被捕获的。即便你不选择种子列，也会发生。

Oracle Database 12c

Oracle Database 12c包含一个允许你在交换前，在LOAD表上创建synopsis的增强。这意味着，只要交换发生，是不需要在交换后，在Q2上收集统计信息，其synopsis就已就绪了。其结果就是SALES表上的全局统计信息，在Oracle Database 12c上的刷新比在Oracle Database 11g上更快。以下是在交换前，如何在LOAD表上做准备：

begin
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE');
   dbms_stats.gather_table_stats (null,'load');
end;
/

只要交换完成就，Q2就将会有一个全新的统计信息和synopsis。故事还没有结束，除非你在交换前，已经在LOAD表上创建了适当的直方图和扩展统计信息，否则Q2上的统计信息会在交换后再次收集（当收集SALES上的统计信息时）。如果你想看一下表上都有什么，GitHub上的list_s.sql脚本可以显示特定表上的扩展统计信息和直方图。如果你使用METHOD_OPT，来指定在SALES表上具体创建什么样的直方图，那么你可以使用同样的METHOD_OPT来收集LOAD表上的统计信息。比如：

设置表上的偏好参数

dbms_stats.set_table_prefs(
    ownname=>null,
    tabname=>'SALES',
    method_opt=>'for all columns size 1 for columns sales_area size 254');

然后

   dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE');

   select dbms_stats.create_extended_stats(null,'load','(col1,col2)') from dual;
   dbms_stats.gather_table_stats(
         ownname=>null,
         tabname=>'LOAD',
         method_opt=>'for all columns size 1 for columns sales_area size 254');

或者，如果你使用的缺省值’FOR ALL COLUMNS SIZE AUTO’在SALES表上收集统计信息，那么通常最好保持自动并且交换时不在LOAD表上创建直方图。这会允许SALES表上的统计信息收集识别出交换后的Q2分区需要什么样的直方图。如果SALES表上的列使用信息，显示出在Q2上没有直方图的列，可能会从直方图中受益，则交换后的Q2上的统计信息会被收集。而且，如上所述，扩展统计信息也将是自动维护的。

步骤摘要

如果你正在使用Oracle Database 12c，并且在带有适当的直方图和扩展统计信息的LOAD表上创建了synopsis,那么你就可以最小化SALES(交换后）表的统计信息收集时间。对于Oracle Database 11g，统计信息总是会在Q2上完成交换后被收集。以下是操作步骤（请注意，我贴出的统计信息维护步骤，未包含维护索引和约束等）：

1、创建LOAD表并插入新数据 (或者使用 CREATE TABLE load AS SELECT…)
2、为SALES表创建一个新的空分区（Q2)
3、对LOAD表填充数据
4、可选 (Oracle Database 12c) - 如果你想Q2在交换后就立即拥有有效的统计信息，按下面的步骤操作：
  4.1 LOAD表设置INCREMENTAL 为 'TRUE' ， INCREMENTAL_LEVEL 为 'TABLE' 
  4.2 参照SALES表，在LOAD表上创建扩展统计信息
  4.3 参照SALES表上的METHOD_OPT参数收集LOAD表上的直方图
5、可选 (Oracle Database 11g) - 如果你想Q2在交换后就立即拥有有效的统计信息，按下面的步骤操作：
  5.1 参照SALES表在LOAD表上创建扩展统计信息
  5.2 参照SALES表上的METHOD_OPT参数收集LOAD表上的直方图
6、交换LOAD表和Q2分区(这会交换Oracle Database 12c中的synopses,基本的列统计信息和直方图）
7、为SALES表收集统计信息。如果你实施了上面的步骤4，Oracle Database 12c会完成得更快。

如果过去你曾经使用过分区交换并使用特定的方式收集统计信息，那么你可能需要比较表级直方图与分区和子分区上的直方图，是符合你的预期的。我在Github上包含了一个脚本来帮助你做这个比较。

复合分区表

如果你正在使用一个复合分区表，使用上面描述的同样方式进行分区交换。如果你想用一个完整的例子来体验，我创建了一个名为example.sql的脚本.

原文标题：Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 2
原文链接：https://blogs.oracle.com/optimizer/post/efficient-statistics-maintenance-for-partitioned-tables-using-incremental-statistics-part-2
原文作者：Nigel Bayliss

原文内容：

Introduction

This post covers how you can manage optimizer statistics efficiently when you use partition exchange load (PEL). This technique is used when large volumes of data must be loaded and maximum performance is paramount. It’s common to see it used in decision support systems and large operational data stores.

Make sure you’ve taken a look at Part 1, or you are at least familiar with the concept of incremental statistics so that you know what a synopsis is in the context of a partitioned table.

Partition Exchange Load

Most of you will be familiar with partition exchange load, but I’ll summarize it briefly to introduce you to the terminology I’ll be using here.

The graphic below represents the process. Firstly, the LOAD table is filled with new data and then exchanged with a partition in the “live” application table (SALES). SALES has partitions for quarter 1 and quarter 2 (Q1 and Q2) and LOAD is exchanged with the empty Q2 partition. The effect of the exchange is to incorporate all of the data in LOAD into SALES by swapping the “identity” of LOAD with Q2. The exchange is a logical operation: a change is made in the Oracle data dictionary and no data is moved. The data in LOAD is published to SALES “at the flick of a switch”.

图片.png

Typically, the exchange step looks like this:

alter table SALES exchange partition Q2 with table LOAD
including indexes without validation;

alter table SALES exchange partition Q2 with table LOAD
including indexes without validation;

Operationally, this approach is more complex than inserting data directly into SALES but it offers some advantages. For example, new data can be inserted into LOAD before any indexes have been created on this table. If the volume of data is large, creating indexes at the end of the load is very efficient and avoids the need to bear the higher cost of index maintenance during the load. The performance benefit is especially impessive if data is loaded at very high rates in parallel.

The exact steps you need to execute for a partition exchange load will vary depending on the type of partitioning you use, whether there are local or global indexes on the table and what constraints are being used. For the purposes of this blog post I’m going to stick to how you manage statistics, but you can find details on how to deal with indexes and constraints in the Database VLDB and Partitioning Guide.

When new data is loaded into a table, optimizer statistics must be updated to take this new data into account. In the example above, the global-level statistics for SALES must be refreshed to reflect the data incorporated into the table when LOAD is exchanged with Q2. To make this step as efficient as possible SALES must use incremental statistics maintenance. I expect you’ll have guessed from the title of this post that I’m going to assume that from now on! I’m also going to assume that the statistics on SALES are up-to-date prior to the partition exchange load.

Oracle Database 11g

The moment after LOAD has been exchanged with Q2 there will be no synopsis on Q2; it won’t have been created yet. Incremental statistics requires synopses to update the global-level statistics for SALES efficiently so a synopsis for Q2 will be created automatically when statistics on SALES are gathered. For example:

EXEC dbms_stats.gather_table_stats(ownname=>null,tabname=>'SALES')

EXEC dbms_stats.gather_table_stats(ownname=>null,tabname=>‘SALES’)

Statistics will be gathered on Q2 and the synopsis will be created so that the global-level statistics for SALES will be updated. Once the exchange has taken place, Q2 will need fresh statistics and a synopsis and it might also need extended statistics and histograms (if SALES has them). For example, if SALES has a column group, “(COL1, COL2)” then Q2 will need these statistics too. The database takes care of this automatically, so there’s no requirement to create histograms and extended column statistics on LOAD prior to the exchange because they are created for you when statistics are gathered on SALES.

There is nevertheless a scenario where you might want to gather statistics on LOAD prior to the exchange. For example, if it’s likely that Q2 will be queried before statistics have been gathered on SALES then you might want to be sure that statistics are available on Q2 as soon as the exchange completes. This is easy to do because any statistics gathered on LOAD will be associated with Q2 after the exchange. However, bear in mind that this will ultimately mean that statistics for the new data will be gathered twice: once before the exchange (on the LOAD table) and once again after the exchange (for the Q2 partition when SALES statistics are re-gathered). Oracle Database 12c gives you an alternative option, so I’ll cover that below.

If you want to know more about extended statistics and column usage then check out this post. It covers how you can seed column usage to identify where there’s a benefit in using extended statistics. Note that some column usage information is always captured to help identify columns that can benefit from histograms. This happens even if you don’t choose to seed column usage.

Oracle Database 12c

Oracle Database 12c includes an enhancement that allows you to create a synopsis on LOAD prior to the exchange. This means that a synopsis will be ready to be used as soon as the exchange has taken place without requiring statistics to be gathered on Q2 post-exchange. The result of this is that the global-level statistics for SALES can be refreshed faster in Oracle Database 12c than they can be in Oracle Database 11g. This is how to prepare the LOAD table before the exchange:

begin
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE');
   dbms_stats.gather_table_stats (null,'load');
end;
/

Q2 will have fresh statistics and a synopsis as soon as the exchange completes. This isn’t quite the end of the story though. Statistics on Q2 will be gathered again after the exchange (when statistics are gathered on SALES) unless you have created appropriate histograms and extended statistics on LOAD before the exchange. The list_s.sql script in GitHub displays extended statistics and histograms for a particular table if you want to take a look at what you have. If you are using METHOD_OPT to specify exactly what histograms to create on SALES then you can use the same METHOD_OPT for gathering statisitcs on LOAD. For example:

Table preference…

dbms_stats.set_table_prefs(
    ownname=>null,
    tabname=>'SALES',
    method_opt=>'for all columns size 1 for columns sales_area size 254');

Then…

   dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');
   dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE');

   select dbms_stats.create_extended_stats(null,'load','(col1,col2)') from dual;
   dbms_stats.gather_table_stats(
         ownname=>null,
         tabname=>'LOAD',
         method_opt=>'for all columns size 1 for columns sales_area size 254');

Alternatively, if you are using the default ‘FOR ALL COLUMNS SIZE AUTO’ to gather statistics on SALES, then it’s usually best to preserve automation and exchange without creating histograms on LOAD. This allows stats gathering on SALES to figure out what histograms are needed for Q2 post-exchange. Statistics on Q2 will be gathered post-exchange if SALES has column usage information indicating that there are columns in Q2 that don’t have a histogram but might benefit from having one. Also, as mentioned above, extended statistics will be maintained automatically too.

Summary of Steps

If you are using Oracle Database 12c then you can minimize the statistics gathering time for SALES (post-exchange) if you create a synopsis on LOAD along with appropriate histograms and extended statistics. For Oracle Database 11g, statistics will always be gathered on Q2 once the exchange has completed. Here are the steps (bearing in mind I’m sticking to statistics maintenance and not including steps to manage indexes and constraints etc):
1、Create LOAD table and insert new data (or CREATE TABLE load AS SELECT…)
2、Create a new (empty) partition for SALES (Q2)
3、Populate LOAD with data
4、Optionally (Oracle Database 12c) - follow these steps if you want Q2 to have valid statistics immediately after the exchange:
4.1 Set INCREMENTAL to ‘TRUE’ and INCREMENTAL_LEVEL to ‘TABLE’ for LOAD table
4.2 Create extended statistics on LOAD to match SALES
4.3 Gather statistics on LOAD using METHOD_OPT parameter to match histograms with SALES
5、Optionally (Oracle Database 11g) - follow these steps if you want Q2 to have valid statistics immediately after the exchange:
5.1 Create extended statistics on LOAD to match SALES
5.2 Gather statistics on LOAD using METHOD_OPT parameter to match histograms with SALES
6、Exchange LOAD with Q2 (this will exchange synopses in Oracle Database 12c, basic column statistics and histograms)
7、Gather statistics for SALES. Oracle Database 12c will complete this step more quickly if you implemented “4”, above.

If, in the past, you have used partition exchange load and gathered statistics in an ad-hoc manner then you should probably check that the histograms you have match your expectations when comparing table-level histograms with histograms on partitions and sub-partitions. I’ve included a script in GitHub to help you do that.

Composite Partitioned Tables

If you are using a composite partitioned table, partition exchange load works in the same way as described above. If you would like to experiment with a complete example, I’ve created a script called example.sql here.

分区表分区交换统计信息墨力翻译

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

使用增量统计信息高效维护分区表的统计信息--2

介绍

分区交换加载

Oracle Database 11g

Oracle Database 12c

步骤摘要

复合分区表

Introduction

Partition Exchange Load

Oracle Database 11g

Oracle Database 12c

Summary of Steps

Composite Partitioned Tables

评论