暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle分区表上的统计信息(三)

在我提交上一篇时,我就知道这不是我所希望的,而且在人们读完之前,我说得太多了。我知道这可能会增加人们对这一主题的困惑(有时,这也是我想避免的)。好在那时我准备了大量的例子,找寻并演示正确的观点。以正确的顺序,达到正确的深度。回想起来,这可能更像一个白皮书的主题,但我还是要继续下去。

总结一下到目前为止的情况,并强调几个关键点。。。
1)如我在第一篇贴子中所说,所有的示例均运行在Oracle 10.2.0.4上,但我认为你们会在我们现在还不能接触到的早期版本上,看到类似的行为。默认参数会不一样,但聚合的过程(译者注:从子分区向上级的分区和表级别汇总的过程)可以追溯到很久以前。这并不涉及任何11g的功能,至少目前还没有,以后也许会。(我仍然希望最好将10g放到贴子的题目中,就象我在Adaptive Thresholds的贴子中那样)

2)分区对象有两类不同的统计信息–描述整个表或分区的,以及其下子结构的全局统计信息和描述了单独分区和子分区的分区统计信息。此外,Oracle有能力通过聚合分区统计信息生成聚合的全局统计信息。但如在上一个贴子的演示所示,聚合的统计信息中,有部分的信息是可靠的,但也有部分是不可靠的,特别是唯一值的数量(NDV)。

3)因此,当你查看优化器的统计信息时,查看诸如GLOBAL_STATS、HIGH_VALUE、LOW_VALUE和NUM_DISTINCT列是最基本的。否则你可能想当然地认为你的统计信息比实际情况要好。我怀疑这正在我所管理的数据库上发生。说实话,除了那些可能读到本文的专家,你们中有多少人可以看一眼NUM_ROWS和LAST_ANALYZED列,就保证统计信息是OK的?在这方面,你们要小心一些,而格雷的贴子可以帮助你检查

4)在前两篇贴子中一个更令人困惑的方面,是其展示了两个完全不同的收集表上统计信息的策略。在第一篇贴子中涵盖了10.2中Oracle建议的默认行为,即将全局和分区统计数据收集到分区级别。第二篇贴子子中展示了完全不同的策略,即在表和分区级别上不收集统计信息,而是收集子分区的统计信息,并向上聚合到更高的级别,以便试图减少统计信息收集的活动性。而在下一下贴子中,我们会展示故意使用错误的方式混合使用这两种策略,带来了怎样的麻烦。

5)如果我每次都展示所有的示例,这将会是一个很长的贴子,所以,如果你想查看表定义等诸如此类的内容,请参考之前的贴子。毕竟这是一个系列贴;-) 最后,我也许会整理这些并将其发表在一个脚本中,并展示各种示例。如果会这样做的话,也将是在Hotsos研讨会(译者注:每年召开一届的,有关安全科学的研讨会)之后。所以,不要期望太高。

希望这个小小的总结,可以帮助我们聚焦于我们工作中所面对的特定问题,以及我们正在寻找的一些选择(是的,伙计们,我读过其它的贴子,并且也知道这些选择,但是,我想用我的方法来研究,所以,给我点儿时间 😉)。也许会需要用两个贴子的篇幅。

我们的系统发生了什么问题?要知道,我们传统上只在子分区级别上收集,让Oracle来聚合它们,来生成表和分区的统计信息(例如第二篇贴子中展示的)。老实说,这不是我此前看到的,专门用于整个系统的一种策略。我足够幸运,至少发现了一些收集分区统计信息的方法。首先,我会向你展示和模拟在新的一天,我们开始加载数据时会发生什么。,我们会先添加一个新的子分区(因为我们有一个子分区模板,所以,子分区也会创建),然后我会插入一些行到不同的,新的子分区中。(注:实际上,取决于表,我们会使用插入和分区交换两种方法,但我稍后使用的是分区交换)


SQL> ALTER TABLE TEST_TAB1
  2  ADD  PARTITION P_20100208 VALUES LESS THAN (20100209);

Table altered.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 30000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 10000, 'Z');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 500, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 700, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'JUNE', 1200, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 600, 'P');

1 row created.

SQL> COMMIT;

Commit complete.

在这个阶段,新的分区或子分区上是没有统计信息的。并且在这之前的统计信息是相同的。(现在,我避免列上的统计信息,以使事情更加简洁)

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:09:55         27

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:09:55          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:09:55          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<截短输出>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                NO
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

现在,我将模拟当其中一个源数据加载完成,并且我们对加载数据的子分区上收集统计信息时所要发生的。你期望中的可能是哪一个?

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_GROT');

PL/SQL procedure successfully completed.

SQL> 
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO

SQL> 
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:11:39          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:11:39          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

哇噢,在表上聚合全局统计信息时发生了什么?这看起来像是从来没有收集统计信息。嗯,为何没有在新的分区上聚合统计信息?我可是刚刚在其中一个子分区上收集了统计信息。问题是Oracle只会在所有可以聚合的单元有统计信息时才会聚合统计信息。这里的问题正是如此,此时,P_20100208_GROT是P_20100208分区中唯一具有统计信息的,而其它的子分区还没有被收集。

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<output snipped>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                YES 23-FEB-2010 06:11:39          5
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

当然,这一切都完全符合设计,Oracle一直强调人们应该使用默认的“自动”粒度进行聚集,因此任何这样做的人都是自找麻烦,但现实是人们正在尝试这样做。看看Peter Scott的评论,你会发现以前有人遇到过这个问题!

一旦其他数据加载完成了,并且我们收集了其余子分区的统计信息后,再让我们看看统计数据。。。。

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_JUNE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_HALO');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_OTHERS');

PL/SQL procedure successfully completed.

SQL>
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:31:24         37

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:31:24          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:31:24          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100208                     NO  23-FEB-2010 06:31:24         10

9 rows selected.

看起来好多了。因此,如果你打算使用这种方法(我希望本系列的贴子可以帮助你确定这是有问题的),你应该只在所有子分区产生了数据的情况下收集子分区的统计数据,并同时收集它们。事实上,在这种情况下,为什么不在分区级别进行收集,并在分区上获得适当的全局统计信息呢?

相信我,还有更多可怕的事情将要发生。。。(点此浏览《Oracle分区表上的统计信息(四)》)

原文链接:http://oracledoug.com/serendipity/index.php?/archives/1565-Statistics-on-Partitioned-Tables-Part-3.html

原文内容:
As soon as I’d committed my last post, I knew it wasn’t what I’d hoped for and said as much to a couple of people before they’d read it. I knew it would probably just add to any confusion people already had about this subject (something I’m particularly keen to avoid) but I am awash with examples at the moment and trying to pick out the right points to illustrate, in the right order, to the right depth. This is probably more of a White Paper subject, in retrospect, but I’ll press on anyway.

To summarise where we are so far, though, and highlight a couple of key points …

  1. As I said in the first post, all of the examples to date are on Oracle 10.2.0.4 but I think you would see similar behaviour on earlier releases that I don’t have to hand right now. The default parameters would be different, but the aggregation process goes back a long way. i.e. This is not about any 11g features, at least not yet. That will come later. (I still wish I’d put 10g in the post titles, though, like I did with the Adaptive Thresholds posts.)

  2. Partitioned objects have two different kinds of stats - Global Statistics describe a Table or Partition as a whole, including all of it’s child structures and Partition Statistics describe individual partitions and subpartitions. In addition, Oracle has the capability to take Partition stats and aggregate them up to generate Aggregated Global Statistics. Hopefully the last post illustrated that some elements of aggregated stats seem reliable but some not, particularly Number of Distinct Values (NDV).

  3. So when looking at optimiser stats, it’s essential that you look at columns like GLOBAL_STATS and at the HIGH_VALUE, LOW_VALUE and NUM_DISTINCT columns or you might kid yourself into thinking that your stats are better than they really are. I suspect that’s what had happened at my current site. Be honest with yourself. Excluding those experts who might read this, how many of you have taken a quick glance at NUM_ROWS and LAST_ANALYZED columns to reassure yourself your stats are ok? You need to be careful with this stuff and Greg’s post can help you check.

  4. One of the more confusing aspects of the first two posts is that they showed completely different strategies to collecting stats on our tables. The first post covered the Oracle-recommended 10.2 default behaviour of gathering GLOBAL AND PARTITION stats down to the Partition level. The second post showed a completely different strategy we use on many tables of gathering no stats at all at the Table and Partition level, but gathering Subpartition statistics and having Oracle aggregate them up to the higher levels in an attempt to reduce stats gathering activity. That was deliberate, as in the next post I’m going to show you how these two strategies combined in the wrong way can cause trouble.

  5. If I posted all of the examples each time, it would become a pretty long post, so I’m going to ask you to refer back to earlier posts if you want to check table definitions and the like. This is a series after all 😉 At the end, I might try to tidy everything up and post it all in one script, showing the various examples. Should that ever happen, it will be after the Hotsos Symposium. i.e. Don’t hold your breath.

Hopefully that little summary will help us move on to the specific problem that we faced at work and some of the options we’re looking at (because, yes folks, I have read other posts and do know some of the options but I’m trying to work my way through them here. Give it time 😉). It might take two posts though.

What went wrong on our current system? Remember that we have traditionally gathered purely at the SUBPARTITION level and allowed Oracle to aggregate those to generate the TABLE and PARTITION stats (i.e. the approach shown in post 2). I’ll be honest that it wasn’t a strategy I’d seen used almost exclusively across a system before. I’d been lucky enough to find some way of gathering PARTITION stats at the very least. First I’ll show you a simulation of what would happen when we start loading data for a new day. First we add a new subpartition (and because we have a subpartition template, the subpartitions are created too) then I’ll insert some rows into the different new subpartitions. (Note that we actually use both inserts and partition exchange, depending on the table, but I’ll deal with partition exchange later.)


SQL> ALTER TABLE TEST_TAB1
  2  ADD  PARTITION P_20100208 VALUES LESS THAN (20100209);

Table altered.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 30000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 10000, 'Z');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'GROT', 2400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 500, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'HALO', 700, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'JUNE', 1200, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100208, 'WINE', 600, 'P');

1 row created.

SQL> COMMIT;

Commit complete.

At this stage there are no stats on the new partition or subpartitions and all of the previous stats look the same. (I’ll make things more succinct by avoiding the column stats for now.)

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:09:55         27

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:09:55          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:09:55          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:09:55          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<output snipped>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                NO
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

Which is probably what you expected. Now I’m going to simulate what would happen when one of the source data feeds complete and we’ll gather stats on that subpartition now that the data is loaded.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_GROT');

PL/SQL procedure successfully completed.

SQL> 
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO

SQL> 
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:11:39          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:11:39          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:11:39          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:11:39          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:11:39          0
TEST_TAB1                      P_20100208                     NO

9 rows selected.

Woah! What happened to our Aggregated Global Stats on the TABLE? It looks like it’s never had statistics at all! Oh, and why are there no Aggregated Stats on the new partition either, given that I just gathered stats for one of it’s subpartitions? Well the problem is that Oracle will only aggregate statistics when all of the components have stats that can be aggregated. The problem here is that, at this stage, P_20100208_GROT is the only subpartition of P_20100208 that has stats. The others haven’t been gathered yet.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100131_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100201_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_JUNE                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100201_OTHERS              YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_GROT                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_HALO                YES 23-FEB-2010 06:09:55          2
TEST_TAB1                      P_20100202_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100202_OTHERS              YES 23-FEB-2010 06:09:55          0

<<output snipped>>

TEST_TAB1                      P_20100207_GROT                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_HALO                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_JUNE                YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100207_OTHERS              YES 23-FEB-2010 06:09:55          0
TEST_TAB1                      P_20100208_GROT                YES 23-FEB-2010 06:11:39          5
TEST_TAB1                      P_20100208_HALO                NO
TEST_TAB1                      P_20100208_JUNE                NO
TEST_TAB1                      P_20100208_OTHERS              NO

36 rows selected.

Of course, this is all behaving exactly as designed, Oracle keep emphasising that people should gather using the default granularity of ‘AUTO’ and so anyone who does this is asking for trouble, but the reality is that people are trying this. Look no further than Peter Scott’s comment to see that someone else has come across this before now!

Let’s look at the stats once the other data feeds complete and we gather the rest of the subpartition stats …

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_JUNE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_HALO');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', 
                                        partname => 'P_20100208_OTHERS');

PL/SQL procedure successfully completed.

SQL>
SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from dba_tables
  3  where table_name='TEST_TAB1'
  4  and owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      NO  23-FEB-2010 06:31:24         37

SQL>
SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from dba_tab_partitions
  3  where table_name='TEST_TAB1'
  4  and table_owner='TESTUSER'
  5  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100201                     NO  23-FEB-2010 06:31:24          8
TEST_TAB1                      P_20100202                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100203                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100204                     NO  23-FEB-2010 06:31:24          4
TEST_TAB1                      P_20100205                     NO  23-FEB-2010 06:31:24          2
TEST_TAB1                      P_20100206                     NO  23-FEB-2010 06:31:24          7
TEST_TAB1                      P_20100207                     NO  23-FEB-2010 06:31:24          0
TEST_TAB1                      P_20100208                     NO  23-FEB-2010 06:31:24         10

9 rows selected.

That looks much better. So, if you are going to use this approach (and I hope this series of blogs helps you decide it’s questionable) you should only gather stats on the subpartitions when you have all of the subpartitions populated and gather them all at the same time. In fact, in that case, why not just gather at the PARTITION level and get proper Global Statistics on your partitions?

Believe me, there are more horrors to come …

最后修改时间:2021-12-18 15:56:32
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论