暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

oracle直方图——FREQUENCY&HEIGHT BALANCED

原创 Leo 2023-07-16
353

文档课题:oracle直方图——FREQUENCY&HEIGHT BALANCED.

1、理论知识

当执行SQL语句时CBO如何选择执行计划?此时就不得不提及数据库统计信息,其帮助CBO做出正确判断,而统计信息包括多方面内容,接下来重点讨论直方图,因为通过直方图CBO可知一个列中不同值所占总数的比例.

Oracle的直方图有两种:

频率直方图(FREQUENCY HISTOGRAM):当列中Distinct_keys较少(小于254),若不手工指定直方图桶数(BUCKET),Oracle会自动创建频率直方图,并且桶数(BUCKET)等于Distinct_Keys.

高度平衡直方图(HEIGHT BALANCED):当列中Distinct_keys大于254,若不手工指定直方图桶数(BUCKET),Oracle就会自动创建高度平衡直方图.

2、实验测试

测试数据库:oracle 11.2.0.4

2.1、建测试表及收集统计信息

--建测试表并收集所有列直方图的统计信息.

LEO@orcl> create table test as select * from dba_objects;

 

Table created.

 

SYS@orcl> begin

  dbms_stats.gather_table_stats('LEO',

                                'TEST',

                                degree           => 1,

                                estimate_percent => 100,

                                method_opt       => 'for all columns size skewonly',

                                no_invalidate    => false,

                                cascade          => true);

  9  end;

 10  /

 

PL/SQL procedure successfully completed.

 

--查统计信息搜集情况

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct / b.num_rows * 100, 2) selectivity,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = 'LEO'

   and a.table_name = 'TEST';

 

COLUMN_NAME       NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS

--------------- ---------- ----------- ----------- --------------- -----------

OWNER                91175          24         .03 FREQUENCY                24

OBJECT_NAME          91175       54521        59.8 HEIGHT BALANCED         254

SUBOBJECT_NAME       91175         105         .12 FREQUENCY               105

OBJECT_ID            91175       91175         100 NONE                      1

DATA_OBJECT_ID       91175        8622        9.46 HEIGHT BALANCED         254

OBJECT_TYPE          91175          44         .05 FREQUENCY                44

CREATED              91175        1061        1.16 HEIGHT BALANCED         254

LAST_DDL_TIME        91175        1251        1.37 HEIGHT BALANCED         254

TIMESTAMP            91175        1224        1.34 HEIGHT BALANCED         254

STATUS               91175           1           0 FREQUENCY                 1

TEMPORARY            91175           2           0 FREQUENCY                 2

 

COLUMN_NAME       NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS

--------------- ---------- ----------- ----------- --------------- -----------

GENERATED            91175           2           0 FREQUENCY                 2

SECONDARY            91175           2           0 FREQUENCY                 2

NAMESPACE            91175          20         .02 FREQUENCY                20

EDITION_NAME         91175           0           0 NONE                      0

 

15 rows selected.

 

说明:如上查询可知owner行的distinct_keys为24,,小于254则默认的桶数为24,属于频率直方图.而object_name行的distinct_keys为54521,大于254则默认的桶数为254,属于高度平衡直方图.

 

2.2、相关测试

2.2.1、有直方图的情况

2.2.1.1、列中重复值高的情况

--测试直方图作用,先在owner列创建索引

LEO@orcl> create index idx_test_owner on test(owner);

 

Index created.

 

--查看SQL语句执行计划

LEO@orcl> set autotrace traceonly

LEO@orcl> select * from test where owner='SYS';

 

40208 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1357081020

 

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 40208 |  3848K|   364   (1)| 00:00:05 |

|*  1 |  TABLE ACCESS FULL| TEST | 40208 |  3848K|   364   (1)| 00:00:05 |

--------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("OWNER"='SYS')

 

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

       3956  consistent gets

          0  physical reads

          0  redo size

    2044704  bytes sent via SQL*Net to client

      29999  bytes received via SQL*Net from client

       2682  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      40208  rows processed

 

注意:owner列虽然存在索引,但此处为什么不走索引呢?因为表test共有91175条数据,而owner='SYS'有40208条数据,若此时走索引首先要在索引上找到对应的rowid,然后再通过rowid回表,因此消耗的资源反而更大.此处全表扫描的consistent gets为3956,再看执行计划rows列id=1 CBO预估返回40208行,而最终结果确实也是返回40208行.

 

2.2.1.2、列中重复值低的情况

--以下测试列中重复值低时存在索引的情况

LEO@orcl> select * from test where owner='SCOTT';

 

8 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 510444911

 

----------------------------------------------------------------------------------------------

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |     8 |   784 |     2   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     8 |   784 |     2   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | IDX_TEST_OWNER |     8 |       |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"='SCOTT')

 

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

          6  consistent gets

          1  physical reads

          0  redo size

       2273  bytes sent via SQL*Net to client

        519  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          8  rows processed

 

注意:执行select * from test where owner= 'SCOTT'语句时CBO选择走索引,因为owner='SCOTT'只有8行,此时走索引只需拿到8行rowid然后回表,比起全表扫描大大节省资源,提高查询速度.

 

2.2.2、无直方图的情况

2.2.2.1、删除直方图

--为进一步说明直方图作用,现将其删除.

begin

  dbms_stats.gather_table_stats('LEO',

                                'TEST',

                                estimate_percent => 100,

                                method_opt       => 'for all columns size 1',

                                no_invalidate    => false,

                                degree           => 1,

                                cascade          => true);

end;

/

--查看直方图信息.

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct / b.num_rows * 100, 2) selectivity,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = 'LEO'

   and a.table_name = 'TEST';

 

COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS

------------------------------ ---------- ----------- ----------- --------------- -----------

OWNER                               91175          24         .03 NONE                      1

OBJECT_NAME                         91175       54521        59.8 NONE                      1

SUBOBJECT_NAME                      91175         105         .12 NONE                      1

OBJECT_ID                           91175       91175         100 NONE                      1

DATA_OBJECT_ID                      91175        8622        9.46 NONE                      1

OBJECT_TYPE                         91175          44         .05 NONE                      1

CREATED                             91175        1061        1.16 NONE                      1

LAST_DDL_TIME                       91175        1251        1.37 NONE                      1

TIMESTAMP                           91175        1224        1.34 NONE                      1

STATUS                              91175           1           0 NONE                      1

TEMPORARY                           91175           2           0 NONE                      1

 

COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS

------------------------------ ---------- ----------- ----------- --------------- -----------

GENERATED                           91175           2           0 NONE                      1

SECONDARY                           91175           2           0 NONE                      1

NAMESPACE                           91175          20         .02 NONE                      1

EDITION_NAME                        91175           0           0 NONE                      0

 

15 rows selected.

注意:HISTOGRAM为'NONE',直方图均被删除.

2.2.2.2、列中重复值高的情况

LEO@orcl> select * from test where owner='SYS';

 

40208 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 510444911

 

----------------------------------------------------------------------------------------------

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |  3799 |   363K|   111   (0)| 00:00:02 |

|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |  3799 |   363K|   111   (0)| 00:00:02 |

|*  2 |   INDEX RANGE SCAN          | IDX_TEST_OWNER |  3799 |       |     9   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"='SYS')

 

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

       6436  consistent gets

          0  physical reads

          0  redo size

    4606034  bytes sent via SQL*Net to client

      29999  bytes received via SQL*Net from client

       2682  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      40208  rows processed

 

注意:删除直方图后再来执行select * from test where owner='SYS',发现此时CBO选择走索引,原本走全表扫描更优.为什么此时选择走索引呢?执行计划显示rows列是3799,CBO认为只需返回3799行数据,从91175中选择3799行数据,CBO必然认为走索引优于走全表扫描.可以看到现在的consistent gets为6436,大于之前走全表扫描的3956.

 

2.2.2.3、列中重复值低的情况

LEO@orcl> select * from test where owner='SCOTT';

 

8 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 510444911

 

----------------------------------------------------------------------------------------------

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |  3799 |   363K|   111   (0)| 00:00:02 |

|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |  3799 |   363K|   111   (0)| 00:00:02 |

|*  2 |   INDEX RANGE SCAN          | IDX_TEST_OWNER |  3799 |       |     9   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"='SCOTT')

 

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

          6  consistent gets

          0  physical reads

          0  redo size

       2273  bytes sent via SQL*Net to client

        519  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          8  rows processed

 

说明:删除直方图后select * from test where owner='SCOTT'依然走索引,rows列依然是3799行,最终真正返回的是8行,原来在oracle数据库中,CBO会默认目标列的数据在最小值LOW_VALUE和最大值HIGH_VALUE之间是均匀分布的,并且会按照均匀分布原则来计算对目标列施加查询条件后的可选择率以及结果集的cardinality,进而来计算成本值并选择执行计划.

 

3、绑定变量窥探

收集直方图可能会导致绑定变量窥探(该问题在9i、10g几乎没什么好办法解决,从11g引进自适应游标,解决绑定变量窥探)

说明:如下测试数据库版本为ORACLE 10.2.0.1.0,主要进行绑定变量的相关测试.

3.1、建测试表及搜集统计信息

LEO@orcl> create table test as select * from dba_objects;

Table created.

 

SYS@orcl> exec dbms_stats.gather_table_stats(ownname => 'LEO',tabname => 'TEST',method_opt => 'for all columns size skewonly');

 

PL/SQL procedure successfully completed.

 

--查询统计信息及直方图

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct / b.num_rows * 100, 2) selectivity,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = 'LEO'

   and a.table_name = 'TEST';

 

COLUMN_NAME                      NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM       NUM_BUCKETS

------------------------------ ---------- ----------- ----------- --------------- -----------

OWNER                               49745          15         .03 FREQUENCY                15

OBJECT_NAME                         49745       29770       59.85 HEIGHT BALANCED         254

SUBOBJECT_NAME                      49745          46         .09 FREQUENCY                46

OBJECT_ID                           49745       49745         100 NONE                      1

DATA_OBJECT_ID                      49745        3620        7.28 HEIGHT BALANCED         254

OBJECT_TYPE                         49745          29         .06 FREQUENCY                29

CREATED                             49745        1022        2.05 HEIGHT BALANCED         254

LAST_DDL_TIME                       49745        1048        2.11 HEIGHT BALANCED         254

TIMESTAMP                           49745        1033        2.08 HEIGHT BALANCED         254

STATUS                              49745           1           0 FREQUENCY                 1

TEMPORARY                           49745           2           0 FREQUENCY                 2

GENERATED                           49745           2           0 FREQUENCY                 2

SECONDARY                           49745           1           0 FREQUENCY                 1

 

13 rows selected.

--建索引

LEO@orcl> create index idx_test_owner on test(owner);

 

Index created.

 

3.2、无绑定变量情况

LEO@orcl> set autotrace traceonly

LEO@orcl> select * from test where owner='SYS';

 

22902 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1357081020

 

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 23093 |  2097K|   157   (2)| 00:00:02 |

|*  1 |  TABLE ACCESS FULL| TEST | 23093 |  2097K|   157   (2)| 00:00:02 |

--------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("OWNER"='SYS')

 

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       2194  consistent gets

          0  physical reads

          0  redo size

    1148025  bytes sent via SQL*Net to client

      17255  bytes received via SQL*Net from client

       1528  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      22902  rows processed

说明:此时执行计划为全表扫描.

3.3、有绑定变量情况

--如下测试使用绑定变量的场景

LEO@orcl> variable v_name varchar2(10);

LEO@orcl> exec :v_name :='SCOTT';

 

PL/SQL procedure successfully completed.

 

LEO@orcl> select * from test where owner=:v_name;

 

6 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 510444911

 

----------------------------------------------------------------------------------------------

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |  3316 |   301K|    95   (0)| 00:00:02 |

|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |  3316 |   301K|    95   (0)| 00:00:02 |

|*  2 |   INDEX RANGE SCAN          | IDX_TEST_OWNER |  3316 |       |     8   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"=:V_NAME)

 

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          5  consistent gets

          0  physical reads

          0  redo size

       1870  bytes sent via SQL*Net to client

        469  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          6  rows processed

--现在将绑定变量改为SYS

LEO@orcl> exec :v_name:='SYS';

 

PL/SQL procedure successfully completed.

 

LEO@orcl> select * from test where owner=:v_name;

 

22902 rows selected.

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 510444911

 

----------------------------------------------------------------------------------------------

| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                |  3316 |   301K|    95   (0)| 00:00:02 |

|   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |  3316 |   301K|    95   (0)| 00:00:02 |

|*  2 |   INDEX RANGE SCAN          | IDX_TEST_OWNER |  3316 |       |     8   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"=:V_NAME)

 

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       3622  consistent gets

         45  physical reads

          0  redo size

    2525237  bytes sent via SQL*Net to client

      17255  bytes received via SQL*Net from client

       1528  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      22902  rows processed

 

说明:如上所示使用绑定变量后当选择SYS用户时,即使有收集直方图,CBO也没有选择走全表扫描而是走索引,执行计划明显不合理.

4、直方图搜集准则

收集直方图并非一定会带来好处,注意以下两点:

A、如果目标列的数据是均匀分布的,比如主键列,有唯一索引的列,则不需要对此类列收集直方图的统计信息;

B、对于从来没有在where条件中出现的列,不管其数据分布是否均匀,都无须对其收集直方图统计信息.

 

参考文档:https://blog.csdn.net/cuiwangxie1183/article/details/100483109?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168932555516800180625844%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168932555516800180625844&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-2-100483109-null-null.268^v1^koosearch&utm_term=%E7%9B%B4%E6%96%B9%E5%9B%BE&spm=1018.2226.3001.4450

最后修改时间:2023-07-16 15:40:29
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论