Oracle 混合直方图

askTom 2016-08-18

309

问题描述

团队，

正在从文档中阅读关于混合直方图的信息

http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL372

demo@ORA12C> create table t as select * from all_objects;

Table created.

demo@ORA12C> column owner format a20
demo@ORA12C> select owner,count(*)
  2  from t
  3  group by owner
  4  order by 2 desc;

OWNER                  COUNT(*)
-------------------- ----------
SYS                       37781
PUBLIC                    37076
ORDSYS                     3146
MDSYS                      1437
APEX_040200                1189
XDB                         378
CTXSYS                      275
WMSYS                       173
LBACSYS                     150
DVSYS                        96
GSMADMIN_INTERNAL            48
OE                           45
SYSTEM                       31
DEMO                         30
OLAPSYS                      20
DVF                          19
DBSNMP                       15
RAJESH                       15
ORDPLUGINS                   10
OJVMSYS                       8
OUTLN                         8
ORACLE_OCM                    6
ORDDATA                       5
HR                            4
SCOTT                         4
PM                            3
IX                            1
FLOWS_FILES                   1

28 rows selected.

demo@ORA12C> select histogram,sample_size
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='OWNER';

HISTOGRAM       SAMPLE_SIZE
--------------- -----------
NONE                  81974

1 row selected.

demo@ORA12C> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for all columns size auto for columns owner size 3');
  6  end;
  7  /

PL/SQL procedure successfully completed.

demo@ORA12C> select histogram,sample_size
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='OWNER';

HISTOGRAM       SAMPLE_SIZE
--------------- -----------
HYBRID                 5655

1 row selected.

demo@ORA12C> column endpoint_actual_value format a15
demo@ORA12C> column endpoint_number format 9999999999
demo@ORA12C> column endpoint_repeat_count format 9999999999
demo@ORA12C> select endpoint_actual_value,endpoint_number,endpoint_repeat_count
  2  from user_tab_histograms
  3  where table_name ='T'
  4  and column_name ='OWNER' ;

ENDPOINT_ACTUAL ENDPOINT_NUMBER ENDPOINT_REPEAT_COUNT
--------------- --------------- ---------------------
APEX_040200                  84                    84
PUBLIC                     2996                  2524
XDB                        5655                    29

3 rows selected.

demo@ORA12C> set autotrace on explain
demo@ORA12C> select count(*) from t where owner ='APEX_040200';

  COUNT(*)
----------
      1189

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   439  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1681 | 10086 |   439  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='APEX_040200')

demo@ORA12C> select count(*) from t where owner ='PUBLIC';

  COUNT(*)
----------
     37076

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   439  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 36588 |   214K|   439  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='PUBLIC')

demo@ORA12C> select count(*) from t where owner ='XDB';

  COUNT(*)
----------
       378

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   439  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1681 | 10086 |   439  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='XDB')

demo@ORA12C> set autotrace off
demo@ORA12C>
demo@ORA12C>
demo@ORA12C>
demo@ORA12C> @printtbl 'select * from user_tab_col_statistics where table_name ="T" and column_name ="OWNER"'
TABLE_NAME                    : "T"
COLUMN_NAME                   : "OWNER"
NUM_DISTINCT                  : "28"
LOW_VALUE                     : "415045585F303430323030"
HIGH_VALUE                    : "584442"
DENSITY                       : ".035714"
NUM_NULLS                     : "0"
NUM_BUCKETS                   : "3"
LAST_ANALYZED                 : "18-aug-2016 18:03:00"
SAMPLE_SIZE                   : "5655"
GLOBAL_STATS                  : "YES"
USER_STATS                    : "NO"
NOTES                         : ""
AVG_COL_LEN                   : "6"
HISTOGRAM                     : "HYBRID"
SCOPE                         : "SHARED"
-----------------

PL/SQL procedure successfully completed.

demo@ORA12C> select num_rows from user_tables where table_name ='T' ;

  NUM_ROWS
----------
     81974

1 row selected.

demo@ORA12C>
demo@ORA12C> set autotrace on explain
demo@ORA12C> select count(*) from t where owner ='SYS';

  COUNT(*)
----------
     37781

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   439  (13)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1681 | 10086 |   439  (13)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='SYS')

demo@ORA12C> set autotrace off
demo@ORA12C>

问题:
1) 为什么优化器只选择这三个值 (APEX_040200，PUBLIC，XDB) 来创建直方图？这背后的逻辑是什么？(我当时很想像它的工作原理就像TOP-N频率直方图一样-仅考虑创建直方图的Top-N值-但事实并非如此)
2) 为什么创建直方图后sample_size减小了？
3) 文档有这个
http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL95039

....
The endpoint repeat count shows the number of times the highest value in the bucket is repeated. By using the endpoint number and repeat count for these values, the optimizer can estimate cardinality. For example, bucket 36 contains instances of values 2033, 2034, 2035, and 2036. The endpoint value 2036 has an endpoint repeat count of 4, so the optimizer knows that 4 instances of this value exist. For values such as 2033, which are not endpoints, the optimizer estimates cardinality using density.
....

3.a) 对于流行的值 (在本例中为APEX_040200，PUBLIC，XDB) -ENDPOINT_REPEAT_COUNT-与计划中的估计基数不匹配。
3.b) 对于非流行值 (在本例中为SYS) -也称为不是终点-为什么计划中的估计基数与密度 * Num_of_rows不匹配 (.035714*81974 = 2927.619436-但计划中的报告基数是1681的)

专家解答

1.混合直方图开始像高度平衡直方图。首先你对数据进行排序。然后你在可用的桶之间分割值。

与高度平衡直方图不同，它会移动值，这样它们就不会跨越多个桶。

这完全独立于前N个直方图!

2.除非您更改了某些内容，否则优化器将使用AUTO_SAMPLE_SIZE。所以它不是检查每一行。

不过，cta的在线统计信息收集会在行进入时对行进行计数。

3. a.记住统计数据是基于样本的。所以它不太可能被发现!但是它们很接近-肯定在同一数量级内。

b.user_tab_col_statistics中显示的密度只是1/NDV。当有直方图时，基数估计中不会使用这一点。这是基于内部算法。

Oracle asktom

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Oracle 混合直方图

问题描述

专家解答

评论