暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle直方图COST探究,了解一下?

dba悠然 2024-03-31
57


1.前言部分

1.1首先对喜欢认可《Oracle 高性能自动化运维》小伙伴说声诚挚的抱歉,前几年忙于工作上事情,确实疏忽了关注邮箱信息。最近翻阅以前文档记录,准备将部分源码提供下载,同时作为补偿会分享一些可用于自动化运维/监控的脚本(某监控项目),希望没有影响别人的利益。

1.2作为后续,希望大家可以多交流,技术工作上的都可以。在目前复杂的去O、开源、信创国产化环境,抱团取暖才是正途。

1.3附录下载(第9章),需要的请加V(tompson666)。



2.说明内容

说实话,本人不怎么会运营公众号,至于为什么要启用,回答是受到朋友启发。

初衷是:公众号主要用于学习交流,说得比较直白的话就是,笔者当年也是从小白过来的,通过不断"白嫖"前辈同行经验升级入行,也希望能够被别人"白嫖",用马斯洛的模型来说,是精神上的满足。

3.关于DBA未来?

对于这个绕不开的话题,可能大家有时候都会迷茫。通过个人的经验就是:多元化技能储备,坦然面对。

4.正题部分

4.1Histograms and Selectivity(公式限制性使用)

Histogram选择率Selectivity计算

pop/nonpop值针对高度均衡直方图宽度均衡每个ndv都在单个buckets不跨界

1.Sel of popvalue

S(col = popvalue) = (popvalue buckets/total buckets)

2.Sel of nonpopvalue

--高度均衡

S(col = nonpopvalue) = density

Density =(1/# NP values)* (NP buckets#/Total buckets)

3.Sel of buckets with endpoints <= value

S(col <= value) = (buckets with endpoints <= value)/(total # buckets)

4.2基础数据准备

    create table t_histogram_height(id,id_skew )
    as
    with set1 as(
    select rownum as id ,mod(rownum,7as id_skew
    from dba_objects where rownum<=1000)
    select * from set1
    -------
    create table t_histogram_width(id,id_skew )
    as
    with set1 as(
    select rownum as id ,mod(rownum,7as id_skew
    from dba_objects where rownum<=1000)
    select * from set1




    --查询分布
    select id_skew as id_skew,
    count(*) as cardinality,
    sum(count(*)) over
    (order by id_skew
    range unbounded preceding) as sum_cardinality
    from t_histogram_width
    group by id_skew;
    SQL> 2 3 4 5 6 7
    ID_SKEW CARDINALITY SUM_CARDINALITY
    ---------- ----------- ---------------
    0 142 142
    1 143 285
    2 143 428
    3 143 571
    4 143 714
    5 143 857
    6 143 1000
    --7个ndv
    --创建宽度均衡衡直方图(buckets>=num_ndv):


    begin
    dbms_stats.gather_table_stats(
    ownname => 'CBO',
    tabname => 't_histogram_height',
    estimate_percent => 50,
    method_opt => 'FOR COLUMNS SIZE 7 id_skew', --在ID列上创建7个直方图,如果创建buckets大于ndv,也是创建宽度均衡直方图
    degree => 4,
    cascade => true);
    end;
    --查询直方图信息
    SQL> select * from dba_histograms h where h.TABLE_NAME=upper('t_histogram_width');
    OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
    ---------- ------------------------------ -------------------- --------------- -------------- ------------------------------
    CBO T_HISTOGRAM_WIDTH ID 0 1
    CBO T_HISTOGRAM_WIDTH ID 1 143
    CBO T_HISTOGRAM_WIDTH ID 2 286
    CBO T_HISTOGRAM_WIDTH ID 3 429
    CBO T_HISTOGRAM_WIDTH ID 4 572
    CBO T_HISTOGRAM_WIDTH ID 5 715
    CBO T_HISTOGRAM_WIDTH ID 6 858
    CBO T_HISTOGRAM_WIDTH ID 7 1000


    SQL> select t.owner,t.table_name,t.column_name,t.num_distinct,t.num_nulls,t.density,t.HISTOGRAM from dba_tab_col_statistics t where t.table_name='T_HISTOGRAM_WIDTH';
    OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
    ---------- ------------------------------ -------------------- ------------ ---------- ---------- ---------------
    CBO T_HISTOGRAM_WIDTH ID_SKEW 7 0 .0005 FREQUENCY -->宽度/频度均衡,buckets=ndv(7)
    CBO T_HISTOGRAM_WIDTH ID 1000 0 .001 HEIGHT BALANCED -->高度均衡,bucket(7)<ndv(1000






    -------eg:高度均衡
    begin
    dbms_stats.gather_table_stats(
    ownname => 'CBO',
    tabname => 't_histogram_height',
    estimate_percent => 50,
    method_opt => 'FOR ALL COLUMNS SIZE 5', --在ID列上创建5个bucktes<ndv(7),高度均衡
    degree => 4,
    cascade => true);
    end;


    SQL> select * from dba_histograms h where h.TABLE_NAME=upper('t_histogram_height');
    OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
    ---------- ------------------------------ -------------------- --------------- -------------- ------------------------------
    CBO T_HISTOGRAM_HEIGHT ID 0 1
    CBO T_HISTOGRAM_HEIGHT ID 1 200
    CBO T_HISTOGRAM_HEIGHT ID 2 400
    CBO T_HISTOGRAM_HEIGHT ID 3 600
    CBO T_HISTOGRAM_HEIGHT ID 4 800
    CBO T_HISTOGRAM_HEIGHT ID 5 1000
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 0 0
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 1 1
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 2 2
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 3 4
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 4 5
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 5 6 -->5buckets
    --
    SQL> select t.owner,t.table_name,t.column_name,t.num_distinct,t.num_nulls,t.density,t.HISTOGRAM from dba_tab_col_statistics t where t.table_name='T_HISTOGRAM_HEIGHT';


    OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
    ---------- ------------------------------ -------------------- ------------ ---------- ---------- ---------------
    CBO T_HISTOGRAM_HEIGHT ID_SKEW 7 0 .142858 HEIGHT BALANCED -->高度均衡
    CBO T_HISTOGRAM_HEIGHT ID 1000 0 .001 HEIGHT BALANCED






    4.3Histogram信息

      1)popvalue:值出现在多个跨界buckets bounds
      2)nonpopvalue:出现在单个buckets不跨界




      SQL> select t.owner,t.table_name,t.column_name,t.num_distinct,t.num_nulls,t.density,t.HISTOGRAM from dba_tab_col_statistics t where t.table_name='T_HISTOGRAM_WIDTH';
      OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
      ---------- ------------------------------ -------------------- ------------ ---------- ---------- ---------------
      CBO T_HISTOGRAM_WIDTH ID_SKEW 7 0 .0005 FREQUENCY -->宽度/频度均衡,buckets=ndv(7)
      CBO T_HISTOGRAM_WIDTH ID 1000 0 .001 HEIGHT BALANCED -->高度均衡,bucket(7)<ndv(1000




      --density密度:


      宽度/频度均衡Sel=1/(2*num_rows)


      ===========================================================
      SQL> select * from dba_histograms h where h.TABLE_NAME=upper('t_histogram_width');
      OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
      ---------- ------------------------------ -------------------- --------------- -------------- ------------------------------
      CBO T_HISTOGRAM_WIDTH ID 0 1
      CBO T_HISTOGRAM_WIDTH ID 1 143
      CBO T_HISTOGRAM_WIDTH ID 2 286
      CBO T_HISTOGRAM_WIDTH ID 3 429
      CBO T_HISTOGRAM_WIDTH ID 4 572
      CBO T_HISTOGRAM_WIDTH ID 5 715
      CBO T_HISTOGRAM_WIDTH ID 6 858
      CBO T_HISTOGRAM_WIDTH ID 7 1000
      --7ndv,每个ndv重复出现143次=1000/7
      SQL> select t.owner,t.table_name,t.column_name,t.num_distinct,t.num_nulls,t.density,t.HISTOGRAM from dba_tab_col_statistics t where t.table_name='T_HISTOGRAM_WIDTH';
      OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
      ---------- ------------------------------ -------------------- ------------ ---------- ---------- ---------------
      CBO T_HISTOGRAM_WIDTH ID_SKEW 7 0 .0005 FREQUENCY -->宽度/频度均衡,buckets=ndv(7)
      CBO T_HISTOGRAM_WIDTH ID 1000 0 .001 HEIGHT BALANCED -->高度均衡,bucket(7)<ndv(1000



      4.4宽度均衡

        --宽度/频度均衡(buckets>=ndv)
        1)Sel of popvalue
        select * from t_histogram_width where id=0
        选择率Sel= (popvalue buckets/total buckets) =1/7
        Cards=1000*Sel=143


        SQL> SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
        Query Plan
        ----------------------------------------------------------------------------------------------------------------------------------------
        0 SELECT STATEMENT (Cards=142 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ALL_ROWS (PA=) (PO=)
        1 0 TABLE ACCESS BY INDEX ROWID T_HISTOGRAM_WIDTH (Cards=142 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ANALYZED (PA=) (PO=)
        2 1 INDEX RANGE SCAN IDX_HIST_WIDTH (Cards=142 IO_COST=1 Cost=1 MB= Elaps_time=1s) OPT=ANALYZED (PA=) (PO=)


        2)Sel of buckets with endpoints <= value
        select * from t_histogram_width where id<=3 -->0,1,2,3(4buckets)
        选择率Sel= (popvalue buckets/total buckets) =4/7
        Cards=1000*Sel=572
        Query Plan
        ----------------------------------------------------------------------------------------------------------------------------------------
        0 SELECT STATEMENT (Cards=572 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ALL_ROWS (PA=) (PO=)
        1 0 TABLE ACCESS FULL T_HISTOGRAM_WIDTH (Cards=572 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ANALYZED (PA=) (PO=)


        3)Sel of nonpopvalue=1/num_rows
        ID CARDINALITY SUM_CARDINALITY
        ---------- ----------- ---------------
        0 990 990
        1 10 1000
        2 1 1001
        3 1 1002
        4 1 1003
        5 1 1004
        6 1 1005
        7 1 1006
        8 1 1007
        9 1 1008
        select * from t_skew1 where id=9 -->9thbucket
        选择率Sel= 1/1008
        Cards=1008*Sel=1
        Query Plan
        ----------------------------------------------------------------------------------------------------------------------------------------
        0 SELECT STATEMENT (Cards=1 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ALL_ROWS (PA=) (PO=)
        1 0 TABLE ACCESS FULL T_SKEW1 (Cards=1 IO_COST=3 Cost=3 MB=0 Elaps_time=1s) OPT=ANALYZED (PA=) (PO=)





        4.5高度均衡

          --buckets<ndv


          OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM
          ---------- ------------------------------ -------------------- ------------ ---------- ---------- ---------------
          CBO T_SKEW1 ID 10 0 .005952381 HEIGHT BALANCED


          ID CARDINALITY SUM_CARDINALITY
          ---------- ----------- ---------------
          0 990 990
          1 10 1000
          2 1 1001
          3 1 1002
          4 1 1003
          5 1 1004
          6 1 1005
          7 1 1006
          8 1 1007
          9 1 1008
          begin
          dbms_stats.gather_table_stats(
          ownname => 'CBO',
          tabname => 't_skew1',
          estimate_percent => 50,
          method_opt => 'FOR ALL COLUMNS SIZE 5', --创建5个buckets
          degree => 4,
          cascade => true);
          end;
          OWNER TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_BUCKETS DENSITY HISTOGRAM
          ---------- ------------------------------ -------------------- ------------ ----------- ---------- ---------------
          CBO T_SKEW1 ID 10 5 .005952381 HEIGHT BALANCED


          OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
          ---------- ------------------------------ -------------------- --------------- -------------- --------------------
          CBO T_SKEW1 ID 5 9
          CBO T_SKEW1 ID 4 0 -->bucket1/2/3重复不被显示,保存值为0


          pop value通过ENDPOINT_NUMBER gap识别


          5个buckets,1008rows,每buckets201rows
          0-990rows
          1-10rows
          2~9:1rows


          数据分布情况:

            其中:
            0跨4个buckets
            其余值(1~9)分布在第5个buckets


            1)Sel of nonpupvalue
            id=0 Sel=Density
            Density =(1/# NP values)* (NP buckets#/Total buckets)
            =1/9*






            SQL> SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
            Query Plan
            ----------------------------------------------------------------------------------------------------------------------------------------
            0 SELECT STATEMENT (Cards=990 IO_COST=2 Cost=2 MB=0 Elaps_time=1s) OPT=ALL_ROWS (PA=) (PO=)
            1 0 INDEX FAST FULL SCAN ID_T_SKEW1 (Cards=990 IO_COST=2 Cost=2 MB=0 Elaps_time=1s) OPT=ANALYZED (PA=) (PO=)






            5.总结

            最近忙于一些事情,准备有些仓促,请见谅。后续会陆续分享一些未来及的发布在书籍中的内容,请保持关注,--可加VX互动—-。

            文章转载自dba悠然,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

            评论