暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

timestamp类型字段谓词越界估算

原创 冯刚 2020-09-02
1355

timestamp/date类型字段,谓词越界后,估算值有一定的容忍度

在目标列上没有直方图时,用目标列做范围查询的可选择率计算公式
1)VAL 大于 HIGH_VALUE,并且 VAL 小于 (2HIGH_VALUE-LOW_VALUE):
Selectivity = (1/NUM_DISTINCT)NULL_Adjust(1-((VAL–HIGH_VALUE)/(HIGH_VALUE–LOW_VALUE)))
2)VAL 小于 LOW_VALUE,并且 VAL 大于 (2
LOW_VALUE-HIGH_VALUE):
Selectivity = (1/NUM_DISTINCT)NULL_Adjust(1-((LOW_VALUE–VAL)/(HIGH_VALUE–LOW_VALUE)))

–查看表情况
SQL> set linesize 300 pagesize 1000

SQL> select count(*),count(distinct s.createdate) from BS_SYSTEMLOG s;

3.png

select count(*) from BS_SYSTEMLOG s where s.createdate is null;

4.png

SQL> COLUMN low_value FORMAT A19
SQL> COLUMN high_value FORMAT A19
SQL> alter session set NLS_DATE_FORMAT=‘YYYY-MM-DD HH24:MI:SS’;
SQL> select low_value,high_value,num_distinct,num_nulls,LAST_ANALYZED from dba_tab_col_statistics where table_name=‘BS_SYSTEMLOG’ and column_name=‘CREATEDATE’;

5.png

–查询日期最大/最小转换值

set serveroutput on
DECLARE
rv RAW(32) := ‘7878081E131F10’;
dt DATE := NULL;
BEGIN
dbms_stats.convert_raw_value(rv, dt);
dbms_output.put_line('HIGH_DATE: '||TO_CHAR(dt, ‘YYYY-MM-DD hh24:mi:ss’));
END;
/

HIGH_DATE: 2020-08-30 18:30:15

set serveroutput on
DECLARE
rv RAW(32) := ‘78740113161720’;
dt DATE := NULL;
BEGIN
dbms_stats.convert_raw_value(rv, dt);
dbms_output.put_line('HIGH_DATE: '||TO_CHAR(dt, ‘YYYY-MM-DD hh24:mi:ss’));
END;
/

HIGH_DATE: 2016-01-19 21:22:31

SQL> alter session set NLS_DATE_FORMAT=‘YYYY-MM-DD HH24:MI:SS’;
SQL> select t.TABLE_NAME,t.NUM_ROWS,t.BLOCKS,t.LAST_ANALYZED from user_tables t where table_name=‘BS_SYSTEMLOG’;

6.png

–综上所述
NUM_ROWS:229754 #以user_tables查到为准
NUM_NULLS:0
NUM_DISTINCT:71200
HIGH_VALUE:1598783415
LOW_VALUE:1453209751

HIGH_VALUE 和 LOW_VALUE 都是根据上面匿名块查到的日期转换成秒的值:
https://tool.lu/timestamp/

7340fb6b8159b5f2bfbce831fe4077ea.png

容忍度范围为:
(2HIGH_VALUE-LOW_VALUE)=(21598783415-1453209751)=1744357079=2025-04-11 15:37:59 2020-08-30 18:30:15<VAL<2025-04-11 15:37:59

(2LOW_VALUE-HIGH_VALUE)=(21453209751-1598783415)=1307636087=2011-06-10 00:14:47 2011-06-10 00:14:47<VAL<2016-01-19 21:22:31

SQL> set autotrace traceonly
SQL> select count(*) from BS_SYSTEMLOG where CREATEDATE>to_date(‘2020-10-01’,‘yyyy-mm-dd’);

1.png

Rows = NUM_ROWS * Selectivity
= NUM_ROWS * (1 / NUM_DISTINCT) * NULL_Adjust * (1 - ((VAL – HIGH_VALUE) / (HIGH_VALUE – LOW_VALUE)))
= NUM_ROWS * (1 / NUM_DISTINCT) * ((NUM_ROWS – NUM_NULLS) / NUM_ROWS) * (1 - ((VAL – HIGH_VALUE) / (HIGH_VALUE – LOW_VALUE)))
= round(229754*(1/71200)1(1-(1601481600-1598783415)/(1598783415-1453209751)))
= 3

SQL> select count(*) from BS_SYSTEMLOG where CREATEDATE<to_date(‘2013-04-12’,‘yyyy-mm-dd’);

2.png

Rows = NUM_ROWS * Selectivity
= NUM_ROWS * (1 / NUM_DISTINCT) * NULL_Adjust * (1 - ((VAL – HIGH_VALUE) / (HIGH_VALUE – LOW_VALUE)))
= NUM_ROWS * (1 / NUM_DISTINCT) * ((NUM_ROWS – NUM_NULLS) / NUM_ROWS) * (1 - ((LOW_VALUE – VAL) / (HIGH_VALUE – LOW_VALUE)))
= round(229754*(1/71200)1(1-(1453209751-1365696000)/(1598783415-1453209751)))
= 1

上面是把date/timestamp字段,统计信息里的最大值和最小值转换成秒进行的计算。如果有更便捷,或不当之处,请指正。

最后修改时间:2020-09-02 10:30:41
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论