点击上方“IT那活儿”公众号,关注后了解更多内容,不管IT什么活儿,干就完了!!!


本文主要讲述实现对指定的空行,按照前面非空或后面非空数据进行填充。原来这种实现数据填充的方法,主要是用LAST_VALUE+IGNORE NULLS实现,在11G中LAG分析函数也支持IGNORE NULLS,但是,在性能上,他们是有区别的。
本文讨论2点内容:
使用分析函数LAST_VALUE和11G LAG实现缺失数据填充
dingjun123@ORADB> SELECT * FROM t;
ID VAL CATE
---------- ---------- ----------
1 VAL1 CATE0
2 CATE0
3 CATE0
4 CATE0
5 CATE0
6 VAL6 CATE1
7 CATE1
8 CATE1
9 CATE1
9 rows selected.
在10g中有LAST_VALUE+IGNORE NULLS很好解决,如下:
dingjun123@ORADB> SELECT ID,
2 last_value(val IGNORE NULLS) over(ORDER BY ID) val,
3 cate
4 FROM t;
ID VAL CATE
---------- ---------- ----------
1 VAL1 CATE0
2 VAL1 CATE0
3 VAL1 CATE0
4 VAL1 CATE0
5 VAL1 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
上面的SQL含义是ID排序直到当前行(默认是RANGE窗口),忽略VAL为空的值,因为是LAST_VALUE,所以找最近的不为空的VAL值来填充到当前行。在11G中,LAG分析函数也带IGNORE NULLS,所以也能实现上面的功能,因为LAG是找当前行前面1行的值,所以需要加个NVL,LAST_VALUE不需要,它是直接找到当前行,否则有值的可能为空,如下:
dingjun123@ORADB> SELECT ID,
2 nvl(val,lag(val IGNORE NULLS) over(ORDER BY ID)) val,
3 cate
4 FROM t;
ID VAL CATE
---------- ---------- ----------
1 VAL1 CATE0
2 VAL1 CATE0
3 VAL1 CATE0
4 VAL1 CATE0
5 VAL1 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
dingjun123@ORADB> select id,val,cate from t;
ID VAL CATE
---------- ---------- ----------
1 CATE0
2 CATE0
3 VAL3 CATE0
4 CATE0
5 CATE0
6 VAL6 CATE1
7 CATE1
8 CATE1
9 CATE1
9 rows selected.
对于ID=1和ID=2的行,因为前面找不到VAL的值,所以用ID=3的来填充。很显然,这里需要用到2次LAST_VALUE分析函数,一次是正常用当前行前面的VAL来填充,如果填充不了,就用按ID倒叙排列的最近一行来填充。如下:
dingjun123@ORADB> SELECT ID,
2 nvl(last_value(val IGNORE NULLS) over(ORDER BY ID),
3 last_value(val IGNORE NULLS) over(ORDER BY ID DESC)) val,
4 cate
5 FROM t
6 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 VAL3 CATE0
2 VAL3 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
黄色区域的数据还是按向上查找的填充方式,红色部分按照向下查找填充的方式。当然,也可以使用LAG或LEAD来实现。
如下:
dingjun123@ORADB> SELECT ID,
2 nvl(val,nvl(lag(val IGNORE NULLS) over(ORDER BY ID),lag(val IGNORE NULLS) over(ORDER BY ID DESC))) val,
3 cate
4 FROM t
5 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 VAL3 CATE0
2 VAL3 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
dingjun123@ORADB> SELECT ID,
2 nvl(val,nvl(lag(val IGNORE NULLS) over(ORDER BY ID),lead(val IGNORE NULLS) over(ORDER BY ID))) val,
3 cate
4 FROM t
5 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 VAL3 CATE0
2 VAL3 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
LAG/LEAD

LAST_VALUE/FIRST_VALUE

--不加WINDOW窗口,不正确dingjun123@ORADB> SELECT ID,
2 nvl(last_value(val IGNORE NULLS) over(ORDER BY ID),
3 first_value(val IGNORE NULLS) over(ORDER BY ID)) val,
4 cate
5 FROM t
6 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 CATE0
2 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
--加WINDOW窗口的FIRST_VALUE,正确
dingjun123@ORADB> SELECT ID,
2 nvl(last_value(val IGNORE NULLS) over(ORDER BY ID),
3 first_value(val IGNORE NULLS) over(ORDER BY ID ROWS BETWEEN unbounded preceding AND unbounded following)) val,
4 cate
5 FROM t
6 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 VAL3 CATE0
2 VAL3 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
dingjun123@ORADB> select id,val,cate from t;
ID VAL CATE
---------- ---------- ----------
1 CATE0
2 CATE0
3 VAL3 CATE0
4 CATE0
5 CATE0
6 CATE1
7 VAL7 CATE1
8 CATE1
9 CATE1
9 rows selected.
dingjun123@ORADB> SELECT ID,
2 nvl(last_value(val IGNORE NULLS) over(PARTITION BY cate ORDER BY ID),
3 last_value(val IGNORE NULLS) over( PARTITION BY cate ORDER BY ID DESC)) val,
4 cate
5 FROM t
6 ORDER BY ID;
ID VAL CATE
---------- ---------- ----------
1 VAL3 CATE0
2 VAL3 CATE0
3 VAL3 CATE0
4 VAL3 CATE0
5 VAL3 CATE0
6 VAL7 CATE1
7 VAL7 CATE1
8 VAL7 CATE1
9 VAL7 CATE1
9 rows selected.
SELECT ID,
nvl(val,nvl(lag(val IGNORE NULLS) over(PARTITION BY cate ORDER BY ID),
lead(val IGNORE NULLS) over(PARTITION BY cate ORDER BY ID))) val,
cate
FROM t
ORDER BY ID;
--结果一样,省略
LAST_VALUE和LAG在实现缺失数据填充上的区别
dingjun123@ORADB> SELECT ID,val,
2 nvl(val,last_value(val IGNORE NULLS) over(ORDER BY ID ROWS BETWEEN unbounded preceding AND 2 following)) new_val,
3 cate
4 FROM t;
ID VAL NEW_VAL CATE
---------- ---------- ------------------------------------------- ----------
1 VAL1 VAL1 CATE0
2 VAL1 CATE0
3 VAL1 CATE0
4 VAL6 CATE0
5 VAL6 CATE0
6 VAL6 VAL6 CATE1
7 VAL6 CATE1
8 VAL6 CATE1
9 VAL6 CATE1
9 rows selected.
如果上面的需求使用LAG分析函数来实现,那就比较复杂了。
dingjun123@ORADB> DROP TABLE t;
Table dropped.
dingjun123@ORADB> CREATE TABLE t AS SELECT LEVEL ID,decode(MOD(LEVEL,5),1,'VAL'||LEVEL) val,
2 'CATE'||(trunc((LEVEL-1)/5)) cate FROM dual CONNECT BY LEVEL<10000;
Table created.
dingjun123@ORADB> select count(*) cnt,count(val) cnt_val from t;
CNT CNT_VAL
---------- ----------
9999 2000
1 row selected.
dingjun123@ORADB> SELECT ID,
2 nvl(val,last_value(val IGNORE NULLS) over(ORDER BY ID)) val,
3 cate
4 FROM t;
9999 rows selected.
Elapsed: 00:00:00.13
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
31 consistent gets
0 physical reads
0 redo size
207607 bytes sent via SQL*Net to client
7741 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
9999 rows processed
dingjun123@ORADB> SELECT ID,
2 nvl(val,lag(val IGNORE NULLS) over(ORDER BY ID)) val,
3 cate
4 FROM t;
9999 rows selected.
Elapsed: 00:00:22.49
Statistics
--------------------------------------------------------
0 recursive calls
0 db block gets
31 consistent gets
0 physical reads
0 redo size
207607 bytes sent via SQL*Net to client
7741 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
9999 rows processed
SELECT ID,
nvl(val,lag(val IGNORE NULLS) over(ORDER BY ID)) val,
cate
FROM t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 668 21.98 22.08 0 31 0 9999
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 670 21.98 22.11 0 32 0 9999

本文作者:丁 俊(上海新炬王翦团队)
本文来源:“IT那活儿”公众号





