暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

MySQL8.0窗口函数实验

MySQL8.0窗口函数实验

窗口函数是一种强大的数据分析工具,允许在数据集的特定"窗口"(数据子集)内执行计算,同时保留原始行数据。以下是其核心使用方法和场景总结:

一、窗口函数分类及常用函数

序号函数:

ROW_NUMBER():生成连续唯一序号(无并列),如1,2,3。
SELECT name, score, ROW_NUMBER() OVER (ORDER BY score DESC) AS rank FROM students;

RANK():允许并列但序号跳跃(如1,1,3)。

DENSE_RANK():允许并列且序号连续(如1,1,2)。
分布函数:

PERCENT_RANK():计算百分比排名(范围[0,1])。
CUME_DIST():计算累积分布值(如小于等于当前值的行占比)。
前后函数:

LAG(expr, N):获取当前行前第 N 行的值。
LEAD(expr, N):获取当前行后第 N 行的值。
SELECT order_date, LAG(amount, 1) OVER (ORDER BY date) AS prev_amount FROM orders;

二、基本语法结构

函数名([参数]) OVER (
[PARTITION BY 分组字段]
[ORDER BY 排序字段]
[ROWS/RANGE 窗口范围]
)
PARTITION BY:按字段分组,窗口函数在组内独立计算(类似GROUP BY,但保留所有行)。
ORDER BY:定义窗口内的排序规则。
窗口范围:
ROWS:基于物理行数(如ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING包含前一行、当前行、后一行)。
RANGE:基于逻辑值范围(如按时间差值动态扩展窗口)。

滑动窗口的范围指定方式
基于行的范围指定
通常使用 BETWEEN frame_start AND frame_end 语法表示行范围,frame_startframe_end 支持以下关键字定义动态行记录边界:

CURRENT_ROW
边界是当前行,一般与其他范围关键字配合使用。
UNBOUNDED_PRECEDING
边界是分区中的第一行。
UNBOUNDED_FOLLOWING
边界是分区中的最后一行。
expr_PRECEDING
边界是当前行减去 expr 的值(expr 为数值表达式)。
expr_FOLLOWING
边界是当前行加上 expr 的值(expr 为数值表达式)。

举例:

  1. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    窗口范围包含当前行、前一行、后一行,共 3 行记录。
  2. ROWS UNBOUNDED FOLLOWING
    窗口范围从当前行到分区中的最后一行。
  3. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    窗口范围为当前分区所有行,等同于省略范围指定。

三、实验部分

环境初始化参考文章【Mysql8.0窗口函数用法】

1.UNBOUNDED PRECEDING表示边界永远为第一行:基于第一行进行累加

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS UNBOUNDED PRECEDING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:28: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, -> DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS UNBOUNDED PRECEDING) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+---------+-----------+--------------+ | emp_no | dept_no | salary | row_num | dept_rank | salary_total | +--------+---------+--------+---------+-----------+--------------+ | 10001 | d005 | 60117 | 10 | 10 | 60117 | | 10001 | d005 | 62102 | 9 | 9 | 122219 | | 10001 | d005 | 66074 | 8 | 8 | 188293 | | 10001 | d005 | 66596 | 7 | 7 | 254889 | | 10001 | d005 | 66961 | 6 | 6 | 321850 | | 10001 | d005 | 71046 | 5 | 5 | 392896 | | 10001 | d005 | 74333 | 4 | 4 | 467229 | | 10001 | d005 | 75286 | 3 | 3 | 542515 | | 10001 | d005 | 75994 | 2 | 2 | 618509 | | 10001 | d005 | 76884 | 1 | 1 | 695393 | +--------+---------+--------+---------+-----------+--------------+ 10 rows in set (0.00 sec)

2.CURRENT ROW表示获取当前行记录,也就是边界是当前行,等值关系

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary range current row) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:28: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, -> DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary range current row) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+---------+-----------+--------------+ | emp_no | dept_no | salary | row_num | dept_rank | salary_total | +--------+---------+--------+---------+-----------+--------------+ | 10001 | d005 | 60117 | 10 | 10 | 60117 | | 10001 | d005 | 62102 | 9 | 9 | 62102 | | 10001 | d005 | 66074 | 8 | 8 | 66074 | | 10001 | d005 | 66596 | 7 | 7 | 66596 | | 10001 | d005 | 66961 | 6 | 6 | 66961 | | 10001 | d005 | 71046 | 5 | 5 | 71046 | | 10001 | d005 | 74333 | 4 | 4 | 74333 | | 10001 | d005 | 75286 | 3 | 3 | 75286 | | 10001 | d005 | 75994 | 2 | 2 | 75994 | | 10001 | d005 | 76884 | 1 | 1 | 76884 | +--------+---------+--------+---------+-----------+--------------+ 10 rows in set (0.01 sec)

3.UNBOUNDED FOLLOWING 表示边界永远为最后一行

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS BETWEEN UNBOUNDED preceding and UNBOUNDED FOLLOWING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:39: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> ROW_NUMBER() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS row_num, -> DENSE_RANK() OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary DESC) AS dept_rank, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS BETWEEN UNBOUNDED preceding and UNBOUNDED FOLLOWING) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+---------+-----------+--------------+ | emp_no | dept_no | salary | row_num | dept_rank | salary_total | +--------+---------+--------+---------+-----------+--------------+ | 10001 | d005 | 60117 | 10 | 10 | 695393 | | 10001 | d005 | 62102 | 9 | 9 | 695393 | | 10001 | d005 | 66074 | 8 | 8 | 695393 | | 10001 | d005 | 66596 | 7 | 7 | 695393 | | 10001 | d005 | 66961 | 6 | 6 | 695393 | | 10001 | d005 | 71046 | 5 | 5 | 695393 | | 10001 | d005 | 74333 | 4 | 4 | 695393 | | 10001 | d005 | 75286 | 3 | 3 | 695393 | | 10001 | d005 | 75994 | 2 | 2 | 695393 | | 10001 | d005 | 76884 | 1 | 1 | 695393 | +--------+---------+--------+---------+-----------+--------------+ 10 rows in set (0.00 sec)

4.EXPR PRECEDING/FOLLOWING 表示带表达式的边界。

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS 1 PRECEDING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 01:55: [employees]> select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10 -> ; +--------+---------+--------+ | emp_no | dept_no | salary | +--------+---------+--------+ | 10001 | d005 | 60117 | | 10001 | d005 | 62102 | | 10001 | d005 | 66074 | | 10001 | d005 | 66596 | | 10001 | d005 | 66961 | | 10001 | d005 | 71046 | | 10001 | d005 | 74333 | | 10001 | d005 | 75286 | | 10001 | d005 | 75994 | | 10001 | d005 | 76884 | +--------+---------+--------+ 10 rows in set (0.01 sec) root@db 01:55: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS 1 PRECEDING) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+--------------+ | emp_no | dept_no | salary | salary_total | +--------+---------+--------+--------------+ | 10001 | d005 | 60117 | 60117 | | 10001 | d005 | 62102 | 122219 | | 10001 | d005 | 66074 | 128176 | | 10001 | d005 | 66596 | 132670 | | 10001 | d005 | 66961 | 133557 | | 10001 | d005 | 71046 | 138007 | | 10001 | d005 | 74333 | 145379 | | 10001 | d005 | 75286 | 149619 | | 10001 | d005 | 75994 | 151280 | | 10001 | d005 | 76884 | 152878 | +--------+---------+--------+--------------+ 10 rows in set (0.36 sec)

上面的查询表达的意思是基于分组内每行记录和它上一条记录求和,不累加。

//求每行和它上面两行的和:

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS 2 PRECEDING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 01:57: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS 2 PRECEDING) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+--------------+ | emp_no | dept_no | salary | salary_total | +--------+---------+--------+--------------+ | 10001 | d005 | 60117 | 60117 | | 10001 | d005 | 62102 | 122219 | | 10001 | d005 | 66074 | 188293 | | 10001 | d005 | 66596 | 194772 | | 10001 | d005 | 66961 | 199631 | | 10001 | d005 | 71046 | 204603 | | 10001 | d005 | 74333 | 212340 | | 10001 | d005 | 75286 | 220665 | | 10001 | d005 | 75994 | 225613 | | 10001 | d005 | 76884 | 228164 | +--------+---------+--------+--------------+ 10 rows in set (0.00 sec)

//求每行的前两行和后面四行相加的结果

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 2 PRECEDING and 4 FOLLOWING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:00: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 2 PRECEDING and 4 FOLLOWING) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+--------------+ | emp_no | dept_no | salary | salary_total | +--------+---------+--------+--------------+ | 10001 | d005 | 60117 | 321850 | | 10001 | d005 | 62102 | 392896 | | 10001 | d005 | 66074 | 467229 | | 10001 | d005 | 66596 | 482398 | | 10001 | d005 | 66961 | 496290 | | 10001 | d005 | 71046 | 507100 | | 10001 | d005 | 74333 | 440504 | | 10001 | d005 | 75286 | 373543 | | 10001 | d005 | 75994 | 302497 | | 10001 | d005 | 76884 | 228164 | +--------+---------+--------+--------------+ 10 rows in set (0.00 sec)

5.RANGE PRECEDING/FOLLOWING
表示当前行值范围内的分组记录。ROWS对应的是行号,RANGE对应的行值。

计算逻辑是:计算当前行salary减去1的值,如果计算的值在结果集中不存在,此时salary_total的值为salary当前值,如果计算的值在结果集中存在,此时salary_total的值为salary当前值加上【salary的值减去1】。

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary RANGE 1 PRECEDING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:13: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary RANGE 1 PRECEDING) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+--------------+ | emp_no | dept_no | salary | salary_total | +--------+---------+--------+--------------+ | 10001 | d005 | 60117 | 60117 | | 10001 | d005 | 62102 | 62102 | | 10001 | d005 | 66074 | 66074 | | 10001 | d005 | 66596 | 66596 | | 10001 | d005 | 66961 | 66961 | | 10001 | d005 | 71046 | 71046 | | 10001 | d005 | 74333 | 74333 | | 10001 | d005 | 75286 | 75286 | | 10001 | d005 | 75994 | 75994 | | 10001 | d005 | 76884 | 76884 | +--------+---------+--------+--------------+ 10 rows in set (0.00 sec)

//RANGE between 1 PRECEDING and 1 FOLLOWING:对每行值-1和+1后对应的值进行求和
计算逻辑:计算当前行值-1和+1后对应的值,如果在结果集中都不存在,则salary_total的值取salary的当前值;
如果在结果集中存在,则salary_total的值取salary的当前值+对应的值。

SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary RANGE between 1 PRECEDING and 1 FOLLOWING ) AS salary_total FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; root@db 02:13: [employees]> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary RANGE between 1 PRECEDING and 1 FOLLOWING ) AS salary_total -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; +--------+---------+--------+--------------+ | emp_no | dept_no | salary | salary_total | +--------+---------+--------+--------------+ | 10001 | d005 | 60117 | 60117 | | 10001 | d005 | 62102 | 62102 | | 10001 | d005 | 66074 | 66074 | | 10001 | d005 | 66596 | 66596 | | 10001 | d005 | 66961 | 66961 | | 10001 | d005 | 71046 | 71046 | | 10001 | d005 | 74333 | 74333 | | 10001 | d005 | 75286 | 75286 | | 10001 | d005 | 75994 | 75994 | | 10001 | d005 | 76884 | 76884 | +--------+---------+--------+--------------+ 10 rows in set (0.00 sec)

6.row_number()
##查询员工部门表dept_emp中,最晚进入公司的人员名单

select * from ( select row_number() over (partition by dept_no order by from_date asc) as row_num, emp_no, dept_no, from_date, to_date from dept_emp) t where row_num=1; 或者写成如下 select * from ( select row_number() over w as row_num, emp_no, dept_no, from_date, to_date from dept_emp WINDOW w AS (partition by dept_no order by from_date asc) ) t where row_num=1; mysql> select * from ( -> select row_number() over (partition by dept_no order by from_date desc) as row_num, -> emp_no, -> dept_no, -> from_date, -> to_date -> from dept_emp) t where row_num=1; +---------+--------+---------+------------+------------+ | row_num | emp_no | dept_no | from_date | to_date | +---------+--------+---------+------------+------------+ | 1 | 483985 | d001 | 2002-07-30 | 9999-01-01 | | 1 | 445457 | d002 | 2002-07-29 | 9999-01-01 | | 1 | 109702 | d003 | 2002-07-28 | 9999-01-01 | | 1 | 413250 | d004 | 2002-08-01 | 9999-01-01 | | 1 | 83118 | d005 | 2002-08-01 | 9999-01-01 | | 1 | 227384 | d006 | 2002-08-01 | 9999-01-01 | | 1 | 71987 | d007 | 2002-07-31 | 9999-01-01 | | 1 | 282455 | d008 | 2002-08-01 | 9999-01-01 | | 1 | 408784 | d009 | 2002-08-01 | 9999-01-01 | +---------+--------+---------+------------+------------+ 9 rows in set (0.52 sec)

##查询员工部门表dept_emp中,最早进入公司的人员名单

mysql> select * from ( -> select row_number() over (partition by dept_no order by from_date asc) as row_num, -> emp_no, -> dept_no, -> from_date, -> to_date -> from dept_emp) t where row_num=1; +---------+--------+---------+------------+------------+ | row_num | emp_no | dept_no | from_date | to_date | +---------+--------+---------+------------+------------+ | 1 | 110022 | d001 | 1985-01-01 | 9999-01-01 | | 1 | 110085 | d002 | 1985-01-01 | 9999-01-01 | | 1 | 110183 | d003 | 1985-01-01 | 9999-01-01 | | 1 | 110303 | d004 | 1985-01-01 | 9999-01-01 | | 1 | 110511 | d005 | 1985-01-01 | 9999-01-01 | | 1 | 110725 | d006 | 1985-01-01 | 9999-01-01 | | 1 | 111035 | d007 | 1985-01-01 | 9999-01-01 | | 1 | 111400 | d008 | 1985-01-01 | 9999-01-01 | | 1 | 111692 | d009 | 1985-01-01 | 9999-01-01 | +---------+--------+---------+------------+------------+ 9 rows in set (0.42 sec)

7.如果窗口比较多,可以给窗口设置别名

select * from ( select row_number() over w as row_num, emp_no, dept_no, from_date, to_date from dept_emp WINDOW w AS (partition by dept_no order by from_date DESC) ) t where row_num=1; select * from ( select row_number() over w as row_num, emp_no, dept_no, from_date, to_date from dept_emp WINDOW w AS (partition by dept_no order by from_date DESC ROWS between 1 PRECEDING and 1 FOLLOWING) ) t;

8.ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
窗口范围包含当前行、前一行、后一行,共 3 行记录

写法一: SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 1 PRECEDING and 1 FOLLOWING ) AS salary_total, avg(tmp.salary) OVER (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 1 PRECEDING and 1 FOLLOWING ) AS salary_avg FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp; 写法二: SELECT tmp.emp_no,tmp.dept_no, tmp.salary, SUM(tmp.salary) OVER s AS salary_total, AVG(tmp.salary) OVER s AS salary_avg FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp WINDOW s AS (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 1 PRECEDING and 1 FOLLOWING ); mysql> SELECT tmp.emp_no,tmp.dept_no, tmp.salary, -> SUM(tmp.salary) OVER s AS salary_total, -> AVG(tmp.salary) OVER s AS salary_avg -> FROM (select de.emp_no,de.dept_no,sa.salary from dept_emp de, salaries sa where de.emp_no=sa.emp_no and dept_no='d005' limit 10) tmp WINDOW s AS (PARTITION BY tmp.dept_no ORDER BY tmp.salary ROWS between 1 PRECEDING and 1 FOLLOWING ); +--------+---------+--------+--------------+------------+ | emp_no | dept_no | salary | salary_total | salary_avg | +--------+---------+--------+--------------+------------+ | 10001 | d005 | 60117 | 122219 | 61109.5000 | | 10001 | d005 | 62102 | 188293 | 62764.3333 | | 10001 | d005 | 66074 | 194772 | 64924.0000 | | 10001 | d005 | 66596 | 199631 | 66543.6667 | | 10001 | d005 | 66961 | 204603 | 68201.0000 | | 10001 | d005 | 71046 | 212340 | 70780.0000 | | 10001 | d005 | 74333 | 220665 | 73555.0000 | | 10001 | d005 | 75286 | 225613 | 75204.3333 | | 10001 | d005 | 75994 | 228164 | 76054.6667 | | 10001 | d005 | 76884 | 152878 | 76439.0000 | +--------+---------+--------+--------------+------------+ 10 rows in set (0.01 sec)

9.LAG(expr,N)/LEAD(expr,N)

LAG(expr,N)和LEAD(expr,N)这两个函数的功能是获取当前数据行按照某种排序规则的上N行(LAG)/下N行(LEAD)数据的某个字段。比如,统计工资表中两条记录的时间间隔,那么就可以用LAG函数来实现,SQL代码如下:

mysql> desc salaries; +-----------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------+------+-----+---------+-------+ | emp_no | int(11) | NO | PRI | NULL | | | salary | int(11) | NO | | NULL | | | from_date | date | NO | PRI | NULL | | | to_date | date | NO | | NULL | | +-----------+---------+------+-----+---------+-------+ 4 rows in set (0.01 sec) select emp_no,from_date,last_date,datediff(from_date,last_date) as diff from ( select emp_no, from_date, lag(from_date,1) over w as last_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 10; +--------+------------+------------+------+ | emp_no | from_date | last_date | diff | +--------+------------+------------+------+ | 10001 | 1986-06-26 | NULL | NULL | | 10001 | 1987-06-26 | 1986-06-26 | 365 | | 10001 | 1988-06-25 | 1987-06-26 | 365 | | 10001 | 1989-06-25 | 1988-06-25 | 365 | | 10001 | 1990-06-25 | 1989-06-25 | 365 | | 10001 | 1991-06-25 | 1990-06-25 | 365 | | 10001 | 1992-06-24 | 1991-06-25 | 365 | | 10001 | 1993-06-24 | 1992-06-24 | 365 | | 10001 | 1994-06-24 | 1993-06-24 | 365 | | 10001 | 1995-06-24 | 1994-06-24 | 365 | +--------+------------+------------+------+ 10 rows in set (6.49 sec) select emp_no,from_date,last_date,datediff(from_date,last_date) as diff from ( select emp_no, from_date, lead(from_date,1) over w as last_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 10; +--------+------------+------------+------+ | emp_no | from_date | last_date | diff | +--------+------------+------------+------+ | 10001 | 1986-06-26 | 1987-06-26 | -365 | | 10001 | 1987-06-26 | 1988-06-25 | -365 | | 10001 | 1988-06-25 | 1989-06-25 | -365 | | 10001 | 1989-06-25 | 1990-06-25 | -365 | | 10001 | 1990-06-25 | 1991-06-25 | -365 | | 10001 | 1991-06-25 | 1992-06-24 | -365 | | 10001 | 1992-06-24 | 1993-06-24 | -365 | | 10001 | 1993-06-24 | 1994-06-24 | -365 | | 10001 | 1994-06-24 | 1995-06-24 | -365 | | 10001 | 1995-06-24 | 1996-06-23 | -365 | +--------+------------+------------+------+ 10 rows in set (7.20 sec)

10.FIRST_VALUE (expr) /LAST_VALUE (expr)

FIRST_VALUE (expr) 函数和 LAST_VALUE (expr) 函数的功能分别是获得滑动窗口范围内的参数字段中第一个 (FIRST_VALUE) 和最后一个 (LAST_VALUE) 的值。SQL 语句如下:

select * from ( select emp_no, salary, from_date, first_value(salary) over w as first_salary, last_value(salary) over w as last_salary, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 10; +--------+--------+------------+--------------+-------------+------------+ | emp_no | salary | from_date | first_salary | last_salary | to_date | +--------+--------+------------+--------------+-------------+------------+ | 10001 | 60117 | 1986-06-26 | 60117 | 60117 | 1987-06-26 | | 10001 | 62102 | 1987-06-26 | 60117 | 62102 | 1988-06-25 | | 10001 | 66074 | 1988-06-25 | 60117 | 66074 | 1989-06-25 | | 10001 | 66596 | 1989-06-25 | 60117 | 66596 | 1990-06-25 | | 10001 | 66961 | 1990-06-25 | 60117 | 66961 | 1991-06-25 | | 10001 | 71046 | 1991-06-25 | 60117 | 71046 | 1992-06-24 | | 10001 | 74333 | 1992-06-24 | 60117 | 74333 | 1993-06-24 | | 10001 | 75286 | 1993-06-24 | 60117 | 75286 | 1994-06-24 | | 10001 | 75994 | 1994-06-24 | 60117 | 75994 | 1995-06-24 | | 10001 | 76884 | 1995-06-24 | 60117 | 76884 | 1996-06-23 | +--------+--------+------------+--------------+-------------+------------+ 10 rows in set (5.40 sec)

11.聚合函数作为窗口函数

除了前面介绍的各类窗口函数外,我们经常使用的各种聚合函数(SUM/AVG/MAX/MIN/COUNT)也可以作为窗口函数来使用。如下统计员工年收入总和,平均年收入,最大年收入,最小年收入,可以用聚合函数作为窗口函数实现如下:

select * from ( select emp_no, salary, sum(salary) over w as sum1, avg(salary) over w as avg1, max(salary) over w as max1, min(salary) over w as min1, count(salary) over w as count1, from_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 10; +--------+--------+--------+------------+-------+-------+--------+------------+------------+ | emp_no | salary | sum1 | avg1 | max1 | min1 | count1 | from_date | to_date | +--------+--------+--------+------------+-------+-------+--------+------------+------------+ | 10001 | 60117 | 60117 | 60117.0000 | 60117 | 60117 | 1 | 1986-06-26 | 1987-06-26 | | 10001 | 62102 | 122219 | 61109.5000 | 62102 | 60117 | 2 | 1987-06-26 | 1988-06-25 | | 10001 | 66074 | 188293 | 62764.3333 | 66074 | 60117 | 3 | 1988-06-25 | 1989-06-25 | | 10001 | 66596 | 254889 | 63722.2500 | 66596 | 60117 | 4 | 1989-06-25 | 1990-06-25 | | 10001 | 66961 | 321850 | 64370.0000 | 66961 | 60117 | 5 | 1990-06-25 | 1991-06-25 | | 10001 | 71046 | 392896 | 65482.6667 | 71046 | 60117 | 6 | 1991-06-25 | 1992-06-24 | | 10001 | 74333 | 467229 | 66747.0000 | 74333 | 60117 | 7 | 1992-06-24 | 1993-06-24 | | 10001 | 75286 | 542515 | 67814.3750 | 75286 | 60117 | 8 | 1993-06-24 | 1994-06-24 | | 10001 | 75994 | 618509 | 68723.2222 | 75994 | 60117 | 9 | 1994-06-24 | 1995-06-24 | | 10001 | 76884 | 695393 | 69539.3000 | 76884 | 60117 | 10 | 1995-06-24 | 1996-06-23 | +--------+--------+--------+------------+-------+-------+--------+------------+------------+ 10 rows in set (24.32 sec)

12.NFILE(N)

NFILE()函数的功能是对一个数据分区中的有序结果集进行划分,将其分为N个组,并为每个小组分配一个唯一的组编号。对员工工资进行分组,NFILE()函数记录每组组编号,SQL 代码如下:

select * from ( select ntile(3) over w as nf, emp_no, salary, from_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 20; +------+--------+--------+------------+------------+ | nf | emp_no | salary | from_date | to_date | +------+--------+--------+------------+------------+ | 1 | 10001 | 60117 | 1986-06-26 | 1987-06-26 | | 1 | 10001 | 62102 | 1987-06-26 | 1988-06-25 | | 1 | 10001 | 66074 | 1988-06-25 | 1989-06-25 | | 1 | 10001 | 66596 | 1989-06-25 | 1990-06-25 | | 1 | 10001 | 66961 | 1990-06-25 | 1991-06-25 | | 1 | 10001 | 71046 | 1991-06-25 | 1992-06-24 | | 2 | 10001 | 74333 | 1992-06-24 | 1993-06-24 | | 2 | 10001 | 75286 | 1993-06-24 | 1994-06-24 | | 2 | 10001 | 75994 | 1994-06-24 | 1995-06-24 | | 2 | 10001 | 76884 | 1995-06-24 | 1996-06-23 | | 2 | 10001 | 80013 | 1996-06-23 | 1997-06-23 | | 2 | 10001 | 81025 | 1997-06-23 | 1998-06-23 | | 3 | 10001 | 81097 | 1998-06-23 | 1999-06-23 | | 3 | 10001 | 84917 | 1999-06-23 | 2000-06-22 | | 3 | 10001 | 85112 | 2000-06-22 | 2001-06-22 | | 3 | 10001 | 85097 | 2001-06-22 | 2002-06-22 | | 3 | 10001 | 88958 | 2002-06-22 | 9999-01-01 | | 1 | 10002 | 65828 | 1996-08-03 | 1997-08-03 | | 1 | 10002 | 65909 | 1997-08-03 | 1998-08-03 | | 2 | 10002 | 67534 | 1998-08-03 | 1999-08-03 | +------+--------+--------+------------+------------+ 20 rows in set (6.06 sec)

13.PERCENT_RANK()/CUME_DIST()

PERCENT_RANK() 和 CUME_DIST() 这两个函数都是计算数据分布的函数。PERCENT_RANK()和之前的 RANK()函数相关,每行按照以下公式进行计算:
(rank - 1) / (rows - 1) 其中,rank 为 RANK()函数产生的序号,rows 为当前窗口的记录总行数。上面的例子修改如下:

select * from ( select rank() over w as row_num, percent_rank() over w as percent, emp_no, salary, from_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 20; +---------+---------+--------+--------+------------+------------+ | row_num | percent | emp_no | salary | from_date | to_date | +---------+---------+--------+--------+------------+------------+ | 1 | 0 | 10001 | 60117 | 1986-06-26 | 1987-06-26 | | 2 | 0.0625 | 10001 | 62102 | 1987-06-26 | 1988-06-25 | | 3 | 0.125 | 10001 | 66074 | 1988-06-25 | 1989-06-25 | | 4 | 0.1875 | 10001 | 66596 | 1989-06-25 | 1990-06-25 | | 5 | 0.25 | 10001 | 66961 | 1990-06-25 | 1991-06-25 | | 6 | 0.3125 | 10001 | 71046 | 1991-06-25 | 1992-06-24 | | 7 | 0.375 | 10001 | 74333 | 1992-06-24 | 1993-06-24 | | 8 | 0.4375 | 10001 | 75286 | 1993-06-24 | 1994-06-24 | | 9 | 0.5 | 10001 | 75994 | 1994-06-24 | 1995-06-24 | | 10 | 0.5625 | 10001 | 76884 | 1995-06-24 | 1996-06-23 | | 11 | 0.625 | 10001 | 80013 | 1996-06-23 | 1997-06-23 | | 12 | 0.6875 | 10001 | 81025 | 1997-06-23 | 1998-06-23 | | 13 | 0.75 | 10001 | 81097 | 1998-06-23 | 1999-06-23 | | 14 | 0.8125 | 10001 | 84917 | 1999-06-23 | 2000-06-22 | | 15 | 0.875 | 10001 | 85112 | 2000-06-22 | 2001-06-22 | | 16 | 0.9375 | 10001 | 85097 | 2001-06-22 | 2002-06-22 | | 17 | 1 | 10001 | 88958 | 2002-06-22 | 9999-01-01 | | 1 | 0 | 10002 | 65828 | 1996-08-03 | 1997-08-03 | | 2 | 0.2 | 10002 | 65909 | 1997-08-03 | 1998-08-03 | | 3 | 0.4 | 10002 | 67534 | 1998-08-03 | 1999-08-03 | +---------+---------+--------+--------+------------+------------+ 20 rows in set (5.40 sec) rank的值取row_num列的值,rows的值为(emp_no的值为1000117行,emp_no的值为100023行),percent列的值按(rank - 1) / (rows - 1)计算得出。

14.CUME_DIST函数的作用是分组内小于等于当前rank值的行数/分组内总行数

select * from ( select rank() over w as row_num, cume_dist() over w as cume, emp_no, salary, from_date, to_date from salaries WINDOW w AS (partition by emp_no order by from_date asc) ) t limit 25; +---------+----------------------+--------+--------+------------+------------+ | row_num | cume | emp_no | salary | from_date | to_date | +---------+----------------------+--------+--------+------------+------------+ | 1 | 0.058823529411764705 | 10001 | 60117 | 1986-06-26 | 1987-06-26 | | 2 | 0.11764705882352941 | 10001 | 62102 | 1987-06-26 | 1988-06-25 | | 3 | 0.17647058823529413 | 10001 | 66074 | 1988-06-25 | 1989-06-25 | | 4 | 0.23529411764705882 | 10001 | 66596 | 1989-06-25 | 1990-06-25 | | 5 | 0.29411764705882354 | 10001 | 66961 | 1990-06-25 | 1991-06-25 | | 6 | 0.35294117647058826 | 10001 | 71046 | 1991-06-25 | 1992-06-24 | | 7 | 0.4117647058823529 | 10001 | 74333 | 1992-06-24 | 1993-06-24 | | 8 | 0.47058823529411764 | 10001 | 75286 | 1993-06-24 | 1994-06-24 | | 9 | 0.5294117647058824 | 10001 | 75994 | 1994-06-24 | 1995-06-24 | | 10 | 0.5882352941176471 | 10001 | 76884 | 1995-06-24 | 1996-06-23 | | 11 | 0.6470588235294118 | 10001 | 80013 | 1996-06-23 | 1997-06-23 | | 12 | 0.7058823529411765 | 10001 | 81025 | 1997-06-23 | 1998-06-23 | | 13 | 0.7647058823529411 | 10001 | 81097 | 1998-06-23 | 1999-06-23 | | 14 | 0.8235294117647058 | 10001 | 84917 | 1999-06-23 | 2000-06-22 | | 15 | 0.8823529411764706 | 10001 | 85112 | 2000-06-22 | 2001-06-22 | | 16 | 0.9411764705882353 | 10001 | 85097 | 2001-06-22 | 2002-06-22 | | 17 | 1 | 10001 | 88958 | 2002-06-22 | 9999-01-01 | | 1 | 0.16666666666666666 | 10002 | 65828 | 1996-08-03 | 1997-08-03 | | 2 | 0.3333333333333333 | 10002 | 65909 | 1997-08-03 | 1998-08-03 | | 3 | 0.5 | 10002 | 67534 | 1998-08-03 | 1999-08-03 | | 4 | 0.6666666666666666 | 10002 | 69366 | 1999-08-03 | 2000-08-02 | | 5 | 0.8333333333333334 | 10002 | 71963 | 2000-08-02 | 2001-08-02 | | 6 | 1 | 10002 | 72527 | 2001-08-02 | 9999-01-01 | | 1 | 0.14285714285714285 | 10003 | 40006 | 1995-12-03 | 1996-12-02 | | 2 | 0.2857142857142857 | 10003 | 43616 | 1996-12-02 | 1997-12-02 | +---------+----------------------+--------+--------+------------+------------+ 25 rows in set (5.26 sec) 20 rows in set (5.96 sec) cume的值取row_num列的值,rows的值为(emp_no的值为1000117行,emp_no的值为100026行),cume列的值按(row_num) / (rows)计算得出。 也就是 1/172/17.....
最后修改时间:2025-04-09 09:51:18
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论