暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

学习MySQL(五):窗口函数

Linux技术宅 2024-04-27
238

点击蓝字,关注我

Every Day

芙蓉楼送辛渐

[作者] 王昌龄 [朝代] 唐

寒雨连江夜入吴,平明送客楚山孤。

洛阳亲友如相问,一片冰心在玉壶。

窗口函数介绍

    窗口函数的引入是为了解决想要既显示聚集前的数据,又要显示聚集后的数据;窗口数对一组值进行操作,不需要使用GROUP BY子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列。

    强调:使用MySQL 8.0版本方可实现

基本语法

函数名(列) over(选项) 选项为partition by 列 order by 列

解释:

  • over(partition by xxx) 按xxx分组的所有行进行分组

  • over(partition by xxx order by aaa) 按列xxx分组,按列aaa排序

  • over(order by aaa) 按aaa列排序

  • over括号中的partition by和order by的使用根据具体情况选择

    -- 需求:计算每个学生的及格科目数
    -- 使用聚合函数,类似数据透视表,原有表结构已发生变化
    SELECT student_id,count( sid ) FROM score WHERE num >= 60 GROUP BY student_id;


    -- 使用窗口函数,不会更改原表结构
    SELECT student_id,count( sid ) over ( PARTITION BY student_id ORDER BY student_id ) AS 及格数
    FROM score WHERE num >= 60;

    聚合窗口函数

    语法:聚合函数(列) over(partition by 列 order by 列)

    常见的聚合函数:sum() count() avg() max() min()

    排序窗口函数

    • row_number():仅仅根据行号进行排序,相同结果则排序按照顺序依次排

    • rank():排名,与row_number函数不同的是,rank函数考虑到了over子句中排序字段值相同的情况,over子句中排序字段值相同的排序结果是一样的,后面字段值不相同的序号将跳过相同的排名号排下一个。如:11335

    • dense_rank():密集排序,用法跟rank类似,唯一不同是当排序结果相同时,排序不跳跃,而是紧跟排下一个。如:11223

    • ntile():桶排序,首先,ntile会先根据你的分组依据,然后把每个组的总记录数进行按照你给的ntile(n)里的数字n进行均分,这个数字就是桶数,例如一个组内总共12条记录,若n=6,则等划分成6桶,然后按照num的排序等级划分,12/6=2则每个桶两条记录,也就是112233445566的排序结果,常用于提取前百分之多少的应用场景。

    都是排名函数,不同之处在对于名次相同的数据处理方式

      -- 对每门课程进行排序
      SELECT
      s.sid,
      s1.sname,
      s1.gender,
      c.cname,
      s.num,
      row_number() over ( PARTITION BY c.cname ORDER BY num DESC ) AS row_number排名,
      rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS rank排名,
      dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名,
      ntile( 6 ) over ( PARTITION BY c.cname ORDER BY num DESC ) AS ntile排名
      FROM
      score AS s
      JOIN student AS s1 ON s.student_id = s1.sid
      LEFT JOIN course AS c ON s.course_id = c.cid


      -- 计算每门课程前三,考虑排名相同的情况
      SELECT * FROM (
      SELECT
      s.sid,
      s1.sname,
      s1.gender,
      c.cname,
      s.num,
      dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名
      FROM
      score AS s
      JOIN student AS s1 ON s.student_id = s1.sid
      LEFT JOIN course AS c ON s.course_id = c.cid
      ) AS a
      WHERE
      dense_rank排名 <=3

      位置移动窗口函数

      • lag(col,n):col列名,n行数,用于统计窗口内往上第n行值

      • lead(col,n):col列名,n行数,用于统计窗口内往下第n行值

      这两个函数可以用于同列中相邻行的数据计算

      应用场景:

      • 计算作弊次数

        -- 需求:对于下面的数据,对于同一用户(uid)如果在2分钟之内重新登陆,则判断为作弊,统计哪些用户有作弊行为,并计算作弊次数
        -- 数据代码
        CREATE TABLE lead_table (
        id INT PRIMARY KEY,
        uid INT NOT NULL,
        login_time datetime NOT NULL );
        INSERT INTO lead_table
        VALUES
        ( 1, 1, "2020-8-26 12:59:00" ),
        ( 2, 1, "2020-8-26 13:02:23" ),
        ( 3, 1, "2020-8-26 13:03:34" ),
        ( 4, 1, "2020-8-26 13:09:00" ),
        ( 5, 2, "2020-8-26 13:57:00" ),
        ( 6, 2, "2020-8-26 13:59:00" ),
        ( 7, 2, "2020-8-26 13:59:45" );

            思路:根据题目要求,如果能把相邻两列的下面一列与上面那一列变成同一行,不久能实现相减了么,因此我们可以多生成一列,例如:把uid都为1的第二行记录生成到第一行,以此类推,这就可以用到lead往下移动的操作了

          -- 第一步
          SELECT id,uid,login_time,
          LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time
          FROM lead_table;


          --第二步
          SELECT id,uid,login_time,
          LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
          TIMESTAMPDIFF(
          SECOND,login_time,
          LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
          FROM lead_table;

          -- 最终代码
          SELECT uid,COUNT( 1 ) AS 作弊次数
          FROM
          (
          SELECT id,uid,login_time,
          LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
          TIMESTAMPDIFF(
          SECOND,login_time,
          lead( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
          FROM lead_table ) AS e
          WHERE 相差秒数 <= 120
          GROUP BY uid;
          • 计算次日留存率

          其他窗口函数

          • first_value(column):取分组排序后第一个值

            SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
            FIRST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS first_value用法
            FROM score AS s
            JOIN student AS s1 ON s.student_id = s1.sid
            LEFT JOIN course AS c ON s.course_id = c.cid
            • last_value(column):取分组排序后最后一个值

              SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
              LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS last_value用法
              FROM score AS s
              JOIN student AS s1 ON s.student_id = s1.sid
              LEFT JOIN course AS c ON s.course_id = c.cid

                  为什么和想要的结果不一样呢?

                  实际上,窗口函数默认统计范围是rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较。

                  修改SQL,在order by条件的后面加上语句:rows between unbounded preceding and unbounded following,可以理解为:当前分组数据中的所有数据进行比较,取最后一条记录。

                SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
                LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC rows between unbounded preceding and unbounded following) AS last_value用法
                FROM score AS s
                JOIN student AS s1 ON s.student_id = s1.sid
                LEFT JOIN course AS c ON s.course_id = c.cid

                详细介绍:

                  • rows between XXX and XXX

                  • unbounded 无限制的

                  • preceding 分区的当前记录的向前偏移量

                  • current 当前

                  • following 分区的当前记录的向后偏移量

                示例:累计计算每个月的销售额

                  -- 示例数据
                  CREATE TABLE sale (
                  id INT PRIMARY KEY auto_increment,
                  年份 INT,
                  月份 INT,
                  money FLOAT ( 10, 2 ));
                  INSERT INTO sale ( 年份, 月份, money )
                  VALUES
                  ( 2020, 1, 5840 ),
                  ( 2020, 2, 5780 ),
                  ( 2020, 3, 4300 ),
                  ( 2020, 4, 4760 ),
                  ( 2020, 5, 3630 ),
                  ( 2020, 6, 4130 ),
                  ( 2020, 7, 4350 );

                  -- 语句
                  SELECT *,
                  sum( money ) over ( ORDER BY 月份
                  rows between unbounded preceding and current row) AS 累计销售额
                  FROM sale;

                  本章示例数据

                    CREATE DATABASE school;
                    USE school;
                    CREATE TABLE class (
                    cid INT ( 11 ) NOT NULL auto_increment,
                    caption VARCHAR ( 32 ) NOT NULL,
                    PRIMARY KEY ( cid )
                    ) ENGINE = INNODB charset = utf8;


                    INSERT INTO class
                    VALUES
                    ( 1, '三年二班' ),
                    ( 2, '三年三班' ),
                    ( 3, '二年二班' ),
                    ( 4, '一年二班' ),
                    ( 5, '二年五班' );

                    CREATE TABLE teacher (
                    tid INT ( 11 ) NOT NULL auto_increment,
                    tname VARCHAR ( 32 ) NOT NULL,
                    PRIMARY KEY ( tid )
                    ) ENGINE = INNODB DEFAULT charset = utf8;


                    INSERT INTO teacher
                    VALUES
                    ( 1, '张磊老师' ),
                    ( 2, '李平老师' ),
                    ( 3, '刘兰老师' ),
                    ( 4, '朱朱老师' ),
                    ( 5, '李杰老师' );

                    CREATE TABLE course (
                    cid INT ( 11 ) NOT NULL auto_increment,
                    cname VARCHAR ( 32 ) NOT NULL,
                    teacher_id INT ( 11 ) NOT NULL,
                    PRIMARY KEY ( cid ),
                    KEY fk_couurse_teacher ( teacher_id ),
                    CONSTRAINT fk_course_teacher FOREIGN KEY ( teacher_id ) REFERENCES teacher ( tid )
                    ) ENGINE = INNODB DEFAULT charset = utf8;
                    INSERT INTO course
                    VALUES
                    ( 1, '生物', 1 ),
                    ( 2, '物理', 2 ),
                    ( 3, '体育', 3 ),
                    ( 4, '美术', 2 );


                    CREATE TABLE student (
                    sid INT ( 11 ) NOT NULL auto_increment,
                    gender CHAR ( 1 ) NOT NULL,
                    class_id INT ( 11 ) NOT NULL,
                    sname VARCHAR ( 32 ) NOT NULL,
                    PRIMARY KEY ( sid ),
                    KEY fk_class ( class_id ),
                    CONSTRAINT fk_class FOREIGN KEY ( class_id ) REFERENCES class ( cid )
                    ) ENGINE = INNODB DEFAULT charset = utf8;


                    INSERT INTO student
                    VALUES
                    ( 1, '男', 1, '理解' ),
                    ( 2, '女', 1, '钢蛋' ),
                    ( 3, '男', 1, '张三' ),
                    ( 4, '男', 1, '张思' ),
                    ( 5, '女', 1, '网易' ),
                    ( 6, '男', 1, '王二' ),
                    ( 7, '男', 2, '铁道' ),
                    ( 8, '男', 2, '李武' ),
                    ( 9, '男', 2, '刘三' ),
                    ( 10, '女', 2, '刘一' ),
                    ( 11, '男', 2, '刘思' ),
                    ( 12, '男', 3, '王三' ),
                    ( 13, '男', 3, '小五' ),
                    ( 14, '男', 3, '小七' ),
                    ( 15, '女', 3, '如花' ),
                    ( 16, '男', 3, '张四' );

                    CREATE TABLE score (
                    sid INT ( 11 ) NOT NULL auto_increment,
                    student_id INT ( 11 ) NOT NULL,
                    course_id INT ( 11 ) NOT NULL,
                    num INT ( 11 ) NOT NULL,
                    PRIMARY KEY ( sid ),
                    KEY fk_score_student ( student_id ),
                    KEY fk_score_course ( course_id ),
                    CONSTRAINT fk_score_course FOREIGN KEY ( course_id ) REFERENCES course ( cid ),
                    CONSTRAINT fk_score_student FOREIGN KEY ( student_id ) REFERENCES student ( sid )
                    ) ENGINE = INNODB DEFAULT charset = utf8;


                    INSERT INTO score
                    VALUES
                    ( 1, 1, 1, 10 ),
                    ( 2, 1, 2, 9 ),
                    ( 5, 1, 4, 66 ),
                    ( 6, 2, 1, 8 ),
                    ( 8, 2, 3, 68 ),
                    ( 9, 2, 4, 99 ),
                    ( 10, 3, 1, 77 ),
                    ( 11, 3, 2, 66 ),
                    ( 12, 3, 3, 87 ),
                    ( 13, 3, 4, 99 ),
                    ( 14, 4, 1, 79 ),
                    ( 15, 4, 2, 11 ),
                    ( 16, 4, 3, 67 ),
                    ( 17, 4, 4, 100 ),
                    ( 18, 5, 1, 79 ),
                    ( 19, 5, 2, 11 ),
                    ( 20, 5, 3, 67 ),
                    ( 21, 5, 4, 100 );



                    分享、在看与点赞

                    只要你点,我们就是胖友

                    文章转载自Linux技术宅,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                    评论