点击关注公众号,干货第一时间送达

这个 "连续xx天"的SQL问题,
实在是太经典了,
完全就是考查诸位对"开窗排序"的掌握,
经常看小编文章的小伙伴都知道,
这货经常出没!
前面的文章:
绝对是面试必考。
如果看过,本篇可以跳过,
直接看看文内的广告
帮小编挣一包辣条的钱
言归正传,咱们开始做题。
【温馨提示:建表语句及数据导入脚本,已经放在文末】
一、需求场景
题目:
查询订单信息表order_info中,至少连续3天下单的用户id。
订单信息表order_info:

正确结果:

二、解决方案
演示工具:Hive-3.1.3 + DataGrip2022
"至少连续xx天"的求解,
一般都会采用"开窗排序"的方式处理,
用到的函数无外乎:
rank()、dense_rank()、row_number()
这几个函数的详细介绍,
可以翻看前面的文章:
(1)按照user_id、create_date,进行分组去重
select user_id,
create_date
from order_info
group by user_id, create_date
结果集:

因为可能存在一个用户user_id,一天create_date下单多笔,
所以先用 group by 去重。
(2)根据分组,进行create_date 排序select user_id,
create_date,
row_number() over (partition by user_id order by create_date) as rk
from (
select user_id,
create_date
from order_info
group by user_id, create_date
)t1
select user_id,
create_date,
row_number() over (partition by user_id order by create_date) as rk
from (
select user_id,
create_date
from order_info
group by user_id, create_date
)t1
结果集:

(3)根据分组,计算 create_date和rk的差值
select user_id,
date_sub(create_date,rk) as `差值`
from (
select user_id,
create_date,
row_number() over (partition by user_id order by create_date) as rk
from (
select user_id,
create_date
from order_info
group by user_id, create_date
)t1
)t2
结果集:

因为如果是 "连续的" 的日期,
减去对应的 "序号",得出来的日期也是相同的。
(4)根据user_id,'差值'做分组,筛选计数>=3的
select user_id
from (
select user_id,
date_sub(create_date,rk) as `差值`
from (
select user_id,
create_date,
row_number() over (partition by user_id order by create_date) as rk
from (
select user_id,
create_date
from order_info
group by user_id, create_date
)t1
)t2
)t3
group by user_id,`差值`
having count(`差值`)>=3
结果集:

因为出现同一用户user_id,有多次连续3天以上下单,
所以会出现这样的情况,
因此还需要再排重一次。
(5)排重user_id
select distinct user_id
from (
select user_id
from (
select user_id,
date_sub(create_date,rk) as `差值`
from (
select user_id,
create_date,
row_number() over (partition by user_id order by create_date) as rk
from (
select user_id,
create_date
from order_info
group by user_id, create_date
)t1
)t2
)t3
group by user_id,`差值`
having count(`差值`)>=3
)t4
结果集:

三、源数据
订单信息表order_info:
create table order_info
(
order_id string,
user_id string,
create_date string,
total_amount decimal(16, 2)
)
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
with serdeproperties ('field.delim' = '\t') stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://mycluster/user/hive/warehouse/hql_test2.db/order_info'
tblproperties ('bucketing_version' = '2');
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('1', '101', '2021-09-27', 29000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('2', '101', '2021-09-28', 70500.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('3', '101', '2021-09-29', 43300.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('4', '101', '2021-09-30', 860.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('5', '102', '2021-10-01', 46180.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('6', '102', '2021-10-01', 50000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('7', '102', '2021-10-01', 75500.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('8', '102', '2021-10-02', 6170.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('9', '103', '2021-10-02', 18580.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('10', '103', '2021-10-02', 28000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('11', '103', '2021-10-02', 23400.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('12', '103', '2021-10-03', 5910.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('13', '104', '2021-10-03', 13000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('14', '104', '2021-10-03', 69500.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('15', '104', '2021-10-03', 2000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('16', '104', '2021-10-03', 5380.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('17', '105', '2021-10-04', 6210.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('18', '105', '2021-10-04', 68000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('19', '105', '2021-10-04', 43100.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('20', '105', '2021-10-04', 2790.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('21', '106', '2021-10-04', 9390.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('22', '106', '2021-10-05', 58000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('23', '106', '2021-10-05', 46600.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('24', '106', '2021-10-05', 5160.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('25', '107', '2021-10-05', 55350.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('26', '107', '2021-10-05', 14500.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('27', '107', '2021-10-06', 47400.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('28', '107', '2021-10-06', 6900.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('29', '108', '2021-10-06', 56570.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('30', '108', '2021-10-06', 44500.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('31', '108', '2021-10-07', 50800.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('32', '108', '2021-10-07', 3900.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('33', '109', '2021-10-07', 41480.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('34', '109', '2021-10-07', 88000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('35', '109', '2020-10-08', 15000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('36', '109', '2020-10-08', 9020.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('37', '1010', '2020-10-08', 9260.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('38', '1010', '2020-10-08', 12000.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('39', '1010', '2020-10-08', 23900.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('40', '1010', '2020-10-08', 6790.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('41', '101', '2020-10-08', 300.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('42', '101', '2021-01-01', 260.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('43', '101', '2021-01-02', 280.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('44', '101', '2021-01-03', 420.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('45', '101', '2021-01-04', 240.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('46', '1011', '2021-09-26', 240.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('47', '1011', '2021-10-24', 240.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('48', '1011', '2022-09-24', 240.00);
INSERT INTO hql_test2.order_info (order_id, user_id, create_date, total_amount) VALUES ('49', '1012', '2022-09-24', 2010.00);

文章转载自皮皮克克,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




