点击关注公众号,干货第一时间送达

本题要求筛选的是"销量最高",
和上篇文章:
有异曲同工之妙。
不过,还需要根据 "品类" 进行二次分组统计。
【温馨提示:建表语句及数据导入脚本,已经放在了文章末尾】
一、需求场景
题目:
查询各品类销售商品的种类数、及销量最高的商品。
看下结果:

不难,就是会有点绕。
二、解决方案
演示工具:Hive-3.1.3 + DataGrip2022
还是遵守:小 -> 大 筛选原则,
先统计每件商品的累积销量,再去关联商品类别。
(1)分组计算每个商品的累积销量
select sku_id, sum(sku_num) as cnt
from order_detail
group by sku_id
结果集:

(2)关联商品类别(category_info)和商品详情(sku_info)
select t1.sku_id, sk.name, t1.cnt,
ct.category_id, ct.category_name
from (
select sku_id, sum(sku_num) as cnt
from order_detail
group by sku_id
)t1
join sku_info sk
on t1.sku_id = sk.sku_id
join category_info ct
on sk.category_id = ct.category_id
结果集:

(3)根据商品类别(category_id)进行销量(cnt)的分组排序
select t2.category_id,
t2.category_name,
t2.sku_id,
t2.name,
t2.cnt,
count(*) over(partition by t2.category_id) as sku_cnt,
rank() over(partition by t2.category_id order by t2.cnt desc) as rk
from (
select t1.sku_id, sk.name, t1.cnt,
ct.category_id, ct.category_name
from (
select sku_id, sum(sku_num) as cnt
from order_detail
group by sku_id
)t1
join sku_info sk
on t1.sku_id = sk.sku_id
join category_info ct
on sk.category_id = ct.category_id
)t2
结果集:

(4)筛选各个类别(category_id)销量(cnt)最高的(rk=1)的商品
select category_id,category_name,sku_id,name,cnt,sku_cnt
from (
select t2.category_id,
t2.category_name,
t2.sku_id,
t2.name,
t2.cnt,
count(*) over(partition by t2.category_id) as sku_cnt,
rank() over(partition by t2.category_id order by t2.cnt desc) as rk
from (
select t1.sku_id, sk.name, t1.cnt,
ct.category_id, ct.category_name
from (
select sku_id, sum(sku_num) as cnt
from order_detail
group by sku_id
)t1
join sku_info sk
on t1.sku_id = sk.sku_id
join category_info ct
on sk.category_id = ct.category_id
)t2
)t3
where rk=1;
结果集:

三、源数据
category_info:
create table category_info
(
category_id string,
category_name string
)
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
with serdeproperties ('field.delim' = '\t') stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://mycluster/user/hive/warehouse/db/category_info'
tblproperties ('bucketing_version' = '2');
INSERT INTO category_info (category_id, category_name) VALUES ('1', '数码');
INSERT INTO category_info (category_id, category_name) VALUES ('2', '厨卫');
INSERT INTO category_info (category_id, category_name) VALUES ('3', '户外');
order_detail 和 sku_info 都在前面的文章:

文章转载自皮皮克克,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




