关注+星标,干货第一时间送达

千思万想,
千言万语,
千方百计,
小编我,终于T... 的回深圳了。
虽然这边比较潮湿、闷热,
但那也比北京舒服多了,
简而言之,那边真汁儿待不习惯!
话不多说,咱们来看看今天的题目。
其实,这是一道比较简单的题目。
"每日新增" 算是大数据中,日活类的数据指标之一。
对业务有点参考价值,
面试也可能会考到(小编知道诸位在想什么
)。
走起!
【温馨提示:建表语句及数据导入脚本,已经放在文章末尾】
一、需求场景
题目:
从用户登录明细表user_login_detail中,查询每天的新增用户数。
若一个用户在某天登录了,且在这一天之前没有登录过,则认为该用户为这一天的新增用户。
"每日新增" 算是大数据中,日活类的数据指标之一。
对业务有点参考价值,
用户登录明细表user_login_detail:

正确结果:

二、解决方案
演示工具:Hive-3.1.3 + DataGrip2022
很显然,每日新增用户的统计,
首先,需要计算出,每个用户首次登录的日期,
然后,统计每个日期,一共有多少是用户首次登录。
能理解吧?
(1)按照用户id分组,统计每个用户最早登录日期
select user_id,
min(date(login_ts)) as first_lg
from user_login_detail
group by user_id
结果集:

代码里面有个日期函数:date 函数
作用是将时间/日期字符串等,转换为日期格式
例如:
select date('2024-08-04 00:00:00.000');
结果就是:

当然,诸位也可以使用:date_format() 函数
select date_format('2024-08-04 00:00:00.000', 'yyyy-MM-dd');
select date_format('2024-08-04 00:00:00.000', 'yyyy-MM-dd HH');
select date_format('2024-08-04 00:00:00.000', 'yyyy-MM-dd HH:mm');
转换成想要的格式字符串:

(2)按照日期分组,统计每个日期有多少用户
select first_lg,
count(*) as cnt
from (
select user_id,
min(date(login_ts)) as first_lg
from user_login_detail
group by user_id
)t1
group by first_lg
结果集:

三、源数据
用户登录明细表user_login_detail:
create table user_login_detail
(
user_id string,
ip_address string,
login_ts string,
logout_ts string
)
row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
with serdeproperties ('field.delim' = '\t') stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://mycluster/user/hive/warehouse/hql_test2.db/user_login_detail'
tblproperties ('bucketing_version' = '2');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('101', '180.149.130.161', '2021-09-21 08:00:00', '2021-09-27 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('101', '180.149.130.161', '2021-09-27 08:00:00', '2021-09-27 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('101', '180.149.130.161', '2021-09-28 09:00:00', '2021-09-28 09:10:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('101', '180.149.130.161', '2021-09-29 13:30:00', '2021-09-29 13:50:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('101', '180.149.130.161', '2021-09-30 20:00:00', '2021-09-30 20:10:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('102', '120.245.11.2', '2021-09-22 09:00:00', '2021-09-27 09:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('102', '120.245.11.2', '2021-10-01 08:00:00', '2021-10-01 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('102', '180.149.130.174', '2021-10-01 07:50:00', '2021-10-01 08:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('102', '120.245.11.2', '2021-10-02 08:00:00', '2021-10-02 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('103', '27.184.97.3', '2021-09-23 10:00:00', '2021-09-27 10:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('103', '27.184.97.3', '2021-10-03 07:50:00', '2021-10-03 09:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('104', '27.184.97.34', '2021-09-24 11:00:00', '2021-09-27 11:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('104', '27.184.97.34', '2021-10-03 07:50:00', '2021-10-03 08:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('104', '27.184.97.34', '2021-10-03 08:50:00', '2021-10-03 10:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('104', '120.245.11.89', '2021-10-03 08:40:00', '2021-10-03 10:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('105', '119.180.192.212', '2021-10-04 09:10:00', '2021-10-04 09:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('106', '119.180.192.66', '2021-10-04 08:40:00', '2021-10-04 10:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('106', '119.180.192.66', '2021-10-05 21:50:00', '2021-10-05 22:40:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('107', '219.134.104.7', '2021-09-25 12:00:00', '2021-09-27 12:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('107', '219.134.104.7', '2021-10-05 22:00:00', '2021-10-05 23:00:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('107', '219.134.104.7', '2021-10-06 09:10:00', '2021-10-06 10:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('107', '27.184.97.46', '2021-10-06 09:00:00', '2021-10-06 10:00:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('108', '101.227.131.22', '2021-10-06 09:00:00', '2021-10-06 10:00:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('108', '101.227.131.22', '2021-10-06 22:00:00', '2021-10-06 23:00:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('109', '101.227.131.29', '2021-09-26 13:00:00', '2021-09-27 13:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('109', '101.227.131.29', '2021-10-06 08:50:00', '2021-10-06 10:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('109', '101.227.131.29', '2021-10-08 09:00:00', '2021-10-08 09:10:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('1010', '119.180.192.10', '2021-09-27 14:00:00', '2021-09-27 14:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('1010', '119.180.192.10', '2021-10-09 08:50:00', '2021-10-09 10:20:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('1011', '180.149.130.161', '2021-09-21 08:00:00', '2021-09-27 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('1011', '180.149.130.161', '2021-10-24 08:00:00', '2021-10-29 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('1011', '180.149.130.161', '2022-09-21 08:00:00', '2022-09-27 08:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('105', '119.180.192.212', '2021-10-06 09:10:00', '2021-10-04 09:30:00');
INSERT INTO hql_test2.user_login_detail (user_id, ip_address, login_ts, logout_ts) VALUES ('106', '119.180.192.66', '2021-10-05 08:50:00', '2021-10-05 12:40:00');
OK,这就是本期的内容了,下期再见!
文章转载自皮皮克克,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




