
我们经常会需要把数据库的数据按日期汇总,然后可视化。
然而,汇总后日期并不一定是连续的,比如下面这样:
day | users
------------------------+-------
2020-06-18 00:00:00+08 | 4
2020-06-19 00:00:00+08 | 5
2020-06-20 00:00:00+08 | 3
2020-06-21 00:00:00+08 | 1
2020-06-22 00:00:00+08 | 3
2020-06-23 00:00:00+08 | 1
2020-06-24 00:00:00+08 | 2
2020-06-28 00:00:00+08 | 1
2020-06-29 00:00:00+08 | 2
2020-06-30 00:00:00+08 | 4
直接可视化的话users为0的日期就不能显示,看起来好像每天的users都大于0,显然很不合理。
在PostgreSQL中,我们可以借助generate_series
函数来生成连续的日期:
generate_series(start, stop, step interval)
:
start
开始时间
stop
结束时间
step interval
间隔时间
比如:
SELECT * FROM generate_series('2020-06-18'::DATE, '2020-06-30'::DATE, INTERVAL'1 day') as date;
date
------------------------
2020-06-18 00:00:00+08
2020-06-19 00:00:00+08
2020-06-20 00:00:00+08
2020-06-21 00:00:00+08
2020-06-22 00:00:00+08
2020-06-23 00:00:00+08
2020-06-24 00:00:00+08
2020-06-25 00:00:00+08
2020-06-26 00:00:00+08
2020-06-27 00:00:00+08
2020-06-28 00:00:00+08
2020-06-29 00:00:00+08
2020-06-30 00:00:00+08
这样我们就可以将其与按日期汇总后的表daily_users进行连接,得到我们想要的效果:
SELECT gs.date as date,
COALESCE(d.users, 0) as users
FROM
generate_series('2020-06-18'::DATE, '2020-06-30'::DATE, INTERVAL '1 day') gs(date)
LEFT JOIN daily_users d
ON d.day = gs.date;
date | users
------------------------+-------
2020-06-18 00:00:00+08 | 4
2020-06-19 00:00:00+08 | 5
2020-06-20 00:00:00+08 | 3
2020-06-21 00:00:00+08 | 1
2020-06-22 00:00:00+08 | 3
2020-06-23 00:00:00+08 | 1
2020-06-24 00:00:00+08 | 2
2020-06-25 00:00:00+08 | 0
2020-06-26 00:00:00+08 | 0
2020-06-27 00:00:00+08 | 0
2020-06-28 00:00:00+08 | 1
2020-06-29 00:00:00+08 | 2
2020-06-30 00:00:00+08 | 4
然而,上面generate_series的起止日期都是写死的,如果想要直接根据daily_users里的起止日期生成序列可以这么写:
SELECT d.date,
COALESCE(u.users,0) as users
FROM
(SELECT generate_series(MIN(day)::DATE,
MAX(day),
INTERVAL '1 day') AS date
FROM daily_users) d
LEFT JOIN
daily_users u ON d.date = u.day
;
结果是一样的:
date | users
------------------------+-------
2020-06-18 00:00:00+08 | 4
2020-06-19 00:00:00+08 | 5
2020-06-20 00:00:00+08 | 3
2020-06-21 00:00:00+08 | 1
2020-06-22 00:00:00+08 | 3
2020-06-23 00:00:00+08 | 1
2020-06-24 00:00:00+08 | 2
2020-06-25 00:00:00+08 | 0
2020-06-26 00:00:00+08 | 0
2020-06-27 00:00:00+08 | 0
2020-06-28 00:00:00+08 | 1
2020-06-29 00:00:00+08 | 2
2020-06-30 00:00:00+08 | 4
总之,这类问题的思路就是构造连续的日期辅助列,然后利用LEFT JOIN
达到想要的效果,这样可视化的时候日期就不会断掉了。
下面附创建daily_users
和插入数据的语句:
CREATE TABLE daily_users (day timestamptz, users INTEGER);
INSERT INTO daily_users ("day", "users") VALUES
('2020-06-18 00:00:00+08', '4'),
('2020-06-19 00:00:00+08', '5'),
('2020-06-20 00:00:00+08', '3'),
('2020-06-21 00:00:00+08', '1'),
('2020-06-22 00:00:00+08', '3'),
('2020-06-23 00:00:00+08', '1'),
('2020-06-24 00:00:00+08', '2'),
('2020-06-28 00:00:00+08', '1'),
('2020-06-29 00:00:00+08', '2'),
('2020-06-30 00:00:00+08', '4');




