碎碎念:上次说写个用SQL查询留存分析的。其实也是很多年前用过的。重新写好后发现有好多人写过。不做比较,仅作记录。附送一个漏斗的逻辑,嘻嘻嘻~~~本文中涉及的计算逻辑与各数据产品的计算逻辑无关,仅代表自己的查询逻辑,供日常分析参考。

已知有表名为events,记录着用户的行为日志,如下表所示,需做留存分析,漏斗分析。

想要查询每日活跃用户留存指标。想要输出如下图数据格式(这里演示的输出指标比较多,根据需求适当展示即可,因为表格较长,所以用导出后用excel缩短了列宽截的图)

输出数据为一张大宽表,横轴表示日期,纵轴分别每日当日初始事件人数,及这些用户后续事件后续各日指标数据。
包括:每日登录人数;次日付费人数,次日付费留存率,次日付费次数,次日付费金额,次日人均付费金额,次日累计人均付费金额;3日相关数据,3日付费人数,3日累计人均付费金额;其他各日指标等等。
这里仅看后续单个事件。围绕该事件的人数,次数,总和,均值,累计等等指标。可以是登录,付费,可以是任意的事件。
主要计算逻辑:
1.统计发生初始事件的用户和时间。这个可能来源于用户表,或事件表2.初始事件和后续事件用户进行关联,计算后续事件和初始事件的时间差3.计算用户数时对用户进行去重统计。其他指标时进行对应的统计计算4.做透视。
查询代码如下:
select dates."登录日期a",count(distinct dates.user_id) as "当日活跃用户数",count(distinct case when dates."付费间隔天数"=0 then dates.user_id else null end) as "当日付费人数",count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end) as "次日付费人数",count(distinct case when dates."付费间隔天数"<=1 then dates.user_id else null end) as "次日累计付费人数",count(distinct case when dates."付费间隔天数"<=2 then dates.user_id else null end) as "3日累计付费人数",count(distinct case when dates."付费间隔天数"<=2 and dates."付费间隔天数">0 then dates.user_id else null end) as "3日累计(不包括首日)付费人数",concat(CAST(round(count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日付费留存率",concat(CAST(round(count(distinct case when dates."付费间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日付费留存率",sum(case when dates."付费间隔天数"=1 then dates."付费次数" else null end) as "次日付费次数" ,sum(case when dates."付费间隔天数"=2 then dates."付费次数" else null end) as "3日付费次数" ,sum(case when dates."付费间隔天数"<=1 then dates."付费次数" else null end) as "次日累计付费次数" ,sum(case when dates."付费间隔天数"<=2 then dates."付费次数" else null end) as "3日累计付费次数" ,sum(case when dates."付费间隔天数"=1 then dates."付费金额" else null end) as "次日付费金额" ,sum(case when dates."付费间隔天数"=2 then dates."付费金额" else null end) as "3日付费金额" ,sum(case when dates."付费间隔天数"=2 then dates."付费金额" else null end) as "次日累计付费金额" ,sum(case when dates."付费间隔天数"=1 then dates."付费金额" else null end)/count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)as "次日人均付费金额",sum(case when dates."付费间隔天数"<=1 then dates."付费金额" else null end)/count(distinct dates.user_id) as "次日累计人均付金额",sum(case when dates."付费间隔天数"<=2 then dates."付费金额" else null end)/count(distinct dates.user_id) as "3日累计人均付费金额"from(select a.user_id,a."登录日期a",datediff (c."付费日期c",a."登录日期a") as "付费间隔天数","付费次数",c."付费金额"from(select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join(select user_id,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date)cOn a.user_id=c.user_id and a."登录日期a"<=c."付费日期c")datesgroup by dates."登录日期a"order by dates."登录日期a"

2.想查看登录用户后续的付费留存数据。包括付费人数,付费次数,付费金额,累计付费金额。输出的数据格式如下图所示

可以看到这个输出了所有间隔天数的指标,而不用一个个的判断。这个比1中查询就是少了一步透视的工作,少了一步把不同间隔天数的同一指标(比如付费人数,付费率)作为列的转置判断。是按照间隔天数直接输出。
主要计算逻辑:
1.统计发生初始事件的用户和时间。这个可能来源于用户表,或事件表2.初始事件和后续事件用户进行关联,计算后续事件和初始事件的时间差3.计算用户数时对用户进行去重统计。其他指标时进行对应的统计计算。4.按照首日期,间隔天数,展示后续行为指标。5.这里为了把首日登录数据展示出来,需加首日用户数的关联
可用如下代码查询得到:
select c."登录日期",c."首日登录用户数",c."付费间隔天数",c. "付费人数",c."付费次数",c."付费金额",sum(c."付费次数") over(partition by c."登录日期" order by c."付费间隔天数" asc) as "累计付费次数",sum(c."付费金额") over(partition by c."登录日期" order by c."付费间隔天数" asc) as "累计付费金额"from(select a."登录日期",a."首日登录用户数",b."付费间隔天数",b. "付费人数",b."付费次数",b."付费金额" from(select date as "登录日期",count(distinct user_id) as "首日登录用户数" from events where event='LoginResult' group by date)aleft join(select dates."登录日期a",dates."付费间隔天数",count(distinct user_id) as "付费人数",sum(dates."付费次数")as "付费次数",sum(dates."付费金额") as "付费金额"from(select a.user_id,a."登录日期a",datediff (c."付费日期c",a."登录日期a") as "付费间隔天数",c."付费次数",c."付费金额"from(select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join(select user_id,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date)cOn a.user_id=c.user_id and a."登录日期a"<=c."付费日期c")datesGROUP BY dates."登录日期a",dates."付费间隔天数" order by dates."登录日期a",dates."付费间隔天数" asc)bon a."登录日期"=b."登录日期a" order by "登录日期","付费间隔天数" asc)c order by "登录日期","付费间隔天数" asc

3.登录/活跃用户后续次日,3日留存率,付费率。数据输出格式如下图:

这个是后续事件为多个事件,是1,2的延申。这里的指标都是对用户留存率的计算,是对用户去重后的计算。
可用如下代码查询得到
select dates."登录日期a",count(distinct dates.user_id) as "当日活跃用户数",count(distinct case when dates."间隔天数"=1 then dates.user_id else null end) as "次日留存数",count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end) as "次日付费数",concat(CAST(round(count(distinct case when dates."间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日留存率",concat(CAST(round(count(distinct case when dates."间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日留存率",concat(CAST(round(count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日付费留存率",concat(CAST(round(count(distinct case when dates."付费间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日付费留存率"from(select a.user_id,a."登录日期a",b."登录日期b",datediff (b."登录日期b",a."登录日期a") as "间隔天数",datediff (c."付费日期c",a."登录日期a") as "付费间隔天数"from(select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join(select distinct user_id,date as "登录日期b" from events where event='LoginResult')bOn a.user_id=b.user_id and a."登录日期a"< b."登录日期b"left join(select distinct user_id,date as "付费日期c" from events where event='PayOrder')cOn a.user_id=c.user_id and a."登录日期a"< c."付费日期c")datesgroup by dates."登录日期a"order by dates."登录日期a"
主要计算逻辑,同1的逻辑。Tips:此代码不能用于类似,后续多个行为的付费金额,付费次数,累计付费金额,付费次数等的计算。因为此代码中在连续关联多个事件时没有做过多的条件关联,付费次数,付费金额会加倍。
4.各品类,后续各日复购人数,次数,复购金额。

与2中不一样的是,这里增加了按照品类。计算逻辑是,在关联的时候,增加品类的关联。
select dates."付费日期a",dates.first_commodity,dates."付费间隔天数",count(distinct user_id) as "付费人数",sum(dates."付费次数")as "付费次数",sum(dates."付费金额") as "付费金额"from(select a.user_id,a."付费日期a",a.first_commodity,datediff (c."付费日期c",a."付费日期a") as "付费间隔天数",c."付费次数",c."付费金额"from(select distinct user_id,first_commodity,date as "付费日期a" from events where event='PayOrder')a left join(select user_id,first_commodity,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date,first_commodity)cOn a.user_id=c.user_id and a.first_commodity=c.first_commodity and a."付费日期a"<=c."付费日期c")dateswhere dates."付费日期a" is not null and dates."付费间隔天数" is not nullGROUP BY dates."付费日期a",dates.first_commodity,dates."付费间隔天数"order by dates."付费日期a",dates.first_commodity,dates."付费间隔天数" asc
5.漏斗分析。以下为一个漏斗用户转化查询(漏斗:浏览商品-加购-购买)。窗口期为24小时的。
本文不涉及一般数据产品的计算逻辑和实现的技术方案的探讨,仅对日常分析做小范围内的尝试。

计算逻辑比较简单,假设漏斗是A-B-C事件,窗口期为n,计算逻辑为:
1.找用户的每一个A事件及其时间,关联去找每个用户A窗口期内的B事件,即 t(b)>t(a) and t(b)<t(a)+n2.关联去找满足条件的C事件,条件为在B事件之后并且在A窗口期n范围内。即t(c)>t(b) and t(c)<t(a)+n3.找到这样的漏斗,按照日期对满足条件的各环节用户进行去重计算。即为用户漏斗转化。
查询代码如下:
SELECT pv.date as "日期",COUNT(DISTINCT pv.user_id) AS "浏览用户数",COUNT(DISTINCT cart.user_id) AS "浏览后加购用户数",COUNT(DISTINCT buy.user_id) AS "浏览后加购后购买用户数"FROM(SELECT DISTINCT user_id, time,dateFROM eventsWHERE event = 'CommodityDetail' and date between '2020-09-17' and '2020-09-30') AS pvLEFT JOIN(SELECT DISTINCT user_id, timeFROM eventsWHERE EVENT = 'SubmitOrder') AS cartON pv.user_id = cart.user_idAND pv.time < cart.time and cart.time<pv.time + INTERVAL 24 hourLEFT JOIN(SELECT DISTINCT user_id, timeFROM eventsWHERE EVENT = 'PayOrder') AS buyON cart.user_id = buy.user_idAND cart.time < buy.time and buy.time<pv.time + INTERVAL 24 hourgroup by pv.date
6.接着5中的用户漏斗。再说个次数漏斗的逻辑,假设漏斗是A-B-C,窗口期为n.
这里的次数漏斗逻辑是在用户的任意2个A事件之间且在A之后窗口期内,如果有按顺序依次完成漏斗内的后续事件,则认为该次漏斗发生转化。
计算逻辑为:
1.找每一个A及其发生时间为t(a)=t_start(a),并取该事件后续一个A事件的时间为t_end(a)2.关联找到每个满足条件的后续事件。比如,关联找到B事件。则B事件发生时间条件为A事件之后,且在窗口期内,且在t_start和t_end之间。即t(b)>t(a) andt>t_end(a) and t(b)<t(a)+n3.关联去找满足条件的C事件,条件为在B事件之后并且在A窗口期n范围内即t(c)>t(b) and andt>t_end(a) and t(b)<t(a)+n4.找到这样的漏斗,按照日期对满足条件的各环节user_id统计计数(注意这里不去重)。即为次数漏斗转化。
次数漏斗的具体的查询代码这里就不写了~有兴趣的可以自己查询玩

本文中所涉及的事件,均可根据自己查询时所涉及的事件进行修改。有没写清楚的地方,欢迎留言私聊!
这个国庆节感觉很多人的心情都跟过年似的
。祝大家国庆节快乐~祝伟大的祖国国泰民安,繁荣昌盛~~




