暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

留存,漏斗分析简单的查询逻辑

填充空白 2022-10-05
405

碎碎念:上次说写个用SQL查询留存分析的。其实也是很多年前用过的。重新写好后发现有好多人写过。不做比较,仅作记录。附送一个漏斗的逻辑,嘻嘻嘻~~~本文中涉及的计算逻辑与各数据产品的计算逻辑无关,仅代表自己的查询逻辑,供日常分析参考。


已知有表名为events,记录着用户的行为日志,如下表所示,需做留存分析,漏斗分析。


  1. 想要查询每日活跃用户留存指标。想要输出如下图数据格式这里演示的输出指标比较多,根据需求适当展示即可,因为表格较长,所以用导出后用excel缩短了列宽截的图)

输出数据为一张大宽表,横轴表示日期,纵轴分别每日当日初始事件人数,及这些用户后续事件后续各日指标数据。

包括:每日登录人数;次日付费人数,次日付费留存率,次日付费次数,次日付费金额,次日人均付费金额,次日累计人均付费金额;3日相关数据,3日付费人数,3日累计人均付费金额;其他各日指标等等。

这里仅看后续单个事件。围绕该事件的人数,次数,总和,均值,累计等等指标。可以是登录,付费,可以是任意的事件。

主要计算逻辑:

    1.统计发生初始事件的用户和时间。这个可能来源于用户表,或事件表
    2.初始事件和后续事件用户进行关联,计算后续事件和初始事件的时间差
    3.计算用户数时对用户进行去重统计。其他指标时进行对应的统计计算
    4.做透视。

    查询代码如下:

      select dates."登录日期a",count(distinct dates.user_id)  as "当日活跃用户数",
      count(distinct case when dates."付费间隔天数"=0 then dates.user_id else null end) as "当日付费人数",
      count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end) as "次日付费人数",
      count(distinct case when dates."付费间隔天数"<=1 then dates.user_id else null end) as "次日累计付费人数",
      count(distinct case when dates."付费间隔天数"<=2 then dates.user_id else null end) as "3日累计付费人数",
      count(distinct case when dates."付费间隔天数"<=2 and dates."付费间隔天数">0 then dates.user_id else null end) as "3日累计(不包括首日)付费人数",


      concat(CAST(round(count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日付费留存率",
      concat(CAST(round(count(distinct case when dates."付费间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日付费留存率",




      sum(case when dates."付费间隔天数"=1 then dates."付费次数" else null end) as "次日付费次数" ,
      sum(case when dates."付费间隔天数"=2 then dates."付费次数" else null end) as "3日付费次数" ,
      sum(case when dates."付费间隔天数"<=1 then dates."付费次数" else null end) as "次日累计付费次数" ,
      sum(case when dates."付费间隔天数"<=2 then dates."付费次数" else null end) as "3日累计付费次数" ,
      sum(case when dates."付费间隔天数"=1 then dates."付费金额" else null end) as "次日付费金额" ,
      sum(case when dates."付费间隔天数"=2 then dates."付费金额" else null end) as "3日付费金额" ,
      sum(case when dates."付费间隔天数"=2 then dates."付费金额" else null end) as "次日累计付费金额" ,




      sum(case when dates."付费间隔天数"=1 then dates."付费金额" else null end)/count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)
      as "次日人均付费金额",
      sum(case when dates."付费间隔天数"<=1 then dates."付费金额" else null end)/count(distinct dates.user_id) as "次日累计人均付金额",
      sum(case when dates."付费间隔天数"<=2 then dates."付费金额" else null end)/count(distinct dates.user_id) as "3日累计人均付费金额"




      from
      (select a.user_id,a."登录日期a",
      datediff (c."付费日期c",a."登录日期a") as "付费间隔天数","付费次数",c."付费金额"
      from
      (select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join
      (select user_id,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date)c
      On a.user_id=c.user_id and a."登录日期a"<=c."付费日期c"
      )dates
      group by dates."登录日期a"
      order by dates."登录日期a"


      2.想查看登录用户后续的付费留存数据。包括付费人数,付费次数,付费金额,累计付费金额。输出的数据格式如下图所示

      可以看到这个输出了所有间隔天数的指标,而不用一个个的判断。这个比1中查询就是少了一步透视的工作,少了一步把不同间隔天数的同一指标(比如付费人数,付费率)作为列的转置判断。是按照间隔天数直接输出。

      主要计算逻辑:

        1.统计发生初始事件的用户和时间。这个可能来源于用户表,或事件表
        2.初始事件和后续事件用户进行关联,计算后续事件和初始事件的时间差
        3.计算用户数时对用户进行去重统计。其他指标时进行对应的统计计算。
        4.按照首日期,间隔天数,展示后续行为指标。
        5.这里为了把首日登录数据展示出来,需加首日用户数的关联
        • 可用如下代码查询得到

          select  c."登录日期",c."首日登录用户数",c."付费间隔天数",c. "付费人数",c."付费次数",c."付费金额",
          sum(c."付费次数") over(partition by c."登录日期" order by c."付费间隔天数" asc) as "累计付费次数",
          sum(c."付费金额") over(partition by c."登录日期" order by c."付费间隔天数" asc) as "累计付费金额"
          from




          (select a."登录日期",a."首日登录用户数",b."付费间隔天数",b. "付费人数",b."付费次数",b."付费金额" from
          (select date as "登录日期",count(distinct user_id) as "首日登录用户数" from events where event='LoginResult' group by date)a
          left join
          (select dates."登录日期a",dates."付费间隔天数",
          count(distinct user_id) as "付费人数",
          sum(dates."付费次数")as "付费次数",
          sum(dates."付费金额") as "付费金额"


          from
          (select a.user_id,a."登录日期a",
          datediff (c."付费日期c",a."登录日期a") as "付费间隔天数",c."付费次数",c."付费金额"
          from
          (select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join
          (select user_id,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date)c
          On a.user_id=c.user_id and a."登录日期a"<=c."付费日期c"
          )dates
          GROUP BY dates."登录日期a",dates."付费间隔天数" order by dates."登录日期a",dates."付费间隔天数" asc)b


          on a."登录日期"=b."登录日期a" order by "登录日期","付费间隔天数" asc)c order by "登录日期","付费间隔天数" asc


          3.登录/活跃用户后续次日,3日留存率,付费率。数据输出格式如下图:

          这个是后续事件为多个事件,是1,2的延申。这里的指标都是对用户留存率的计算,是对用户去重后的计算。

          可用如下代码查询得到

            select dates."登录日期a",count(distinct dates.user_id)  as "当日活跃用户数",
            count(distinct case when dates."间隔天数"=1 then dates.user_id else null end) as "次日留存数",
            count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end) as "次日付费数",


            concat(CAST(round(count(distinct case when dates."间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日留存率",
            concat(CAST(round(count(distinct case when dates."间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日留存率",


            concat(CAST(round(count(distinct case when dates."付费间隔天数"=1 then dates.user_id else null end)/count(distinct dates.user_id)*100,2)AS STRING),'%') as "次日付费留存率",
            concat(CAST(round(count(distinct case when dates."付费间隔天数"=2 then dates.user_id else null end)/count(distinct dates.user_id )*100,2) AS STRING),'%') as "3日付费留存率"




            from
            (select a.user_id,a."登录日期a",b."登录日期b",
            datediff (b."登录日期b",a."登录日期a") as "间隔天数",
            datediff (c."付费日期c",a."登录日期a") as "付费间隔天数"
            from
            (select distinct user_id,date as "登录日期a" from events where event='LoginResult')a left join
            (select distinct user_id,date as "登录日期b" from events where event='LoginResult')b
            On a.user_id=b.user_id and a."登录日期a"< b."登录日期b"
            left join
            (select distinct user_id,date as "付费日期c" from events where event='PayOrder')c
            On a.user_id=c.user_id and a."登录日期a"< c."付费日期c"
            )dates
            group by dates."登录日期a"
            order by dates."登录日期a"

            主要计算逻辑同1的逻辑。Tips:此代码不能用于类似,后续多个行为的付费金额,付费次数,累计付费金额,付费次数等的计算。因为此代码中在连续关联多个事件时没有做过多的条件关联,付费次数,付费金额会加倍。

            4.各品类,后续各日复购人数,次数,复购金额。


            与2中不一样的是,这里增加了按照品类。计算逻辑是,在关联的时候,增加品类的关联。

              select dates."付费日期a",dates.first_commodity,dates."付费间隔天数",
              count(distinct user_id) as "付费人数",
              sum(dates."付费次数")as "付费次数",
              sum(dates."付费金额"as "付费金额"


              from
              (select a.user_id,a."付费日期a",a.first_commodity,
              datediff (c."付费日期c",a."付费日期a") as "付费间隔天数",c."付费次数",c."付费金额"
              from
              (select distinct user_id,first_commodity,date as "付费日期a" from events where event='PayOrder')a left join
              (select user_id,first_commodity,date as "付费日期c",count(*) as "付费次数",sum(order_amount) as "付费金额" from events where event='PayOrder' group by user_id,date,first_commodity)c
              On a.user_id=c.user_id and a.first_commodity=c.first_commodity and a."付费日期a"<=c."付费日期c"
              )dates
              where dates."付费日期a" is not null and dates."付费间隔天数" is not null
              GROUP BY dates."付费日期a",dates.first_commodity,dates."付费间隔天数"
              order by dates."付费日期a",dates.first_commodity,dates."付费间隔天数" asc





              5.漏斗分析。以下为一个漏斗用户转化查询(漏斗:浏览商品-加购-购买)。窗口期为24小时的。

              本文不涉及一般数据产品的计算逻辑和实现的技术方案的探讨,仅对日常分析做小范围内的尝试。

              计算逻辑比较简单,假设漏斗是A-B-C事件,窗口期为n,计算逻辑为:

                1.找用户的每一个A事件及其时间,关联去找每个用户A窗口期内的B事件,即 t(b)>t(a) and t(b)<t(a)+n
                2.关联去找满足条件的C事件,条件为在B事件之后并且在A窗口期n范围内。即t(c)>t(b) and t(c)<t(a)+n
                3.找到这样的漏斗,按照日期对满足条件的各环节用户进行去重计算。即为用户漏斗转化。


                查询代码如下:

                  SELECT pv.date as "日期",
                  COUNT(DISTINCT pv.user_id) AS "浏览用户数",
                  COUNT(DISTINCT cart.user_id) AS "浏览后加购用户数",
                  COUNT(DISTINCT buy.user_id) AS "浏览后加购后购买用户数"
                  FROM
                  (
                  SELECT DISTINCT user_id, time,date
                  FROM events
                  WHERE event = 'CommodityDetail' and date between '2020-09-17' and '2020-09-30'
                  ) AS pv


                  LEFT JOIN
                  (
                  SELECT DISTINCT user_id, time
                  FROM events
                  WHERE EVENT = 'SubmitOrder'
                  ) AS cart
                  ON pv.user_id = cart.user_id
                  AND pv.time < cart.time and cart.time<pv.time  + INTERVAL 24 hour


                  LEFT JOIN
                  (
                  SELECT DISTINCT user_id, time
                  FROM events
                  WHERE EVENT = 'PayOrder'
                  ) AS buy
                  ON cart.user_id = buy.user_id
                  AND cart.time < buy.time and buy.time<pv.time + INTERVAL 24 hour
                  group by pv.date



                  6.接着5中的用户漏斗。再说个次数漏斗的逻辑,假设漏斗是A-B-C,窗口期为n.

                  这里的次数漏斗逻辑是在用户的任意2个A事件之间且在A之后窗口期内,如果有按顺序依次完成漏斗内的后续事件,则认为该次漏斗发生转化。

                  计算逻辑为:

                    1.找每一个A及其发生时间为t(a)=t_start(a),并取该事件后续一个A事件的时间为t_end(a)
                    2.关联找到每个满足条件的后续事件。比如,关联找到B事件。则B事件发生时间条件为A事件之后,且在窗口期内,
                    且在t_start和t_end之间。即t(b)>t(a) andt>t_end(a) and t(b)<t(a)+n
                    3.关联去找满足条件的C事件,条件为在B事件之后并且在A窗口期n范围内
                    即t(c)>t(b) and andt>t_end(a) and t(b)<t(a)+n
                    4.找到这样的漏斗,按照日期对满足条件的各环节user_id统计计数(注意这里不去重)。即为次数漏斗转化。

                    次数漏斗的具体的查询代码这里就不写了~有兴趣的可以自己查询玩



                    本文中所涉及的事件,均可根据自己查询时所涉及的事件进行修改。有没写清楚的地方,欢迎留言私聊!

                    这个国庆节感觉很多人的心情都跟过年似的。祝大家国庆节快乐~祝伟大的祖国国泰民安,繁荣昌盛~~


                    文章转载自填充空白,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                    评论