碎碎念:
这周全国各地大大小小的朋友们都加入了羊群,小主我也未能幸免,虽然前2天高烧到40多度,烧到自己从来没想过这辈子还能烧这么高,烧到自己都开始怀疑,是不是要烧坏了咋还没降温,好在第3天慢慢恢复,目前仍有咳喘。期间精神正常,游离在各种群家人朋友同学同事间,相互关心相互帮助的状态中。
坊间流传的新冠10大酷刑,你目前经历了几个
懂得都懂~

头脑清醒,那就不要停下。还是把想到的东西记录下来。
今天聊聊归因分析模型。以及怎么用SQL实现。
1.归因分析模型
在网上搜到一篇,写的很详细,也很好理解。包括什么是归因分析,几种常见的归因模型,以及如何选择归因模型,不明白的可以仔细看看。--不知道为什么粘贴不了链接和超链接。可自行搜索以下内容即可。
(7条消息) 归因分析(Attribution Analysis)模型解析_只有光头才能强的博客-CSDN博客_归因算法)
这里不再仔细介绍,以下会简要说几句什么是归因分析,几种常见的归因模型。
(7条消息) 归因分析(Attribution Analysis)模型解析_只有光头才能强的博客-CSDN博客_归因算法)或其他文献。
首次触点归因:首次互动的渠道获得100%的功劳。 末次触点归因:末次互动的渠道获得100%的功劳。 线性归因:对于链路上所有的渠道,平等地分配他们的贡献权重。 位置归因:基于位置的归因模型,也叫U型归因模型,它其实是混合使用了首次互动归因和末次互动归因的结果。即首末次贡献该40%的功劳,链路上其他触点平均分配剩下的20%的功劳。 时间递减归因:链路上离目标转化事件最近的贡献最大,时间越远贡献越小。
一般窗口期定义多久呢。比如7天、30天、90天、180天等,主要是为了规避掉一些时间上的干扰因素,具体我们还是要根据业务场景和分析需求而定,用户大致会在多长时间内做决策。可以分析通过分析用户的需求间隔等等。
2.归因分析的SQL实现
1.首先找到时间范围内的目标事件。然后找每个目标事件往前推窗口期内的待归因事件。2.对每个目标事件窗口期内的待归因事件按照时间进行排序。3.对目标事件窗口期内的待归因事件,按照不同的分析模型规则,设置权重。
##首次归因select c.e2,count(distinct c.u1) as renshu,count(c.u1) as gongxian_cishu,sum(mm) as gongxian_moneyfrom(select a.user_id as u1,a.time as t1,a.event as e1,a.mm,b.event as e2,b.time as t2,rank() over (partition by a.user_id,a.time order by a.user_id,b.time asc) as rkfrom(select user_id,event,date,time,total_price_of_commodity as mmfrom eventswhere event='PayOrderDetail' and date>='2020-09-24' and date<='2020-09-30')aleft join(select user_id,event,timefrom eventswhere (event='BannerClick' or event='SearchResultClick') and date>='2020-09-22'and date<='2020-09-30')bon a.user_id=b.user_id and b.time>date_sub(a.time, interval 1 day) and b.time<=a.time)cwhere c.rk=1group by c.e2
##末次归因select c.e2,count(distinct c.u1) as renshu,count(c.u1) as gongxian_cishu,sum(mm) as gongxian_moneyfrom(select a.user_id as u1,a.time as t1,a.event as e1,a.mm,b.event as e2,b.time as t2,rank() over (partition by a.user_id,a.time order by a.user_id,b.time desc) as rkfrom(select user_id,event,date,time,total_price_of_commodity as mmfrom eventswhere event='PayOrderDetail' and date>='2020-09-24' and date<='2020-09-30')aleft join(select user_id,event,timefrom eventswhere (event='BannerClick' or event='SearchResultClick') and date>='2020-09-22'and date<='2020-09-30')bon a.user_id=b.user_id and b.time>date_sub(a.time, interval 1 day) and b.time<=a.time)cwhere c.rk=1group by c.e2
##线性归因select d.e2,count(distinct d.u1) as renshu,sum(d.quanzhong) as gongxian_cishu,sum(d.mm*quanzhong) as gongxian_moneyfrom(select c.u1,c.t1,c.e1,c.mm,c.e2,c.t2,1/cn as quanzhongfrom(select a.user_id as u1,a.time as t1,a.event as e1,a.mm,b.event as e2,b.time as t2,count() over (partition by a.user_id,a.time) as cn,rank() over (partition by a.user_id,a.time order by a.user_id,b.time asc) as rk1,rank() over (partition by a.user_id,a.time order by a.user_id,b.time desc) as rk2from(select user_id,event,date,time,total_price_of_commodity as mmfrom eventswhere event='PayOrderDetail' and date>='2020-09-24' and date<='2020-09-30')aleft join(select user_id,event,timefrom eventswhere (event='BannerClick' or event='SearchResultClick') and date>='2020-09-22'and date<='2020-09-30')bon a.user_id=b.user_id and b.time>date_sub(a.time, interval 1 day) and b.time<=a.time)c)dgroup by d.e2
##位置归因select d.e2,count(distinct d.u1) as renshu,sum(d.quanzhong) as cishu,sum(d.mm*d.quanzhong) as gongxian_moneyfrom(selectc.u1,c.t1,c.mm,c.e1,c.e2,c.t2,(case when cn=1 then 1When cn=2 then 0.5when cn>2 and (rk1=1 or rk2=1) then 0.4when cn>2 and (rk1>1 and rk2>1) then 0.2/(cn-2) else 0 end )as quanzhongfrom(select a.user_id as u1,a.time as t1,a.event as e1,a.mm,b.event as e2,b.time as t2,count() over (partition by a.user_id,a.time) as cn,rank() over (partition by a.user_id,a.time order by a.user_id,b.time asc) as rk1,rank() over (partition by a.user_id,a.time order by a.user_id,b.time asc) as rk2from(select user_id,event,date,time,total_price_of_commodity as mmfrom eventswhere event='PayOrderDetail' and date>='2020-09-24' and date<='2020-09-30')aleft join(select user_id,event,timefrom eventswhere (event='BannerClick' or event='SearchResultClick') and date>='2020-09-22'and date<='2020-09-30')bon a.user_id=b.user_id and b.time>date_sub(a.time, interval 1 day) and b.time<=a.time)c)dgroup by d.e2
2.5 时间递减归因
看了上述模型的计算。应该也看到了,各类归因分析模型的区别主要是如何分配链路上待归因事件的比重。假设时间递减归因中,待归因事件比重根据距离归因事件的时长的倒数,则需再对倒数做归一化, 【 1/(t(i)-t0)】除以所有【1/(t(i)-t0)】求和。其他亦可类比。
仔细看看,没那么难吧。这次分享的主要就是这些。欢迎留言私聊:
最后碎碎的:
疫情即将迎来胜利。希望大家坚持到最后,进入决赛圈的继续坚持,注意防护。加入羊圈的不恐惧,注意好好休养,早日康复。杨过的注意防护以防2次感染。虽然致死率不高,但是生命只有一次,大家都好好照顾自己。
希望大家能早日摘下口罩,尽情的呼吸自由的空气!




