暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【大数据开发】OOZIE之案例开发(九)

数据信息化 2020-07-22
229

OOZIE


OOZIE之案例开发(九)



案例开发之静态获取数据

 案例开发涉及到的主要文件:


1. job.properties文件,内容如下:

   nameNode=hdfs://bigdata-pro-m01.kfk.com:9000
jobTracker=bigdata-pro-m01.kfk.com:8032
queueName=default
oozieAppRoot=user/kfk/oozie-apps
oozieDataRoot=user/kfk/oozie/datas




script=load_track_log.sh
EXEC=export_visit.txt




SQL=hive-visi.sql
shellHive=daily_hour_visit.sh




#oozie.coord.application.path=${nameNode}/${oozieAppRoot}/project
start=2018-10-27T15:00+0800
end=2018-10-27T16:20+0800




oozie.use.system.libpath=true




workflowAppUri=${nameNode}/${oozieAppRoot}/project
oozie.wf.application.path=${nameNode}/${oozieAppRoot}/project


2. workflow.xml文件

  根据需求,workflow的开发我们实现了两个方案:


  第一个方案(静态获取数据):

1)数据加载load data  ->  shell action

         load_track_log.sh脚本内容如下:

#!/bin/sh
. /etc/profile


##track log dir path
LOD_DIR=/opt/track/


##hive home
HIVE_HOME=/opt/modules/apache-hive-2.3.6-bin


yesterday=`date -d "1 day ago" +"%Y%m%d"`






cd $LOD_DIR
for line in `ls $yesterday`;
do
date=${line:0:4}${line:4:2}${line:6:2}
hour=${line:8:2}


$HIVE_HOME/bin/hive -e "load data local inpath '$LOD_DIR/$yesterday/$line' overwrite into table track.track_log
partition(p_day='$date',p_hour='$hour')"


done


(2)hive数据分析   ->  hive action

          hive-visi.sql脚本内容如下:

use track
truncate table daily_hour_visit;


insert into daily_hour_visit
select date,hour,count(url) pv,count(distinct guid) uv
from track_log where date='${YESTERDAY}' group by date,hour;


(3)sqoop数据导出    ->  sqoop action

          export_visit.txt脚本内容如下:

export
--connect
jdbc:mysql://bigdata-pro-m01.kfk.com:3306/track
--username
root
--password
12345678
--table
daily_hour_visit
--num-mappers
1
--export-dir
/user/hive/warehouse/track.db/daily_hour_visit
--fields-terminated-by
'\t'


3. coordinator.xml文件

<coordinator-app name="cron-coord" frequency="${coord:minutes(10)}" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.4">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>


4. lib目录 (mysql jar包)


5. 注意:

     shell action中关系到本地的服务或者数据时,对应的服务或者数据必须是在datanode节点上。


6. 启动迁移命令:

bin/oozie job -oozie http://bigdata-pro-m01.kfk.com:11000/oozie -config oozie-apps/project/job.properties -run



案例开发之动态获取数据

 案例开发涉及到的主要文件:


1. job.properties文件,内容如下:

   nameNode=hdfs://bigdata-pro-m01.kfk.com:9000
jobTracker=bigdata-pro-m01.kfk.com:8032
queueName=default
oozieAppRoot=user/kfk/oozie-apps
oozieDataRoot=user/kfk/oozie/datas




script=load_track_log.sh
EXEC=export_visit.txt




SQL=hive-visi.sql
shellHive=daily_hour_visit.sh




#oozie.coord.application.path=${nameNode}/${oozieAppRoot}/project
start=2018-10-27T15:00+0800
end=2018-10-27T16:20+0800




oozie.use.system.libpath=true




workflowAppUri=${nameNode}/${oozieAppRoot}/project
oozie.wf.application.path=${nameNode}/${oozieAppRoot}/project


2. workflow.xml文件

  根据需求,workflow的开发我们实现了两个方案:


  第二个方案(动态获取数据是为了解决动态获取前一天的时间):

1)数据加载load data  ->  shell action

         load_track_log.sh脚本内容如下:

#!/bin/sh
. /etc/profile


##track log dir path
LOD_DIR=/opt/track/


##hive home
HIVE_HOME=/opt/modules/apache-hive-2.3.6-bin


yesterday=`date -d "1 day ago" +"%Y%m%d"`






cd $LOD_DIR
for line in `ls $yesterday`;
do
date=${line:0:4}${line:4:2}${line:6:2}
hour=${line:8:2}


$HIVE_HOME/bin/hive -e "load data local inpath '$LOD_DIR/$yesterday/$line' overwrite into table track.track_log
partition(p_day='$date',p_hour='$hour')"


done


(2)hive数据分析   ->  shell action

          daily_hour_visit.sh +  hive-visi.sql:


   daily_hour_visit.sh脚本如下:

#!/bin/sh


yesterday=`date -d "1 day ago" +"%Y%m%d"`


/opt/modules/apache-hive-2.3.6-bin/bin/hive --hiveconf yesterday=${yesterday} -f $


    hive-visi.sql脚本如下:

use track
truncate table daily_hour_visit;


insert into daily_hour_visit
select date,hour,count(url) pv,count(distinct guid) uv
from track_log where date='${hiveconf:yesterday}' group by date,hour;


(3)sqoop数据导出    ->  sqoop action

          export_visit.txt脚本内容如下:

export
--connect
jdbc:mysql://bigdata-pro-m01.kfk.com:3306/track
--username
root
--password
12345678
--table
daily_hour_visit
--num-mappers
1
--export-dir
/user/hive/warehouse/track.db/daily_hour_visit
--fields-terminated-by
'\t'


3. coordinator.xml文件

<coordinator-app name="cron-coord" frequency="${coord:minutes(10)}" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.4">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>


4. lib目录 (mysql jar包)


5. 注意:

    shell action中关系到本地的服务或者数据时,对应的服务或者数据必须是在datanode节点上。


6. 启动迁移命令:

bin/oozie job -oozie http://bigdata-pro-m01.kfk.com:11000/oozie -config oozie-apps/project/job.properties -run


文章转载自数据信息化,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论