http://spark.apache.org/
1.下载文件并上传
spark-2.2.0-bin-hadoop2.7.tgz
解压:tar -zvxf spark-2.2.0-bin-hadoop2.7.tgz
2.准备4台机器
bigdata01,bigdata02,bigdata03,bigdata04
Master:bigdata01,bigdata02
Worker:bigdata01,bigdata02,bigdata03,bigdata04
3.修改配置文件
/root/training/spark-2.2.0-bin-hadoop2.7/conf
3.1 修改spark-env.sh,基本配置
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
选择standalone模式
Options for the daemons used in the standalone deploy mode
export JAVA_HOME=export JAVA_HOME=/root/training/jdk1.8.0_144/( 可以使用改命令:r!which java)
export SPARK_MASTER_HOST=bigdata01
export SPARK_MASTER_PORT=7077
3.2 修改slaves,具体执行任务的节点
mv slaves.template slaves
vim slaves
bigdata01
bigdata02
bigdata03
bigdata04
3.3拷贝到其他机器
for i in {2…4};
do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0<math><semantics><mrow><mi>i</mi><mo>:</mo></mrow><annotation encoding="application/x-tex">i:</annotation></semantics></math>i:PWD ;
done
for i in {2…4};do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0<math><semantics><mrow><mi>i</mi><mo>:</mo></mrow><annotation encoding="application/x-tex">i:</annotation></semantics></math>i:PWD ; done
- 启动shell,最好使用单独shell脚本(start-master.sh和start-slave.sh),本文只是简单搭建直接启动start-all.sh
如果没有免密码登录,配置一下免密码登录,否则每启动一台都需要输入密码
cd /root/training/spark-2.2.0-bin-hadoop2.7
sbin/start-all.sh
jps
只有01同时存在Master Worker,其他机器都为Worker
5.浏览器查看spark集群
http://bigdata01:8080/ (netty)
URL: spark://bigdata01:7077




