好久没更新了。。
最近参加了大数据平台建设的项目,所以在学习一些相关的内容,本篇介绍的是CDH环境搭建,涉及到一些隐私已经修改了实际的IP地址等相关信息,如果只是做测试也建议至少使用三台机器搭建。
大概整理了下搭建过程和遇到的坑,供参考
PS:其实搭建了好几次了,第一次是使用全手动安装的就是在所有节点都进行安装,因为我只有三个节点,比较好搭;后来被问到如果是100个节点这样搭建会累死的。。所以重新按照官网教程搭建了6.3.2;然后被告知生产环境用的6.3.3,好吧,参考某公司搭建文档再来一遍。
简介:CDH算是Hadoop的一个分支吧,商业用的较多
Cloudera版本(Cloudera’s Distribution Including Apache Hadoop,简称“CDH”),基于Web的用户界面,支持大多数Hadoop组件,包括HDFS、MapReduce、Hive、Pig、 Hbase、Zookeeper、Sqoop,简化了大数据平台的安装、使用难度。
官方文档地址:
https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/installation.html
准备三台机器:
192.168.247.142 ORACLE-TEST2
192.168.247.143 ORACLE-TEST3
192.168.247.144 ORACLE-TEST4
Redhat 7.4
安装包:(目前官网只有6.3.1的CM和6.3.2的parcel,可以安装低版本,建议CM与parcel安装同版本)
jdk-8u231-linux-x64.tar.gz
mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar
cm6.3.3-redhat7.tar.gz
CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel
manifest.json
1.禁用防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
echo "SELINUX=disabled" > etc/sysconfig/selinux
2.配置NTP
NTP同步采用集群内设置NTP Server的办法,将管理节点做成NTP服务器,集群内的其他节点与管理节点进行时钟同步。
首先在所有节点上安装NTP包:
yum -y install ntp
启动NTP服务,并且查看NTP状态,设置自启动:
service ntpd restart
systemctl enable ntpd.service
3.配置互信
ssh-keygen -N ""
ssh-copy-id 192.168.247.143
ssh-copy-id 192.168.247.144
其他机器也需要执行对另外两台的互信
如果root用户不能远程登录,也需要进行配置:
去其他机器执行以下
[root@ORALCE-TEST3 ~]# who
admin pts/0 2020-03-24 13:42 (10.141.24.13)
vi etc/securetty
加入如下内容
pts/1
pts/2
pts/3
4.192.168.247.142 和 192.168.247.143 安装mysql
建议使用rpm 安装,安装过程可能需要json包
yum install -y perl-JSON
创建用户和数据库并授权
create database metastore default character set utf8;
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'%';
create database rman default character set utf8;
create user 'rman'@'%' identified by 'rman';
grant all privileges on rman.* to 'rman'@'%';
create database sentry default character set utf8;
create user 'sentry'@'%' identified by 'sentry';
grant all privileges on sentry.* to 'sentry'@'%';
create database nav default character set utf8;
create user 'nav'@'%' identified by 'nav';
grant all privileges on nav.* to 'nav'@'%';
create database navms default character set utf8;
create user 'navms'@'%' identified by 'navms';
grant all privileges on navms.* to 'navms'@'%';
create database cm default character set utf8;
create user 'cm'@'%' identified by 'cm';
grant all privileges on cm.* to 'cm'@'%';
create database scm default character set utf8;
create database oozie default character set utf8;
create user 'oozie'@'%' identified by 'oozie';
grant all privileges on oozie.* to 'oozie'@'%';
create database hue default character set utf8;
create user 'hue'@'%' identified by 'hue';
grant all privileges on hue.* to 'hue'@'%';
FLUSH PRIVILEGES;
4.配置MySQL Connector
安装jdk,配置好环境变量
yum install mysql-connector-java
5.httpd
yum install httpd
systemctl start httpd
systemctl enable httpd.service
6.将cm6.3.3-redhat7.tar.gz解压包、parcel包和manifest.json分别放置到如下路径下
/var/www/html
/var/www/html/cdh6/parcels/6.3.3/
cd /var/www/html/cdh6/parcels/6.3.3/
计算hash值:
sha1sum CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel
204dc478dbadd93413ff5dc2cd17ae9969ca4b5c CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel
正常,如果hash值不是204dc478dbadd93413ff5dc2cd17ae9969ca4b5c,可能文件有损坏,会影响后面安装
vi CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel.sha
写入哈希值
204dc478dbadd93413ff5dc2cd17ae9969ca4b5c
说明:每个版本hash值不一样,CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel.sha文件内容是CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel文件的hash值
且此hash值与manifest.json中,CDH-6.3.3-1.cdh6.3.3.p0.1796617-el7.parcel文件对应的hash值一样
7.yum源配置
cd etc/yum.repos.d/
vi cdh.repo
写入如下内容:
[cdh]
name=cdh
baseurl=http://192.168.247.142/cm6.3.3
enable=1
gpgcheck=0
8.安装服务
yum install cloudera-manager-daemons cloudera-manager-server
初始化:
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h 192.168.247.142 --scm-host 192.168.247.142 scm scm scm
开启服务:
service cloudera-scm-server start
查看日志:
vi var/log/cloudera-scm-server/cloudera-scm-server.log
查看运行状态:
service cloudera-scm-server.service status -l
9.打开网址:http://192.168.247.142:7180
按提示安装
默认用户名密码都是admin
可能遇到的报错:
报错1:
JAVA_HOME=/usr/lib/jvm/jre-openjdk
Verifying that we can write to etc/cloudera-scm-server
Creating SCM configuration file in etc/cloudera-scm-server
Executing: usr/lib/jvm/jre-openjdk/bin/java -cp usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
[ main] DbCommandExecutor INFO Successfully connected to database.
[ main] DbCommandExecutor ERROR Unable to create/drop a table.
java.sql.SQLException: Statement violates GTID consistency: CREATE TABLE ... SELECT.
解决办法:关闭GTID模式
vi etc/my.cnf
gtid_mode=OFF
enforce_gtid_consistency=OFF
报错2:
JAVA_HOME=/usr/lib/jvm/jre-openjdk
Verifying that we can write to /etc/cloudera-scm-server
[ main] DbProvisioner ERROR Exception when creating/dropping database with user 'scm' and jdbc url 'jdbc:mysql://192.168.247.141/?useUnicode=true&characterEncoding=UTF-8'
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user 'scm'@'192.168.247.141' to database 'scm'
解决方法:
drop user scm@'%';
drop user scm@'192.168.247.141';
update mysql.user set host = '%' where user = 'root';
然后使用root用户进行初始化:
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h 192.168.247.141 -uroot -proot --scm-host 192.168.247.141 scm scm scm
报错3:
ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: Unable to retrieve remote parcel repository manifest
错误处理:
没有创建manifest.json或hash值有误导致的
报错4:
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to retrieve schema tables from Hive Metastore DB,Not supported
原因是缺少JDBC驱动:下载一个:mysql-connector-java-5.1.25.zip
解压到/tmp目录下:
cp /tmp/mysql-connector-java-5.1.25/mysql-connector-java-5.1.25-bin.jar /usr/share/java
cp /usr/share/java/mysql-connector-java-5.1.25-bin.jar /opt/cloudera/parcels/CDH/lib/hive/lib
其他问题:
1.搭建完成后发现143上面的/tmp文件夹总是被撑满,生成很多类似的文件:
mgmt_mgmt-NAVIGATOR-c77e3c9ed97bae1760d495254cb7a0c1_pid2864.hprof
前端修改:Navigator Metadata Server 的 Java 堆栈大小(字节)大小,修改为1-2G
未解决,写了一个定时任务定时清理
2.Service Monitor的内存溢出
前端控制台修改如下参数,按照提示的建议值进行修改即可,一般1G(可以直接在搜索框进行搜索):
Host Monitor 的 Java 堆栈大小(字节)
Service Monitor 的 Java 堆栈大小(字节)
完结




