CDP测试环境突然服务异常,不能正常使用,通过分析原因并进行恢复,并对问题再现。

CentOS 7.6;
cloudera-manager-agent-7.4.4(CM7.4.4)。

某测试环境cloudera服务突然异常,Cloudera Management Service服务突然不能启动。


经仔细排查,发现节点2(nn02)上cloudera-manager-agent不翼而飞,且节点2上运行CMS服务,导致服务异常,先进行恢复,后续会再现问题,找到根因。

3.1 在agent丢失节点安装和其它节点相同版本的agent
[root@nn02 ~]# rpm -ivh cloudera-manager-agent-7.4.4-15850731.el7.x86_64.rpmPreparing... ################################# [100%]Updating installing...1:cloudera-manager-agent-7.4.4-1585################################# [100%]Created symlink from etc/systemd/system/multi-user.target.wants/cloudera-scm-agent.service to usr/lib/systemd/system/cloudera-scm-agent.service.Created symlink from etc/systemd/system/multi-user.target.wants/cloudera-scm-supervisord.service to usr/lib/systemd/system/cloudera-scm-supervisord.service.
3.2 修改配置文件中的CM Server指向
修改15行的CM Server指向为182.168.80.122
[root@nn02 ~]# vim etc/cloudera-scm-agent/config.ini #和其它节点相同,需要将server_host更改为CM地址,13 [General]14 # Hostname of the CM server.15 server_host=192.168.80.1221617 # Port that the CM server is listening on.18 server_port=7182
3.3 启动并确保agent服务正常
[root@nn02 ~]# systemctl start cloudera-scm-agent[root@nn02 ~]# systemctl status cloudera-scm-agent
3.4 在CM界面重启CMS
重启CMS服务

其它Runtime服务异常,稍等片刻后会恢复

如果Runtime组件仍旧异常,建议重启其它组件。


4.1 查看当前节点agent已经安装
[root@nn02 ~]# rpm -qa | grep clouderaopenjdk8-8.0+232_9-cloudera.x86_64cloudera-manager-daemons-7.4.4-15850731.el7.x86_64cloudera-manager-agent-7.4.4-15850731.el7.x86_64
4.2 卸载nfs相关组件
卸载NFS组件同时,也会移除依赖项cloudera-manager-agent,始料未及。

[root@nn02 ~]# yum remove -y nfs-utils rpcbindLoaded plugins: fastestmirror, langpacksResolving DependenciesThere are unfinished transactions remaining. You might consider running yum-complete-transaction, or "yum-complete-transaction --cleanup-only" and "yum history redo last", first to finish them. If those don't work you'll have to try removing/installing packages by hand (maybe package-cleanup can help).--> Running transaction check---> Package nfs-utils.x86_64 1:1.3.0-0.68.el7.2 will be erased---> Package rpcbind.x86_64 0:0.2.0-49.el7 will be erased--> Processing Dependency: portmap for package: cloudera-manager-agent-7.4.4-15850731.el7.x86_64--> Processing Dependency: rpcbind for package: 1:quota-4.01-19.el7.x86_64--> Running transaction check---> Package cloudera-manager-agent.x86_64 0:7.4.4-15850731.el7 will be erased---> Package quota.x86_64 1:4.01-19.el7 will be erased--> Finished Dependency ResolutionDependencies Resolved=======================================================================================================================================================================Package Arch Version Repository Size=======================================================================================================================================================================Removing:nfs-utils x86_64 1:1.3.0-0.68.el7.2 @updates 1.1 Mrpcbind x86_64 0.2.0-49.el7 @base 101 kRemoving for dependencies:cloudera-manager-agent x86_64 7.4.4-15850731.el7 installed 154 Mquota x86_64 1:4.01-19.el7 @base 887 kTransaction Summary=======================================================================================================================================================================Remove 2 Packages (+2 Dependent packages)Installed size: 156 MDownloading packages:Running transaction checkRunning transaction testTransaction test succeededRunning transactionWarning: RPMDB altered outside of yum.Erasing : cloudera-manager-agent-7.4.4-15850731.el7.x86_64 1/4warning: /etc/cloudera-scm-agent/config.ini saved as /etc/cloudera-scm-agent/config.ini.rpmsaveErasing : 1:nfs-utils-1.3.0-0.68.el7.2.x86_64 2/4warning: file /var/lib/nfs/v4recovery: remove failed: No such file or directorywarning: file /var/lib/nfs/statd/sm.bak: remove failed: No such file or directorywarning: file /var/lib/nfs/statd/sm: remove failed: No such file or directorywarning: file /var/lib/nfs/statd: remove failed: No such file or directorywarning: directory /var/lib/nfs/rpc_pipefs: remove failed: Device or resource busyErasing : 1:quota-4.01-19.el7.x86_64 3/4Erasing : rpcbind-0.2.0-49.el7.x86_64 4/4warning: file /var/lib/rpcbind: remove failed: No such file or directoryVerifying : cloudera-manager-agent-7.4.4-15850731.el7.x86_64 1/4Verifying : 1:quota-4.01-19.el7.x86_64 2/4Verifying : 1:nfs-utils-1.3.0-0.68.el7.2.x86_64 3/4Verifying : rpcbind-0.2.0-49.el7.x86_64 4/4Removed:nfs-utils.x86_64 1:1.3.0-0.68.el7.2 rpcbind.x86_64 0:0.2.0-49.el7Dependency Removed:cloudera-manager-agent.x86_64 0:7.4.4-15850731.el7 quota.x86_64 1:4.01-19.el7Complete!
4.3 查看当前节点agent已经被卸载
[root@nn02 ~]# rpm -qa | grep clouderaopenjdk8-8.0+232_9-cloudera.x86_64cloudera-manager-daemons-7.4.4-15850731.el7.x86_64

cloudera-manager-agent依赖NFS组件,卸载NFS时,不小心也删除了agnet组件,也是始料未及。生产环境非必要不执行yum remove -y操作,如果确需执行,可以将-y参数去掉,删除前会提示将要删除的依赖包,确定无害后,再执行卸载。
文章转载自rundba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




