暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

搭建LSF的master服务器

82

       LSF集群被芯片研发环境广泛应用,是一款分布式集群管理系统软件,负责计算资源的管理和批处理作业的调度。

       自己搭建一台社区版LSF的master主机。

1.环境准备:

1.1 NFS服务器一台

IP: 192.168.128.149

共享目录: share

NFS服务器的搭建之前有说过,可以看上一篇的内容:

创建NFS服务器

1.2 LSF master服务器1台

OS: Centos7.4

IP: 192.168.128.150

挂载NFS服务器的共享目录/share

#mkdir share

#mount -t nfs 192.168.128.149:/share share


1.3 设置/etc/hosts文件
分别修改两个服务器的/etc/hostc文件,使得可以通过hostname访问到对方服务器


1.4 创建LSF管理员账号lsfadmin
[root@nfs01 package]# useradd lsfadmin
[root@nfs01 package]# id lsfadmin
uid=1000(lsfadmin) gid=1000(lsfadmin) groups=1000(lsfadmin)
[root@nfs01 package]#

2.在master1上部署LSF服务器
2.1 下载lsf安装文件并上传到共享目录/share中解压
安装包:lsfsce10.2.0.12-x86_64.tar.gz,可以这IBM官网中下载,此版本为社区版。
[root@master1 package]# pwd
/share/package
[root@master1 package]# ls
lsfsce10.2.0.12-x86_64 lsfsce10.2.0.12-x86_64.tar.gz
[root@master1 package]#


2.2 cd lsfsce10.2.0.12-x86_64/lsf
解压lsf10.1_lsfinstall_linux_x86_64.tar.Z
[root@master1 package]# ls
lsfsce10.2.0.12-x86_64 lsfsce10.2.0.12-x86_64.tar.gz
[root@master1 package]# cd lsfsce10.2.0.12-x86_64
[root@master1 lsfsce10.2.0.12-x86_64]# ls
lsf pac
[root@master1 lsfsce10.2.0.12-x86_64]# cd lsf
[root@master1 lsf]# ls
lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z lsf10.1_lsfinstall_linux_x86_64.tar.Z
[root@master1 lsf]# tar -zxf lsf10.1_lsfinstall_linux_x86_64.tar.Z
[root@master1 lsf]# ls
lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z lsf10.1_lsfinstall lsf10.1_lsfinstall_linux_x86_64.tar.Z
[root@master1 lsf]#


2.3 进入解压出来的lsf10.1_lsfinstall目录,备份install.config。


2.4 将lsf10.1_lsfinstall目录拷贝到/share/lsf目录中

# cp -r lsf10.1_lsfinstall share/lsf/


2.5 修改/share/lsf/lsf10.1_lsfinstall/install.config
[root@master1 lsf10.1_lsfinstall]# cat install.config|grep -v "#"

LSF_TOP="/share/lsf/10.1" #安装目录
LSF_ADMINS="lsfadmin" #管理账号,上面创建的
LSF_CLUSTER_NAME="cluster1" #集群名称
LSF_MASTER_LIST="master1" #master机器列表,如果有多台机器,建议至少设置两台master,作为冗余备份
LSF_TARDIR="/share/package/lsfsce10.2.0.12-x86_64/lsf" #安装文件解压缩路径
CONFIGURATION_TEMPLATE="HIGH_THROUGHPUT" #配置模式,如果是IC应用场景,建议设置为HIGH_THROUGHPUT高性能模式。

2.6 执行lsfinstall -f install.config
[root@master1 lsf10.1_lsfinstall]# ./lsfinstall -f install.config

Logging installation sequence in share/lsf/lsf10.1_lsfinstall/Install.log

International License Agreement for Non-Warranted Programs

Part 1 - General Terms

BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON
AN "ACCEPT" BUTTON, OR OTHERWISE USING THE PROGRAM,
LICENSEE AGREES TO THE TERMS OF THIS AGREEMENT. IF YOU ARE
ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT
AND WARRANT THAT YOU HAVE FULL AUTHORITY TO BIND LICENSEE
TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS,

* DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN
"ACCEPT" BUTTON, OR USE THE PROGRAM; AND

* PROMPTLY RETURN THE UNUSED MEDIA AND DOCUMENTATION TO THE

Press Enter to continue viewing the license agreement, or
enter "1" to accept the agreement, "2" to decline it, "3"
to print it, "4" to read non-IBM terms, or "99" to go back
to the previous screen.
1 (这里输入1回车)
LSF pre-installation check ...

[Sun Jun 26 06:01:30 EDT 2022:chk_shell:ERROR_1001]
Cannot find UNIX command " ed".


lsfinstall found unrecoverable errors during pre-installation check.
Check share/lsf/lsf10.1_lsfinstall/Install.err
to correct these errors and run "lsfinstall -f install.config" again.
出现上述报错,执行yum install -y ed后重新操作即可。
安装成功:


2.7 修改/share/lsf/10.1/conf/lsf.conf
加上如下配置:
LSF_RSH=ssh

2.8所有节点执行/share/lsf/lsf10.1_lsfinstall/hostsetup --top="/share/lsf/10.1" --boot="y"

[root@master1 lsf10.1_lsfinstall]# share/lsf/lsf10.1_lsfinstall/hostsetup --top="/share/lsf/10.1" --boot="y"
Logging installation sequence in share/lsf/10.1/log/Install.log


------------------------------------------------------------

    L S F H O S T S E T U P U T I L I T Y

------------------------------------------------------------

This script sets up local host (LSF server, client or dynamic host) environment.


Setting up LSF server host "master1" ...
Checking LSF installation for host "master1" ... Done
Created symlink from etc/systemd/system/multi-user.target.wants/lsfd.service to usr/lib/systemd/system/lsfd.service.
Installing LSF service scripts on host "master1" ... Done

LSF service ports are defined in share/lsf/10.1/conf/lsf.conf.
Checking LSF service ports definition on host "master1" ... Done
You are installing IBM Spectrum LSF - Community Edition.

... Setting up LSF server host "master1" is done
... LSF host setup is done.


2.9 启动LSFmaster
root账号切换到csh
[root@master1 lsf10.1_lsfinstall]# csh
[root@master1 lsf10.1_lsfinstall]# source share/lsf/10.1/conf/cshrc.lsf
[root@master1 lsf10.1_lsfinstall]# lsfstartup
Starting up all LIMs ...
Do you really want to start up LIM on all hosts ? [y/n]y
Start up LIM on ...... The authenticity of host 'master1 (fe80::a3de:7879:d139:bb2%ens33)' can't be established.
ECDSA key fingerprint is SHA256:3FDXmVqph+2JsOJiAD0h0bHsJOKIjfXjhLPjoWB2WJw.
ECDSA key fingerprint is MD5:dd:45:bf:b3:17:47:f8:17:0e:c8:93:83:a6:94:a3:c9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master1,fe80::a3de:7879:d139:bb2%ens33' (ECDSA) to the list of known hosts.
root@master1's password:
done

Waiting for Master LIM to start up ... Master LIM is ok
Starting up all RESes ...
Do you really want to start up RES on all hosts ? [y/n]y
Start up RES on ...... root@master1's password:
done

Starting all server daemons on LSBATCH hosts ...
Do you really want to start up server batch daemon on all hosts ? [y/n] u^Hy^H^H^H
Do you really want to start up server batch daemon on all hosts ? [y/n] y
Start up server batch daemon on ...... root@master1's password:
done

Done starting up LSF daemons on the local LSF cluster ...
执行LSF命令查看:
[root@master1 lsf10.1_lsfinstall]# lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
master1 X86_64 Intel_Pl 15.0 1 975M 1.9G Yes (mg)


完成LSF naster的部署和启动!



文章转载自小左的运维之路,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论