RAC集群的时间同步,可以采用操作系统的NTP服务,也可以使用Oracle自带的服务CTSS,如果NTP没有启用,那么Oracle会自动启用自己的ctssd进程。
从Oracle 11gR2 RAC开始使用Cluster Time Synchronization Service(CTSS)同步各节点的时间。CTSS时间同步服务作为Clusteware的一部分被安装,在系统中,如果察觉到时间同步服务或者时间同步服务配置(NTP),那么CTSS将以观察模式(Oberver Mode)启动和运行,不执行时间同步操作。CTSS守护进程能随时被安装,并将一直运行,但是只有在系统符合配置条件情况下才会起作用。如果NTP不存在于任何的集群服务器中,CTSS将被激活,接管集群的时间管理工作,以活动模式(Active Mode)启动和运行,使用集群其中一个服务器作为参考服务器,同步集群中的其他服务器的时间。
在RAC中,集群的时间应该是保持同步的,否则可能导致很多问题,例如:依赖于时间的应用会造成数据的错误,各种日志打印的顺序紊乱,这将会影响问题的诊断,严重的可能会导致集群宕机或者重新启动集群时节点无法加入集群。
NTP和CTSS是可以共存的,且NTP的优先级要高于CTSS,也就是说,如果系统中同时有NTP和CTSS,那么集群的时间是由NTP同步的,CTSS会处于观望(Observer)模式,只有当集群关闭所有的NTP服务,CTSS才会处于激活(Active)模式。在一个集群中,只要有一个节点的ntp处于活动状态,那么集群的所有节点的CTSS都会处于观望(Observer)模式。
需要注意的是,要让CTSS处于激活(Active)模式,则不仅要关闭ntp服务(/sbin/service ntpd stop),还要删除/etc/ntp.conf文件(也可mv etc/ntp.conf etc/ntp.conf.bak),否则不能启用CTSS。
1、CTSS同步模式
[root@rac1centorder ~]# service ntpd statusntpd 已停[root@rac1centorder ~]# ll /etc/ntp.*-rw-r--r--. 1 root root 1778 12月 18 2017 /etc/ntp.conf.bak[root@rac1centorder ~]# chkconfig --list ntpdntpd 0:关闭 1:关闭 2:关闭 3:关闭 4:关闭 5:关闭 6:关闭
查看ctss进程
[root@rac1centorder ~]# ps -ef |grep ctssroot 129931 1 0 2019 ? 23:33:10 /u01/app/11.2.0/grid_1/bin/octssd.bin rebootroot 217171 155615 0 11:24 pts/0 00:00:00 grep ctss
查看集群节点1的ctss状态:
[root@rac1centorder ~]# su - grid[grid@rac1centorder ~]$ crsctl check ctssCRS-4701: The Cluster Time Synchronization Service is in Active mode.CRS-4702: Offset (in msec): 0
节点1的octssd的日志:
[grid@rac1centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac1centorder/ctssd/octssd.log2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm: The system time difference is too small [753] usec. Not adjusting time.2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm17: LT [1603683079sec 472234usec], MT [1603683079sec 140694539155355usec], Delta [1920usec]2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm19: The offset is [-753 usec] and sync interval set to [1]2020-10-26 11:31:19.472: [ CTSS][2587350784]ctssslave_swm: Received from master (mode [0xcc] nodenum [2] hostname [rac2centorder] )2020-10-26 11:31:19.472: [ CTSS][2587350784]ctsselect_msm: Sync interval returned in [1]2020-10-26 11:31:19.472: [ CTSS][2591553280]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler2020-10-26 11:31:27.472: [ CTSS][2587350784]ctsselect_msm: CTSS mode is [0xc4]2020-10-26 11:31:27.472: [ CTSS][2587350784]ctssslave_swm1_2: Ready to initiate new time sync process.2020-10-26 11:31:27.473: [ CTSS][2587350784]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].2020-10-26 11:31:27.474: [ CTSS][2591553280]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].2020-10-26 11:31:27.474: [ CTSS][2591553280]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].2020-10-26 11:31:27.474: [ CTSS][2587350784]ctssslave_swm2_3: Received time sync message from master.
查看集群节点2的ctss状态:
[grid@rac2centorder ~]$ crsctl check ctssCRS-4701: The Cluster Time Synchronization Service is in Active mode.CRS-4702: Offset (in msec): 0
节点2的octssd的日志:
[grid@rac2centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac2centorder/ctssd/octssd.log2020-10-26 11:35:03.532: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )2020-10-26 11:35:10.688: [ CTSS][2236086016]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xcc], offset[0 ms]}, length=[8].2020-10-26 11:35:11.533: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].2020-10-26 11:35:11.533: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg2020-10-26 11:35:11.534: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg2020-10-26 11:35:19.536: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [1] hostname [rac1centorder] )2020-10-26 11:35:26.720: [ CTSS][2216871680]sclsctss_gvss2: NTP default pid file not found2020-10-26 11:35:26.720: [ CTSS][2216871680]sclsctss_gvss8: Return [0] and NTP status [1].2020-10-26 11:35:26.720: [ CTSS][2216871680]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].
log中记录没有发现ntp服务,ctss服务为激活模式,同步时间的主节点是节点1,并且会告诉集群的时间有差异,但是因为差异过小,无需调整。
[grid@rac1centorder ~]$ cluvfy comp clocksync -n all -verboseVerifying Clock Synchronization across the cluster nodesChecking if Clusterware is installed on all nodes...Check of Clusterware install passedChecking if CTSS Resource is running on all nodes...Check: CTSS Resource running on all nodesNode Name Status------------------------------------ ------------------------rac2centorder passedrac1centorder passedResult: CTSS resource check passedQuerying CTSS for time offset on all nodes...Result: Query of CTSS for time offset passedCheck CTSS state started...Check: CTSS stateNode Name State------------------------------------ ------------------------rac2centorder Activerac1centorder ActiveCTSS is in Active state. Proceeding with check of clock time offsets on all nodes...Reference Time Offset Limit: 1000.0 msecsCheck: Reference Time OffsetNode Name Time Offset Status------------ ------------------------ ------------------------rac2centorder 0.0 passedrac1centorder 0.0 passedTime offset is within the specified limits on the following set of nodes:"[rac2centorder, rac1centorder]"Result: Check of clock time offsets passedOracle Cluster Time Synchronization Services check passedVerification of Clock Synchronization across the cluster nodes was successful.
虽然集群时间不一致,但是这种情况下校验结果是通过的,而且略微的差异范围内集群也会自动同步回来。
注意:
(1)CTSS不会把系统时间向前调整,Oracle 10.2 RAC中有向前调整时间引起节点重启的BUG;
(2)CTSS可以保证节点之间时间同步,但不能保证和外部标准时钟(北京时间)保持一致。
2、Linux NTP同步模式
此方法既可以保证节点间同步,又保证了时钟和标准时间同步。
配置NTP服务:
修改所有节点/etc/ntp.conf, 192.168.7.2为公司内网时间同步服务器(已和标准时钟同步)。
[root@rac1centorder ~]# vi /etc/ntp.confserver 192.168.7.2driftfile /var/lib/ntp/driftbroadcastdelay 0.008disable monitor
注:disable monitor 防止NTP服务的DDOS攻击解决办法。
[root@rac1centorder ~]# vi /etc/sysconfig/ntpd# Drop root to id 'ntp:ntp' by default.OPTIONS=" -x -u ntp:ntp -p /var/run/ntpd.pid -g"
注:-x参数代表使用clock slewing 微调模式同步,避免时钟大幅度跳跃导致集群重构。大幅度向后调整时间会导致 Clusterware 以为错过了签到,从而发生节点驱逐的情况。
题外:
有时候存在硬件时间和系统时间不同步问题,同步命令如下:clock 显示硬件时间clock --hctosys 硬件时间 写入 系统时间clock --systohc 系统时间 写入 硬件时间hwclock -r 显示硬件时间hwclock -s 硬件时间 写入 系统时间hwclock -w 系统时间 写入 硬件时间
自动启动配置:
[root@rac1centorder ~]# chkconfig --list ntpdntpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off[root@rac1centorder ~]# chkconfig ntpd on[root@rac1centorder ~]# chkconfig --list ntpdntpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
开始NTP服务:
[root@rac1centorder ~]# service ntpd statusntpd 已停[root@rac1centorder ~]# service ntpd restart关闭 ntpd:[失败]正在启动 ntpd:[确定][root@rac1centorder ~]# ntpq -premote refid st t when poll reach delay offset jitter==============================================================================*192.168.7.2 120.25.115.20 3 u 27 64 377 0.187 5.667 3.343
开始NTP后查看CTSS状态:
[grid@rac1centorder ~]$ crsctl check ctssCRS-4700: The Cluster Time Synchronization Service is in Observer mode.[grid@rac2centorder ~]$ crsctl check ctssCRS-4700: The Cluster Time Synchronization Service is in Observer mode.[grid@rac1centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac1centorder/ctssd/octssd.log2020-10-26 13:54:25.782: [ CTSS][2587350784]sclsctss_gvss1: NTP default config file found2020-10-26 13:54:25.782: [ CTSS][2587350784]sclsctss_gvss8: Return [0] and NTP status [2].2020-10-26 13:54:25.782: [ CTSS][2587350784]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].2020-10-26 13:54:25.782: [ CTSS][2587350784]ctss_check_vendor_sw: Ctssd is switching to observer role2020-10-26 13:54:25.782: [ CTSS][2587350784]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[1] swversion[186647552] mode[0xe6] }.2020-10-26 13:54:25.783: [ CTSS][2587350784]ctsselect_msm: CTSS mode is [0xe6]
[grid@rac2centorder ~]$ tail -30 /u01/app/11.2.0/grid_1/log/rac2centorder/ctssd/octssd.log2020-10-26 14:28:56.783: [ CTSS][2216871680]sclsctss_gvss1: NTP default config file found2020-10-26 14:28:56.783: [ CTSS][2216871680]sclsctss_gvss8: Return [0] and NTP status [2].2020-10-26 14:28:56.783: [ CTSS][2216871680]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].2020-10-26 14:28:56.783: [ CTSS][2216871680]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[2] swversion[186647552] mode[0xee] }.2020-10-26 14:28:57.034: [ CRSCCL][2013263616]clsCclGetPriMemberData: Detected pridata change for node[2]. Retrieving it to the cache.2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received sync msg2020-10-26 14:28:58.337: [ CTSS][2221074176]ctsscomm_msg_hndlr: Received from slave ( mode [0xe6] nodenum [1] hostname [rac1centorder] )2020-10-26 14:29:06.339: [ CTSS][2221074176]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
节点2的octssd.log中也会记录发现ntp服务,ctss服务为观望模式,并且同步时间的主节点是节点1。




