oracle RAC 主机时间同步与 CTSSD

原创 ByteHouse 2024-12-05

187

1.错误描述

2024-12-05 00:06:34.745: 
[ctssd(55910)]CRS-2409:The clock on host sljj01 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.

在oracle11G R2版本中,oracle自己增加了集群中的时间同步服务,如果操作系统没有配置NTP,就会启用oracle 自带的 Cluster Time Synchronization Service ,以下为官方文档说明:

Cluster Time Synchronization Service
Cluster node times should be synchronized. With this release, Oracle Clusterware
provides Cluster Time Synchronization Service (CTSS), which ensures that there is a
synchronization service in the cluster. If Network Time Protocol (NTP) is not during cluster configuration, then CTSS is configured to ensure time synchronization

2.mos 解决方案

SYMPTOMS

CTSSD runs in observer mode even though no time sync software is running.
or
Cluvfy comp clocksync fails with:

$ ./cluvfy comp clocksync
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes:
racbde1
PRVF-5415 : Check to see if NTP daemon is running failed
Clock synchronization check using Network Time Protocol(NTP) failed
PRVF-9652 : Cluster Time Synchronization Services check failed
Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes.

or
The clusterware alert log reports errors like:

2009-12-23 20:06:53.974
[ctssd(13443)]CRS-2409:The clock on host racbde2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.

Note you could also see the CRS-2409 error if you have a vendor time synchronization service running but the times are sync’d yet.

CAUSE

NTP or some other time sync service is configured but not running. The install, CVU, and CTSSD check for the following :

There is only one and only one time synchronization service running at a time.
The time sync service can be a vendor service (NTP, Windows Time Service, etc…) or CTSS.
If vendor time service is configured, it needs to be correctly configured AND active.
For CTSS to be in active mode, a vendor time service must not be running, and must not be configured (correctly or otherwise). This is important. CTSS is conservative and will switch to OBSERVER mode the moment it discovers that a vendor service is running or configured on even one node in the cluster. This is to prevent multiple active time sync services running on the cluster, potentially changing clocks.

SOLUTION

If you choose to use a vendor time sync service (like ntp), make sure it is configured AND running.

If you choose to let CTSSD handle time synchronization, de-configure the vendor time sync service. For example, for NTP you may need to move or remove /etc/ntp.conf or /etc/xntp.conf.

After choosing option 2 (and removing /etc/ntp.conf in my case), I now see a good time status reported by cluvfy:

$ ./cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed


Oracle Cluster Time Synchronization Services check passed

Verification of Clock Synchronization across the cluster nodes was successful.

3.关于时间同步

Linux系统有两个时间同步服务：ntpd和chrony, CentOS 6.X 系统使用ntpd，CentOS 6.X 系统使用chrony。

ntpd有两种校时策略slew和step：

slew是平滑、缓慢的渐进式调整（adjusts the clock in small steps所谓的微调）。
step的同步方式指一个一个的跳跃（跳跃式调整）。

Chrony服务默认采用了渐进式模式。

时间跳跃式调整可能会对服务器业务造成影响，比如数据库事务。这里主要针对ntpd服务配置微调模式。

NTPD 微调模式配置：

ntpd 配置成"微调模式" 也就是在options中要加入 -x 的选项：

编辑 /etc/sysconfig/ntpd

OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g"

修改完成后保存。保存成功后，重启ntpd服务：

service ntpd restart

如何确认我们的NTP服务器已经更新了自己的时间呢？

[root@localhost ~]# ntpstat 
synchronised to NTP server (192.168.6.202) at stratum 2 
   time correct to within 41 ms

   polling server every 1024 s

该指令可列出NTP服务器是否与上层联机。由上述输出结果可知，时间校正在41毫秒内，且每隔1024秒会主动更新时间。该同步间隔会根据本地时钟与时间服务器的时钟误差大小而进行自动改变，同步间隔：最小64秒，最大1024秒。

当出现unsynchronised时，请检查配置，如无误，那就需要等待，ntpd服务开启之后需要等待一段时间才能同步上。
ntp在没有开启slew方式时，会一直使用step的方式，即跳跃调整偏差。在开启了slew方式时，在600s内的偏差会使用slew方式以0.5ms/s的速度缓慢平滑的进行微调，整个过程<=14天消除偏差，如果此时偏差查过了600s（10分钟）的话会继续使用step的方式分步分阶段调整偏差。
不管哪种同步方式，如果系统时间与服务器时间差别太大（默认是1000秒），ntpd会退出服务并记录日志到文件。如果加上-g参数，可忽略，不退出服务，但只能有效一次，下次客户端时差依然超过1000s，则ntpd会自动退出，并记录日志到文件。一般该命令与-x组合使用。

墨力计划

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者