暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OGG使用nfs高可用配置问题解决

原创 张鹏 2022-02-12
1524

OGG使用nfs高可用配置问题解决

 

Ogg使用nfs时,在创建extract进程时会报无法锁定文件的错误,参见如下解决,最好的解决方法还是使用acfs

 

OGG File Locking Issue with XAG on Node Switch Over when using NFS Mounted File System (Doc ID 1673231.1)

 

 

APPLIES TO:

Oracle GoldenGate - Version 12.1.2.0.0 and later
Information in this document applies to any platform.

GOAL

PROBLEM:

Provide additional information to the user on OGG File Locking issues with XAG on Node Switch Over when using a NFS mounted File System.

SOLUTION

SYMPTOMS:

On a node switch over via XAG after a node failure when using NFS mounted file system, OGG processes will not restart and encounters the following error:

ERROR   OGG-00446  Unable to lock file "<ogg_home>/dirchk/<extract_name>.cpe" (error 11, Resource temporarily unavailable)

ERROR   OGG-00446  Unable to lock file "<ogg_home>/dirchk/<pump_name>.cpe" (error 11, Resource temporarily unavailable)

ERROR   OGG-00446  Unable to lock file "<ogg_home>/dirchk/<replicat_name>.cpe" (error 11, Resource temporarily unavailable)

CAUSE:

NFS server doesn’t release locks that were established before the failure of the node, which causes the OGG processes to fail during startup since the files are still being locked and there is no timeout on these locks. This is an inherent behavior of NFS prior to NFS v4.

SOLUTION:

There are three available solutions depending on the availability and implementation of NFS on the NFS server and NFS client machines.

1. If NFS v4 is available on both the NFS server and clients, then you can mount the file system using NFS v4 and use the built-in lease based file locking feature of NFS v4. The syntax is as follows:

mount -t nfs4 oggnfs-svr:/home/oracle/ogg /mnt/oggnfs-svr/oracle/ogg -o rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,noac,timeo=600

2. If NFS v4 is not a viable solution, then you can disable the file locking within NFS, however, this will be a less secure solution. The use of “nolock” option poses a risk of data corruption, since any processes from any other NFS client machines can access and modify the files. The syntax is as follows:

 

mount -t nfs oggnfs-svr:/home/oracle/ogg /mnt/oggnfs-svr/oracle/ogg -o nolock,rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,noac,nfsvers=3,timeo=600

3. Upgrade the XAG to version 3.1 and set the NFS_UNLOCK option to 1. This was an enhancement done on XAG to clean the locks on OGG checkpoint and trail files left by NFS. The syntax is as follows:

$ agctl modify goldengate gg1 --environment_vars "NFS_UNLOCK=1"

 

 

 

Oracle GoldenGate Best Practice: NFS Mount options for use with GoldenGate (Doc ID 1232303.1)


APPLIES TO:

Oracle GoldenGate - Version 11.1.1.0.0 and later
Oracle Database - Enterprise Edition - Version 12.2.0.1 to 12.2.0.1 [Release 12.2]
Information in this document applies to any platform.

PURPOSE

The purpose of this bulletin is to document the file system mount options to use when configuring GoldenGate to run with NFS mounted file system.

Unless IO buffering is OFF, then NFS mounts should not be used when running any Oracle GoldenGate processes. A danger occurs when one process registers the end of a trail file or transaction log and moves on to the next in sequence and after this event data in the NFS IO buffer gets flushed to disk. The net result is skipped data and this cannot be compensated for with GoldenGate parameter EOFDELAY.

When using NFS mounted file system with Oracle GoldenGate files, the setting for file system caching or buffered IO must be disabled on both NFS client and server.

 

SCOPE

This document is relevant to all environments using Oracle GoldenGate.

The important factor to consider when configuring Oracle GoldenGate processes to run on NFS mounted file system is to make sure that buffered IO (data and attribute caching) is always set to OFF on both NFS client and server.

DETAILS

NOTES:

1. Oracle does not support running OGG binaries on shared storage. OGG binaries should be installed on local storage.

2. NFS v4 is supported 

 

Mount Options for Oracle GoldenGate Datafiles

Operating System

NFS Client Mount options for Oracle GoldenGate Datafiles

Sun Solaris *

rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,forcedirectio, vers=3,suid

AIX (5L) *

cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600

HPUX 11i *

rw,bg,vers=3,proto=tcp,noac,forcedirectio,hard,nointr,timeo=600,rsize=32768,wsize=32768,suid

Linux (x86-32/x86-64/Itanium) *

rw,bg,hard,rsize=32768,wsize=32768,tcp,actimeo=0,noac,vers=3,timeo=600

 

 

* Although data caching or buffered IO is set to OFF on the NFS client system, sometimes for other specialized file system such as Veritas File System (VxFS), or with NAS device/server that supports additional caching feature such as FlexCache system on NetApp, this will not take into effect unless you explicitly disable this function on the server side.For VxFS please review Note Golden Gate Config Parameters For HP Journal File Systems "direct" Option Is Performing Very Slow (Doc ID 1607386.1) as not all the options above are supported (example option hard). You will need to set the setting MINCACHE to UNBUFFERED or DIRECT and for the NetApp the FlexCache system must not be used at all with Oracle GoldenGate processes.

For Sun Solaris operating system, if extract hang situation is experienced, consider adding "timeo=600, llock" mount options on top of the ones required for oracle gg datafile.

 

NFS Server Operating System

Additional Mount Option on NFS Server Local Disk **

Sun Solaris

forcedirectio

AIX (5L) *

cio

HPUX 11i

no_fs_async

Linux (x86-32/x86-64/Itanium)

sync

NetApp (Data OnTap)

Optional FlexCache System must be disabled

 

** This option is in addition to the regular local file system mount options used to mount the local disk to be used by the NFS client where Oracle GoldenGate datafiles will be used. This setting will forced the IO behavior setting on the file system to be synchronous "sync". Asynchronous IO behavior setting on the file system is not recommended for Oracle GoldenGate datafiles and must be turned off at all times.

Note: For ZFS turning on the noac, sync and actimeo=0 on the client side would suffice.

 

For a Global File system, it may also need "localflocks" mount option for GFS2 filesystem.  

To check mount settings, cat /etc/fstab

 

When NFS mounting trail files to a server where any GoldeGate process is going to read them (including Extract pump, Replicat, Distribution Service) the process that writes to the these files (Extract, Receiver Service, Collector) must be running on the same server. For example, if sending remote trail files to a target clustered serve, and this target cluster has 2 nodes (NODE_A and NODE_B) if you are going to run a Replicat process on NODE_B, then the process that writes the trail file that the Replicat process is going to use must also be running on NODE_B. If using an Extract pump, the RMTHOST must connect to NODE_B, the same node that the Replicat reading that trail is running on. If using a Distribution Service to send the trail files it must connect to the Receiver Service that is running on the same node as the Replicat process that will be reading the trail files.

It is not supported to have your trail files that are stored on NFS mounted devices where the process writing the trail file is running on a different node or different server than the process that is reading the trail files.

 

Additional Testing Programs

Two small demo programs written in perl have been attached to this KM to help the customer identify if there are any cache related issues on the NFS mounted files system


fwriter.pl writes a 500M file, sleeping 1 sec every 1M.
freader.pl reads a file and sleeps 1 sec every 2M or until it reaches EOF. It copies the file to another location. It will also detect '\0' in the file.

To use them, first start fwriter from the machine where primary extract starts

Type command:
perl ./fwriter.pl file1
where file1 is your new file to be created on an NFS drive

Then start freader from the machine where the pump or Replicat runs

Type command:
perl ./freader.pl file1 file2
where file1 is the file fwriter is writing, and file2 can be some local file for later comparison.

If you find any of the following issues, it indicates a file caching issue on the NFS drive.

1. freader prints out something like "found NULL in file"
This is a case in which trail file record contains blocks of 0s when reader tries to read the file.

2. freader prints out "incomplete file", then NFS failed to transfer newly written blocks to reader
within 20 secs (which is how long OGG will wait for).
.

3. Even if there is no error or warning message, if, by the  end of the run, content in file1 and file2 are different, there is a file caching issue.

References 

Best Practice - Oracle GoldenGate for Linux, UNIX and Windows

 

 

 


cat /etc/fstab
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0
#sydneyrac2-priv:/SAN /SAN nfs rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0 0 0
sydneystg1-priv:/disk /u02/oradata nfs rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0 0 0
carlton:/mnt/vg01/shared/software /na_filer/software nfs ro,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600 0 0
carlton:/mnt/vg01/shared/scripts /na_filer/scripts nfs ro,bg,hard,nointr,rsize=32768,wsize=32768,tcp,vers=3,timeo=600 0 0
#nassperanza:/vol/HA_TEAM_lnxclu3_oradata /rac_shared nfs rw,bg,nointr,hard,timeo=600,rsize=32768,wsize=32768,tcp,nfsvers=3,actimeo=0 0 0

 

 

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论