暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

AIX+Oracle使用NFS挂载建议及案例分析

原创 王君慧 2023-06-27
1915

NFS挂载配置建议

NFS挂载参数

ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

为避免NFS使用时出现如上类似的错误,在挂载NFS作为备份,或是作为数据泵dump文件存放目录时,应参考对应版本,以如下参数进行挂载(Doc ID 370513.1):

Operating System Mount options for Binaries ## **Mount options for Oracle Datafiles The NFS storage for asm diskgroup (not quorum disks) that does not have the OCR and voting disk must also use this mount options ** **Mount options for CRS Voting Disk and OCR (12.1 and lower) In 12.2, both OCR and voting disks must reside in ASM. Refer to Document 2201844.1 The NFS storage for asm diskgroup (not quorum disks) that has the OCR and voting disk must also use this mount options **
Sun Solaris * rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac,vers=3,suid rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, forcedirectio, vers=3 rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3, noac,forcedirectio
AIX (5L) ** rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3,timeo=600 cio,rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,timeo=600 cio,rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac, vers=3,timeo=600
HPUX 11.23 *** – rw,bg,vers=3,proto=tcp,noac, hard,nointr,timeo=600, rsize=32768,wsize=32768,suid rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600, rsize=32768,wsize=32768 rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600 ,rsize=32768,wsize=32768
Windows (Use dNFS if needed. Refer to Document 1468114.1 Not Supported Not Supported Not Supported
Linux x86 # **** rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp, vers=3, timeo=600, actimeo=0 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,actimeo=0, vers=3,timeo=600
Linux x86-64 # **** rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0
Linux - Itanium rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0

如以上参数

同时,如果在使用上述参数的情况下依然出现ORA-27054错误(已知在10g版本出现过),可以考虑将NFS添加到/etc/filesystems

/dirnfs/point: --NFS本地挂载点名称 dev = /dev/nfs --NFS服务端设备名称 vfs = nfs nodename = <nfs address> mount = <true or false> --true 为自动挂载 options = rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3,timeo=600 account = false

也可以尝试使用配置event:

ALTER SYSTEM SET EVENTS ‘10298 trace name context forever, level 32’;

NFS挂载点选择

在选择挂载点时,避免使用根目录下的目录作为挂载点

正确示例如:

mkdir /dirnfs/point_a mount -o rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3,timeo=600 xxx.xxx.xxx.xxx:/nfsdir /dirnfs/point_a

特别是针对于Oracle 10g版本,当NFS目录挂载在根目录下的挂载点时(如/nfspoint),nfs故障(无论是否为Oracle使用)均会导致该节点数据库实例出现hang问题,不论集群或单实例该现象均会出现,详情参考下述案例。

案例分析:NFS故障导致的Oracle Rac 节点HANG

环境描述

案例环境为:
AIX 6.1 + ORACLE 10.2.0.5 RAC
数据库实例突然出现HANG,alert日志未有错误提示,操作系统df -g命令出现:

NFS server *.*.*.* not responding still trying

同时数据库sqlplus hang无法登录进入数据库,后续NFS恢复后数据库实例随之恢复正常。

问题分析

通过故障现象判断,该问题较为符合如下两篇mos提到的NFS故障导致数据库实例HANG的案例:

参考MOS文档,该问题主要原因为Oracle在UNIX系统中,通过调用系统函数getcwd->getwd->stat的方式顺序处理目录条目,后续的方法是对每个条目执行statx调用,在进程执行过程中,Oracle会调用该方法扫描根目录下的所有目录,而当根目录下挂载的NFS丢失时,statx扫描目录将会被阻塞,进而导致Oracle hang的情况出现。

由于故障时期较短,并未使用truss命令进行分析,为进一步验证问题原因,经客户同意后,分别在同版本的DG备库(10.2.0.5)、Oracle 11g + AIX 7.2的测试环境进行故障复现:

问题模拟复现

NFS磁盘挂载

创建*.148的NFS服务器作为测试NFS文件系统。

#检查挂载结果--10g
# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           9.00      7.83   14%     7706     1% /
/dev/hd2          10.00      8.10   19%    36984     2% /usr
/dev/hd9var       10.00      9.65    4%     7116     1% /var
/dev/hd3           4.00      3.88    3%      467     1% /tmp
/dev/hd1           4.00      1.12   72%     2694     2% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       0.50      0.10   81%     9047    29% /opt
/dev/lv00          0.25      0.24    4%       18     1% /var/adm/csd
/dev/fslv00       50.00     38.63   23%    23698     1% /u01
/dev/lv74      25580.00   5056.27   81%    44564     1% /u02
*.*.*.148:/nfstest     99.95     82.09   18%    75105     1% /nfstest

#检查挂载结果--11g
# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         150.00     40.09   74%   158519     2% /
/dev/hd2          20.00     13.97   31%    88107     3% /usr
/dev/hd9var       15.00     14.86    1%     1644     1% /var
/dev/hd3          15.00     12.10   20%    11836     1% /tmp
/dev/hd1          20.00      3.52   83%     8905     2% /home
/dev/hd11admin      5.00      5.00    1%        5     1% /admin
/proc                 -         -    -        -      - /proc
/dev/hd10opt      20.00     14.65   27%    48250     2% /opt
/dev/livedump      1.00      1.00    1%        4     1% /var/adm/ras/livedump
/ahafs                -         -    -       36     1% /aha
/dev/fslv01     6000.00   2541.76   58%    28918     1% /backup
/dev/loop1         6.38      0.00  100%  3346878   100% /cdrom
*.*.*.106:/oraclebackup   30714.86   3370.36   90%   564081     8% /oraclebackup
*.*.*.10:/*bak   40960.00  37487.03    9% 93162521     2% /orabak/*bak
*.*.*.148:/nfstest      99.95     82.09   18%    75105     1% /nfstest



NFS故障模拟

关闭148节点nfs服务

  # systemctl stop nfs

10g环境测试分析

#Df命令查看NFS状态
$df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           9.00      7.83   14%     7706     1% /
/dev/hd2          10.00      8.10   19%    36984     2% /usr
/dev/hd9var       10.00      9.65    4%     7116     1% /var
/dev/hd3           4.00      3.88    3%      467     1% /tmp
/dev/hd1           4.00      1.12   72%     2694     2% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       0.50      0.10   81%     9047    29% /opt
/dev/lv00          0.25      0.24    4%       18     1% /var/adm/csd
/dev/fslv00       50.00     38.63   23%    23698     1% /u01
/dev/lv74      25580.00   5055.41   81%    44566     1% /u02
NFS server *.*.*.148 not responding still trying

#Df命令hang住

#sqlplus命令

$
$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Thu Apr 13 14:16:27 2023

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.



#sqlplus 命令hang住

使用truss命令分析进程状态:

  truss  -aefd sqlplus / as sysdba

10g环境日志:

221464: 0.3678:        statx("./../../../../../esa", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3680:        statx("./../../../../../etc", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3683:        statx("./../../../../../fxcdb_200110_1624.nmon", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3687:        statx("./../../../../../fxcdb_200515_0832.nmon", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3689:        statx("./../../../../../fxcdb_200518_0752.nmon", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3692:        statx("./../../../../../home", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3695:        statx("./../../../../../image.data", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3697:        statx("./../../../../../installp.log", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3700:        statx("./../../../../../lib", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3703:        statx("./../../../../../lost+found", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3705:        statx("./../../../../../lpp", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3708:        statx("./../../../../../mnt", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3711:        statx("./../../../../../nas_142", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
221464: 0.3713:        statx("./../../../../../nas_vol15", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
213256: 2.1795:        kread(10, " i x\0\0\0\0\0\0\010".., 64) (sleeping...)
221464: 2.3718:        statx("./../../../../../nfstest", 0x0FFFFFFFFFFF5AC0, 176, 021) (sleeping...)

与1316251.1符合,说明10.2.0.5版本在NFS故障时确实会出现此类问题。
image.png

11g环境测试分析

# df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4         150.00     40.09   74%   158520     2% /
/dev/hd2          20.00     13.97   31%    88107     3% /usr
/dev/hd9var       15.00     14.86    1%     1644     1% /var
/dev/hd3          15.00     12.10   20%    11836     1% /tmp
/dev/hd1          20.00      3.52   83%     8905     2% /home
/dev/hd11admin      5.00      5.00    1%        5     1% /admin
/proc                 -         -    -        -      - /proc
/dev/hd10opt      20.00     14.65   27%    48250     2% /opt
/dev/livedump      1.00      1.00    1%        4     1% /var/adm/ras/livedump
/ahafs                -         -    -       36     1% /aha
/dev/fslv01     6000.00   2541.76   58%    28918     1% /backup
/dev/loop1         6.38      0.00  100%  3346878   100% /cdrom
*.*.*.106:/oraclebackup   30714.86   3370.36   90%   564081     8% /oraclebackup
*.*.*.10:/*bak   40960.00  37487.03    9% 93162521     2% /orabak/*bak
NFS server *.*.*.148 not responding still trying

#Df命令hang住
#sqlplus命令
$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 13 14:21:55 2023

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> 

#sqlplus 命令成功登入数据库

truss命令日志:

23462304: 76874021: 0.0788:        statx(".", 0x0FFFFFFFFFFFD780, 176, 010) = 0
23462304: 76874021: 0.0805:        statx("/", 0x0FFFFFFFFFFFD990, 176, 020) = 0
23462304: 76874021: 0.0807:        statx("./", 0x0FFFFFFFFFFFD990, 176, 020) = 0
23462304: 76874021: 0.0809:        statx("./../", 0x0FFFFFFFFFFFD780, 176, 010) = 0
23462304: 76874021: 0.0824:        fstatx(6, 0x0FFFFFFFFFFFD990, 176, 020) = 0
23462304: 76874021: 0.0831:        statx("./../../", 0x0FFFFFFFFFFFD780, 176, 010) = 0
23462304: 76874021: 0.0846:        fstatx(6, 0x0FFFFFFFFFFFD990, 176, 020) = 0
23462304: 76874021: 0.0870:        statx("/u01/app/oracle", 0x0FFFFFFFFFFFB540, 176, 0) = 0
23462304: 76874021: 0.0873:        statx("/u01/app/oracle/diag", 0x0FFFFFFFFFFFB540, 176, 0) = 0
23462304: 76874021: 0.0875:        statx("/u01/app/oracle/diag/clients", 0x0FFFFFFFFFFFB540, 176, 0) = 0
23462304: 76874021: 0.0895:        statx("/usr/lib/security/methods.cfg", 0x0FFFFFFFFFFFB970, 176, 0) = 0
23462304: 76874021: 0.1021:        statx("/etc/passwd", 0x0FFFFFFFFFFFB380, 176, 0) = 0
23462304: 76874021: 0.1179:        statx("/etc/secvars.cfg", 0x0FFFFFFFFFFFAAB0, 176, 0) = 0
23462304: 76874021: 0.1545:        statx("/usr/share/lib/zoneinfo/Asia/Shanghai", 0x0FFFFFFFFFFFB660, 176, 0) = 0
23462304: 76874021: 0.1568:        statx("/u01/app/oracle", 0x0FFFFFFFFFFFBB90, 176, 0) = 0
23462304: 76874021: 0.1643:        statx("/u01/app/oracle", 0x0FFFFFFFFFFFB910, 176, 0) = 0
23462304: 76874021: 0.1646:        statx("/u01/app/oracle/diag", 0x0FFFFFFFFFFFB910, 176, 0) = 0
23462304: 76874021: 0.1648:        statx("/u01/app/oracle/diag/clients", 0x0FFFFFFFFFFFB910, 176, 0) = 0
23462304: 76874021: 0.1939:        statx("/u01/app/oracle/product/11.2/db_1/oracore/zoneinfo", 0x0FFFFFFFFFFFE820, 176, 0) = 0
23462304: 76874021: 0.2323:        fstatx(7, 0x0FFFFFFFFFFFEA20, 176, 010) = 0
23462304: 76874021: 0.2443:        statx(".", 0x0FFFFFFFFFFF7540, 176, 010) = 0
23462304: 76874021: 0.2460:        statx("/", 0x0FFFFFFFFFFF7750, 176, 020) = 0
23462304: 76874021: 0.2462:        statx("./", 0x0FFFFFFFFFFF7750, 176, 020) = 0
23462304: 76874021: 0.2465:        statx("./../", 0x0FFFFFFFFFFF7540, 176, 010) = 0
23462304: 76874021: 0.2479:        fstatx(8, 0x0FFFFFFFFFFF7750, 176, 020) = 0
23462304: 76874021: 0.2486:        statx("./../../", 0x0FFFFFFFFFFF7540, 176, 010) = 0
23462304: 76874021: 0.2500:        fstatx(8, 0x0FFFFFFFFFFF7750, 176, 020) = 0
23462304: 76874021: 0.3244:        statx(".", 0x0FFFFFFFFFFFD210, 176, 010) = 0
23462304: 76874021: 0.3261:        statx("/", 0x0FFFFFFFFFFFD420, 176, 020) = 0
23462304: 76874021: 0.3263:        statx("./", 0x0FFFFFFFFFFFD420, 176, 020) = 0
23462304: 76874021: 0.3266:        statx("./../", 0x0FFFFFFFFFFFD210, 176, 010) = 0
23462304: 76874021: 0.3280:        fstatx(8, 0x0FFFFFFFFFFFD420, 176, 020) = 0
23462304: 76874021: 0.3287:        statx("./../../", 0x0FFFFFFFFFFFD210, 176, 010) = 0
23462304: 76874021: 0.3302:        fstatx(8, 0x0FFFFFFFFFFFD420, 176, 020) = 0
23462304: 76874021: 0.3376:        statx("/dev/pts/1", 0x0FFFFFFFFFFF8748, 176, 0) = 0
23462304: 76874021: 0.3466:        statx("/dev/netcd", 0x0FFFFFFFFFFF66C8, 176, 010) Err#2  ENOENT
23462304: 76874021: 0.3885:        statx("/u01/app/oracle/product/11.2/db_1/sqlplus/admin/glogin.sql", 0x0FFFFFFFFFFFDAF0, 176, 010) = 0
23462304: 76874021: 0.3910:        fstatx(8, 0x0FFFFFFFFFFFD6D0, 176, 010) = 0

将11g的truss日志提取出所有statx调用日志,从上述日志看,statx没有再扫描根目录下的所有路径,那么对于11g来说,这个问题很可能已经优化掉了。

10g 版本测试挂载非根目录方法是否有效

在10.2.0.5版本测试该优化方法是否有效:

挂载nfs到非根目录

# umount /nfstest
# mkdir -p /test/nfstest 
# mount *.*.*.148:/nfstest /test/nfstest

关闭nfs服务

# systemctl stop nfs

检查sqlplus 命令

$sqlplus / as sysdba SQL*Plus: Release 10.2.0.5.0 - Production on Thu Apr 13 14:52:35 2023 Copyright (c) 1982, 2010, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> --sqlplus 命令未受影响

truss命令日志查看日志

398036: 0.3735:        statx("./../../../../../image.data", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3738:        statx("./../../../../../installp.log", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3740:        statx("./../../../../../lib", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3743:        statx("./../../../../../lost+found", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3745:        statx("./../../../../../lpp", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3748:        statx("./../../../../../mnt", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3750:        statx("./../../../../../nas_142", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3753:        statx("./../../../../../nas_vol15", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3756:        statx("./../../../../../nfstest", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3759:        statx("./../../../../../opt", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3761:        statx("./../../../../../orabackup", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3764:        statx("./../../../../../oradatalv", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3767:        statx("./../../../../../orapworcl1", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3770:        statx("./../../../../../proc", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3772:        statx("./../../../../../root", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3775:        statx("./../../../../../sbin", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3778:        statx("./../../../../../slk", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3780:        statx("./../../../../../smit.log", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3783:        statx("./../../../../../smit.script", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3786:        statx("./../../../../../smit.transaction", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3788:        statx("./../../../../../test", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3791:        statx("./../../../../../tftpboot", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3793:        statx("./../../../../../tivoli", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3796:        statx("./../../../../../tmp", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3798:        statx("./../../../../../u", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0
398036: 0.3801:        statx("./../../../../../u01", 0x0FFFFFFFFFFF5AC0, 176, 021) = 0

可以正常扫描过NFS的父级目录,并无错误提示。

解决方案

问题处理

根据mos文档描述的问题原因,当出现该问题时有如下处理方式:

1.修复出现问题的NFS
2.重启操作系统并不再挂载该目录

问题预防

由于该问题原因为Oracle在系统调用阶段扫描根目录出现的问题,那么我们可以通过将NFS挂载点转移到次级目录即可:

# unmount /faraway_files
# mkdir /my_mounts
# mv /faraway_files /my_mounts
# mount remhost01:/documents /my_mounts/faraway_files

同时通过测试来看,这个问题在11g很可能已经被修复(测试数据库11.2.0.4补丁为20年10月),不过mos文档指定的范围为10.2.0.1以上,且本次测试仅做了登录及基础查询测试,保险起见,在AIX部署的Oracle环境,还是建议将NFS挂载点指定为非根目录下的次级目录。

参考文档

When NFS Server Is Down, Oracle Server Freezes With No Errors In Alert Log File (Doc ID 1316251.1)
Disconnected NFS Mount Point Causes Instance to Hang on AIX (Doc ID 1445600.1)
最后修改时间:2023-06-28 09:54:46
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论