问题描述
客户安装了一个Oracle 11.2.0.2 for Linux X64环境,使用了ASM方式,重启后发现数据库没有启动。
检查操作系统进程,发现ASM已经启动,CLUSTER环境也启动完成,但是数据库并未启动,尝试手工启动数据库,报错找不到对应的SPFILE。
登录ASM实例,查询发现找不到任何的ASM磁盘组信息:
专家解答
登录ASM实例,查询发现找不到任何的ASM磁盘组信息:
[grid@rptdb trace]$ sqlplus / AS sysasm SQL*Plus: Release 11.2.0.2.0 Production ON Tue DEC 20 18:41:41 2011 Copyright (c) 1982, 2010, Oracle. ALL rights reserved. Connected TO: Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production WITH the Automatic Storage Management OPTION SQL> SELECT * FROM v$asm_diskgroup; no ROWS selected
查询V$ASM_DISK可以看到磁盘信息,但是磁盘头的头状态是UNKNOWN:
SQL> SELECT GROUP_NUMBER, DISK_NUMBER, MOUNT_STATUS, HEADER_STATUS, PATH FROM V$ASM_DISK; GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU PATH ------------ ----------- ------- ------------ ---------------------------------- 0 1 CLOSED UNKNOWN ORCL:FRAVOL 0 0 CLOSED UNKNOWN ORCL:DATAVOL
显然Oracle并没有去加载磁盘组,利用当前ASM的SPFILE生成PFILE后发现,只有最基本的参数:
*.asm_power_limit=1 *.diagnostic_dest='/u01/app/grid' *.instance_type='asm' *.large_pool_size=12M *.remote_login_passwordfile='EXCLUSIVE'
连ASM_DISKGROUPS参数都没有,难怪查询V$ASM_DISKGROUP找不到任何数据,添加下面的参数:
asm_diskgroups='DATA','FRA' asm_diskstring='ORCL:*VOL'
重启ASM实例,在加载磁盘组时报错:
[grid@rptdb ~]$ sqlplus / AS sysasm SQL*Plus: Release 11.2.0.2.0 Production ON Tue DEC 20 18:25:03 2011 Copyright (c) 1982, 2010, Oracle. ALL rights reserved. Connected TO: Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production WITH the Automatic Storage Management OPTION SQL> shutdown immediate ORA-15100: invalid OR missing diskgroup name ASM instance shutdown SQL> startup pfile=/home/grid/init+ASM.ora ASM instance started Total System Global Area 283930624 bytes Fixed SIZE 2225792 bytes Variable SIZE 256539008 bytes ASM Cache 25165824 bytes ORA-15032: NOT ALL alterations performed ORA-15017: diskgroup "FRA" cannot be mounted ORA-15063: ASM discovered an insufficient NUMBER OF disks FOR diskgroup "FRA" ORA-15017: diskgroup "DATA" cannot be mounted ORA-15063: ASM discovered an insufficient NUMBER OF disks FOR diskgroup "DATA"
导致这个错误的原因有很多,比如权限,多路径设置,存储设置等等,检查告警日志寻找进一步的信息:
Tue DEC 20 18:25:47 2011 * instance_number obtained FROM CSS = 1, checking FOR the existence OF node 0... * node 0 does NOT exist. instance_number = 1 Starting ORACLE instance (normal) ****************** Huge Pages Information ***************** Huge Pages memory pool detected (total: 42000 free: 42000) DFLT Huge Pages allocation successful (allocated: 0) *********************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 3 USING LOG_ARCHIVE_DEST_1 parameter DEFAULT VALUE AS /u01/app/grid/product/11.2.0/grid/dbs/arch Autotune OF undo retention IS turned ON. IMODE=BR ILAT =0 LICENSE_MAX_USERS = 0 SYS auditing IS disabled Starting up: Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production WITH the Automatic Storage Management OPTION. USING parameter settings IN client-side pfile /home/grid/init+ASM.ora ON machine rptdb System parameters WITH non-DEFAULT VALUES: large_pool_size = 12M instance_type = "asm" remote_login_passwordfile= "EXCLUSIVE" asm_diskstring = "ORCL:*VOL" asm_diskgroups = "DATA" asm_diskgroups = "FRA" asm_power_limit = 1 diagnostic_dest = "/u01/app/grid" Tue DEC 20 18:25:48 2011 PMON started WITH pid=2, OS id=14718 Tue DEC 20 18:25:48 2011 PSP0 started WITH pid=3, OS id=14722 Tue DEC 20 18:25:49 2011 VKTM started WITH pid=4, OS id=14726 at elevated priority VKTM running at (1)millisec PRECISION WITH DBRM quantum (100)ms Tue DEC 20 18:25:49 2011 GEN0 started WITH pid=5, OS id=14732 Tue DEC 20 18:25:49 2011 DIAG started WITH pid=6, OS id=14736 Tue DEC 20 18:25:49 2011 DIA0 started WITH pid=7, OS id=14740 Tue DEC 20 18:25:49 2011 MMAN started WITH pid=8, OS id=14744 Tue DEC 20 18:25:49 2011 DBW0 started WITH pid=9, OS id=14748 Tue DEC 20 18:25:49 2011 LGWR started WITH pid=10, OS id=14752 Tue DEC 20 18:25:49 2011 CKPT started WITH pid=11, OS id=14756 Tue DEC 20 18:25:49 2011 SMON started WITH pid=12, OS id=14760 Tue DEC 20 18:25:49 2011 RBAL started WITH pid=13, OS id=14764 Tue DEC 20 18:25:49 2011 GMON started WITH pid=14, OS id=14768 Tue DEC 20 18:25:49 2011 MMON started WITH pid=15, OS id=14772 Tue DEC 20 18:25:49 2011 MMNL started WITH pid=16, OS id=14776 ORACLE_BASE NOT SET IN environment. It IS recommended that ORACLE_BASE be SET IN the environment Tue DEC 20 18:25:49 2011 SQL> ALTER DISKGROUP ALL MOUNT NOTE: Diskgroups listed IN ASM_DISKGROUPS are DATA FRA NOTE: cache registered GROUP DATA NUMBER=1 incarn=0x157c40a1 NOTE: cache began mount (FIRST) OF GROUP DATA NUMBER=1 incarn=0x157c40a1 NOTE: cache registered GROUP FRA NUMBER=2 incarn=0x158c40a2 NOTE: cache began mount (FIRST) OF GROUP FRA NUMBER=2 incarn=0x158c40a2 NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so ORA-15186: ASMLIB error FUNCTION = [asm_open(global)], error = [1], mesg = [Operation NOT permitted] ORA-15025: could NOT OPEN disk "ORCL:DATAVOL" ORA-15186: ASMLIB error FUNCTION = [asm_open(global)], error = [1], mesg = [Operation NOT permitted] ORA-15025: could NOT OPEN disk "ORCL:FRAVOL" ERROR: no PST quorum IN GROUP: required 2, found 0 NOTE: cache dismounting (clean) GROUP 1/0x157C40A1 (DATA) NOTE: dbwr NOT being msg'd to dismount NOTE: lgwr not being msg'd TO dismount NOTE: cache dismounted GROUP 1/0x157C40A1 (DATA) NOTE: cache ending mount (fail) OF GROUP DATA NUMBER=1 incarn=0x157c40a1 NOTE: cache deleting context FOR GROUP DATA 1/0x157c40a1 GMON dismounting GROUP 1 at 2 FOR pid 17, osid 14779 ERROR: diskgroup DATA was NOT mounted ERROR: no PST quorum IN GROUP: required 2, found 0 NOTE: cache dismounting (clean) GROUP 2/0x158C40A2 (FRA) NOTE: dbwr NOT being msg'd to dismount NOTE: lgwr not being msg'd TO dismount NOTE: cache dismounted GROUP 2/0x158C40A2 (FRA) NOTE: cache ending mount (fail) OF GROUP FRA NUMBER=2 incarn=0x158c40a2 NOTE: cache deleting context FOR GROUP FRA 2/0x158c40a2 GMON dismounting GROUP 2 at 4 FOR pid 17, osid 14779 ERROR: diskgroup FRA was NOT mounted ORA-15032: NOT ALL alterations performed ORA-15017: diskgroup "FRA" cannot be mounted ORA-15063: ASM discovered an insufficient NUMBER OF disks FOR diskgroup "FRA" ORA-15017: diskgroup "DATA" cannot be mounted ORA-15063: ASM discovered an insufficient NUMBER OF disks FOR diskgroup "DATA" ERROR: ALTER DISKGROUP ALL MOUNT
这里发现了ORA-15186和ORA-15025错误。而ORA-15186错误发生在ASMLIB调用上,根据这个错误更容易定位到问题的原因。
在MOS上,有专门的文章描述这个问题:Mount ASM Disk Group Fails : ORA-15186, ORA-15025, ORA-15063 [ID 1384504.1],根据问题描述,导致这个错误的原因是多路径的配置存在问题:
[grid@rptdb ~]$ /etc/init.d/oracleasm listdisks DATAVOL FRAVOL [grid@rptdb ~]$ cat /proc/partitions major minor #blocks name 8 0 285155328 sda 8 1 104391 sda1 8 2 285049327 sda2 8 16 1073741824 sdb 8 17 322119283 sdb1 8 18 429497775 sdb2 8 19 322119315 sdb3 8 32 1073741824 sdc 8 33 322119283 sdc1 8 34 429497775 sdc2 8 35 322119315 sdc3 8 48 1073741824 sdd 8 49 322119283 sdd1 8 50 429497775 sdd2 8 51 322119315 sdd3 8 64 1073741824 sde 8 65 322119283 sde1 8 66 429497775 sde2 8 67 322119315 sde3 253 0 150896640 dm-0 253 1 134119424 dm-1 253 2 1073741824 dm-2 253 3 322119283 dm-3 253 4 429497775 dm-4 253 5 322119315 dm-5 [grid@rptdb ~]$ ls -l /dev/oracleasm/disks total 0 brw-rw---- 1 grid oinstall 8, 18 Dec 20 17:42 DATAVOL brw-rw---- 1 grid oinstall 8, 19 Dec 20 17:42 FRAVOL
可以看到,DATAVOL和FRAVOL没有对应到多路径设备dm-n上,而是对应到了sdb2和sdb3上。
修改/etc/sysconfig/oracleasm文件,将ORACLEASM_SCANORDER参数和ORACLEASM_SCANEXCLUDE修改如下:
ORACLEASM_SCANORDER="mpath dm" ORACLEASM_SCANEXCLUDE="sd"
修改后重启服务器,由于ASM中的SPFILE还是存在问题的,所以先关闭,然后重新启动:
[grid@rptdb ~]$ sqlplus / AS sysasm SQL*Plus: Release 11.2.0.2.0 Production ON Tue DEC 20 19:05:33 2011 Copyright (c) 1982, 2010, Oracle. ALL rights reserved. Connected. SQL> shutdown abort ASM instance shutdown SQL> startup pfile=/home/grid/init+ASM.ora ASM instance started Total System Global Area 283930624 bytes Fixed SIZE 2225792 bytes Variable SIZE 256539008 bytes ASM Cache 25165824 bytes ASM diskgroups mounted SQL> CREATE spfile='+DATA' FROM pfile='/home/grid/init+ASM.ora'; File created.
磁盘组已经顺利启动,创建一个SPFILE文件,确保下次ASM自动启动可以加载磁盘组。切换到Oracle用户打开数据库:
[root@rptdb ~]# su - oracle [oracle@rptdb ~]$ sqlplus / AS sysdba SQL*Plus: Release 11.2.0.2.0 Production ON Tue DEC 20 19:06:27 2011 Copyright (c) 1982, 2010, Oracle. ALL rights reserved. Connected TO an idle instance. SQL> startup ORACLE instance started. Total System Global Area 6.4939E+10 bytes Fixed SIZE 2242560 bytes Variable SIZE 3.0467E+10 bytes DATABASE Buffers 3.4360E+10 bytes Redo Buffers 109195264 bytes DATABASE mounted. DATABASE opened. SQL> exit Disconnected FROM Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production WITH the Partitioning, Automatic Storage Management, OLAP, DATA Mining AND REAL Application Testing options
数据库顺利打开,最后检查一下ASM磁盘的设置:
[oracle@rptdb ~]$ ls -l /dev/oracleasm/disks total 0 brw-rw---- 1 oracle oinstall 253, 4 Dec 20 19:00 DATAVOL brw-rw---- 1 oracle oinstall 253, 5 Dec 20 19:00 FRAVOL [oracle@rptdb ~]$ cat /proc/partitions major minor #blocks name 8 0 285155328 sda 8 1 104391 sda1 8 2 285049327 sda2 8 16 1073741824 sdb 8 17 322119283 sdb1 8 18 429497775 sdb2 8 19 322119315 sdb3 8 32 1073741824 sdc 8 33 322119283 sdc1 8 34 429497775 sdc2 8 35 322119315 sdc3 8 48 1073741824 sdd 8 49 322119283 sdd1 8 50 429497775 sdd2 8 51 322119315 sdd3 8 64 1073741824 sde 8 65 322119283 sde1 8 66 429497775 sde2 8 67 322119315 sde3 253 0 150896640 dm-0 253 1 134119424 dm-1 253 2 1073741824 dm-2 253 3 322119283 dm-3 253 4 429497775 dm-4 253 5 322119315 dm-5
调整oracleasm的配置后,多路径的配置恢复正常。