暂无图片
暂无图片
4
暂无图片
暂无图片
暂无图片

Oracle11G集群磁盘异常掉落

273

Part1问题报错

邮件收到某套Oracle11GR2的数据库集群报警

异常,ARCH使用率为24.36%; 
DATA使用率为100.0%; 
OCR使用率为1.3%; 
REDO1使用率为38.27%; 
REDO使用率为38.27%;

Part2问题定位

2.1检查磁盘使用率

select group_number,name,total_mb/1024,round(free_mb/1024,2),round((total_mb - free_mb)/total_mb,2)*100 || '%'  from v$asm_diskgroup;

GROUP_NUMBER      NAME     TOTAL_MB     free_mb     ROUND((TOTAL_MB-FREE_MB)/TOTAL_MB,2)*100|
------------   ---------  ------------  ---------   --------------------------------
       1          ARCH       1500        1134.64              4%
       2          DATA       1000           0                 100%
       3          OCR          30          29.61              %
       4          REDO1        50          30.86              8%
       5          REDO         50          30.86              8%

2.2检查表空间使用率

SQL> set pages 999 lines 999
SQL> select a.tablespace_name,
       round(a.bytes / 1024 / 1024 /  1024, 0) "sum G",
       round((a.bytes - b.bytes) / 1024 /  1024 / 1024, 0) "used G",
       round(b.bytes / 1024 / 1024 /  1024, 0) "free G",
       round(((a.bytes - b.bytes) /  a.bytes) *   2    3    4    5  100, 2) "used%"
  from (select tablespace_name,  sum(bytes) bytes
          from dba_data_files
         group by tablespace_name) a,
       (select tablespace_name,  sum(bytes) bytes, max(bytes) largest
          from dba_free_space
         group by tablespace_name) b
where a.tablespace_name =  b.tablespace_name
order by ((a.bytes - b.bytes) / a.bytes)  desc;  6    7    8    9   10   11   12   13  

TABLESPACE_NAME             sum G     used G     free G      used%
------------------------------ ---------- ---------- ---------- ----------
SYSAUX                       63      62          1      98.14
USERS                        63      52         11      81.83
OSPBASE                     837     659        178      78.78
SYSTEM                        5       1          4      29.96
UNDOTBS1                     31       1         30       1.98
UNDOTBS2                     32       0         32        .64

发现表空间和磁盘组的使用数量比对不上,

想的是要么磁盘组的状态是 normal,要么是开发自己自己扩容的操作

2.3检查磁盘问题

select a.path,a.name,a.mode_status,b.name diskgroupname,b.type from v$asm_disk a,v$asm_diskgroup b where a.group_number=b.group_number order by NAME;

PATH                          NAME                 MODE_ST DISKGROUPNAME            TYPE
--------------------------------------------- ------------------------------ ------- ------------------------------ ------
/dev/asm-arch1                      ARCH_0000              ONLINE  ARCH                EXTERN
/dev/asm-arch2                      ARCH_0001              ONLINE  ARCH                EXTERN
/dev/asm-arch3                      ARCH_0002              ONLINE  ARCH                EXTERN
/dev/asm-data5                      DATA_0000              ONLINE  DATA                NORMAL
/dev/asm-data13                   DATA_0004              ONLINE  DATA                NORMAL
/dev/asm-ocr1                      OCR_0000                 ONLINE  OCR                EXTERN
/dev/asm-ocr2                      OCR_0001                 ONLINE  OCR                EXTERN
/dev/asm-ocr3                      OCR_0002                 ONLINE  OCR                EXTERN
/dev/asm-redo2                      REDO1_0000             ONLINE  REDO1                EXTERN
/dev/asm-redo1                      REDO_0000              ONLINE  REDO                EXTERN
                                _DROPPED_0001_DATA         OFFLINE DATA                NORMAL

其中有状态异常的 _DROPPED_0001_DATA 磁盘

2.4磁盘对应

[grid@qhxsdbrac2 ~]$ ls -lrt /dev/asm-data*
brw-rw---- 1 grid asmadmin  8, 240 Nov  3 09:22 /dev/asm-data9
brw-rw---- 1 grid asmadmin 65,  48 Nov  3 09:22 /dev/asm-data6
brw-rw---- 1 grid asmadmin 65, 112 Nov  3 09:22 /dev/asm-data7
brw-rw---- 1 grid asmadmin 65, 128 Nov  3 09:22 /dev/asm-data8
brw-rw---- 1 grid asmadmin 65, 192 Nov  3 14:23 /dev/asm-data10
brw-rw---- 1 grid asmadmin 65, 224 Nov  3 14:23 /dev/asm-data12
brw-rw---- 1 grid asmadmin 66,   0 Nov  3 14:34 /dev/asm-data11
brw-rw---- 1 grid asmadmin 66,  64 Nov  3 14:34 /dev/asm-data3
brw-rw---- 1 grid asmadmin 65,  16 Nov  3 14:34 /dev/asm-data4
brw-rw---- 1 grid asmadmin 66,  48 Nov  3 14:34 /dev/asm-data2
brw-rw---- 1 grid asmadmin 65,  80 Nov  3 14:34 /dev/asm-data5
brw-rw---- 1 grid asmadmin 66,  16 Nov  3 14:34 /dev/asm-data13
brw-rw---- 1 grid asmadmin 66,  32 Nov  3 14:34 /dev/asm-data1

2.5检查集群磁盘信息

[grid@ ~]$  kfod disks=all status=TRUE dscvgroup=TRUE
--------------------------------------------------------------------------------
 Disk          Size Header    Path                                     Disk Group   User     Group   
================================================================================
   1:     512000 Mb MEMBER    /dev/asm-arch1                           ARCH         grid     asmadmin
   2:     512000 Mb MEMBER    /dev/asm-arch2                           ARCH         grid     asmadmin
   3:     512000 Mb MEMBER    /dev/asm-arch3                           ARCH         grid     asmadmin
   4:    1024000 Mb MEMBER    /dev/asm-data1                           DATA         grid     asmadmin
   5:    1024000 Mb MEMBER    /dev/asm-data10                          DATA         grid     asmadmin
   6:    1024000 Mb MEMBER    /dev/asm-data11                          DATA         grid     asmadmin
   7:    1024000 Mb MEMBER    /dev/asm-data12                          DATA         grid     asmadmin
   8:    1024000 Mb MEMBER    /dev/asm-data13                          DATA         grid     asmadmin
   9:    1024000 Mb MEMBER    /dev/asm-data2                           DATA         grid     asmadmin
  10:    1024000 Mb MEMBER    /dev/asm-data3                           DATA         grid     asmadmin
  11:    1024000 Mb MEMBER    /dev/asm-data4                           DATA         grid     asmadmin
  12:    1024000 Mb MEMBER    /dev/asm-data5                           DATA         grid     asmadmin
  13:    1024000 Mb MEMBER    /dev/asm-data6                           DATA         grid     asmadmin
  14:    1024000 Mb MEMBER    /dev/asm-data7                           DATA         grid     asmadmin
  15:    1024000 Mb MEMBER    /dev/asm-data8                           DATA         grid     asmadmin
  16:    1024000 Mb MEMBER    /dev/asm-data9                           DATA         grid     asmadmin
  17:      10240 Mb MEMBER    /dev/asm-ocr1                            OCR          grid     asmadmin
  18:      10240 Mb MEMBER    /dev/asm-ocr2                            OCR          grid     asmadmin
  19:      10240 Mb MEMBER    /dev/asm-ocr3                            OCR          grid     asmadmin
  20:      51200 Mb MEMBER    /dev/asm-redo1                           REDO         grid     asmadmin
  21:      51200 Mb MEMBER    /dev/asm-redo2                           REDO1        grid     asmadmin
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME                                                          
================================================================================
     +ASM2 /home/app/11.2.0/grid                                                

通过kfed查询数据库磁盘对应信息

[grid@ ~]$ kfed read /dev/asm-data1   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data2   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0001 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data3   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0007 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data4   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0008 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data5   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data6   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0009 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data7   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0010 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data8   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0011 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data9   | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0012 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data10  | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0003 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data11  | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0005 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data12  | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0006 ; 0x028: length=9

[grid@ ~]$ kfed read /dev/asm-data13  | grep 'kfdhdb.dskname'
kfdhdb.dskname:               DATA_0004 ; 0x028: length=9

Part3问题解决

3.1添加磁盘信息报错

SQL> alter diskgroup DATA add disk '/dev/asm-data2';
alter diskgroup DATA add disk '/dev/asm-data2'

*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-data2' belongs to diskgroup "DATA"

3.2强制添加

alter diskgroup DATA add disk '/dev/asm-data11' force;

3.3设置磁盘平衡力度

SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT      POWER     ACTUAL       SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
       2      REBAL RUN        1       1       14454    1340163     16564        80


SQL> alter diskgroup DATA rebalance power 10;


SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT      POWER     ACTUAL       SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
       2      REBAL RUN        10       10       14454    1340163     16564        57

3.4坐等错误信息消失

select a.path,a.name,a.mode_status,b.name diskgroupname,b.type from v$asm_disk a,v$asm_diskgroup b where a.group_number=b.group_number order by NAME;

PATH                          NAME                 MODE_ST DISKGROUPNAME            TYPE
--------------------------------------------- ------------------------------ ------- ------------------------------ ------
/dev/asm-arch1                      ARCH_0000              ONLINE  ARCH                EXTERN
/dev/asm-arch2                      ARCH_0001              ONLINE  ARCH                EXTERN
/dev/asm-arch3                      ARCH_0002              ONLINE  ARCH                EXTERN
/dev/asm-data5                      DATA_0000              ONLINE  DATA                NORMAL
/dev/asm-data13                   DATA_0004              ONLINE  DATA                NORMAL
/dev/asm-ocr1                      OCR_0000                 ONLINE  OCR                EXTERN
/dev/asm-ocr2                      OCR_0001                 ONLINE  OCR                EXTERN
/dev/asm-ocr3                      OCR_0002                 ONLINE  OCR                EXTERN
/dev/asm-redo2                      REDO1_0000             ONLINE  REDO1                EXTERN
/dev/asm-redo1                      REDO_0000              ONLINE  REDO                EXTERN
                                _DROPPED_0001_DATA         OFFLINE DATA                NORMAL

添加完了_DROPPED_0001_DATA就不在了

3.5查询剩余磁盘信息


SQL> select OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,name,path from  v$asm_disk order by name;

OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME                 PATH
---------- ------------- ------------ ------------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       500         500   378.212891 ARCH_0000              /dev/asm-arch1
       500         500    378.21582 ARCH_0001              /dev/asm-arch2
       500         500   378.213867 ARCH_0002              /dev/asm-arch3
      1000        1000   .499023438 DATA_0000              /dev/asm-data5
      1000        1000   998.854492 DATA_0002              /dev/asm-data1
      1000        1000   .499023438 DATA_0004              /dev/asm-data13
    10          10   9.84765625 OCR_0000                 /dev/asm-ocr1
    10          10   9.88085938 OCR_0001                 /dev/asm-ocr2
    10          10   9.88085938 OCR_0002                 /dev/asm-ocr3
    50          50   30.8632813 REDO1_0000             /dev/asm-redo2
    50          50   30.8632813 REDO_0000              /dev/asm-redo1
     0        1000   876.958984 _DROPPED_0001_DATA
      1000           0        0                     /dev/asm-data2
      1000           0        0                     /dev/asm-data11
      1000           0        0                     /dev/asm-data9
      1000           0        0                     /dev/asm-data3
      1000           0        0                     /dev/asm-data10
      1000           0        0                     /dev/asm-data6
      1000           0        0                     /dev/asm-data7
      1000           0        0                     /dev/asm-data12
      1000           0        0                     /dev/asm-data8
      1000           0        0                     /dev/asm-data4

3.6添加剩余磁盘

alter diskgroup DATA add disk '/dev/asm-data2' force;
alter diskgroup DATA add disk '/dev/asm-data3' force;
alter diskgroup DATA add disk '/dev/asm-data4' force;
alter diskgroup DATA add disk '/dev/asm-data6' force;
alter diskgroup DATA add disk '/dev/asm-data7' force;
alter diskgroup DATA add disk '/dev/asm-data8' force;
alter diskgroup DATA add disk '/dev/asm-data9' force;
alter diskgroup DATA add disk '/dev/asm-data10' force;
alter diskgroup DATA add disk '/dev/asm-data12' force;

Part4问题回顾

1、这套库已经异常重启多次,但是云平台没给具体答复

2、上次解决是通过节点1的报警日志查询磁盘信息来梳理和定位的,结果节点1被我重启起不来了

3、但是上次有个群友说是kfed可以查询磁盘对应信息,这次针对这个命令进行了查询

4、怎么才能把磁盘按照顺序添加一次呢?

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论