从oracle 11gR2开始,引入了ACFS,其中11gR2同时又引入了ASM Dynamic Volume Manager (ADVM)去支持ACFS。
在11.2的asm中,不仅仅用于存储database files,还能存储一些非结构化的数据,例如clusterware 文件、以及
一些通常的二进制文件、external files和text files。
总之,从11gR2开始,asm变得更加简单,智能化,也大大的降低了开销。彻底的抛开了对第三方卷组管理软件的依赖。
这一篇文章中,我们就来讲解11gR2引入的ADVM,也就是我们所看到的asm metadata file 7.
在未创建ADVM之前,我们直接查询是看不到file 7的,这里我们先来创建一下。
首先需要启动acfs 相关服务:
[root@11gR2test bin]# ./acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
[root@11gR2test bin]# ./acfsload start
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9322: completed
[root@11gR2test bin]# ./acfsdriverstate version
ACFS-9325: Driver OS kernel version = 2.6.18-8.el5(i386).
ACFS-9326: Driver Oracle version = 111206.
创建一个diskgroup:
创建advm卷组:
[ora11g@11gR2test ~]$ asmcmd volcreate -G acfs -s 1g acfs_v1
[ora11g@11gR2test ~]$ asmcmd volcreate -G acfs -s 1g acfs_v2
ORA-15032: not all alterations performed
ORA-15041: diskgroup "ACFS" space exhausted (DBD ERROR: OCIStmtExecute)
[ora11g@11gR2test ~]$ asmcmd volinfo -a
Diskgroup Name: ACFS
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
[ora11g@11gR2test ~]$ exit
exit
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
Volume Name: ACFS_V2
Volume Device: /dev/asm/acfs_v2-41
State: ENABLED
Size (MB): 512
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
创建完advm之后,我们再次查询试图,看下能否看到asm file 7呢?
从上面,大家可以看到,默认创建advm是必须镜像的,且其分配单元是256m,条带宽度是128k.
创建一个acfs文件系统:
[ora11g@11gR2test ~]$ /sbin/mkfs -t acfs /dev/asm/acfs_v1-41
mkfs.acfs: version = 11.2.0.2.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfs_v1-41
mkfs.acfs: volume size = 1073741824
mkfs.acfs: Format complete.
[root@11gR2test bin]# mkdir /acfs_test
[root@11gR2test bin]# chown -R ora11g:oinstall /acfs_test
[root@11gR2test bin]# mount -t acfs /dev/asm/acfs_v1-41 /acfs_test
[root@11gR2test bin]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/asm/acfs_v1-41 on /acfs_test type acfs (rw)
[ora11g@11gR2test ~]$ asmcmd volinfo -G acfs ACFS_V1
Diskgroup Name: ACFS
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /acfs_test
[ora11g@11gR2test ~]$ df -h
Filesystem Size Used Avail Use{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} Mounted on
/dev/sda2 3.8G 2.9G 767M 80{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /
/dev/sda5 19G 18G 419M 98{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /home
/dev/sda1 46M 13M 31M 31{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /boot
tmpfs 506M 154M 352M 31{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /dev/shm
/dev/asm/acfs_v1-41 1.0G 39M 986M 4{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /acfs_test
下面我们用kfed来读取advm 元数据:
---block 1
---block 2
由于我这里只有2个advm 卷,所以kfed读取前面2个block 就行了,第1个block对应第一个advm卷,第2个block对应第2个
advm卷,以此类推. 正因为我这里只有2个,所以当读取第3个block时,会发现数据都是空的,这是正常的,如下:
下面我们进入正题,解析下advm的结构,从上面kfed的结果,我们可以看到其实也就分为3部分:
1)kfbh ,头部信息 这部分内容不多说了,前面的文章中都有描述,差不多。
kfbh.type 指数据类型
kfbh.block.obj 指该元数据的asm file number,advm是file 7,所以这里看到的是7.
kfbh.block.blk 该数据所在的au block号
2)kffdnd,从上面输出的信息,我们不难猜测,这部分信息其实就是用来定位和描述block在目录树中的具体位置的。
跟前面描述disk directory的kffdnd结构是一样的,所以这里也不多说。
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0 ----分配信息,包括block的分支号和指向next freelist block的指针
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 2 ; 0x00c: 0x00000002 ---overfl,表示指向同层级的下一个block
kffdnd.overfl.incarn: 1 ; 0x010: A=1 NUMM=0x0
kffdnd.parent.number: 4294967295 ; 0x014: 0xffffffff
kffdnd.parent.incarn: 0 ; 0x018: A=0 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000 ---表示指向上一层的block
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
3)kfvvde,这部分内容是asm advm元数据定义内容。
---前面部分是entry信息,这部分内容无关紧要
kfvvde.entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfvvde.entry.hash: 0 ; 0x028: 0x00000000
kfvvde.entry.refer.number: 4294967295 ; 0x02c: 0xffffffff
kfvvde.entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
---下面才是我们关注的焦点:
kfvvde.volnm: ACFS_V1 ; 0x034: length=7 ---表示asm advm 卷名称
kfvvde.usage: ACFS ; 0x054: length=4 ---advm的type类型,这里是使用的acfs
kfvvde.dgname: ; 0x074: length=0
kfvvde.clname: ; 0x094: length=0
kfvvde.mountpath: /acfs_test ; 0x0b4: length=10 ---这里表示acfs mount的路径
kfvvde.drlinit: 1 ; 0x4b5: 0x01
kfvvde.pad1: 0 ; 0x4b6: 0x0000
kfvvde.volfnum.number: 256 ; 0x4b8: 0x00000100 ---这里表示volume file number.
kfvvde.volfnum.incarn: 808033913 ; 0x4bc: 0x30299e79
kfvvde.drlfnum.number: 257 ; 0x4c0: 0x00000101 ---这里表示volume dirty region logging 信息对应的file number
kfvvde.drlfnum.incarn: 808033913 ; 0x4c4: 0x30299e79
kfvvde.volnum: 1 ; 0x4c8: 0x0001 ---这里表示对应的卷组number号,从1开始。
kfvvde.avddgnum: 41 ; 0x4ca: 0x0029 ---这里不知道是什么意思 ?
kfvvde.extentsz: 64 ; 0x4cc: 0x00000040 ---这里表示advm的extent大小,有点类似database中的extent概念。
这里stripe是4,而其分配unit是256m,所以这里是64.
kfvvde.volstate: 2 ; 0x4d0: D=0 C=1 R=0 --这里表示advm卷组状态。2应该是表示可用。
kfvvde.pad[0]: 0 ; 0x4d1: 0x00
上面的kfvvde.drlfnum.number,表示dirty region logg,我没有找到asm相关的资料,通过google搜索发现Veritas Volume Manager有类似的机制。
Dirty region logging (DRL) is a fault recovery mechanism used in Veritas Volume Manager. If DRL is enabled, speeds recovery of
mirrored volumes after a system crash. DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume.
DRL uses this information to recover only those portions of the volume that need to be recovered.
If DRL is not used and a system failure occurs, all mirrors of the volumes must be restored to a consistent state. Restoration is
done by copying the full contents of the volume between its mirrors. This process can be lengthy and I/O intensive. It may also be
necessary to recover the areas of volumes that are already consistent.
Dirty Region Logs
DRL logically divides a volume into a set of consecutive regions, and maintains a log on disk where each region is represented by
a status bit. This log records regions of a volume for which writes are pending. Before data is written to a region, DRL synchronously
marks the corresponding status bit in the log as dirty. To enhance performance, the log bit remains set to dirty until the region
becomes the least recently accessed for writes. This allows writes to the same region to be written immediately to disk if the
region’s log bit is set to dirty.
On restarting a system after a crash, VxVM recovers only those regions of the volume that are marked as dirty in the dirty region log.
从这段文字的描述来看,我猜测oracle asm advm的DRL,跟veritas 卷管理的drl机制应该是一样或至少是类似的机制。 dirty region log,顾名思义,
也就是脏数据日志记录区域,能够加快系统crash后的恢复速度,可以防止io写错误等。
事实上,不仅仅是veritas,其他的厂家都有着类似的技术,我在Sun Cluster 2.2 Cluster Volume Manager Guide 中也看到了如下类似的描述:
Dirty Region Logging (DRL) is an optional property of a volume that provides speedy recovery of mirrored volumes after a system failure.
Dirty Region Logging is supported in cluster-shareable disk groups. This section provides a brief overview of DRL and outlines differences
between SSVM DRL and the CVM implementation of DRL. For more information on DRL, refer to Chapter 1 in the applicable Sun StorEdge
Volume Manager 2.6 System Administrator's Guide.
DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume and uses this information to recover only
the portions of the volume that need to be recovered. DRL logically divides a volume into a set of consecutive regions and maintains
a dirty region log that contains a status bit representing each region of the volume. Log subdisks store the dirty region log of a
volume that has DRL enabled. A volume with DRL has at least one log subdisk, that is associated with one of the volume's plexes.
大家可以去参考oracle的官方文档:http://docs.oracle.com/cd/E19957-01/806-2329/ch2admin-39382/index.html
最后简单总结一下:
1) 11gR2引入了acfs文件系统,同时引入了advm卷组管理,跟第三方软件的卷组管理机制类似,比如veritas,solaris cluster。
2) 11gR2的asm相比之前的版本,更为强悍,可以存储各种文件,不仅仅是datafiles、甚至还可以存储external files、text files等。
3) acfs管理并不复杂,同时advm的结构也比较简单,总共分为3部分,第一部分是头部信息,第2部分是指针信息,第3部分是advm
卷组的一些定义信息。
4) 在$GRID_HOME/bin下面有部分acfs的工具,是grid软件自带的,可以用于进行acfs的管理和监控.
在11.2的asm中,不仅仅用于存储database files,还能存储一些非结构化的数据,例如clusterware 文件、以及
一些通常的二进制文件、external files和text files。
总之,从11gR2开始,asm变得更加简单,智能化,也大大的降低了开销。彻底的抛开了对第三方卷组管理软件的依赖。
这一篇文章中,我们就来讲解11gR2引入的ADVM,也就是我们所看到的asm metadata file 7.
SQL> select number_kffxp file#, disk_kffxp disk#, count(disk_kffxp) extents
2 from x$kffxp
3 where group_kffxp=1
4 and disk_kffxp <> 65534
5 group by number_kffxp, disk_kffxp;
FILE# DISK# EXTENTS
---------- ---------- ----------
1 0 2
2 0 1
3 0 28
3 3 14
4 0 5
4 3 3
5 0 1
6 0 1
8 3 1
9 0 1
256 0 482
.........
268 3 7
269 0 1
35 rows selected.
在未创建ADVM之前,我们直接查询是看不到file 7的,这里我们先来创建一下。
首先需要启动acfs 相关服务:
[root@11gR2test bin]# ./acfsroot install
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
[root@11gR2test bin]# ./acfsload start
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9322: completed
[root@11gR2test bin]# ./acfsdriverstate version
ACFS-9325: Driver OS kernel version = 2.6.18-8.el5(i386).
ACFS-9326: Driver Oracle version = 111206.
创建一个diskgroup:
SQL> conn /as sysasm
Connected.
SQL> create diskgroup ACFS disk '/dev/sdb','/dev/sde'
2 ATTRIBUTE 'compatible.asm' = '11.2', 'compatible.advm' = '11.2';
create diskgroup ACFS disk '/dev/sdb','/dev/sde'
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15031: disk specification '/dev/sde' matches no disks
ORA-15014: path '/dev/sde' is not in the discovery set
SQL> show parameter asm
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA1, DATA
asm_diskstring string /dev/sdc, /dev/sdb, /dev/sdd
asm_power_limit integer 1
asm_preferred_read_failure_groups string
SQL> alter system set asm_diskstring='/dev/sdc','/dev/sdb','/dev/sdd','/dev/sde';
System altered.
SQL> create diskgroup ACFS disk '/dev/sdb','/dev/sde'
2 ATTRIBUTE 'compatible.asm' = '11.2', 'compatible.advm' = '11.2';
Diskgroup created.
SQL> SET LINESIZE 145
SQL> SET PAGESIZE 9999
SQL> SET VERIFY off
SQL>
SQL> COLUMN disk_group_name FORMAT a20 HEAD 'Disk Group Name'
SQL> COLUMN disk_file_path FORMAT a17 HEAD 'Path'
SQL> COLUMN disk_file_name FORMAT a20 HEAD 'File Name'
SQL> COLUMN disk_file_fail_group FORMAT a20 HEAD 'Fail Group'
SQL> COLUMN total_mb FORMAT 999,999,999 HEAD 'File Size (MB)'
SQL> COLUMN used_mb FORMAT 999,999,999 HEAD 'Used Size (MB)'
SQL> COLUMN pct_used FORMAT 999.99 HEAD 'Pct. Used'
SQL>
SQL> break on report on disk_group_name skip 1
SQL>
SQL> compute sum label "" of total_mb used_mb on disk_group_name
SQL> compute sum label "Grand Total: " of total_mb used_mb on report
SQL>
SQL> SELECT
2 NVL(a.name, '[CANDIDATE]') disk_group_name
3 , b.path disk_file_path
4 , b.name disk_file_name
5 , b.failgroup disk_file_fail_group
6 , b.total_mb total_mb
7 , (b.total_mb - b.free_mb) used_mb
8 , ROUND((1- (b.free_mb / b.total_mb))*100, 2) pct_used
9 FROM
10 v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
11 ORDER BY
12 a.name
13 /
Disk Group Name Path File Name Fail Group File Size (MB) Used Size (MB) Pct. Used
-------------------- ----------------- -------------------- -------------------- -------------- -------------- ---------
ACFS /dev/sdb ACFS_0000 ACFS_0000 2,048 53 2.59
/dev/sde ACFS_0001 ACFS_0001 2,048 53 2.59
******************** -------------- --------------
4,096 106
DATA1 /dev/sdd DATA1_0000 DATA1_0000 4,096 2,730 66.65
/dev/sdc DATA1_0003 DATA1_0003 2,048 1,365 66.65
******************** -------------- --------------
6,144 4,095
-------------- --------------
Grand Total: 10,240 4,201
创建advm卷组:
[ora11g@11gR2test ~]$ asmcmd volcreate -G acfs -s 1g acfs_v1
[ora11g@11gR2test ~]$ asmcmd volcreate -G acfs -s 1g acfs_v2
ORA-15032: not all alterations performed
ORA-15041: diskgroup "ACFS" space exhausted (DBD ERROR: OCIStmtExecute)
[ora11g@11gR2test ~]$ asmcmd volinfo -a
Diskgroup Name: ACFS
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
[ora11g@11gR2test ~]$ exit
exit
SQL> l
1 SELECT
2 NVL(a.name, '[CANDIDATE]') disk_group_name
3 , b.path disk_file_path
4 , b.name disk_file_name
5 , b.failgroup disk_file_fail_group
6 , b.total_mb total_mb
7 , (b.total_mb - b.free_mb) used_mb
8 , ROUND((1- (b.free_mb / b.total_mb))*100, 2) pct_used
9 FROM
10 v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
11 ORDER BY
12* a.name
SQL> /
Disk Group Name Path File Name Fail Group File Size (MB) Used Size (MB) Pct. Used
-------------------- ----------------- -------------------- -------------------- -------------- -------------- ---------
ACFS /dev/sdb ACFS_0000 ACFS_0000 2,048 1,084 52.93
/dev/sde ACFS_0001 ACFS_0001 2,048 1,084 52.93
******************** -------------- --------------
4,096 2,168
DATA1 /dev/sdd DATA1_0000 DATA1_0000 4,096 2,730 66.65
/dev/sdc DATA1_0003 DATA1_0003 2,048 1,365 66.65
******************** -------------- --------------
6,144 4,095
-------------- --------------
Grand Total: 10,240 6,263
SQL> !
[ora11g@11gR2test ~]$ asmcmd volcreate -G acfs -s 500m acfs_v2
[ora11g@11gR2test ~]$ asmcmd volinfo -a
Diskgroup Name: ACFS
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
Volume Name: ACFS_V2
Volume Device: /dev/asm/acfs_v2-41
State: ENABLED
Size (MB): 512
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:
创建完advm之后,我们再次查询试图,看下能否看到asm file 7呢?
SQL> select number_kffxp file#, disk_kffxp disk#, count(disk_kffxp) extents
2 from x$kffxp
3 where group_kffxp in(1,2)
4 and disk_kffxp <> 65534
5 group by number_kffxp, disk_kffxp
6 order by 1;
FILE# DISK# EXTENTS
---------- ---------- ----------
1 0 4
1 1 2
2 0 2
2 1 1
3 0 71
3 1 43
3 3 14
4 0 7
4 1 2
4 3 3
5 0 2
5 1 1
6 0 2
6 1 1
7 0 1 ---file # 7,即ADVM
7 1 1
8 0 1
8 1 1
8 3 1
9 0 2
9 1 1
256 0 498
256 1 16
256 3 240
257 0 405
257 1 5
257 3 202
258 0 77
258 1 8
258 3 33
259 0 1257
259 1 5
.........
267 0 334
267 3 168
268 0 14
268 3 7
269 0 1
50 rows selected.
SQL>
从上面,大家可以看到,默认创建advm是必须镜像的,且其分配单元是256m,条带宽度是128k.
创建一个acfs文件系统:
[ora11g@11gR2test ~]$ /sbin/mkfs -t acfs /dev/asm/acfs_v1-41
mkfs.acfs: version = 11.2.0.2.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfs_v1-41
mkfs.acfs: volume size = 1073741824
mkfs.acfs: Format complete.
[root@11gR2test bin]# mkdir /acfs_test
[root@11gR2test bin]# chown -R ora11g:oinstall /acfs_test
[root@11gR2test bin]# mount -t acfs /dev/asm/acfs_v1-41 /acfs_test
[root@11gR2test bin]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/asm/acfs_v1-41 on /acfs_test type acfs (rw)
[ora11g@11gR2test ~]$ asmcmd volinfo -G acfs ACFS_V1
Diskgroup Name: ACFS
Volume Name: ACFS_V1
Volume Device: /dev/asm/acfs_v1-41
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 256
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /acfs_test
[ora11g@11gR2test ~]$ df -h
Filesystem Size Used Avail Use{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} Mounted on
/dev/sda2 3.8G 2.9G 767M 80{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /
/dev/sda5 19G 18G 419M 98{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /home
/dev/sda1 46M 13M 31M 31{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /boot
tmpfs 506M 154M 352M 31{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /dev/shm
/dev/asm/acfs_v1-41 1.0G 39M 986M 4{39ecd679003247f2ed728ad9c7ed019a369dd84d0731b449c26bf628d3c1a20b} /acfs_test
下面我们用kfed来读取advm 元数据:
[ora11g@11gR2test ~]$ kfed read /dev/sdb |grep f1b1
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
[ora11g@11gR2test ~]$ kfed read /dev/sdb aun=2 blkn=7 | more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 7 ; 0x004: T=0 NUMB=0x7
kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1
kfbh.check: 3972293629 ; 0x00c: 0xecc463fd
kfbh.fcn.base: 263 ; 0x010: 0x00000107
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfffdb.node.incarn: 1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number: 4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes: 0 ; 0x00c: 0x00000000
kfffdb.lobytes: 1048576 ; 0x010: 0x00100000
kfffdb.xtntcnt: 3 ; 0x014: 0x00000003
kfffdb.xtnteof: 3 ; 0x018: 0x00000003
kfffdb.blkSize: 4096 ; 0x01c: 0x00001000
kfffdb.flags: 1 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=0 A=0
kfffdb.fileType: 15 ; 0x021: 0x0f
kfffdb.dXrs: 19 ; 0x022: SCHE=0x1 NUMB=0x3
kfffdb.iXrs: 19 ; 0x023: SCHE=0x1 NUMB=0x3
kfffdb.dXsiz[0]: 4294967295 ; 0x024: 0xffffffff
kfffdb.dXsiz[1]: 0 ; 0x028: 0x00000000
kfffdb.dXsiz[2]: 0 ; 0x02c: 0x00000000
kfffdb.iXsiz[0]: 4294967295 ; 0x030: 0xffffffff
kfffdb.iXsiz[1]: 0 ; 0x034: 0x00000000
kfffdb.iXsiz[2]: 0 ; 0x038: 0x00000000
kfffdb.xtntblk: 3 ; 0x03c: 0x0003
kfffdb.break: 60 ; 0x03e: 0x003c
kfffdb.priZn: 0 ; 0x040: KFDZN_COLD
kfffdb.secZn: 0 ; 0x041: KFDZN_COLD
kfffdb.ub2spare: 0 ; 0x042: 0x0000
kfffdb.alias[0]: 4294967295 ; 0x044: 0xffffffff
kfffdb.alias[1]: 4294967295 ; 0x048: 0xffffffff
kfffdb.strpwdth: 0 ; 0x04c: 0x00
kfffdb.strpsz: 0 ; 0x04d: 0x00
kfffdb.usmsz: 0 ; 0x04e: 0x0000
kfffdb.crets.hi: 32983749 ; 0x050: HOUR=0x5 DAYS=0x16 MNTH=0x2 YEAR=0x7dd
kfffdb.crets.lo: 3373440000 ; 0x054: USEC=0x0 MSEC=0xa7 SECS=0x11 MINS=0x32
kfffdb.modts.hi: 32983749 ; 0x058: HOUR=0x5 DAYS=0x16 MNTH=0x2 YEAR=0x7dd
kfffdb.modts.lo: 3373440000 ; 0x05c: USEC=0x0 MSEC=0xa7 SECS=0x11 MINS=0x32
kfffdb.dasz[0]: 0 ; 0x060: 0x00
kfffdb.dasz[1]: 0 ; 0x061: 0x00
kfffdb.dasz[2]: 0 ; 0x062: 0x00
kfffdb.dasz[3]: 0 ; 0x063: 0x00
kfffdb.permissn: 0 ; 0x064: 0x00
kfffdb.ub1spar1: 0 ; 0x065: 0x00
kfffdb.ub2spar2: 0 ; 0x066: 0x0000
kfffdb.user.entnum: 0 ; 0x068: 0x0000
kfffdb.user.entinc: 0 ; 0x06a: 0x0000
kfffdb.group.entnum: 0 ; 0x06c: 0x0000
kfffdb.group.entinc: 0 ; 0x06e: 0x0000
kfffdb.spare[0]: 0 ; 0x070: 0x00000000
kfffdb.spare[1]: 0 ; 0x074: 0x00000000
kfffdb.spare[2]: 0 ; 0x078: 0x00000000
kfffdb.spare[3]: 0 ; 0x07c: 0x00000000
kfffdb.spare[4]: 0 ; 0x080: 0x00000000
kfffdb.spare[5]: 0 ; 0x084: 0x00000000
kfffdb.spare[6]: 0 ; 0x088: 0x00000000
kfffdb.spare[7]: 0 ; 0x08c: 0x00000000
kfffdb.spare[8]: 0 ; 0x090: 0x00000000
kfffdb.spare[9]: 0 ; 0x094: 0x00000000
kfffdb.spare[10]: 0 ; 0x098: 0x00000000
kfffdb.spare[11]: 0 ; 0x09c: 0x00000000
kfffdb.usm: ; 0x0a0: length=0
kfffde[0].xptr.au: 53 ; 0x4a0: 0x00000035
kfffde[0].xptr.disk: 1 ; 0x4a4: 0x0001
kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk: 30 ; 0x4a7: 0x1e
kfffde[1].xptr.au: 53 ; 0x4a8: 0x00000035
kfffde[1].xptr.disk: 0 ; 0x4ac: 0x0000
kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk: 31 ; 0x4af: 0x1f
kfffde[2].xptr.au: 4294967294 ; 0x4b0: 0xfffffffe
.........
---block 1
[ora11g@11gR2test ~]$ kfed read /dev/sdb aun=53 blkn=1 | more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 22 ; 0x002: KFBTYP_VOLUMEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 1 ; 0x004: T=0 NUMB=0x1
kfbh.block.obj: 7 ; 0x008: TYPE=0x0 NUMB=0x7
kfbh.check: 4259734440 ; 0x00c: 0xfde663a8
kfbh.fcn.base: 864 ; 0x010: 0x00000360
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 2 ; 0x00c: 0x00000002
kffdnd.overfl.incarn: 1 ; 0x010: A=1 NUMM=0x0
kffdnd.parent.number: 4294967295 ; 0x014: 0xffffffff
kffdnd.parent.incarn: 0 ; 0x018: A=0 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
kfvvde.entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfvvde.entry.hash: 0 ; 0x028: 0x00000000
kfvvde.entry.refer.number: 4294967295 ; 0x02c: 0xffffffff
kfvvde.entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
kfvvde.volnm: ACFS_V1 ; 0x034: length=7
kfvvde.usage: ACFS ; 0x054: length=4
kfvvde.dgname: ; 0x074: length=0
kfvvde.clname: ; 0x094: length=0
kfvvde.mountpath: /acfs_test ; 0x0b4: length=10
kfvvde.drlinit: 1 ; 0x4b5: 0x01
kfvvde.pad1: 0 ; 0x4b6: 0x0000
kfvvde.volfnum.number: 256 ; 0x4b8: 0x00000100
kfvvde.volfnum.incarn: 808033913 ; 0x4bc: 0x30299e79
kfvvde.drlfnum.number: 257 ; 0x4c0: 0x00000101
kfvvde.drlfnum.incarn: 808033913 ; 0x4c4: 0x30299e79
kfvvde.volnum: 1 ; 0x4c8: 0x0001
kfvvde.avddgnum: 41 ; 0x4ca: 0x0029
kfvvde.extentsz: 64 ; 0x4cc: 0x00000040
kfvvde.volstate: 2 ; 0x4d0: D=0 C=1 R=0
kfvvde.pad[0]: 0 ; 0x4d1: 0x00
..........
---block 2
[ora11g@11gR2test ~]$ kfed read /dev/sdb aun=53 blkn=2| more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 22 ; 0x002: KFBTYP_VOLUMEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2 ; 0x004: T=0 NUMB=0x2
kfbh.block.obj: 7 ; 0x008: TYPE=0x0 NUMB=0x7
kfbh.check: 1380684830 ; 0x00c: 0x524b941e
kfbh.fcn.base: 836 ; 0x010: 0x00000344
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 4294967295 ; 0x00c: 0xffffffff
kffdnd.overfl.incarn: 0 ; 0x010: A=0 NUMM=0x0
kffdnd.parent.number: 4294967295 ; 0x014: 0xffffffff
kffdnd.parent.incarn: 0 ; 0x018: A=0 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
kfvvde.entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfvvde.entry.hash: 0 ; 0x028: 0x00000000
kfvvde.entry.refer.number: 4294967295 ; 0x02c: 0xffffffff
kfvvde.entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
kfvvde.volnm: ACFS_V2 ; 0x034: length=7
kfvvde.usage: ; 0x054: length=0
kfvvde.dgname: ; 0x074: length=0
kfvvde.clname: ; 0x094: length=0
kfvvde.mountpath: ; 0x0b4: length=0
kfvvde.drlinit: 0 ; 0x4b5: 0x00
kfvvde.pad1: 0 ; 0x4b6: 0x0000
kfvvde.volfnum.number: 258 ; 0x4b8: 0x00000102
kfvvde.volfnum.incarn: 808034057 ; 0x4bc: 0x30299f09
kfvvde.drlfnum.number: 259 ; 0x4c0: 0x00000103
kfvvde.drlfnum.incarn: 808034057 ; 0x4c4: 0x30299f09
kfvvde.volnum: 2 ; 0x4c8: 0x0002
kfvvde.avddgnum: 41 ; 0x4ca: 0x0029
kfvvde.extentsz: 64 ; 0x4cc: 0x00000040
kfvvde.volstate: 2 ; 0x4d0: D=0 C=1 R=0
kfvvde.pad[0]: 0 ; 0x4d1: 0x00
kfvvde.pad[1]: 0 ; 0x4d2: 0x00
kfvvde.pad[2]: 0 ; 0x4d3: 0x00
kfvvde.pad[3]: 0 ; 0x4d4: 0x00
.........
由于我这里只有2个advm 卷,所以kfed读取前面2个block 就行了,第1个block对应第一个advm卷,第2个block对应第2个
advm卷,以此类推. 正因为我这里只有2个,所以当读取第3个block时,会发现数据都是空的,这是正常的,如下:
[ora11g@11gR2test ~]$ kfed read /dev/sdb aun=53 blkn=3| more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 22 ; 0x002: KFBTYP_VOLUMEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 3 ; 0x004: T=0 NUMB=0x3
kfbh.block.obj: 7 ; 0x008: TYPE=0x0 NUMB=0x7
kfbh.check: 18252298 ; 0x00c: 0x0116820a
kfbh.fcn.base: 11 ; 0x010: 0x0000000b
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kffdnd.bnode.incarn: 0 ; 0x000: A=0 NUMM=0x0
kffdnd.bnode.frlist.number: 4 ; 0x004: 0x00000004
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 0 ; 0x00c: 0x00000000
kffdnd.overfl.incarn: 0 ; 0x010: A=0 NUMM=0x0
kffdnd.parent.number: 0 ; 0x014: 0x00000000
kffdnd.parent.incarn: 0 ; 0x018: A=0 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn: 0 ; 0x020: A=0 NUMM=0x0
kfvvde.entry.incarn: 0 ; 0x024: A=0 NUMM=0x0
kfvvde.entry.hash: 0 ; 0x028: 0x00000000
kfvvde.entry.refer.number: 0 ; 0x02c: 0x00000000
kfvvde.entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
kfvvde.volnm: ; 0x034: length=0
kfvvde.usage: ; 0x054: length=0
kfvvde.dgname: ; 0x074: length=0
kfvvde.clname: ; 0x094: length=0
kfvvde.mountpath: ; 0x0b4: length=0
kfvvde.drlinit: 0 ; 0x4b5: 0x00
kfvvde.pad1: 0 ; 0x4b6: 0x0000
kfvvde.volfnum.number: 0 ; 0x4b8: 0x00000000
kfvvde.volfnum.incarn: 0 ; 0x4bc: 0x00000000
kfvvde.drlfnum.number: 0 ; 0x4c0: 0x00000000
kfvvde.drlfnum.incarn: 0 ; 0x4c4: 0x00000000
kfvvde.volnum: 0 ; 0x4c8: 0x0000
kfvvde.avddgnum: 0 ; 0x4ca: 0x0000
kfvvde.extentsz: 0 ; 0x4cc: 0x00000000
kfvvde.volstate: 0 ; 0x4d0: D=0 C=0 R=0
kfvvde.pad[0]: 0 ; 0x4d1: 0x00
kfvvde.pad[1]: 0 ; 0x4d2: 0x00
..........
下面我们进入正题,解析下advm的结构,从上面kfed的结果,我们可以看到其实也就分为3部分:
1)kfbh ,头部信息 这部分内容不多说了,前面的文章中都有描述,差不多。
kfbh.type 指数据类型
kfbh.block.obj 指该元数据的asm file number,advm是file 7,所以这里看到的是7.
kfbh.block.blk 该数据所在的au block号
2)kffdnd,从上面输出的信息,我们不难猜测,这部分信息其实就是用来定位和描述block在目录树中的具体位置的。
跟前面描述disk directory的kffdnd结构是一样的,所以这里也不多说。
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0 ----分配信息,包括block的分支号和指向next freelist block的指针
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 2 ; 0x00c: 0x00000002 ---overfl,表示指向同层级的下一个block
kffdnd.overfl.incarn: 1 ; 0x010: A=1 NUMM=0x0
kffdnd.parent.number: 4294967295 ; 0x014: 0xffffffff
kffdnd.parent.incarn: 0 ; 0x018: A=0 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000 ---表示指向上一层的block
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
3)kfvvde,这部分内容是asm advm元数据定义内容。
---前面部分是entry信息,这部分内容无关紧要
kfvvde.entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfvvde.entry.hash: 0 ; 0x028: 0x00000000
kfvvde.entry.refer.number: 4294967295 ; 0x02c: 0xffffffff
kfvvde.entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
---下面才是我们关注的焦点:
kfvvde.volnm: ACFS_V1 ; 0x034: length=7 ---表示asm advm 卷名称
kfvvde.usage: ACFS ; 0x054: length=4 ---advm的type类型,这里是使用的acfs
kfvvde.dgname: ; 0x074: length=0
kfvvde.clname: ; 0x094: length=0
kfvvde.mountpath: /acfs_test ; 0x0b4: length=10 ---这里表示acfs mount的路径
kfvvde.drlinit: 1 ; 0x4b5: 0x01
kfvvde.pad1: 0 ; 0x4b6: 0x0000
kfvvde.volfnum.number: 256 ; 0x4b8: 0x00000100 ---这里表示volume file number.
kfvvde.volfnum.incarn: 808033913 ; 0x4bc: 0x30299e79
kfvvde.drlfnum.number: 257 ; 0x4c0: 0x00000101 ---这里表示volume dirty region logging 信息对应的file number
kfvvde.drlfnum.incarn: 808033913 ; 0x4c4: 0x30299e79
kfvvde.volnum: 1 ; 0x4c8: 0x0001 ---这里表示对应的卷组number号,从1开始。
kfvvde.avddgnum: 41 ; 0x4ca: 0x0029 ---这里不知道是什么意思 ?
kfvvde.extentsz: 64 ; 0x4cc: 0x00000040 ---这里表示advm的extent大小,有点类似database中的extent概念。
这里stripe是4,而其分配unit是256m,所以这里是64.
kfvvde.volstate: 2 ; 0x4d0: D=0 C=1 R=0 --这里表示advm卷组状态。2应该是表示可用。
kfvvde.pad[0]: 0 ; 0x4d1: 0x00
上面的kfvvde.drlfnum.number,表示dirty region logg,我没有找到asm相关的资料,通过google搜索发现Veritas Volume Manager有类似的机制。
Dirty region logging (DRL) is a fault recovery mechanism used in Veritas Volume Manager. If DRL is enabled, speeds recovery of
mirrored volumes after a system crash. DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume.
DRL uses this information to recover only those portions of the volume that need to be recovered.
If DRL is not used and a system failure occurs, all mirrors of the volumes must be restored to a consistent state. Restoration is
done by copying the full contents of the volume between its mirrors. This process can be lengthy and I/O intensive. It may also be
necessary to recover the areas of volumes that are already consistent.
Dirty Region Logs
DRL logically divides a volume into a set of consecutive regions, and maintains a log on disk where each region is represented by
a status bit. This log records regions of a volume for which writes are pending. Before data is written to a region, DRL synchronously
marks the corresponding status bit in the log as dirty. To enhance performance, the log bit remains set to dirty until the region
becomes the least recently accessed for writes. This allows writes to the same region to be written immediately to disk if the
region’s log bit is set to dirty.
On restarting a system after a crash, VxVM recovers only those regions of the volume that are marked as dirty in the dirty region log.
从这段文字的描述来看,我猜测oracle asm advm的DRL,跟veritas 卷管理的drl机制应该是一样或至少是类似的机制。 dirty region log,顾名思义,
也就是脏数据日志记录区域,能够加快系统crash后的恢复速度,可以防止io写错误等。
事实上,不仅仅是veritas,其他的厂家都有着类似的技术,我在Sun Cluster 2.2 Cluster Volume Manager Guide 中也看到了如下类似的描述:
Dirty Region Logging (DRL) is an optional property of a volume that provides speedy recovery of mirrored volumes after a system failure.
Dirty Region Logging is supported in cluster-shareable disk groups. This section provides a brief overview of DRL and outlines differences
between SSVM DRL and the CVM implementation of DRL. For more information on DRL, refer to Chapter 1 in the applicable Sun StorEdge
Volume Manager 2.6 System Administrator's Guide.
DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume and uses this information to recover only
the portions of the volume that need to be recovered. DRL logically divides a volume into a set of consecutive regions and maintains
a dirty region log that contains a status bit representing each region of the volume. Log subdisks store the dirty region log of a
volume that has DRL enabled. A volume with DRL has at least one log subdisk, that is associated with one of the volume's plexes.
大家可以去参考oracle的官方文档:http://docs.oracle.com/cd/E19957-01/806-2329/ch2admin-39382/index.html
最后简单总结一下:
1) 11gR2引入了acfs文件系统,同时引入了advm卷组管理,跟第三方软件的卷组管理机制类似,比如veritas,solaris cluster。
2) 11gR2的asm相比之前的版本,更为强悍,可以存储各种文件,不仅仅是datafiles、甚至还可以存储external files、text files等。
3) acfs管理并不复杂,同时advm的结构也比较简单,总共分为3部分,第一部分是头部信息,第2部分是指针信息,第3部分是advm
卷组的一些定义信息。
4) 在$GRID_HOME/bin下面有部分acfs的工具,是grid软件自带的,可以用于进行acfs的管理和监控.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




