Oracle ASM 再平衡

IT界数据库架构师的漂泊人生 2020-12-14

3402

asm 在什么情况下进行rebalance操作.？

实际上,rebalance主要是在diskgroup中disk member发现变化时,比如add/drop/resize disk操作.

那么,asm diskgroup的rebalance操作,到底需要做什么呢？或者说包含了哪些步骤？

不同的oracle 版本，其实rebalance操作是有所差异的,在10g版本中,asm rebalance主要包含如下2个操作：

—planning
—extent relocatyion

然而在oracle 11.1版本中，引入了asm fast rebalance特性,大概是是说可以将asm实例启动到Restricted mode然后去完成rebalance操作。

在11.2中又有所变化，引入了一个compact操作，所谓compact操作，其实就是数据重组。也就是说在11.2版本中,rebalance操作
应该包含如下几个步骤了：

1) planning
2) extent relocation
3) compacting

这里针对这几个步骤简单描述一下：

planning： 也就是说oracle会自己计算，绝对需要将那些extent进行relocation以及需要move到什么地方去，应该也是用的hash算法.

extent relocation:这个其实是根据前面planning的结果，将数据按照extent为单位进行move,移动到其他的disk上，均匀分布。
我们称呼这个操作为extent relocation。一般来讲，这个操作是非常耗时的，也就是说整个rebalance操作中基本
上时间大多的消耗在relocation这一步。

在进行extent relocation的阶段，是可以进行并行操作的，该操作是通过我们所熟知的一个参数asm_power_limit来进行控制。
该参数在11.2.0.2以下版本中,其取值范围是0~11. 在11.2.0.2以及以上版本取值范围已经扩展为0~1024了.

该参数控制rbal的slave process个数，换句话将，通常参数越大rebalance操作也就越快，当然这样也要看系统硬件配置.

另外有一点需要注意的是,rbal的slave process的可以动态调整的，例如：

alter diskgroup diskgroup_name rebalance power 5;需要注意的是，哪怕是你alter diskgroup add disk命令已经
发出了,也可以使用上面的方式来动态调整rebalance power值.

当rebalance power值大于1后,oracle 会启动多个rbal salve process,类似rba,rba1这样的命名.

这部分的消耗时间可以通过v$asm_operation.est_minutes来进行估算，但是这个值不一定准确，受cpu,io等因素的影响。

compacting： 这个操作是11gR2引入的一个未公布的特性,其目的是在前面extent relocation完成之后,oracle将diskgroup中都的每个disk
中的数据进行重组。什么是重组？这里的重组其实是disk级别，不再是整个diskgroup级别. 其目的是将数据尽可能的挪到
disk的外圈，这样可以加快访问，为什么可以加快访问？这样可以降低disk 寻道时间.

关于该特性,11gR2版本中引入了1个参数来进行控制：
_disable_rebalance_compact

我们可要通过动态调整该参数来关闭这个特性,当rebalance进行到这个步骤时,查询v$asm_operation.est_minutes会显示为0.

这也就是为什么我们上次看到est_minutes为0了,rebalance 操作却还没有完成，还进行了1.5小时.

那么最后大家可能比较关心的是，如何加快asm rebalance的速度，大概有如下几种方法：

1) 调大asm_power_limit参数
2) 将参数_disable_rebalance_compact设置为true,可动态调整
3) 设置diskgroup的attributes属性：_REBALANCE_COMPACT=false
4) 将参数_asm_imbalance_tolerance调的更低(11gR2默认为3%)
4) 调整参数_disable_rebalance_space_check,关闭compact过程中的space use检查.
5) 调大_asm_rebalance_plan_size参数,该参数控制maximum rebalance work unit,通过调大该参数
应该可以降低extent relocation的次数,但是这个也受限于系统的io能力.

从diskgroup中添加或删除disk时候，将触发RBAL进程创建rebalance计划，并计算执行Rebalance所需要时间和工作要求，然后发消息给ASM Reblance (ARBx)进程处理该请求。ARBx进程的数量由参数ASM_POWER_LIMIT决定。COD (Continuting Operation Directory)用于记录rebalances情况。如果rebalance失败，在重启instance时候，将从COD读取记录，重新启动rebalance。ARBx进程对每个extents进行locked, relocated和unlocked操作，执行过程中可以参考v$asm_operation视图。
ASM_POWER_LIMIT范围是0~11，0：表示不进行rebalance，值越大， rebalance速度越快。
       在进行rebalance时候要注意以下几点：
      1、每个disk的大小必须是相同，如果存在一个小盘，因为rebalance将对每个盘的分配相同比例的空间，可能造成rebalance时候空间不足。
      2、rebalance仅仅在diskgroup发生改变时候才进行的，并不是定时执行
      3、如果磁盘大小一样，仍然没有进行rebalance，需要查看asm_power_limit
      4、如果rebalance执行过程中，server宕机，重启后会自动进行rebalance
      5、影响rebalance速度因素有很多，最重要是I/O子系统
6、如果执行过程中，空闲空间不足，造成rebalance失败，将出现ORA-15041错误，需要再添加disk，。
      7、如果需要频繁添加disk，每一次都可能造成数据的频繁移动，为提高效率，最好批量添加。
     从老的磁盘阵列切换和迁移disk到新的磁盘阵列命令如下：

alter diskgroup add disk '/dev/xxx/xxx' drop disk [disk_name] rebalance power 8;

涉及平衡参数官方解释：

Use this clause to manually rebalance the disk group. Automatic Storage Management redistributes datafiles evenly across all drives. This clause is rarely necessary, because Automatic Storage Management allocates files evenly and automatically rebalances diskgroups when the storage configuration changes. However, it is useful if you want to use the POWER
clause to control the speed of what would otherwise be an automatic rebalance operation.

POWER In the POWER
clause, specify a value from 0 to 11, where 0 stops the rebalance operation and 11 permits Automatic Storage Management to execute the rebalance as fast as possible. The value you specify in the POWER
clause defaults to the value of the ASM_POWER_LIMIT
initialization parameter.

If you omit the POWER
clause, then Automatic Storage Management executes both automatic and specified rebalance operations at the power determined by the value of the ASM_POWER_LIMIT
initialization parameter.

WAIT | NOWAIT Use this clause to specify when in the course of the rebalance operation control should be returned to the user.

Specify WAIT
to allow a script that adds or removes disks to wait for the disk group to be rebalanced before returning control to the user. You can explicitly terminate a rebalance operation running in WAIT
mode, although doing so does not undo any completed disk add or drop operation in the same statement.
Specify NOWAIT
if you want control returned to the user immediately after the statement is issued. This is the default.

You can monitor the progress of the rebalance operation by querying the V$ASM_OPERATION
dynamic performance view.

如果本号文对你有意义可以下我打赏，金额多少无所谓！