关于CF锁事件，你了解了吗？

原创 subverter 2024-02-21

122

Metalink对该等待事件的分析

这问题一直没有遇到过，只能求助于metalink，详细的说明如下：

1、出现问题的版本

ORACLE DATABASE - ENTERPRISE EDITION - VERSION 9.2.0.1 TO 11.2.0.3 [RELEASE 9.2 TO 11.2]（当前数据库的版本为11.2.0.3）

2、症状

在awr等待报告中的top5等待事件或出现v$session_wait的等待事件；

3、原因

任何需要读取控制文件的动作期间都会产生CF队列，CF锁用于controlfile序列操作和共享部分controlfile读和写。通常CF锁是分配给一个非常简短的时间和时使用:

发生检查点
日志文件的切换
归档online redolog
运行崩溃后的恢复
热备的开始和结束
DML通过nologging选项执行对象时

4、解决问题

找出当前持有CF锁的对象

select l.sid, p.program, p.pid, p.spid, s.username, s.terminal, s.module, s.action, s.event, s.wait_time, s.seconds_in_wait, s.statefrom v$lock l, v$session s, v$process pwhere l.sid = s.sidand s.paddr = p.addrand l.type='CF'and l.lmode >= 5;

查找等待CF锁的对象

select l.sid, p.program, p.pid, p.spid, s.username, s.terminal, s.module, s.action, s.event, s.wait_time, s.seconds_in_wait, s.statefrom v$lock l, v$session s, v$process pwhere l.sid = s.sidand s.paddr = p.addrand l.type='CF'and l.request >= 5

METALINK如下：

It is advisable to run the above queries a few times in a row...

1. If you see the holder is:

background process, typically LGWR, CKPT or ARCn

the holder is holding the enqueue for a longer period of time

Check if the redologs are sized adequately. Typically you want to drive at a log switch every 30 minutes. Also verify checkpointing parameters such as fast_start_mttr_target

2. If you see the holder is:

a user session (so no background process)

the holder is constantly changing

the wait event of the holder is 'control file parallel write'

Then it is most likely that the contention for the CF enqueue is caused by DML on a NOLOGGING object.

When performing DML operations using either NOLOGGING or UNRECOVERABLE option, then oracle records the unrecoverable SCN in the controlfiles. Typically you will see an increase in waits appearing for 'control file parallel write' as well however the session is not blocked for this wait event but rather the session performing the controlfile write will be holding the CF enqueue and the other sessions performing the unrecoverable (nologging) operation will be waiting to get a CF enqueue to update the controlfile with the unrecoverable SCN.

So if you have an object with the NOLOGGING option, it is normal to see CF enqueue contention...

The following operations can make use of no-logging mode:

direct load (SQL*Loader)

direct-load INSERT

CREATE TABLE ... AS SELECT

CREATE INDEX

ALTER TABLE ... MOVE PARTITION

ALTER TABLE ... SPLIT PARTITION

ALTER INDEX ... SPLIT PARTITION

ALTER INDEX ... REBUILD

ALTER INDEX ... REBUILD PARTITION

INSERT, UPDATE, and DELETE on LOBs in NOCACHE NOLOGGING mode stored out of line

3. Check if the archive destination (log_archive_dest_n) are accessible, you may need to involve System/Storage admins.

If you are using NFS filesystem for the archive destinations then make sure there is no issue with nfs as this can lead to log switch hanging and that leads to CF enqueue as the lock holder will be either LGWR or ARCn processes

理解如下：

当holder的对象是后台进程:LGWR、CKPT、ARCn

解决方法：redolog的大小和切换频率，建议每次日志切换的时间间隔着30分钟左右。

当holder的对象是用户session、并经常变化、等待事件"control file parallel write"

解决方法：该等待是正常的数据库等待；

其他：检查归档的路径，由于系统或存储的问题导致的该等待事件；

5、问题的总结

本案例的aw报告中显示数据库每小时产生的归档日志达22G，数据库的online redolog的大小为1G/个，计算下来每个小时需要进行20次的日志切换，平均3分钟执行次。与建议的30分钟一次相差很多。

经过与业务沟通发现当前数据库正在进行数据的抽取工作，导致该等待事件的发生。

最后的解决方法：建议在工作时间避免进行数据的抽取保证在工作期间系统能够正常运行；

可以适当增加online redolog的大小到5G，减低日志的切换频率；

DBA有时候就是有这个好处，当所有人都不知道问题的时候，问题的大小你都可以随便描述（前提是建立在事实的依据下），如果平时树立足够的威信的话，那么很容易让其他的人员配合你的工作，这个时候成就感是很强的。

墨力计划

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

关于CF锁事件，你了解了吗？

评论