暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

[译文] Oracle 21c:处理 FastStartFailoverActionOnPreCalloutFailure

原创 通讯员 2022-04-24
634

在之前的博客中,我谈到了 FSFO 标注脚本,这是 Oracle 21c 与代理的一个新特性。

此功能将允许在快速启动故障转移之前和之后执行一些任务。默认情况下,如果预脚本失败,则不会发生自动故障转移。也许这不是我们想要的,有时即使预任务没有成功执行,我们也会想要继续自动故障转移。

Oracle 对此有一个参数,它是 FastStartFailoverActionOnPreCalloutFailure。该参数有两个值:
STOP:如果没有 .suc 文件,FSFO 不会发生(前置任务失败)
CONTINUE:即使前置任务失败,FSFO 也会继续

在这篇博客中,我用这个参数做了一些测试并展示了结果。下面是我使用的配置,和我之前博客中使用的一样


DGMGRL> show configuration
 
Configuration - db21
 
  Protection Mode: MaxPerformance
  Members:
  DB21_SITE1 - Primary database
    DB21_SITE2 - (*) Physical standby database
 
Fast-Start Failover: Enabled in Potential Data Loss Mode
 
Configuration Status:
SUCCESS   (status updated 17 seconds ago)
 
DGMGRL>


FastStartFailoverActionOnPreCalloutFailure=STOP

第一次测试是在参数设置为 STOP 的情况下完成的。在我的标注脚本下方

值为 STOP 的 fsfocallout.ora 脚本


oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#
FastStartFailoverPreCallout=fsfo_precallout
FastStartFailoverPreCalloutTimeout=25
FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc
FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err
FastStartFailoverActionOnPreCalloutFailure=STOP
FastStartFailoverPostCallout=fsfo_postcallout
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]


内部有错误的 fsfo_precallout 脚本


oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfo_precallout
#! /bin/bash
if [ 1 -lt 100 ]
 then
   touch /temp/test
   echo "starting fun observer" > /temp/test
   echo "starting fun observer" > /temp/test
   touch  /u01/app/oracle/aadmin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc
else
  touch /u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err
fi
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]


如您所见,我在脚本中犯了一些错误(/temp 代替 /tmp,aadmin 代替 admin),导致预任务无法成功完成。

现在让我们模拟一个故障转移来验证预期的行为


SQL> select db_unique_name,open_mode from v$database;
 
DB_UNIQUE_NAME                 OPEN_MODE
------------------------------ --------------------
db21_site1                     READ WRITE
 
SQL> shut abort
ORACLE instance shut down.
SQL>


在主节点关闭中止后,我们可以在观察者日志文件中看到,由于参数 FastStartFailoverActionOnPreCalloutFailure=STOP 的值,自动故障转移没有发生

[W000 2022-04-13T11:07:07.786+02:00] Fast-Start Failover is not enabled or can't be checked. Retry after 15 seconds.
[W000 2022-04-13T11:07:22.792+02:00] Standby database has changed to DB21_SITE2.
[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary.
[W000 2022-04-13T11:07:22.794+02:00] Try to connect to the primary DB21_SITE1.
[W000 2022-04-13T11:07:24.028+02:00] Connection to the primary restored!
[W000 2022-04-13T11:07:24.034+02:00] The standby DB21_SITE2 is ready to be a FSFO target
[W000 2022-04-13T11:07:26.036+02:00] Disconnecting from database DB21_SITE1.
[W000 2022-04-13T11:24:02.493+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:02.494+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds
[W000 2022-04-13T11:24:03.496+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:05.797+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:06.799+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:19.665+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:24:19.665+02:00] Fast-Start Failover threshold has expired.
[W000 2022-04-13T11:24:19.666+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'
[W000 2022-04-13T11:24:19.666+02:00] Try to connect to the standby.
[W000 2022-04-13T11:24:19.666+02:00] Check if the standby is ready for failover.
[W000 2022-04-13T11:24:19.685+02:00] Doing pre-FSFO callout.
[W000 2022-04-13T11:24:23.746+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:29.821+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:36.020+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:41.040+02:00] Failed to ping the primary.
[W000 2022-04-13T11:24:41.040+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.
[W000 2022-04-13T11:24:41.040+02:00] Will not continue Fast-Start Failover since pre-FSFO callout failure action is STOP
[W000 2022-04-13T11:24:41.040+02:00] Returning to primary ping state.
[W000 2022-04-13T11:24:41.040+02:00] Try to connect to the primary.
[W000 2022-04-13T11:24:43.274+02:00] Primary database cannot be reached.


FastStartFailoverActionOnPreCalloutFailure=CONTINUE

如果有任何理由我希望 fsfo 在预任务失败的情况下发生事件,我必须明确地将值设置为 CONTINUE

让我们做同样的测试,但参数是 CONTINUE


oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)] cat fsfocallout.ora | grep -v ^#
FastStartFailoverPreCallout=fsfo_precallout
FastStartFailoverPreCalloutTimeout=25
FastStartFailoverPreCalloutSucFileName=fsfo_precallout.suc
FastStartFailoverPreCalloutErrorFileName=fsfo_precallout.err
FastStartFailoverActionOnPreCalloutFailure=CONTINUE
FastStartFailoverPostCallout=fsfo_postcallout
oracle@oraadserver3:/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/ [DB21 (CDB$ROOT)]


在观察者日志文件中,我们可以看到自动故障转移正如预期的那样发生,因为参数的值为 CONTINUE。


[W000 2022-04-13T11:36:15.409+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:15.410+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds
[W000 2022-04-13T11:36:16.410+02:00] Try to connect to the primary.
[W000 2022-04-13T11:36:18.949+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:19.950+02:00] Try to connect to the primary.
[W000 2022-04-13T11:36:30.063+02:00] Primary database cannot be reached.
[W000 2022-04-13T11:36:30.063+02:00] Fast-Start Failover threshold has expired.
[W000 2022-04-13T11:36:30.072+02:00] Succeeded to parse FSFO callout config file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfocallout.ora'
[W000 2022-04-13T11:36:30.072+02:00] Try to connect to the standby.
[W000 2022-04-13T11:36:30.072+02:00] Check if the standby is ready for failover.
[W000 2022-04-13T11:36:30.087+02:00] Doing pre-FSFO callout.
[W000 2022-04-13T11:36:34.095+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:40.146+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:46.255+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:52.311+02:00] Failed to ping the primary.
[W000 2022-04-13T11:36:55.352+02:00] Failed to detect the pre-FSFO callout suc file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.suc', or error file '/u01/app/oracle/admin/prod20/broker_files/config_db21/callout/fsfo_precallout.err', after 25 seconds passed.
[W000 2022-04-13T11:36:55.352+02:00] Will continue Fast-Start Failover since pre-FSFO callout failure action is CONTINUE
[S006 2022-04-13T11:36:55.352+02:00] Fast-Start Failover started...
 
2022-04-13T11:36:55.352+02:00
Initiating Fast-Start Failover to database "DB21_SITE2"...
[S006 2022-04-13T11:36:55.352+02:00] Initiating Fast-start Failover.
2022-04-13T11:36:55.362+02:00
Performing failover NOW, please wait...
 
2022-04-13T11:37:18.566+02:00
Failover succeeded, new primary is "DB21_SITE2".
 
2022-04-13T11:37:18.566+02:00
Failover processing complete, broker ready.
2022-04-13T11:37:18.566+02:00
[S006 2022-04-13T11:37:18.566+02:00] Fast-Start Failover finished...
[W000 2022-04-13T11:37:18.566+02:00] Failover succeeded. Restart pinging.
[W000 2022-04-13T11:37:18.582+02:00] Primary database has changed to DB21_SITE2.


结论

我们只能说,在处理 fsfo 标注脚本时,请确保根据您的意愿正确设置参数 FastStartFailoverActionOnPreCalloutFailure。


文章来源:Mouhamadou Diaw

https://blog.dbi-services.com/oracle-21c-dealing-with-faststartfailoveractiononprecalloutfailure/

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论