

集群环境:srvctl status database -d ywzd2 -vInstance ywzd21 is running on node ywzd-db03. Instance status: Open.Instance ywzd22 is running on node ywzd-db04. Instance status: Open.单机环境:sqlplus as sysdbaSQL> select startup_time,status from v$instance;STARTUP_TIME STATUS----------------------- ------------06-DEC-2023 12:29:59 OPEN
2、检查数据库故障期间的alert日志,可以看到实例遇到ORA-07445[opiaba()+639]和ORA-00600[17147]内部错误,接着核心PMON 进程崩溃,导致了 DB 实例重启。
第一段日志出现ORA-07445[opiaba()+639]错误Wed Dec 06 12:29:21 2023Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x185806F, opiaba()+639] [flags: 0x0, count: 1]Errors in file u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_ora_538783.trc (incident=418676):ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x185806F] [SI_KERNEL(general_protection)] []Incident details in: u01/app/oracle/diag/rdbms/ywzd2/ywzd21/incident/incdir_418676/ywzd21_ora_538783_i418676.trc第二段日志出现ORA-00600[17147]错误Wed Dec 06 12:29:47 2023Errors in file u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_pmon_512337.trc (incident=411292):ORA-00600: internal error code, arguments: [17147], [0x14DAE09578], [], [], [], [], [], [], [], [], [], []Incident details in: u01/app/oracle/diag/rdbms/ywzd2/ywzd21/incident/incdir_411292/ywzd21_pmon_512337_i411292.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Wed Dec 06 12:29:48 2023Dumping diagnostic data in directory=[cdmp_20231206122948], requested by (instance=1, osid=512337 (PMON)), summary=[incident=411292].Errors in file u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_pmon_512337.trc:ORA-00600: internal error code, arguments: [17147], [0x14DAE09578], [], [], [], [], [], [], [], [], [], []PMON (ospid: 512337): terminating the instance due to error 472System state dump requested by (instance=1, osid=512337 (PMON)), summary=[abnormal instance termination].System State dumped to trace file u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_diag_512357_20231206122952.trcInstance terminated by PMON, pid = 512337第三段日志集群实例自动重启Wed Dec 06 12:29:59 2023Starting ORACLE instance (normal)批量检查命令:grep -ib1E 'opiaba|\[17147|terminating the instance due to error 472' /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/alert_ywzd21.log
3、进一步检查用户进程的跟踪文件,该进程正在执行一条非常长的、使用了超过 10 万个绑定变量的SQL语句。该PLSQL语句块包含大量的UPDATE语句,并且绑定变量个数超过65535个,触发ora-7445 [opiaba]错误。
========= Dump for incident 328879 (ORA 600 [17147]) ========----- Beginning of Customized Incident Dump(s) -----********** Internal heap ERROR 17147 addr=0x12b0accc78 *********...dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- Current SQL Statement for this session (sql_id=btng01wnhnvnw) -----beginupdate 业务表1set 字段1 = :1 ,字段2 = :2 ,字段3 = :3 ,字段4 = :4 ,字段5 = :5 ,字段6 = :6 ,字段7 = '1'where 字段8 = :7and 字段9 = :8and 字段10 = :9 ;....(sql 太长,部分省略)...update 业务表Nset 字段1 = :100381 ,字段2 = :100382 ,字段3 = :100383where 字段4 = :100384and 字段5 = :100385and 字段6 = :100386 ;end;
4、进一步检查PMON进程的跟踪文件,堆栈信息显示数据库遇到“[03]: kgherror [KGH]”错误。
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- SQL Statement (None) -----Current SQL information unavailable - no cursor....----- Incident Context Dump -----Address: 0x7fff22dbad78Incident ID: 327263Problem Key: ORA 600 [17147]Error: ORA-600 [17147] [0x12B0ACCC78] [] [] [] [] [] [] [] [] [] [][00]: dbgexExplicitEndInc [diag_dde][01]: dbgeEndDDEInvocationImpl [diag_dde][02]: dbgeEndDDEInvocation [diag_dde][03]: kgherror [KGH]<-- Signaling[04]: kghfrmrg [KGH][05]: kghfre [KGH][06]: kghsfx [KGH][07]: kghunfhp [KGH][08]: kghfrh_internal [KGH][09]: kghfrh [KGH][10]: kksFreeHeap [cursor][11]: kksLockDelete [cursor][12]: kksLockOperation [cursor][13]: kgllkdl [LIBCACHE][14]: kgllkds [LIBCACHE][15]: kssxdl [VOS][16]: kssdel [VOS][17]: kssdch [VOS][18]: ksuxds [ksu][19]: kssxdl [VOS][20]: kssdel [VOS][21]: kssdch [VOS][22]: ksudlp [ksu][23]: kssxdl [VOS][24]: kssdel [VOS][25]: ksuxdl [ksu][26]: ksuxda_del_proc [ksu][27]: ksucln_dpc_cleanup [ksu][28]: ksucln_dpc_dfs [ksu][29]: ksucln_dpc_main [ksu][30]: ksucln_dpc [ksu][31]: ksucln [ksu][32]: ksbrdp [background_proc][33]: opirip []
5、对这种批量更新的大量使用绑定变量的长SQL/PLSQL,DB 系统在设计时未预料到此种情形,从而触发Oracle BUG导致数据库重启。结合官方文档1466343.1找到两种解决方式,分别为应用补丁12578873或修改应用程序减少绑定变量。

select sql_id, count(*) bind_countfrom v$sql_bind_capture twhere t.CHILD_NUMBER = 0group by t.SQL_IDhaving count(*) > 1000order by count(*);
-------------堆栈调用函数-------------------kgdsdstskdstdst operating system dependent kernel dump a stack traceksedst1 kernel service (VOS) error debug dump the call stack stack trace 1 helperksedst kernel service (VOS) error debug dump the call stackdbkedDefDump debug and diagnostics RDBMS diagnostic data extractor Dumper do DDE default dumpksedmp kernel service (VOS) error debug dump process statessexhd operating system dependent system dump routine__sighandleropiabaopiprs oracle program interface parsekksParseChildCursor kernel compile shared objects (cursor) parse child cursorrpiswu2 recursive program interface switch user in recursive sqlkksLoadChild kernel compile shared objects (cursor) load childkxsGetRuntimeLock kernel execution shared cursor get runtime lockkksfbc kernel compile shared objects (cursor) find bound cursorkkspsc0 kernel compile shared objects (cursor) parse shared cursorkksParseCursor kernel compile shared objects (cursor) parse cursoropiosq0 oracle program interface prepare to parse a sql command 0kpooprx kernel programmatic interface oracle open, parse, and optionally executekpoal8 kernel programmatic interface oracle V8 bundled executionopiodr oracle program interface oracle code request driver, route the current requestttcpip two task common pipe read/writeopitsk oracle program interface two task function dispatcheropiino oracle program interface initialize opiopiodropidrv oracle program interface route current request driver, entry side into two task interfacesou2o main oracle executable entry pointopimai_real oracle program interface main real oracle start pointssthrdmain operating system dependent system main for every thread in a threaded oraclemain general c starting function

1、创建测试表create table test01 as select * from dba_tables;2、修改共享游标参数值为SIMILARalter session set CURSOR_SHARING=SIMILAR;3、第一次执行语句1select table_name from test01 where table_name='OBJ$';4、查看语句1的SQL_ID和执行次数set linesize 1000col sql_text for a80select sql_id,EXECUTIONS,LOADS,sql_text from v$sqlwhere sql_text like 'select table_name from test01 where table_name%';SQL_ID EXECUTIONS LOADS SQL_TEXT------------- ---------- ---------- ----------------------------------------------------------------------------------------------------9dhppyxtr9qck 1 1 select table_name from test01 where table_name=:"SYS_B_0"5、通过SQL_ID获取语句1的绑定变量值col value_string for a30select child_address,to_char(LAST_CAPTURED,'dd-mm-yyyy hh24:mi:ss'),VALUE_STRINGfrom v$sql_bind_capture where sql_id='9dhppyxtr9qck';CHILD_ADDRESS TO_CHAR(LAST_CAPTUR VALUE_STRING---------------- ------------------- ------------------------------0000000068BD8220 17-03-2024 20:32:52 OBJ$6、查看参数_cursor_bind_capture_interval的值为900秒,默认15分钟set linesize 1000col name for a50col value for a30select nam.ksppinm NAME,val.KSPPSTVL VALUEfrom x$ksppi nam, x$ksppsv valwhere nam.indx = val.indxand nam.ksppinm like '%_cursor_bind_capture_interval%'order by 1;7、修改动态参数为10秒捕获一次alter system set "_cursor_bind_capture_interval"=10;8、再次执行语句1,参数值不一样,传入参数须为4个字节,避免硬解析select table_name from test01 where table_name='TAB$';9、再次查询绑定变值,已从COL$变为TAB$CHILD_ADDRESS TO_CHAR(LAST_CAPTUR VALUE_STRING---------------- ------------------- ------------------------------0000000068BD8220 17-03-2024 20:35:56 TAB$

Bug 12578873 - ORA-7445 [opiaba] when using ore than 65535 bind variables (文档 ID 12578873.8)
Instance crashed after ORA-7445 [opiaba] and ORA-600 [17147] (Doc ID 1466343.1)
ORA-7445 [opiaba] (Doc ID 737378.1)
Not Every Bind Values Captured in v$sql_bind_capture (Doc ID 1370816.1)

可以加我的微信,交个朋友或讨论数据库解决方案,请备注”姓名单位“,谢谢!

文章转载自数据库运维之道,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




