Oracle 诊断案例:通过10046事件跟踪和解决12.2 多租户 ORA-00600 908 错误

eygle 2019-08-28
122
1 0
摘要:在 Oracle 数据库的世界里,通过10046事件跟踪解决未知问题,是 DBA 的重要技能之一。掌握了Oracle数据库最为重要的跟踪方法,就可以在遇到问题时,快速定位根源。而找到问题根源,距离解决问题也就不远了——不论这些问题是已知的还是未知的。

在 Oracle 数据库的世界里,通过10046事件跟踪解决未知问题,是 DBA 的重要技能之一。


掌握了Oracle数据库最为重要的跟踪方法,就可以在遇到问题时,快速定位根源。而找到问题根源,距离解决问题也就不远了——不论这些问题是已知的还是未知的。


以下一个案例来自于Oracle Database 12.2的版本,在数据库启动时遇到错误,数据库无法启动,抛出的异常是ORA-00600 908错误,关于这个错误在MOS上也没有说明(在撰文时未有任何已知资料),这属于最早发现的12.2的问题。

[oracle@enmocoredb ~]$ SQLplus / as sysdba

SQL*Plus: Release 12.2.0.0.3 Production on Mon Aug 1 09:59:54 2016

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.


SQL> startup

ORACLE instance started.

Database mounted.

ORA-00603: ORACLE server session terminated by fatal error

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00604: error occurred at recursive SQL level 1

ORA-00600: internal error code, arguments: [908], [], [], [], [], [], [], [], [], [], [], []

ORA-07445: exception encountered: core dump [ksupdbsesinc()+866] [SIGSEGV]

[ADDR:0x13650] [PC:0x6170C62] [Address not mapped to object] []

Process ID: 24446

Session ID: 30 Serial number: 42586


进一步的检查告警日志文件,获取后台记录的更详细信息,在这个案例中后台的异常日志和前台抛出的一致,包括跟踪日志,很难获得更明确的判断线索。

2016-08-01T10:02:56.913024+08:00

Errors in file /u01/app/oracle/diag/rdbms/eygle/eygle/trace/eygle_ora_24446.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-00600: internal error code, arguments: [908], [], [], [], [], [], [], [], [], [], [], []

ORA-07445: exception encountered: core dump [ksupdbsesinc()+866] [SIGSEGV] [ADDR:0x13650] [PC:0x6170C62] [Address not mapped to object] []

Error 604 happened during db open, shutting down database

为了进一步分析这个问题,我们在Open数据库的阶段启用跟踪,以寻找最后出现问题的步骤,这一方法在实践中非常有效。

SQL> startup mount;

ORACLE instance started.

Database mounted.

SQL> alter session set events '10046 trace name context forever,level 12';

Session altered.


SQL> alter database open;

ORA-00600: internal error code, arguments: [908], [], [], [], [], [], [], [],[], [], [], []


现在可以在后台找到这个跟踪文件,通过定位最后出现问题的部分,获得线索。以下一段输出是数据库启动报错前最后执行的一段递归SQL。

=====================

PARSING IN CURSOR #0x7f19251432d8 len=123 dep=1 uid=0 oct=3 lid=0 tim=2423484984490 hv=1601912009 ad='0x6181e130' SQLid='65m6cgpgrqg69'

select /*+ NO_PARALLEL(c) */ c.con_id#  from cdb_service$ c where lower(c.name) = lower(:1)    or lower(c.name) = lower(:2)

END OF STMT

PARSE #0x7f19251432d8:c=0,e=293,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=2423484984490

BINDS #0x7f19251432d8:


 Bind#0

  oacdty=01 mxl=32(04) mxlc=00 mal=00 scl=00 pre=00

  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0

  kxsbbbfp=7f19266bdf70  bln=32  avl=04  flg=05

  value="enmo"

 Bind#1

  oacdty=01 mxl=32(04) mxlc=00 mal=00 scl=00 pre=00

  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0

  kxsbbbfp=7f19266bdf38  bln=32  avl=04  flg=05

  value="enmo"

EXEC #0x7f19251432d8:c=1000,e=594,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2885951592,tim=2423484985147

FETCH #0x7f19251432d8:c=0,e=24,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=2885951592,tim=2423484985192

STAT #0x7f19251432d8 id=1 cnt=1 pid=0 pos=1 obj=434 op='TABLE ACCESS FULL CDB_SERVICE$ (cr=3 pr=0 pw=0 str=1 time=23 us cost=2 size=16 card=1)'

CLOSE #0x7f19251432d8:c=0,e=41,dep=1,type=0,tim=2423484985251

kswscrs: error service is already defined in pdb 3.

       : current pdb id=1


注意最后出现的提示:

kswscrs: error service is already defined in pdb 3,current pdb id=1。


这个提示给了我们一个重要线索,错误出现的原因是服务名已经在PDB 3中被定义和使用,因此出现了冲突。

再来看看最后执行的这个递归SQL,该SQL从cdb_service$表来取得服务名,进行验证。

select /*+ NO_PARALLEL(c) */ c.con_id#  from cdb_service$ c where lower(c.name) = lower(:1)    or lower(c.name) = lower(:2)

两个绑定变量的输入参数是:value="enmo" 。


找到了这样一个方向,进一步的检查数据库参数,发现在初始化参数中的确设置了一个服务名enmo,根据错误应该是这个服务名和PDB自动注册产生了冲突。

SQL> startup mount;

ORACLE instance started.

Database mounted.

SQL> show parameter service

NAME      TYPE VALUE

------------------------------------ ----------- ------------------------

service_names      string yhem,eygle,enmo


接下来尝试去掉这个服务名,重新启动数据库,数据库成功启动。

SQL> alter system set service_names='yhem,eygle';

System altered.

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

SQL> startup

ORACLE instance started.

Database mounted.

Database opened.

SQL> show parameter service

NAME      TYPE VALUE

------------------------------------ ----------- ------------------------------

service_names      string yhem,eygle


在熟悉了Oracle的跟踪方法之后,就可以据此不断深入理解Oracle数据库的工作原理,在实践中应对各种异常,并直指根源,找到解决方案。


在12c的多租户环境,修改service_names参数应该非常谨慎,对于该参数的修改会直接反映到监听器的动态注册,甚至会覆盖PDB的动态注册,影响服务。以下过程展示这一知识点,在运维工作中尤其应当引起大家的注意和关注。


测试环境的CDB中存在两个PDB,分别是ENMO和YHEM2,同时service_names参数中存在两个服务名设定,分别是 yhem 和 eygle,这样数据库存在了四个服务名。

SQL> col name for a30

SQL> select con_id,dbid,name from v$pdbs;

    CON_ID DBID NAME

---------- ---------- ------------------------------

      2  612507346 PDB$SEED

      3  965292808 ENMO

      4 2839503056 YHEM2

SQL> show parameter service_names

NAME      TYPE VALUE

------------------------------------ ----------- ------------------------------

service_names      string yhem,eygle


在监听器中的注册的服务名如下表征(去除了不必要内容)。

[oracle@enmocoredb ~]$ lsnrctl status

LSNRCTL for Linux: Version 12.2.0.0.3 - Production on 03-AUG-2016 11:08:46

Copyright (c) 1991, 2016, Oracle.  All rights reserved.


Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))

STATUS of the LISTENER

------------------------

Services Summary...

Service "enmo" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "eygle" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "yhem" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "yhem2" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

The command completed successfully


如果通过如下方式修改了参数,去除部分服务名。

SQL> alter system set service_names='yhem';

System altered.


此时的监听状态会随之改变,包括PDB自动注册在内的服务名被一起清除。

[oracle@enmocoredb ~]$ lsnrctl status

LSNRCTL for Linux: Version 12.2.0.0.3 - Production on 03-AUG-2016 11:10:33

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Services Summary...

Service "eygle" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "yhem" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

The command completed successfully


如果遇到这种状况,这意味着所有通过服务名访问PDB的请求都会无法获取连接。解决这个问题的方法是,将PDB的服务名重新通过service_names参数进行显示的设置,就可以临时恢复这个问题。

SQL> alter system set service_names='yhem,yhem2,enmo';

SQL> !lsnrctl status

LSNRCTL for Linux: Version 12.2.0.0.3 - Production on 03-AUG-2016 15:19:49

Copyright (c) 1991, 2016, Oracle.  All rights reserved.


Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))

Services Summary...

Service "enmo" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "eygle" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "yhem" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

Service "yhem2" has 1 instance(s).

  Instance "eygle", status READY, has 1 handler(s) for this service...

The command completed successfully


但是注意,在12.2的已知问题未作为Bug修复之前,如果带着这个service_names参数设置重启数据库就会遇到前文所说的问题。


评论

Oracle
订阅
欢迎订阅Oracle频道,订阅之后可以获取最新资讯和更新通知。
墨值排行
今日本周综合
热门文章
近期活动
全部
相关课程
全部