最近迁移了公司测试环境的Oracle数据库,使用的是centos7.2的操作系统,步骤也很简单,搭建完DG以后,做了switchover。但是switchover以后,Oracle已经频繁crash几次,日志显示如下:
Errors in file /data/prod/db/11.2.0/admin/PROD_ebs-db-test/diag/rdbms/proddg/PROD/trace/PROD_smon_5957.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
SMON (ospid: 5957): terminating the instance due to error 27157
Instance terminated by SMON, pid = 5957
Errors in file /data/prod/db/11.2.0/admin/PROD_ebs-db-test/diag/rdbms/proddg/PROD/trace/PROD_smon_5957.trc:
ORA-27300: OS system dependent operation:semctl failed with status: 22
ORA-27301: OS failure message: Invalid argument
ORA-27302: failure occurred at: sskgpwrm1
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
这里提示很明显,SMON (ospid: 5957): terminating the instance due to error 27157
,首先说明一下Oracle的SMON进程是干嘛的。
The SMON background process performs all system monitoring functions on the Oracle database. The SMON process performs a "warm start" each time that Oracle is re-started, ensuring that any in-flight transaction at the time of the last shutdown are recovered. For example, if Oracle crashed hard with a power failure, the SMON process is attached at startup time, and detects any uncompleted work, using the rollback segments to recover the transactions. In addition, SMON performs periodic cleanup of temporary segments that are no longer needed, and also perform tablespace operations, coalescing contiguous free extents into larger extents.
说简单点SMON进行就是一个系统监控进程,在Oracle中承担很多清理工作、执行实例恢复等重要功能,这里由于ORA-27157
错误的出现,SMON进行主动将实例关闭了。
根据关键字ORA-27157
,我们直接在mos上找到了相关的案例(Doc ID 438205.1),根据mos的描述,出现该问题的原因如下:
在centos7.2中,systemd-logind 服务引入了一个新特性,该新特性是:当一个user 完全退出os之后,remove掉所有的IPC objects。该特性由 /etc/systemd/logind.conf
参数文件中RemoveIPC选项来控制。在centos7.2中,RemoveIPC的默认值为yes因此,当最后一个oracle 或者Grid用户退出时,操作系统会remove 掉这个user的shared memory segments and semaphores 由于Oracle ASM 和database 使用 shared memory segments ,remove shared memory segments将会crash掉Oracle ASM and database instances.
mos也同样给出了解决方法:
Set RemoveIPC=no in etc/systemd/logind.conf if it is not in that file Reboot the server or restart systemd-logind Migrate to Oracle Linux 7.2 resolves the problem
看样子Oracle Linux 7.2已经解决了这个问题。根据mos给出的方法,我们修改了/etc/systemd/logind.conf
将选项RemoveIPC
显式指定为no以后,观察了一段时间,实例果然没有再crash了。
后续将整理一下最近一周对Oracle的一些操作文档,包括静默安装、DG搭建、switchover等等,作为mysqler,万一哪天又要用到可以作为参考😄





