问题现象
数据库实例RSS内存打满,日志有OOM信息,库挂了。这里不分析OOM原因。
但是起库的时候失败,从日志来看总共起库4、5次都失败:
2026-02-12 09:15:21 CST::@:[578272]: FATAL: pre-existing shared memory block (key 2048, ID 1328250881) is still in use 2026-02-12 09:15:21 CST::@:[578272]: HINT: Terminate any old server processes associated with data directory "/data". 2026-02-12 09:15:21 CST::@:[578272]: LOG: database system is shut down 2026-02-12 09:21:03 CST::@:[658824]: FATAL: pre-existing shared memory block (key 2048, ID 1328250881) is still in use 2026-02-12 09:21:03 CST::@:[658824]: HINT: Terminate any old server processes associated with data directory "/data". 2026-02-12 09:21:03 CST::@:[658824]: LOG: database system is shut down 2026-02-12 09:31:12 CST::@:[794791]: LOG: redirecting log output to logging collector process 2026-02-12 09:31:12 CST::@:[794791]: HINT: Future log output will appear in directory "/data/pg_log". 2026-02-12 09:31:37 CST::@:[801049]: FATAL: lock file "postmaster.pid" already exists 2026-02-12 09:31:37 CST::@:[801049]: HINT: Is another postmaster (PID 794791) running in data directory "/data"? 2026-02-12 09:32:34 CST::@:[814396]: FATAL: lock file "postmaster.pid" already exists 2026-02-12 09:32:34 CST::@:[814396]: HINT: Is another postmaster (PID 794791) running in data directory "/data"?
启动成功是因为DBA在起库前执行ipcrm -m xxx,然后启动成功的。
虽然快速解决问题,但是仍有很多疑问:
- 为什么这种场景在现实种不算太多见?
- start.log起库报错有2类,分别对应什么操作和逻辑?
- 如果PM都不在了共享内存还可以存在吗?
- 这段共享内存如何定位和清理?
- PG共享内存有多段,这段共享内存是哪一段?
- 除了ipcrm -m还有其他起库办法吗
报错分析:pre-existing shared memory block
3种共享内存
正常来说,PG起库后共享内存有三段。
以默认“shared_memory_type='mmap'+未使用大页”为例:
# 从PG申请的虚拟内存查看PG真实使用共享内存
cat /proc/`head -1 $PGDATA/postmaster.pid`/smaps | grep -E "\-s"
2b61b0563000-2b61b0564000 rw-s 00000000 00:04 116293664 /SYSV00001000 (deleted) 2b61b057f000-2b61b05b3000 rw-s 00000000 00:12 1501001168 /dev/shm/PostgreSQL.1193490778 2b61bbac2000-2b61fa67a000 rw-s 00000000 00:04 1500999610 /dev/zero (deleted)
如上所示,从上往下分别是SYSV起库使用的共享内存、并行计算使用的共享内存、sharedbuffers使用的共享内存。
如果sharedbuffers使用了大页,或者sharedbuffers type是SYSV而不是mmap,输出会稍微有些区别。
大页:
2aaaaac00000-2aba9ca00000 rw-s 00000000 00:0e 48453452 /anon_hugepage (deleted) 2b08f2eea000-2b08f2eeb000 rw-s 00000000 00:04 50692152 /SYSV00001000 (deleted) 2b08f2f05000-2b08f302d000 rw-s 00000000 00:12 48436142 /dev/shm/PostgreSQL.1345689218
shared_memory_type = ‘sysv’:
2b03b3ceb000-2b03b3d1f000 rw-s 00000000 00:12 1572332304 /dev/shm/PostgreSQL.2883611352 2b03bf0c2000-2b03fdc7a000 rw-s 00000000 00:04 143917075 /SYSV00001000 (deleted)
汇总如下:
| PG共享内存配置 | smaps共享内存段数 | sharedbuffers smaps | sysv smaps |
|---|---|---|---|
| shared_memory_type=mmap,没有大页 | 3段共享内存 | /dev/zero | /SYSV00001000 |
| shared_memory_type=sysv,没有大页 | 2段共享内存 | /SYSV00001000 | /SYSV00001000 |
| shared_memory_type=mmap,有大页 | 3段共享内存 | /anon_hugepage | /SYSV00001000 |
| shared_memory_type=sysv,有大页 | 不支持 | 不支持 |
那么现在问题来了,报错分析:pre-existing shared memory block时对应的哪个共享内存?
源码分析
源码搜报错很容易找到关键代码位置:src/backend/port/sysv_shmem.c
首先理解sysv shmem是干嘛的,以下截取自零散的readme:
We still require a SysV shmem block to
* exist, though, because mmap'd shmem provides no way to find out how
* many processes are attached, which we need for interlocking purposes.
* As of PostgreSQL 9.3, we normally allocate only a very small amount of
* System V shared memory, and only for the purposes of providing an
* interlock to protect the data directory. The real shared memory block
* is allocated using mmap(). This works around the problem that many
* systems have very low limits on the amount of System V shared memory
* that can be allocated. Even a limit of a few megabytes will be enough
* to run many copies of PostgreSQL without needing to adjust system settings.
- sysv shmem可以找共享内存是否是attached,mmap不能实现此功能
- 这段sysv shmem是用来保护datadir的;shared buffer用的是mmap(默认)不是sysv
- 这段sysv shmem非常小(从虚拟内存地址可以看出申请的是4K=2b61b0563000-2b61b0564000)
再看shm的状态enum:
typedef enum
{
SHMSTATE_ANALYSIS_FAILURE, /* unexpected failure to analyze the ID */
SHMSTATE_ATTACHED, /* pertinent to DataDir, has attached PIDs */
SHMSTATE_ENOENT, /* no segment of that ID */
SHMSTATE_FOREIGN, /* exists, but not pertinent to DataDir */
SHMSTATE_UNATTACHED /* pertinent to DataDir, no attached PIDs */
} IpcMemoryState;
主要是关注ATTACHED,FOREIGN,UNATTACHED。
sysv shmem是用来保护datadir目录的,比如常见的场景是要确认这个目录不会被跑2个实例。既然有shmem共享内存,那么因为各种奇怪原因,这个共享内存也有可能不是这个目录或者这个进程的,所有是FOREIGN状态。如果共享内存对应到datadir了,但没有进程在运行,那么应该是UNATTACHED,有进程运行那么是ATTACHED。
这时再来看PGSharedMemoryCreate函数抛出的报错:
PGShmemHeader *
PGSharedMemoryCreate(Size size,
PGShmemHeader **shim)
{...
for (;;) //死循环
{..
shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);//shmget获取shmem共享内存并返回shmid
if (shmid < 0)
{
oldhdr = NULL;
state = SHMSTATE_FOREIGN;
}
else
state = PGSharedMemoryAttach(shmid, NULL, &oldhdr);//找到这段shmem共享内存的状态
switch (state)//根据共享内存的状态执行不同的动作
{
...//这里只展示了2种,shm有attach和没有attach
case SHMSTATE_ATTACHED: //shm有attach的情况,抛出报错(也就是问题现象出现的报错)
ereport(FATAL,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
errmsg("pre-existing shared memory block (key %lu, ID %lu) is still in use",
(unsigned long) NextShmemSegID,
(unsigned long) shmid),
errhint("Terminate any old server processes associated with data directory \"%s\".",
DataDir)));
break;
...
case SHMSTATE_UNATTACHED://shm是unattach的
/*
* The segment pertains to DataDir, and every process that had
* used it has died or detached. Zap it, if possible, and any
* associated dynamic shared memory segments, as well. This
* shouldn't fail, but if it does, assume the segment belongs
* to someone else after all, and try the next candidate.
* Otherwise, try again to create the segment. That may fail
* if some other process creates the same shmem key before we
* do, in which case we'll try the next key.
*/
//代表内存段关联Data目录,且没有进程还持有这个段
if (oldhdr->dsm_control != 0)
dsm_cleanup_using_control_segment(oldhdr->dsm_control);
if (shmctl(shmid, IPC_RMID, NULL) < 0)
NextShmemSegID++; //注意这里的ShmemSegID递增循环
break;
}
...
}
...
}
可以看到shmem attached时会抛出报错。如果没有attach,会无限循环尝试清理这段共享内存并shmemsegid+1申请新的共享内存。
- 第一种情况对应这个故障
- 第二种情况对应实例崩溃仍然可以正常起库
sysv shmem
PG10及以后postmaster.pid,sysv_shmem相关的逻辑大改,10以后基本没有变过。本文只分析了10以后的逻辑。
pidfile.h:
#define LOCK_FILE_LINE_SHMEM_KEY 7
sysv_shmem.c,InternalIpcMemoryCreate():
{
char line[64];
sprintf(line, "%9lu %9lu",
(unsigned long) memKey, (unsigned long) shmid);
AddToDataDirLockFile(LOCK_FILE_LINE_SHMEM_KEY, line);
}
从源码可以看出,shmem信息保存在postmaster.pid文件第七行,分别写的是shmkey和shmid。
> cat postmaster.pid
242712
/data
1772698474
8531
/tmp
0.0.0.0
4096 143917078 # <----here
ready
什么是shmkey和shmid
在pg源码中是这样调用的,InternalIpcMemoryCreate():
shmid = shmget(memKey, 0, IPC_CREAT | IPC_EXCL | IPCProtection);
PG以shmkey/memkey为种子key,向内核申请shmem并返回唯一标识符shmid。
shmid高度依赖服务器或者说服务器内存的状态。对于PG来说,快速重启实例,前后的shmid可能会相同或者+1,这跟linux内核机制相关;服务器重启那就完全不一样。
可以这样增加理解度:无论服务器是否重启,shmkey/memkey都可以是固定值,因为毕竟是用户输入(即PG);而在服务器重启前后,即便传入同一shmkey,获取的shmid不太可能是同一值。
PG是怎么拿shmkey的
PGSharedMemoryCreate():
/*
* We use the data directory's ID info (inode and device numbers) to
* positively identify shmem segments associated with this data dir, and
* also as seeds for searching for a free shmem key.
*/
if (stat(DataDir, &statbuf) < 0)
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not stat data directory \"%s\": %m",
DataDir)));
...
/*
* Loop till we find a free IPC key. Trust CreateDataDirLockFile() to
* ensure no more than one postmaster per data directory can enter this
* loop simultaneously. (CreateDataDirLockFile() does not entirely ensure
* that, but prefer fixing it over coping here.)
*/
NextShmemSegID = statbuf.st_ino;
for (;;)
{
IpcMemoryId shmid;
PGShmemHeader *oldhdr;
IpcMemoryState state;
/* Try to create new segment */
memAddress = InternalIpcMemoryCreate(NextShmemSegID, sysvsize);
if (memAddress)
break; /* successful create and attach */
/* Check shared memory and possibly remove and recreate */
/*
* shmget() failure is typically EACCES, hence SHMSTATE_FOREIGN.
* ENOENT, a narrow possibility, implies SHMSTATE_ENOENT, but one can
* safely treat SHMSTATE_ENOENT like SHMSTATE_FOREIGN.
*/
shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);
PG通过stat获取datadir的状态,其中包含datadir的inode,PG直接将datadir.inode当作shmkey。
在PG中shmem key跟datadir的inode强相关,一般情况下shmem key=datadir inode
验证示例:
> ls -id $PGDATA
4096 /lzlcloud/pg8574/data
> cat postmaster.pid |head -7|tail -1
4096 143917090
可以看到datadir.inode=shmkey=4096。
PG在云环境下的shmkey
上面说一般情况shmid=datadir.inode,实际上这在云环境中基本不是这个情况。
我们的云环境:
> ls -id /lzlcloud/pg8298/data
4096 /lzlcloud/pg8298/data
> ls -id /lzlcloud/pg8388/data
4096 /lzlcloud/pg8388/data
> ls -id /lzlcloud/pg8095/data
4096 /lzlcloud/pg8095/data
> cat /lzlcloud/pg8298/data/postmaster.pid|head -7|tail -1
4096 971833391
> cat /lzlcloud/pg8388/data/postmaster.pid|head -7|tail -1
4097 62128161
> cat /lzlcloud/pg8095/data/postmaster.pid|head -7|tail -1
4098 143163441
data盘dir的inode都是4096,而shmkey是4096、4097、4098
why?
inode的问题跟磁盘的文件系统有关系:
- 每个文件系统有独立的inode
- 文件系统预留了一些inode,前几位是不能使用的。根据不同的挂载方式,我们data盘真正的inode从4096开始
也就是说datadir.inode=4096这是我们云环境磁盘挂载的默认行为。其他环境可能不一样,未深入分析。不过以相同文件系统和相同方式挂载挂载pg datadir的话,仍有可能inode数值相等。
shmkey的问题跟PG源码相关,PGSharedMemoryCreate():
for (;;)
{
...
NextShmemSegID = statbuf.st_ino;
...
shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);
...
switch (state)
{
case SHMSTATE_FOREIGN:
NextShmemSegID++;
break;
本来shmkey=datadir.inode,但是由于可能申请到shmem是foreign的,所以shmkey+1再申请一次。
例如postmaster.pid文件shmkey=4097的那个实例,它起库时shmkey=4096,但是发现shmid那个内存段被其他实例使用了(就是另一个shmkey=4096的PG实例),它让shmkey+1再申请了另一个shmid共享内存段。
同理shmkey=4098的那个实例加了2次才找到空闲的shmkey对应的shmid。
shmid的关联性
sysv的shmid可以在起库的报错日志、postmaster.pid文件第7行、虚拟内存地址smaps中均可以找到,并通过sysv共享内存命令的ipcs命令查看和ipcrm命令清理。
示例:注意以下shmid=143917078
起库报错日志:
pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2026-03-05 16:02:19 CST::@:[262388]: FATAL: pre-existing shared memory block (key 4096, ID 143917078) is still in use
postmaster.pid文件第七行:
> cat postmaster.pid |head -7|tail -1
4096 143917078
虚拟内存smaps:
cat /proc/`head -1 $PGDATA/postmaster.pid`/smaps | grep -E "\-s" 2ad2b5189000-2ad2b518a000 rw-s 00000000 00:04 143917078 /SYSV00001000 (deleted)
通过shmid sysv共享内存id查看和清理:
ipcs -m -i 143917078 #清理:ipcrm -m shmid Shared memory Segment shmid=143917078 uid=6001 gid=6001 cuid=6001 cgid=6001 mode=0600 access_perms=0600 bytes=56 lpid=242712 cpid=242712 nattch=10 att_time=Thu Mar 5 16:14:51 2026 det_time=Thu Mar 5 16:14:49 2026 change_time=Thu Mar 5 16:14:34 2026
测试
生产问题复现
持有一个backend进程永不退出,kill -9 PM
> cat postmaster.pid
4096 143917076
> ipcs -m -i 143917076 #shmem id
Shared memory Segment shmid=143917076
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=241567 cpid=64757 nattch=23
> kill -stop 107648 #任意一个backend
> kill -9 64757 #postmaster或者其他的
> ipcs -m -i 143917076
Shared memory Segment shmid=143917076
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=252283 cpid=64757 nattch=1 #nattch != 0
> pg_ctl start -D $PGDATA
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 16:02:19 CST::@:[262388]: FATAL: pre-existing shared memory block (key 4096, ID 143917076) is still in use
2026-03-05 16:02:19 CST::@:[262388]: HINT: Terminate any old server processes associated with data directory "/data".
stopped waiting
pg_ctl: could not start server
nattach=1,实例无法启动。
实例奔溃正常起库
其实就是kill实例然后启动
> cat postmaster.pid
4096 143917077
> ipcs -m -i 143917077 #shmem id
Shared memory Segment shmid=143917077
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=154800 cpid=134329 nattch=18
> kill -9 134329 #postmaster或者其他的
> cat postmaster.pid
4096 143917077
> ipcs -m -i 143917077 #shmem id没有改变,shmem仍然存在
Shared memory Segment shmid=143917077
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=169360 cpid=134329 nattch=0 #nattch=0
> ipcs -m -i 143917077 #shmem id没有改变,shmem仍然存在
> pg_ctl start -D $PGDATA # 起库成功
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 16:14:34 CST::@:[242712]: LOG: redirecting log output to logging collector process
2026-03-05 16:14:34 CST::@:[242712]: HINT: Future log output will appear in directory "/data/pg_log".
done
server started
> ipcs -m -i 143917077 #残留的shmem起库时被清理
ipcs: id 143917077 not found
> ipcs -m -i 143917078 #shmemid起库时被+1
Shared memory Segment shmid=143917078
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=273571 cpid=242712 nattch=26
> cat postmaster.pid # shmkey不变,shmid+1
4096 143917078
正常kill -9然后启动,可以正常启动,残留的shmem会在启动时被清理。shmkey不变是因为inode=4096且shmkey=4096没有被占用,shmid+1这是linux内核行为,至少说明不是使用的同一段shmem。
持有文件但不持有shmem
因为起库跟datadir inode相关,inode跟shmem id相关,起库本质上是在检查shmem是不是被其他进程持有,而不是文件fd是否还被其他进程持有。所以这里测试不持有共享内存但持有文件fd的进程logger。
$ cat /proc/77300/smaps | grep -E "\-s" #这是logger进程,检查它没有用共享内存
$ kill -stop 77300 #stop logger
$ kill -9 77076 #kill -9 pm
$ cat postmaster.pid #文件仍在
77076
/lzlcloud/pg8531/data
1772700343
8531
/tmp
0.0.0.0
4096 143917080
ready
$ ipcs -m -i 143917080 #共享内存仍在
Shared memory Segment shmid=143917080
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=77319 cpid=77076 nattch=0
att_time=Thu Mar 5 17:27:11 2026
det_time=Thu Mar 5 17:27:15 2026
change_time=Thu Mar 5 16:45:43 2026
$ ps -ef|grep 77300 #进程仍在
postgres 77300 1 0 16:45 ? 00:00:00 postgresql: lzldb: logger
postgres 135246 46622 0 17:27 pts/1 00:00:00 grep --color=auto 77300
$ pg_ctl start -D $PGDATA #起库成功
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 17:27:55 CST::@:[140497]: LOG: redirecting log output to logging collector process
2026-03-05 17:27:55 CST::@:[140497]: HINT: Future log output will appear in directory "/data/pg_log".
done
server started
logger持有data目录下的文件,但不关联共享内存,不会阻止起库
删除postmaster.pid文件起库失败
流程跟上面差不多:持有1个backend进程,kill -9 PM,删除postmaster.pid文件,起库。
过程不贴了,结果是起库失败,报错如下:
waiting for server to start....2026-03-06 15:29:48 CST::@:[22475]: FATAL: pre-existing shared memory block (key 4098, ID 171868173) is still in use 2026-03-06 15:29:48 CST::@:[22475]: HINT: Terminate any old server processes associated with data directory "/data". 2026-03-06 15:29:48 CST::@:[22475]: LOG: database system is shut down
可以看出,有僵尸进程持有shmem的情况下,即便删除包含shmid的postmaster.pid文件,PG仍然能找到对应的shmid。
关闭一个其他库,启动当前库
pg会分析2个地方shmid是否是当前的
- 以datadir.inode当作shmkey对应的shmid,或者
shmkey++ - postmaster.pid中的shmid
即便直接删除postmaster.pid,PG仍然可以知道shmem是不是被其他进程持有。但是我们可以通过datadir.inode和shmkey++的特性让他起库。
因为根据之前分析,我们云环境datadir inode都是4096,shmkey不同是因为源码有shmkey++的逻辑。所以我们可以:启动或停止一个datadir.inode=4096的PG库,让当前PG库启动时shmkey++多一个或者少一个,拿到不同的shmid。
$ kill -stop 165245
$ kill -9 164411 #停当前库并持有一个当前库backend进程
$ pg_ctl stop -D /pg8531/data # 停一个其他库
waiting for server to shut down.... done
server stopped
$ pg_ctl start -D /pg8574/data # 启动当前库,会失败,因为postmaster.pid没有删除
rase_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 18:22:35 CST::@:[196209]: FATAL: pre-existing shared memory block (key 4097, ID 143917087) is still in use
2026-03-05 18:22:35 CST::@:[196209]: HINT: Terminate any old server processes associated with data directory "/pg8574/data".
stopped waiting
rase_ctl: could not start server
Examine the log output.
$ mv /lzlcloud/pg8574/data/postmaster.pid{,.bak} # 删除当前库的postmaster.pid
$ pg_ctl start -D /lzlcloud/pg8574/data #再起当前库,成功
2026-03-05 18:23:09 CST::@:[207725]: LOG: redirecting log output to logging collector process
2026-03-05 18:23:09 CST::@:[207725]: HINT: Future log output will appear in directory "/lzlcloud/pg8574/data/pg_log".
done
server started
$ ipcs -m -i 143917087 #shmid对应的sysv共享内存仍然被我们持有
Shared memory Segment shmid=143917087
uid=6001 gid=6001 cuid=6001 cgid=6001
mode=0600 access_perms=0600
bytes=56 lpid=196209 cpid=164411 nattch=1
att_time=Thu Mar 5 18:22:35 2026
det_time=Thu Mar 5 18:22:35 2026
change_time=Thu Mar 5 18:21:04 2026
可以启动,当前库共享内存申请了另一块,之前那个共享内存没有被清理。这就是在云环境下关其他库启动当前库的骚操作。
这里有个小小的前提,关的其他库不仅要inode=当前库inode,还要其他库shmkey<当前库shmkey。
报错分析:lock file "postmaster.pid" already exists
这个问题比“共享内存已存在”简单多了。
起库时本身就会检查lock file、lock file中的pid,CreateLockFile():
if (other_pid != my_pid && other_pid != my_p_pid &&
other_pid != my_gp_pid)
{
if (kill(other_pid, 0) == 0 ||
(errno != ESRCH && errno != EPERM))
{
/* lockfile belongs to a live process */
ereport(FATAL,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
errmsg("lock file \"%s\" already exists",
filename),
isDDLock ?
(encoded_pid < 0 ?
errhint("Is another postgres (PID %d) running in data directory \"%s\"?",
(int) other_pid, refName) :
errhint("Is another postmaster (PID %d) running in data directory \"%s\"?",
(int) other_pid, refName)) :
(encoded_pid < 0 ?
errhint("Is another postgres (PID %d) using socket file \"%s\"?",
(int) other_pid, refName) :
errhint("Is another postmaster (PID %d) using socket file \"%s\"?",
(int) other_pid, refName))));
}
}
测试就更简单,在库启动的时候再启动一次:
$ pg_ctl start -D /pg8531/data
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-06 15:59:05 CST::@:[89145]: FATAL: lock file "postmaster.pid" already exists
2026-03-06 15:59:05 CST::@:[89145]: HINT: Is another postmaster (PID 255500) running in data directory "/pg8531/data"?
stopped waiting
pg_ctl: could not start server
Examine the log output.
所以故障时的start.log后面几个报错是因为库已经启动了,多启动了几次。
总结
PG在起库时,会先开辟一个sysv shmem(不是mmap对应的share buffers)以锁定datadir。锁定是通过datadir的inode号当作shmkey通过shmget申请的,并返回shmem唯一标识符shmid。由于可能申请的shmem被其他进程使用,PG会让shmkey++无限循环指到申请到没有被人占用的shmem。postmaster.pid第七行分别保存shmkey和shmid。在云环境下通常可以看到共享PG实例的shmkey递增的现象,这是因为data盘挂载方式相同使用了相同的inode,shmkey++导致。
如果PG实例被意外干掉,shmem不会被清理,正常情况下没有僵尸进程持有共享内存,那么起库会清理这段shmem并正常起库;异常情况下僵尸进程持有共享内存,起库会失败,此时需要介入处理。
推荐的处理方式:
- ipcrm -m(最推荐)
- lsof找到僵尸进程并kill
- 重启主机
不推荐但可以起库的方式:
- mv postmaster.pid+关闭一个其他PG库(其他PG库的shmkey<当前PG库)
- mv postmaster.pid+重新挂载data盘并改变inode
最后回答开头的问题:
- 为什么这种场景在现实种不算太多见?
实例异常宕机+仍然有僵尸进程没有被清理。有些情况是异常宕机没有僵尸进程,正常起库就行了。
- start.log起库报错有2类,分别对应什么操作和逻辑?
共享内存被占用的报错是因为实例异常宕机+仍然有僵尸进程;postmaster.pid存在的报错是因为起库多次
- 如果PM都不在了共享内存还可以存在吗?
PM都不在了共享内存可以存在,PG的进程不一定会自己跑挂或者被OS处理;但是所有进程都不在了共享内存应该不存在
- 这段共享内存如何定位和清理?
起库的start.log可以找到shmid,ipcrm -m $shmid命令可以清理。
- PG共享内存有多段,这段共享内存是哪一段?
sysv shmem,用于保护datadir,一定存在,参考“三种共享内存”部分。与mmap下的sharebuffers是2个东西。
- 可以通过inode或者文件找到对应的shmem吗?
LINUX在用户态没有提供通过inode或者文件找到对应shmem的接口(这句话AI含量100%,经过多个模型交叉验证)。PG是通过datadir的inode当作种子shmkey去申请的shmem共享内存,本质上不是通过inode直接找到对应的shmem,PG对shmem共享内存使用自己的寻找机制,但不绝对对应,shkey++就是一个折衷起库逻辑。




