暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 小改动,解决流复制遇到的pg_xlog已删除的问题(主库wal sender读归档目录文件发送给wal sender)

digoal 2016-03-10
1082

作者

digoal

日期

2016-03-10

标签

PostgreSQL , wal sender , wal receiver , wal sender支持归档读取


背景

PostgreSQL 通过流复制搭建STANDBY,在某些特定的场景可能会遭遇standby需要的xlog文件已经从master删除的情况。

例如

1. 备库写入XLOG的速度比主库产生XLOG的速度慢。

2. 网络传输速度比产生XLOG的速度慢。

3. 备库故障,中断时间过长。

4. 网络故障,中断时间过长。

目前的解决办法

1. 归档

主库对xlog文件进行归档,当备库需要的文件已经不在pg_xlog目录中时,备库可以通过配置restore_command,从主库的归档来获取需要的xlog文件。这种方法比较通用。

缺点

备库需要有获取归档的通道,例如能访问归档存储,或者能通过ftp, http服务获取到归档文件。如果归档是在磁带库的,要有从磁带库获取归档的路径。

2. slot

slot会记录下备库已经同步到XLOG的什么位置的信息,所以主库不会删除备库还需要的xlog文件。

缺点

因为主库不能删除未发送的XLOG,那么如果备库落后的太多,可能导致主库的空间爆满。

3. wal keep segments

设置主库最多保留的XLOG文件个数,不能彻底解决问题。

另类的解决办法,其实通过归档来解决是比较好的办法,但是有一个空缺,就是备库可能无法访问归档存储或归档服务。

这种情况下,有什么好的方法来解决呢?

首先我们要了解一个问题,主库一定是能访问归档的,或者说概率非常大。

我们以主库能访问归档为前提,当备库需要的XLOG文件已经被删除时,主库主动去归档获取归档文件,并通过流复制发送给备库。

已有的代码:

当BasicOpenFile失败时,表示$PGDATA/pg_xlog中不存在备库请求的xlog信息,返回错误信息requested WAL segment %s has already been removed"。

src/backend/replication/walsender.c

```
static void
XLogRead(char *buf, XLogRecPtr startptr, Size count)
{
....
XLogFilePath(path, curFileTimeLine, sendSegNo);

                    sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);  
                    if (sendFile < 0)  
                    {  
                            /*  
                             * If the file is not found, assume it's because the standby  
                             * asked for a too old WAL segment that has already been  
                             * removed or recycled.  
                             */  
                            if (errno == ENOENT)  
                                    ereport(ERROR,  
                                                    (errcode_for_file_access(),  
                                                     errmsg("requested WAL segment %s has already been removed",  
                                                            XLogFileNameP(curFileTimeLine, sendSegNo))));  
                            else  
                                    ereport(ERROR,  
                                                    (errcode_for_file_access(),  
                                                     errmsg("could not open file \"%s\": %m",  
                                                                    path)));  
                    }  
                    sendOff = 0;

```

我们假设归档目录是一个分布式文件系统例如lustre, moosefs或者其他。

归档命令是

archive_command = 'test ! -f /home/digoal/arch/%f && cp %p /home/digoal/arch/%f'

我们需要新增一个参数, 这个参数是主库的参数。

xlog_search_dir

通过它告诉主库归档目录在哪里。本文的归档目录是/home/digoal/arch

代码如下:

src/include/replication/walsender.h

extern PGDLLIMPORT char *xlog_search_dir;

src/backend/utils/misc/guc.c

```
static struct config_string ConfigureNamesString[] =
{
{
{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
gettext_noop("Sets the shell command that will be called to archive a WAL file."),
NULL
},
&XLogArchiveCommand,
"",
NULL, NULL, show_archive_command
},

    {  
            {"xlog_search_dir", PGC_SIGHUP, WAL_ARCHIVING,  
                    gettext_noop("Sets the dir by walsender search XLOG from archive dir, when xlog cleaned in pg_xlog file which walreceiver requesting."),  
                    NULL  
            },  
            &xlog_search_dir,  
            "",  
            NULL, NULL, NULL  
    },

```

另外还需要调整walsender.c的代码:

src/include/replication/walsender.h

extern PGDLLIMPORT char *xlog_search_dir;

src/backend/replication/walsender.c

```
char * xlog_search_dir = NULL;

static void
XLogRead(char *buf, XLogRecPtr startptr, Size count)
{
...
if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
{
char path[MAXPGPATH];
...
sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
if (sendFile < 0)
{
if (xlog_search_dir != NULL && xlog_search_dir[0] != '\0')
{
snprintf(path, MAXPGPATH, "%s/%08X%08X%08X", xlog_search_dir, curFileTimeLine,
(uint32) ((sendSegNo) / XLogSegmentsPerXLogId),
(uint32) ((sendSegNo) % XLogSegmentsPerXLogId));

                                    sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);  
                                    if (sendFile <0)  
                                     {  
                                                    /*  
                                                     * If the file is not found, assume it's because the standby  
                                                    * asked for a too old WAL segment that has already been  
                                                    * removed or recycled.  
                                                    */  
                                            if (errno == ENOENT)  
                                                    ereport(ERROR,  
                                                            (errcode_for_file_access(),  
                                                             errmsg("requested WAL segment %s has already been removed",  
                                                                    XLogFileNameP(curFileTimeLine, sendSegNo))));  
                                            else  
                                                    ereport(ERROR,  
                                                                    (errcode_for_file_access(),  
                                                                     errmsg("could not open file \"%s\": %m",  
                                                                            path)));  
                                    }  
                             }  
                             else  
                             {  
                                                    /*  
                                                     * If the file is not found, assume it's because the standby  
                                                    * asked for a too old WAL segment that has already been  
                                                    * removed or recycled.  
                                                    */  
                                            if (errno == ENOENT)  
                                                    ereport(ERROR,  
                                                            (errcode_for_file_access(),  
                                                             errmsg("requested WAL segment %s has already been removed",  
                                                                    XLogFileNameP(curFileTimeLine, sendSegNo))));  
                                            else  
                                                    ereport(ERROR,  
                                                                    (errcode_for_file_access(),  
                                                                     errmsg("could not open file \"%s\": %m",  
                                                                            path)));  
                             }  
                    }  
                    sendOff = 0;  
            }

...
```

重新编译,并在postgresql.conf中配置

xlog_search_dir='/home/digoal/arch'

现在,把备库需要的已归档的XLOG从主库的pg_xlog目录删除,也不会报requested WAL segment %s has already been removed"了。

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论