暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

海山数据库(He3DB)源码详解:主备复制SyncRepWaitForLSN

yidongyun 2024-11-26
255

# 海山数据库(He3DB)源码详解:主备复制SyncRepWaitForLSN

背景

He3DB 采用了先进的存储引擎和查询优化技术,能够快速处理大量数据和复杂查询。无论是 OLTP(在线事务处理)还是 OLAP(在线分析处理)场景,都能提供出色的性能表现。He3DB 具备完善的数据备份和恢复机制,能够在系统故障或数据损坏时快速恢复数据,确保业务的连续性。He3DB 支持水平扩展和垂直扩展,可以轻松应对不断增长的数据需求。He3DB 提供了严格的访问控制和数据加密功能,确保数据的安全性和隐私性。

本文基于He3DB,针对主备复制模块进行源码解读分享

流复制——SyncRepWaitForLSN

SyncRepWaitForLSN主要用于同步复制中的等待特定预写日志(Write-Ahead Log,WAL)位置的处理。

  1. 前期检查与准备
    确保在事务提交期间持有中断,防止后续共享内存队列清理受到外部中断影响
    快速退出条件检查:
    如果用户未请求同步复制(!SyncRepRequested())或者没有定义同步复制备用节点名称(!((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined),则直接返回
    根据提交状态调整同步复制等待模式
void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) { char *new_status = NULL; const char *old_status; int mode; Assert(InterruptHoldoffCount > 0); if (!SyncRepRequested() || !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) return; /* Cap the level for anything other than commit to remote flush only. */ if (commit) mode = SyncRepWaitMode; else mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH); Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); Assert(WalSndCtl != NULL);
  1. 获取同步复制锁与检查
    获取同步复制锁(LWLockAcquire(SyncRepLock, LW_EXCLUSIVE)
    确保当前进程不在等待状态。
    再次检查是否需要等待同步复制:
    如果WalSndCtl->sync_standbys_defined为假或者给定的LSN已经被处理(lsn <= WalSndCtl->lsn[mode]),则释放锁并返回。
//获取同步复制锁 LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); //确保当前进程不在等待状态 Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING); if (!WalSndCtl->sync_standbys_defined || lsn <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); return; }
  1. 设置等待状态并加入队列
    设置当前进程的等待LSN(MyProc->waitLSN = lsn)和等待状态为正在等待(MyProc->syncRepState = SYNC_REP_WAITING
    将当前进程加入同步复制队列(SyncRepQueueInsert(mode)),并确保队列按LSN有序
    释放同步复制锁
MyProc->waitLSN = lsn; MyProc->syncRepState = SYNC_REP_WAITING; SyncRepQueueInsert(mode); Assert(SyncRepQueueIsOrderedByLSN(mode)); LWLockRelease(SyncRepLock);
  1. 更新进程标题(可选)
    如果需要更新进程标题,则进行相应的操作,显示正在等待同步复制的状态
if (update_process_title) { int len; old_status = get_ps_display(&len); new_status = (char *) palloc(len + 32 + 1); memcpy(new_status, old_status, len); sprintf(new_status + len, " waiting for %X/%X", LSN_FORMAT_ARGS(lsn)); set_ps_display(new_status); new_status[len] = '\0'; /* truncate off " waiting ..." */ }
  1. 循环等待
    进入无限循环等待指定的LSN被确认:
    重置等待锁存器(ResetLatch(MyLatch)
    如果当前进程的同步复制状态为已完成(MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE),则跳出循环
    如果进程有死亡标志(ProcDiePending),则发出警告并取消等待,关闭进一步的输出,准备终止连接
    如果有查询取消挂起标志(QueryCancelPending),则取消等待并发出警告
    等待锁存器被设置或主进程死亡(WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, WAIT_EVENT_SYNC_REP))-1:通常表示没有超时时间限制,即会一直等待直到满足上述条件之一
    如果主进程死亡标志被设置(rc & WL_POSTMASTER_DEATH),则设置进程死亡标志,关闭输出,取消等待并跳出循环
or (;;) { int rc; /* Must reset the latch before testing state. */ //重置等待锁存器 ResetLatch(MyLatch); /* * Acquiring the lock is not needed, the latch ensures proper * barriers. If it looks like we're done, we must really be done, * because once walsender changes the state to SYNC_REP_WAIT_COMPLETE, * it will never update it again, so we can't be seeing a stale value * in that case. */ if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE) break; /* * If a wait for synchronous replication is pending, we can neither * acknowledge the commit nor raise ERROR or FATAL. The latter would * lead the client to believe that the transaction aborted, which is * not true: it's already committed locally. The former is no good * either: the client has requested synchronous replication, and is * entitled to assume that an acknowledged commit is also replicated, * which might not be true. So in this case we issue a WARNING (which * some clients may be able to interpret) and shut off further output. * We do NOT reset ProcDiePending, so that the process will die after * the commit is cleaned up. */ if (ProcDiePending) { ereport(WARNING, (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"), errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); whereToSendOutput = DestNone; SyncRepCancelWait(); break; } /* * It's unclear what to do if a query cancel interrupt arrives. We * can't actually abort at this point, but ignoring the interrupt * altogether is not helpful, so we just terminate the wait with a * suitable warning. */ if (QueryCancelPending) { QueryCancelPending = false; ereport(WARNING, (errmsg("canceling wait for synchronous replication due to user request"), errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); SyncRepCancelWait(); break; } /* * Wait on latch. Any condition that should wake us up will set the * latch, so no need for timeout. */ //等待锁存器被设置或主进程死亡 rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, WAIT_EVENT_SYNC_REP); /* * If the postmaster dies, we'll probably never get an acknowledgment, * because all the wal sender processes will exit. So just bail out. */ if (rc & WL_POSTMASTER_DEATH) { ProcDiePending = true; whereToSendOutput = DestNone; SyncRepCancelWait(); break; } }
  1. 清理状态
    当等待结束后,进行状态清理:
    执行 pg_read_barrier(),确保内存中的读取操作能够正确地看到数据库的一致状态,可能防止读取到尚未稳定的或不一致的数据版本。
    Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)))检查与当前进程(由 MyProc 表示)相关的 syncRepLinks 结构是否处于分离状态。如果不满足这个条件,程序可能会停止并报告错误,因为后续的操作假设这个结构已经分离。
    设置当前进程的同步复制状态为未等待(MyProc->syncRepState = SYNC_REP_NOT_WAITING),并将等待 LSN 重置为 0
    如果更新了进程标题,则恢复原始标题并释放内存
pg_read_barrier(); Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); MyProc->syncRepState = SYNC_REP_NOT_WAITING; MyProc->waitLSN = 0; if (new_status) { /* Reset ps display */ set_ps_display(new_status); pfree(new_status); } }

He3DB其余文章参考链接

海山数据库(He3DB)源码详解:He3DB-CLOG日志管理器函数之TransactionIdSetTreeStatus

海山数据库(He3DB)+AI(五):一种基于强化学习的数据库旋钮调优方法

海山数据库(He3DB)+AI(四):一种基于迁移学习的启发式数据库旋钮调优方法

海山数据库(He3DB)源码解读:海山PG 词法、语法分析

海山数据库(He3DB)源码详解:海山PG 空闲空间映射表FSM

作者介绍

周雨慧 中移(苏州)软件技术有限公司 数据库内核开发工程师

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论