背景
今天在巡检PostgreSQL数据库日志时,发现某个节点日志信息出现了,"ERROR: could not access status of transaction 228942"的错误信息。分析了一下错误信息,定位为clog(xact)事物状态文件损坏,这种情况是没有真正的办法修复该问题的,只能尝试尽可能的抢救数据,不过还好这套系统是一套非核心系统,并且还在备节点发生该错误,直接将备节点重做了。
事后思考2个问题:
1.如何最大程度的规避该问题?
2.如何在没有备份的情况下,快速恢复现场?并抢救尽可能多的抢救数据。
概念
/** Possible transaction statuses --- note that all-zeroes is the initial* state.** A "subcommitted" transaction is a committed subtransaction whose parent* hasn't committed or aborted yet.*/typedef int XidStatus;#define TRANSACTION_STATUS_IN_PROGRESS 0x00#define TRANSACTION_STATUS_COMMITTED 0x01#define TRANSACTION_STATUS_ABORTED 0x02#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
实验
模拟下clog损坏使用dd命令伪造事物状态,将数据导出后重建实例,并导回数据。
手工模拟clog损坏postgres@bogon-> cd opt/pg_root/pg_xactpostgres@bogon-> lltotal 12K-rw------- 1 postgres postgres 8.1K Jul 19 04:57 0000postgres@bogon-> cat''>0000postgres@bogon-> lltotal 0-rw------- 1 postgres postgres 0 Jul 19 04:57 0000启动DB,报错无法启动DBpostgres@bogon-> pg_ctl startwaiting for server to start....2021-07-19 05:01:51.774 EDT [56728] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit2021-07-19 05:01:51.774 EDT [56728] LOG: listening on IPv4 address "0.0.0.0", port 54322021-07-19 05:01:51.775 EDT [56728] LOG: listening on Unix socket "./.s.PGSQL.5432"...2021-07-19 05:01:54.849 EDT [56728] LOG: redirecting log output to logging collector process2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will appear in directory "pg_log".stopped waitingpg_ctl: could not start serverExamine the log output.检查下日志信息:postgres@bogon-> cat postgresql-2021-07-19_050154.log2021-07-19 05:01:54.849 EDT [56728] LOG: ending log output to stderr2021-07-19 05:01:54.849 EDT [56728] HINT: Future log output will go to log destination "csvlog".postgres@bogon-> cat postgresql-2021-07-19_050154.csv2021-07-19 05:01:54.849 EDT,,,56728,,60f53f7f.dd98,1,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""2021-07-19 05:01:54.853 EDT,,,56733,,60f53f82.dd9d,1,,2021-07-19 05:01:54 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""2021-07-19 05:01:54.855 EDT,,,56733,,60f53f82.dd9d,2,,2021-07-19 05:01:54 EDT,,0,FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,2,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"startup process (PID 56733) exited with exit code 1",,,,,,,,,""2021-07-19 05:01:54.855 EDT,,,56728,,60f53f7f.dd98,3,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""2021-07-19 05:01:54.879 EDT,,,56728,,60f53f7f.dd98,4,,2021-07-19 05:01:51 EDT,,0,LOG,00000,"database system is shut down",,,,,,,,,""错误信息FATAL,XX000,"could not access status of transaction 1217","Could not read from file ""pg_xact/0000"" at offset 0: read too few bytes.",,,,,,,,""无法访问事物1217的状态尝试使用dd工具修复postgres@bogon-> dd if=/dev/zero of=/opt/pg_root/pg_xact/0000 bs=256K count=11+0 records in1+0 records out262144 bytes (262 kB) copied, 0.000680516 s, 385 MB/spostgres@bogon-> lltotal 256K-rw------- 1 postgres postgres 256K Jul 19 05:12 0000启动数据库postgres@bogon-> pg_ctl startwaiting for server to start....2021-07-19 05:13:13.879 EDT [57590] LOG: starting PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit2021-07-19 05:13:13.879 EDT [57590] LOG: listening on IPv4 address "0.0.0.0", port 54322021-07-19 05:13:13.881 EDT [57590] LOG: listening on Unix socket "./.s.PGSQL.5432"...2021-07-19 05:13:17.483 EDT [57590] LOG: redirecting log output to logging collector process2021-07-19 05:13:17.483 EDT [57590] HINT: Future log output will appear in directory "pg_log".doneserver started数据库已经启动检查数据库日志2021-07-19 05:13:17.483 EDT,,,57590,,60f54229.e0f6,1,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""2021-07-19 05:13:17.495 EDT,,,57595,,60f5422d.e0fb,1,,2021-07-19 05:13:17 EDT,,0,LOG,00000,"database system was shut down at 2021-07-19 04:56:09 EDT",,,,,,,,,""2021-07-19 05:13:17.536 EDT,,,57590,,60f54229.e0f6,2,,2021-07-19 05:13:13 EDT,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
clog伪造完成,需要对数据库做一个全量备份,并重新创建数据库实例。
参考
《精通PostgreSQL11》
文章转载自CP的PostgreSQL厨房,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




