某个PG数据损坏
参考链接:https://ceph.com/geen-categorie/ceph-manually-repair-object/
一般手动修复损坏的PG即可,使用ceph pg repair {pgid}
PG状态为inconsistent时,说明PG中存在对象不一致的情况。有可能时某个OSD磁盘损坏,或者磁盘上的数据发生静默错误。
下面手动构造一个PG数据损坏的例子,并修复它。
# 1.关闭OSD服务
$ systemctl stop ceph-osd@{id}
# 2.使用ceph-objectstore-tool 挂载 /var/lib/ceph/osd/ceph-0 到 /mnt/ceph-osd@0
[root@node-1 ceph-objectstore-tool-test]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op fuse --mountpoint /mnt/ceph-osd@0/
mounting fuse at /mnt/ceph-osd@0/ ...
# 3.删除 /mnt/ceph-osd/@0/10.0_head/all文件夹中某个目录文件(即PG中的对象),即破坏了10.0pg的某个对象
[root@node-1 all]# rm -rf \#10\:01ec679f\:\:\:10000011eba.00000000\:head#/
rm: 无法删除"#10:01ec679f:::10000011eba.00000000:head#/bitwise_hash": 不允许的操作
rm: 无法删除"#10:01ec679f:::10000011eba.00000000:head#/omap": 不允许的操作
rm: 无法删除"#10:01ec679f:::10000011eba.00000000:head#/attr": 不允许的操作
# 4.卸载/mnt/ceph-osd@0,重启 OSD服务,等待集群恢复正常
# 5.手动对10.0 PG做scrub,命令:ceph pg scrub 10.0,等待后台scrub完成
[root@node-1 ~]# ceph pg scrub 10.0
instructing pg 10.0 on osd.2 to scrub
# 6.发现集群报错,PG id是10.0,状态为active+clean+inconsistent
[root@node-1 ~]# ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 10.0 is active+clean+inconsistent, acting [2,1,0]
# 7.执行修复操作,PG状态:active+clean+scrubbing+deep+inconsistent+repair
[root@node-1 ~]# ceph pg repair 10.0
instructing pg 10.0 on osd.2 to repair
[root@node-1 ~]# ceph -s
cluster:
id: 60e065f1-d992-4d1a-8f4e-f74419674f7e
health: HEALTH_ERR
2 scrub errors
Possible data damage: 1 pg inconsistent
。。。
data:
pools: 9 pools, 464 pgs
objects: 53.99k objects, 802 MiB
usage: 16 GiB used, 24 GiB / 40 GiB avail
pgs: 463 active+clean
1 active+clean+scrubbing+deep+inconsistent+repair
# 8.等待集群恢复健康
[root@node-1 ~]# ceph health detail
HEALTH_OK
如果ceph pg repair {pgid}命令无法修复PG,可以使用ceph-objectstore-tool导入整个PG的方式。
参考链接:https://www.jianshu.com/p/36c2d5682d87#:~:text=%E8%B5%B7%E5%A4%AF%E4%BD%8F%E3%80%82-,3.9%20Incomplete,-Peering%E8%BF%87%E7%A8%8B%E4%B8%AD
构造故障
# 构造故障环境,使用ceph-objectstore-tool,删除三副本中两个副本上的同一个对象。
# 注意,使用ceph-objectstore-tool前,需要停掉该osd服务,使用systemctl stop ceph-osd@{id}
# 选取10.0 ,在node-2和node3节点上都删除1000000d4dc.00000000对象,集群为3副本,10.0PG分布在node1,2,3上
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 1000000d4dc.00000000 remove
remove #10:03f57502:::1000000d4dc.00000000:head#
[root@node-3 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --pgid 10.0 1000000d4dc.00000000 remove
remove #10:03f57502:::1000000d4dc.00000000:head#
[root@node-1 ~]# ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 10.0 is active+clean+inconsistent, acting [2,1,0]
使用ceph-objectstore-tool修复
# 查询数据对比
# 1.导出PG的object清单,把所有清单都放到node-1节点的~/export文件夹下,方便比较
$ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op list > ~/export/pg-10.0-osd0.txt
[root@node-1 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op list > ~/export/pg-10.0-osd1.txt
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 --op list > ~/pg-10.0-osd1.txt
[root@node-3 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --pgid 10.0 --op list > ~/pg-10.0-osd2.txt
[root@node-1 export]# scp root@node-2:/root/pg-10.0-osd1.txt ./
pg-10.0-osd1.txt 100% 97KB 19.5MB/s 00:00
pg-10.0-osd0.txt pg-10.0-osd1.txt
[root@node-1 export]# scp root@node-3:/root/pg-10.0-osd2.txt ./
pg-10.0-osd2.txt 100% 97KB 35.0MB/s 00:00
[root@node-1 export]# ls
pg-10.0-osd0.txt pg-10.0-osd1.txt pg-10.0-osd2.txt
# 2.查询PG中object的数量,发现node-1节点上的10.0PG拥有最多的对象,833个
$ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op list | wc -l
[root@node-1 export]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op list | wc -l
833
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 --op list | wc -l
832
[root@node-3 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --pgid 10.0 --op list | wc -l
832
# 3.对比所有副本的object是否一致,本例中node-2和node-3一致,node-1和他们不一致,但是object数量最多
# 本例中使用node-1节点的PG副本,并导入node-2和node-3上的OSD
# - 如上述情况,diff对比后,每个副本(主从所有副本)的object list是否一致。避免有数据不一致。使用数量最多,并且diff对比后,数量最多的包含所有object的备份。
# - 如上述情况,diff对比后,数量是不一致,最多的不包含所有的object,则需要考虑不覆盖导入,再导出。最终使用完整的所有的object进行导入。注:import是需要提前remove pg后进行导入,等于覆盖导入。
# - 如上述情况,diff对比后,数据是一致,则使用object数量最多的备份,然后import到object数量少的pg里面 然后在所有副本mark complete,一定要先在所有副本的osd节点export pg备份,避免异常后可恢复pg。
[root@node-1 export]# diff -u ./pg-10.0-osd0.txt ./pg-10.0-osd1.txt
[root@node-1 export]# diff -u ./pg-10.0-osd0.txt ./pg-10.0-osd2.txt
[root@node-1 export]# diff -u ./pg-10.0-osd2.txt ./pg-10.0-osd1.txt
# 4.导出node-1节点的PG,导出文件名可自行定义,并把此文件拷贝到node-2和node3节点
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op export --file ~/export/pg-10.0.obj
[root@node-1 export]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --pgid 10.0 --op export --file ~/export/pg-10.0.obj
Read #10:03f57502:::1000000d4dc.00000000:head#
Read #10:03f6b1a4:::100000091a0.00000000:head#
Read #10:03f6dfc2:::10000010b31.00000000:head#
Read #10:03f913b2:::10000010740.00000000:head#
Read #10:03f99080:::10000010f0f.00000000:head#
Read #10:03fc19a4:::10000011c5e.00000000:head#
Read #10:03fe3b90:::10000010166.00000000:head#
Read #10:03fe60e1:::10000011c44.00000000:head#
........
Export successful
[root@node-1 export]# ls
pg-10.0.obj pg-10.0-osd0.txt pg-10.0-osd1.txt pg-10.0-osd2.txt
[root@node-1 export]# scp pg-10.0.obj root@node-2:/root/
pg-10.0.obj 100% 4025KB 14.7MB/s 00:00
[root@node-1 export]# scp pg-10.0.obj root@node-3:/root/
pg-10.0.obj
# 注:后续所有操作,node-2和node-3节点是一样的,出于简洁考虑,只展示node-2节点
# 5.在node-2和node-1节点上导入备份的PG
# 在导入备份前,建议把将要被替换的PG导出,这样后续出现问题后,还可以还原
# 将指定的pg元数据导入到当前pg,导入前需要先删除当前pg(remove之前请先export备份一下pg数据)。需要remove当前pg,否则无法导入,提示已存在。
# 5.1备份
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 --op export --file ~/pg-10.0-node-1.obj
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 --op export --file ~/pg-10.0-node-2.obj
Read #10:03f6b1a4:::100000091a0.00000000:head#
Read #10:03f6dfc2:::10000010b31.00000000:head#
Read #10:03f913b2:::10000010740.00000000:head#
Read #10:03f99080:::10000010f0f.00000000:head#
Read #10:03fc19a4:::10000011c5e.00000000:head#
Read #10:03fe3b90:::10000010166.00000000:head#
Read #10:03fe60e1:::10000011c44.00000000:head#
...
Export successful
# 5.2删除
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --pgid 10.0 --op remove --force
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore --pgid 10.0 --op remove --force
marking collection for removal
setting '_remove' omap key
finish_remove_pgs 10.0_head removing 10.0
Remove successful
# 5.3导入
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore --pgid 10.0 --op import --file ~/pg-10.0.obj
[root@node-2 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore --pgid 10.0 --op import --file ~/pg-10.0.obj
Write #10:03f6dfc2:::10000010b31.00000000:head#
snapset 1=[]:{}
Write #10:03f913b2:::10000010740.00000000:head#
snapset 1=[]:{}
Write #10:03f99080:::10000010f0f.00000000:head#
snapset 1=[]:{}
....
write_pg epoch 6727 info 10.0( v 5925'23733 (5924'20700,5925'23733] local-lis/les=6726/6727 n=814 ec=5833/5833 lis/c 6726/6726 les/c/f 6727/6727/0 6726/6726/6724)
Import successful
# 6.检查,PG没有inconsistent状态,处于慢慢恢复中
[root@node-3 ~]# ceph -s
cluster:
id: 60e065f1-d992-4d1a-8f4e-f74419674f7e
health: HEALTH_WARN
Degraded data redundancy: 37305/161973 objects degraded (23.032%), 153 pgs degraded, 327 pgs undersized
services:
mon: 3 daemons, quorum node-1,node-2,node-3 (age 23m)
mgr: node-1(active, since 23m)
mds: cephfs:2 {0=mds1=up:active,1=mds2=up:active} 1 up:standby
osd: 4 osds: 4 up (since 6s), 4 in (since 16h)
rgw: 1 daemon active (node-2)
task status:
data:
pools: 9 pools, 464 pgs
objects: 53.99k objects, 802 MiB
usage: 16 GiB used, 24 GiB / 40 GiB avail
pgs: 37305/161973 objects degraded (23.032%)
182 active+undersized
145 active+undersized+degraded
129 active+clean
8 active+recovering+degraded
io:
recovery: 0 B/s, 2 objects/s
原文链接:https://blog.csdn.net/DeamonXiao/article/details/120879236




