暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

CRUSH MAP 错误

原创 Oracle 2022-12-05
382

CRUSH MAP 错误

PG 达不到 clean 状态的另一个可能的原因就是集群的 CRUSH Map 有错误,导致 PG 不能映射到正确的地方。

2.2 OSD down导致PG故障
最常见的PG故障都是由于某个或者多个OSD进程挂掉导致的。一般重启OSD后恢复健康。

可以通过ceph -s或者ceph osd stat检查是否有OSD down。

[root@node-1 ~]# ceph osd stat
4 osds: 4 up (since 4h), 4 in (since 6d); epoch: e6364

尝试停掉一个或多个OSD(3副本集群,总共4个OSD),观察集群状态。

# 停掉1个OSD,还剩3个OSD,出现active+undersized+degraded警告,说明集群还可以读写
[root@node-1 ~]# ceph health detail
HEALTH_WARN 1 osds down; Degraded data redundancy: 52306/162054 objects degraded (32.277%), 197 pgs degraded
OSD_DOWN 1 osds down
    osd.0 (root=default,host=node-1) is down
PG_DEGRADED Degraded data redundancy: 52306/162054 objects degraded (32.277%), 197 pgs degraded
    pg 1.1d is active+undersized+degraded, acting [2,1]
    pg 1.60 is active+undersized+degraded, acting [1,2]
    pg 1.62 is active+undersized+degraded, acting [2,1]
    ...
    
    
# 停掉2个OSD,还剩2个OSD,满足min_size=2,集群依然可以读写
[root@node-1 ~]# ceph health detail
HEALTH_WARN 2 osds down; 1 host (2 osds) down; Degraded data redundancy: 54018/162054 objects degraded (33.333%), 208 pgs degraded, 441 pgs undersized
OSD_DOWN 2 osds down
    osd.0 (root=default,host=node-1) is down
    osd.3 (root=default,host=node-1) is down
OSD_HOST_DOWN 1 host (2 osds) down
    host node-1 (root=default) (2 osds) is down
PG_DEGRADED Degraded data redundancy: 54018/162054 objects degraded (33.333%), 208 pgs degraded, 441 pgs undersized
    pg 1.29 is stuck undersized for 222.261023, current state active+undersized, last acting [2,1]
    pg 1.2a is stuck undersized for 222.251868, current state active+undersized, last acting [2,1]
    pg 1.2b is stuck undersized for 222.246564, current state active+undersized, last acting [2,1]
    pg 1.2c is stuck undersized for 221.679774, current state active+undersized+degraded, last acting [1,2]
    
    
# 停掉3个OSD,还剩1个OSD,不满足min_size=2,集群失去读写能力,出现undersized+degraded+peered警告
[root@node-2 ~]# ceph -s
  cluster:
    id:     60e065f1-d992-4d1a-8f4e-f74419674f7e
    health: HEALTH_WARN
            3 osds down
            2 hosts (3 osds) down
            Reduced data availability: 192 pgs inactive
            Degraded data redundancy: 107832/161748 objects degraded (66.667%), 208 pgs degraded
 
  services:
    mon: 3 daemons, quorum node-1,node-2,node-3 (age 5h)
    mgr: node-1(active, since 20h)
    mds: cephfs:2 {0=mds2=up:active,1=mds1=up:active} 1 up:standby
    osd: 4 osds: 1 up (since 47s), 4 in (since 6d)
    rgw: 1 daemon active (node-2)
 
  task status:
 
  data:
    pools:   9 pools, 464 pgs
    objects: 53.92k objects, 803 MiB
    usage:   16 GiB used, 24 GiB / 40 GiB avail
    pgs:     100.000% pgs not active
             107832/161748 objects degraded (66.667%)
             256 undersized+peered
             208 undersized+degraded+peered
  
# 停掉4个OSD,发现最后只剩一个OSD的时候,即便停止此OSD进程,但是通过ceph -s命令检查发现依然up
# 但是检查改进程,发现进程状态为dead
# PG状态为stale+undersized+peered,集群失去读写能力
[root@node-1 ~]# systemctl status ceph-osd@0
 ceph-osd@0.service - Ceph object storage daemon osd.0
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
   Active: inactive (dead) since  2021-10-14 15:36:14 CST; 1min 56s ago
  Process: 5528 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 5524 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 5528 (code=exited, status=0/SUCCESS)
。。。
[root@node-1 ~]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF 
-1       0.03918 root default                            
-3       0.01959     host node-1                         
 0   hdd 0.00980         osd.0       up  1.00000 1.00000 
 3   hdd 0.00980         osd.3     down  0.09999 1.00000 
-5       0.00980     host node-2                         
 1   hdd 0.00980         osd.1     down  1.00000 1.00000 
-7       0.00980     host node-3                         
 2   hdd 0.00980         osd.2     down  1.00000 1.00000 
 
 [root@node-1 ~]# ceph pg stat
464 pgs: 440 down, 14 stale+undersized+peered, 10 stale+undersized+degraded+peered; 801 MiB data, 12 GiB used, 24 GiB / 40 GiB avail; 3426/161460 objects degraded (2.122%)

重启所有停掉的OSD,集群会慢慢恢复健康。

# 所有OSD已经重启,PG在进行peering,但此时已经可以读写。
[root@node-1 ~]# ceph -s
  cluster:
    id:     60e065f1-d992-4d1a-8f4e-f74419674f7e
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 2 pgs peering
            Degraded data redundancy: 16715/162054 objects degraded (10.314%), 65 pgs degraded
 
  services:
    mon: 3 daemons, quorum node-1,node-2,node-3 (age 5h)
    mgr: node-1(active, since 20h)
    mds: cephfs:2 {0=mds2=up:active,1=mds1=up:active} 1 up:standby
    osd: 4 osds: 4 up (since 5s), 4 in (since 5s)
    rgw: 1 daemon active (node-2)
 
  task status:
 
  data:
    pools:   9 pools, 464 pgs
    objects: 54.02k objects, 803 MiB
    usage:   11 GiB used, 19 GiB / 30 GiB avail
    pgs:     65.302% pgs not active
             16715/162054 objects degraded (10.314%)
             294 peering
             75  active+undersized
             62  active+undersized+degraded
             21  active+clean
             9   remapped+peering
             2   active+recovery_wait+degraded
             1   active+recovering+degraded
             
# 过一段时间,再次检查ceph健康,发现正在rebalancing中,此时集群依然可以读写,并且PG状态为“active + clean” 
[root@node-1 ~]# ceph -s
  cluster:
    id:     60e065f1-d992-4d1a-8f4e-f74419674f7e
    health: HEALTH_OK
 ...
  progress:
    Rebalancing after osd.0 marked in
      [==............................]
 
[root@node-1 ~]# rados -p pool-1 put file_bench_cephfs.f file_bench_cephfs.f 
[root@node-1 ~]# rados -p pool-1 ls | grep file_bench
file_bench_cephfs.f

这里罗列一下集群不能读写的PG状态:

stale:OSD全挂
peered:OSD小于min_size
down:OSD节点数据太旧,并且其他在线的OSD不足以完成数据修复
stale和peered状态上文已经演示过,通过停止OSD服务达到。

down的一个经典场景:A(主)、B、C

 a. 首先kill B        
 b. 新写入数据到 A、C        
 c. kill A和C       
 d. 拉起B 

此时存活的B数据陈旧(不含新数据),而且集群中也没有其他OSD可以帮助其完成数据迁移,因此会显示down,参考链接:https://zhuanlan.zhihu.com/p/138778000#:~:text=3.8.3%20PG%E4%B8%BADown%E7%9A%84OSD%E4%B8%A2%E5%A4%B1%E6%88%96%E6%97%A0%E6%B3%95%E6%8B%89%E8%B5%B7

原文链接:https://blog.csdn.net/DeamonXiao/article/details/120879236

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论