一、了解tikv各状态
Up :表示当前的 Store 处于提供服务的状态。
Disconnect:当 PD 和 TiKV 的心跳信息丢失超过 20s 后,该 TiKV 的状态会变为 Disconnect 状态,当时间超过 max-store-down-time 定义的时间后,该 TiKV 会变为 Down。
Down:表示该 TiKV 与集群失去链接的时间已经超过了 max-store-down-time 定义的时间,默认 30 分钟,超过该时间后,相应的 TiKV 会变为 Down,并且开始在存活的 TiKV 上补足各个 Region 的副本。
Offline:当对某个 TiKV 缩容后,该 TiKV 会变为 Offline 状态,该状态只是 TiKV 下线的中间状态,处于该状态的 TiKV 会进行 leader 的 transfter 和 region balance ,当 leader_count/region_count (pd-ctl 获取) 均显示 transfter 或 balance 完毕后,该 TiKV 会由 Offline —> Tombstone。在 Offline 状态时,禁止关闭该 TiKV 服务以及其所在的物理服务器。
Tombstone:表示该 TiKV 已处于完全下线状态。
二、Tombstone状态处理
tidb数据库在进行节点缩容后,通过tiup命令查看tidb集群状态发现,下线tikv节点有由up状态,变为Tombstone。
[tidb@ip-172-16-160-20 ~]$ tiup cluster display gsc-test
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster display gsc-test
Cluster type: tidb
Cluster name: gsc-test
Cluster version: v5.1.2
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.16.160.21:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
172.16.160.20:9093 alertmanager 172.16.160.20 9093/9094 linux/x86_64 Up /tidb-data1/tidb-data/alertmanager-9093 /tidb-data1/tidb-deploy/alertmanager-9093
172.16.160.20:3000 grafana 172.16.160.20 3000 linux/x86_64 Up - /tidb-data1/grafana-3000
172.16.160.20:2379 pd 172.16.160.20 2379/2380 linux/x86_64 Up /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.21:2379 pd 172.16.160.21 2379/2380 linux/x86_64 Up|UI /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.22:2379 pd 172.16.160.22 2379/2380 linux/x86_64 Up|L /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.20:9090 prometheus 172.16.160.20 9090 linux/x86_64 Up /tidb-data1/tidb-data/prometheus-8249 /tidb-data1/tidb-deploy/prometheus-8249
172.16.160.20:4000 tidb (patched) 172.16.160.20 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.21:4000 tidb (patched) 172.16.160.21 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.22:4000 tidb (patched) 172.16.160.22 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.20:20160 tikv (patched) 172.16.160.20 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
172.16.160.20:20162 tikv (patched) 172.16.160.20 20162/20182 linux/x86_64 Tombstone /tidb-data1/tidb-data/tikv-20162 /tidb-data1/tidb-deploy/tikv-20162
172.16.160.21:20160 tikv (patched) 172.16.160.21 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
172.16.160.21:20162 tikv (patched) 172.16.160.21 20162/20182 linux/x86_64 Tombstone /tidb-data1/tidb-data/tikv-20162 /tidb-data1/tidb-deploy/tikv-20162
172.16.160.22:20160 tikv (patched) 172.16.160.22 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
172.16.160.22:20162 tikv (patched) 172.16.160.22 20162/20182 linux/x86_64 Tombstone /tidb-data1/tidb-data/tikv-20162 /tidb-data1/tidb-deploy/tikv-20162
Total nodes: 15
There are some nodes can be pruned:
Nodes: [172.16.160.20:20162 172.16.160.21:20162 172.16.160.22:20162]
You can destroy them with the command: `tiup cluster prune gsc-test`
备注:在输出结尾可以看到,提示通过tiup cluster prune gsc-test命令去消除Tombstone状态。此tidb集群为v5.1.2,如果是低版本tidb数据库,可能需要使用remove-tombstone 接口安全的清理。
[tidb@ip-172-16-160-20 ~]$ tiup cluster prune gsc-test
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster prune gsc-test
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/gsc-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/gsc-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.21
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.21
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.21
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.22
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.21
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.22
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.22
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.22
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [Parallel] - UserSSH: user=tidb, host=172.16.160.20
+ [ Serial ] - FindTomestoneNodes
Will destroy these nodes: [172.16.160.20:20162 172.16.160.21:20162 172.16.160.22:20162]
Do you confirm this action? [y/N]:(default=N) y
Start destroy Tombstone nodes: [172.16.160.20:20162 172.16.160.21:20162 172.16.160.22:20162] ...
+ [ Serial ] - ClusterOperate: operation=DestroyTombstoneOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[] ShowUptime:false JSON:false Operation:StartOperation}
Stopping component tikv
Stopping instance 172.16.160.20
Stop tikv 172.16.160.20:20162 success
Destroying component tikv
Destroying instance 172.16.160.20
Destroy 172.16.160.20 success
- Destroy tikv paths: [/tidb-data1/tidb-data/tikv-20162 /tidb-data1/tidb-deploy/tikv-20162/log /tidb-data1/tidb-deploy/tikv-20162 /etc/systemd/system/tikv-20162.service]
Stopping component tikv
Stopping instance 172.16.160.21
Stop tikv 172.16.160.21:20162 success
Destroying component tikv
Destroying instance 172.16.160.21
Destroy 172.16.160.21 success
- Destroy tikv paths: [/tidb-data1/tidb-deploy/tikv-20162/tikv-20162/log /tidb-data1/tidb-deploy/tikv-20162 /etc/systemd/system/tikv-20162.service /tidb-data1/tidb-data/tikv-20162]
Stopping component tikv
Stopping instance 172.16.160.22
Stop tikv 172.16.160.22:20162 success
Destroying component tikv
Destroying instance 172.16.160.22
Destroy 172.16.160.22 success
- Destroy tikv paths: [/tidb-data1/tidb-data/tikv-20162 /tidb-data1/tidb-deploy/tikv-20162/log /tidb-data1/tidb-deploy/tikv-20162 /etc/systemd/system/tikv-20162.service]
+ [ Serial ] - UpdateMeta: cluster=gsc-test, deleted=`'172.16.160.20:20162','172.16.160.21:20162','172.16.160.22:20162'`
+ [ Serial ] - UpdateTopology: cluster=gsc-test
+ Refresh instance configs
- Regenerate config pd -> 172.16.160.20:2379 ... Done
- Regenerate config pd -> 172.16.160.21:2379 ... Done
- Regenerate config pd -> 172.16.160.22:2379 ... Done
- Regenerate config tikv -> 172.16.160.20:20160 ... Done
- Regenerate config tikv -> 172.16.160.21:20160 ... Done
- Regenerate config tikv -> 172.16.160.22:20160 ... Done
- Regenerate config tidb -> 172.16.160.20:4000 ... Done
- Regenerate config tidb -> 172.16.160.21:4000 ... Done
- Regenerate config tidb -> 172.16.160.22:4000 ... Done
- Regenerate config prometheus -> 172.16.160.20:9090 ... Done
- Regenerate config grafana -> 172.16.160.20:3000 ... Done
- Regenerate config alertmanager -> 172.16.160.20:9093 ... Done
+ [ Serial ] - SystemCtl: host=172.16.160.20 action=reload prometheus-9090.service
Destroy success
至此清理完成,再次执行tiup查看tidb集群状态,Tombstone已消除。
[tidb@ip-172.16-160-20 ~]$ tiup cluster display gsc-test
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster display gsc-test
Cluster type: tidb
Cluster name: gsc-test
Cluster version: v5.1.2
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.16.160.21:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
172.16.160.20:9093 alertmanager 172.16.160.20 9093/9094 linux/x86_64 Up /tidb-data1/tidb-data/alertmanager-9093 /tidb-data1/tidb-deploy/alertmanager-9093
172.16.160.20:3000 grafana 172.16.160.20 3000 linux/x86_64 Up - /tidb-data1/grafana-3000
172.16.160.20:2379 pd 172.16.160.20 2379/2380 linux/x86_64 Up /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.21:2379 pd 172.16.160.21 2379/2380 linux/x86_64 Up|UI /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.22:2379 pd 172.16.160.22 2379/2380 linux/x86_64 Up|L /tidb-data1/pd/tidb-data/pd-2379/data.pd /tidb-data1/pd/tidb-deploy/pd-2379
172.16.160.20:9090 prometheus 172.16.160.20 9090 linux/x86_64 Up /tidb-data1/tidb-data/prometheus-8249 /tidb-data1/tidb-deploy/prometheus-8249
172.16.160.20:4000 tidb (patched) 172.16.160.20 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.21:4000 tidb (patched) 172.16.160.21 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.22:4000 tidb (patched) 172.16.160.22 4000/10080 linux/x86_64 Up - /tidb-data1/tidb-deploy/tidb-4000
172.16.160.20:20160 tikv (patched) 172.16.160.20 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
172.16.160.21:20160 tikv (patched) 172.16.160.21 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
172.16.160.22:20160 tikv (patched) 172.16.160.22 20160/20180 linux/x86_64 Up /tidb-data2/tidb-data/tikv-20160 /tidb-data2/tidb-deploy/tikv-20160
Total nodes: 12




