TiDB-最小实践 Cluster111

边城元元 2022-03-21

296

作者：边城元元

原文来源：https://tidb.net/blog/af8080f7

TiDB-最小实践

最小拓扑生产级体验TiDB5.3.0 并升级到TiDBV5.4.0

一、说明

1.1 这篇文章出现的原因

自己在做调研、学习试用tidb的时候苦于环境的要求（本机不方便实现最小拓扑），再有后来社区里一些同学们发了有关最小部署的疑问的帖子，鉴于上面2点，就想写点什么记录下来。

官方要求最小拓扑：

| 实例 | 个数 | 物理机配置 | IP | 配置 | | :------------------- | :- | :-------------------------------- | :------------------------- | :---------- | | TiDB | 3 | 16 VCore 32GB * 1 | 10.0.1.1 10.0.1.2 10.0.1.3 | 默认端口全局目录配置 | | PD | 3 | 4 VCore 8GB * 1 | 10.0.1.4 10.0.1.5 10.0.1.6 | 默认端口全局目录配置 | | TiKV | 3 | 16 VCore 32GB 2TB (nvme ssd) * 1 | 10.0.1.7 10.0.1.8 10.0.1.9 | 默认端口全局目录配置 | | Monitoring & Grafana | 1 | 4 VCore 8GB * 1 500GB (ssd) | 10.0.1.10 | 默认端口全局目录配置 |

1.2 疑问&目标

最小拓扑111 是否可以完成学习体验
最小拓扑111 能不能抗住小并发正常的读写，用于小型的单库应用
最小拓扑111 扩容之后是否于正常的部署标准拓扑无异
目标：验证上面的问题，并学习tidb安装、扩容、升级、压测

注：“拓扑111” 指的是1个tidb-server+1个pd+1个tikv，本文中此词未做特别声明的都做此解释！

1.3 本次测试环境

```shell 1、主机win10 8G 4核

2、虚拟机centos7.3 4G内存 2核cpu
- 中控机(tiup安装) - 集群所在的机器 - 宿主机到虚拟机的端口转发（宿主机:虚拟机） 20022:22,4000:400,2379:2379,3000:3000

```

二、拓扑111

既然是最小实践就应该是拓扑111：1 tidb-server+1 pd+1 tikv

2.1 拓扑文件 cluster111.yml

```yaml

cluster111.yml

global: user: "tidb" ssh_port: 22 deploy_dir: "/tidb-deploy111" data_dir: "/tidb-data111"

server_configs: tidb: log.slow-threshold: 300 tikv: readpool.storage.use-unified-pool: false readpool.coprocessor.use-unified-pool: true pd: replication.max-replicas: 1

pd_servers: - host: 10.0.2.15 client_port: 2379 #注意与tidb-server,tikv的key不一样 peer_port: 2380 #注意与tidb-server,tikv的key不一样

tidb_servers: - host: 10.0.2.15 port: 4000 status_port: 10080

tikv_servers: - host: 10.0.2.15 port: 20160 status_port: 20180

```

三、部署&压测

3.1 准备环境(centos7.3)

centos7.3环境的安装这里不做赘述。

```shell

调大 sshd 服务的连接数限制

修改 /etc/ssh/sshd_config 将 MaxSessions 调至 30。

sed -i 's/#MaxSessions.*/MaxSessions 30/g' /etc/ssh/sshd_config

重启 sshd 服务

stystemctl restart sshd ```

3.2 单IP部署最小拓扑111 v5.3.0

3.2.1 安装 cluster111

```shell

安装tiup

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

```

```shell

1、加载tiup组件

source /root/.bash_profile

安装cluster组件

tiup cluster

更新tiup、更新cluster

tiup update --self && tiup update cluster

2、帮助查看

tiup help tiup cluster 或 tiup cluster -h

3、查看 TiUP 支持的最新可用版本

tiup list tidb

4、安装cluster111

tiup cluster deploy ./topo.yaml --user root -p

tiup cluster check ./cluster111.yml tiup cluster deploy cluster111 v5.3.0 ./cluster111.yml --user root -p

会提示输入密码

提示输入y/n

预估约10多分钟可以安装完毕

如果中间有终端可以重复执行 tiup cluster deploy cluster111 v5.3.0 ./cluster111.yml --user root -p

提示 “Cluster `cluster111` deployed successfully, you can start it with command: `tiup cluster start cluster111 --init`” 表示安装成功

5、通过命令查看集群

tiup cluster list

6、初始化集群

tiup cluster start cluster111 --init ```

注意：

使用tiup cluster start cluster111 --init 将给root用户生成随机密码

如果不加--init 将不生成随机密码

演示期间把密码修改为123456。ALTER USER 'root' IDENTIFIED BY '123456';

3.2.2 使用命令查看集群情况

```shell tiup cluster list tiup cluster display cluster111

```

如上图：显示cluster111 启动成功

1、pd 是一个节点，并且是Leader，而且有dashboard

2、tidb-server和tikv都各一个节点

3.2.3 通过dashboard 查看集群情况

1、dashboard的默认地址是http://127.0.0.1:2379/dashboard/

默认账号是root

默认密码是空（生产环境记得修改密码）

打开dashboard的默认也如下：

2、实例信息：

3、主机：

4、SQL语句分析

5、慢查询、流量可视化、日志搜索都正常。

3.2.4 tidb相关命令

1、集群shell命令

tiup cluster show-config cluster111 tiup cluster display cluster111

2、查看config命令

shell>tiup ctl:v5.3.0 pd -u http://127.0.0.1:2379 config show shell>tiup ctl:v5.3.0 pd -u http://127.0.0.1:2379 config show scheduler

默认指向的pd 为 -u http://127.0.0.1:2379 可以省略

确认当前的最大副本数 msyql>show config msyql>show config where name='replication.max-replicas';

3、ctl shell命令

#pd-ctl tiup ctl:v5.3.0 pd help tiup ctl:v5.3.0 pd health

tiup ctl:v5.3.0 pd store #region数据量，leader数量，perr数量 tiup ctl:v5.3.0 pd scheduler tiup ctl:v5.3.0 pd scheduler show tiup ctl:v5.3.0 pd cluster tiup ctl:v5.3.0 pd region -h

tiup ctl:v5.3.0 pd region check miss-peer tiup ctl:v5.3.0 pd region check miss-peer #缺副本的 Region tiup ctl:v5.3.0 pd region check extra-peer #多副本的 Region tiup ctl:v5.3.0 pd region check down-peer #有副本状态为 Down 的 Region tiup ctl:v5.3.0 pd region check pending-peer #有副本状态为 Pending 的 Region

#进入ptcl 交互式 tiup ctl:v5.3.0 pd -u http://127.0.0.1:2379 -i

#tidb -ctl tiup ctl:v5.3.0 tidb help tiup ctl:v5.3.0 tidb keyrange --database test --table ttt #查找范围

#tikv-ctl tiup ctl:v5.3.0 tikv help tiup ctl:v5.3.0 tikv --pd 127.0.0.1:2379 compact-cluster

7、安装组件

```mysql # shell # 安装组件 tiup ctl:v5.3.0 tiup bench

```

3.4 压测

使用 TiUP bench 组件压测 TiDB

TPC-C 评测数据库的联机交易处理（OLTP）能力

TPC-H 通过复杂SQL查询来评估数据库OLAP的性能

3.4.1 TPCC压测

tiup cluster display cluster111

说明：

WAREHOUSE 数据库中 ITEM 表中固定包含 10 万种商品，仓库的数量可进行调整，假设 WAREHOUSE 表中有 W 条记录，

那么：

STOCK 表中应有 W * 10 万条记录（每个仓库对应 10 万种商品的库存数据）

DISTRICT 表中应有 W * 10 条记录（每个仓库为 10 个地区提供服务）

CUSTOMER 表中应有 W * 10 * 3000 条记录（每个地区有 3000 个客户）

HISTORY 表中应有 W * 10 * 3000 条记录（每个客户一条交易历史）

ORDER 表中应有 W * 10 * 3000 条记录（每个地区 3000 个订单），并且最后生成的 900 个订单被添加到 NEW-ORDER 表中，每个订单随机生成 5 \~ 15 条 ORDER-LINE 记录。

可以粗鲁的估计1W的数据为10万。

1、1W压测

tiup bench tpcc -H 127.0.0.1 -p 123456 -P 4000 -D tpcc --warehouses 1 prepare

tiup bench tpcc -H 127.0.0.1 -p 123456 -P 4000 -D tpcc --warehouses 1 run

进行数据正确性验证。一共压测了10多个小时

shell [Summary] DELIVERY - Takes(s): 53449.3, Count: 14240, TPM: 16.0, Sum(ms): 8055695.8, Avg(ms): 560.7, 50th(ms): 469.8, 90th(ms): 872.4, 95th(ms): 1208.0, 99th(ms): 2080.4, 99.9th(ms): 4831.8, Max(ms): 4831.1 [Summary] NEW_ORDER - Takes(s): 53449.1, Count: 161134, TPM: 180.9, Sum(ms): 23554267.9, Avg(ms): 144.3, 50th(ms): 113.2, 90th(ms): 234.9, 95th(ms): 318.8, 99th(ms): 637.5, 99.9th(ms): 1677.7, Max(ms): 4831.1 [Summary] ORDER_STATUS - Takes(s): 53434.3, Count: 14315, TPM: 16.1, Sum(ms): 955242.2, Avg(ms): 66.8, 50th(ms): 50.3, 90th(ms): 121.6, 95th(ms): 167.8, 99th(ms): 335.5, 99.9th(ms): 738.2, Max(ms): 2415.9 [Summary] PAYMENT - Takes(s): 53448.7, Count: 154134, TPM: 173.0, Sum(ms): 17936546.0, Avg(ms): 97.2, 50th(ms): 75.5, 90th(ms): 159.4, 95th(ms): 218.1, 99th(ms): 436.2, 99.9th(ms): 1275.1, Max(ms): 4831.1 [Summary] PAYMENT_ERR - Takes(s): 53448.7, Count: 1, TPM: 0.0, Sum(ms): 282285.1, Avg(ms): 15837.7, 50th(ms): 16106.1, 90th(ms): 16106.1, 95th(ms): 16106.1, 99th(ms): 16106.1, 99.9th(ms): 16106.1, Max(ms): 4831.1 [Summary] STOCK_LEVEL - Takes(s): 53448.9, Count: 14572, TPM: 16.4, Sum(ms): 1332893.0, Avg(ms): 87.5, 50th(ms): 71.3, 90th(ms): 142.6, 95th(ms): 184.5, 99th(ms): 369.1, 99.9th(ms): 805.3, Max(ms): 4831.1 tpmC: 180.9, efficiency: 1406.6%

压测数据只供参考：TPCC 发压机器在虚拟机，压测期间做了其他的事项，影响测试结果。

2、10W压测（百万数据）

tiup bench tpcc -H 127.0.0.1 -p 123456 -P 4000 -D tpcc --warehouses 10 prepare

tiup bench tpcc -H 127.0.0.1 -p 123456 -P 4000 -D tpcc --warehouses 10 run

此篇文章不再进行10W的测试

3.4.2 TPCH压测

shell tiup bench tpch --db tpch -P4000 -Uroot -p123456 --sf=1 prepare tiup bench tpch --db tpch -P4000 -Uroot -p123456 --sf=1 --check=true run

虚拟机压测oom 没有进行下去（没有截图）

3.5 Cluster111总结

1）在当前的配置下，比较很定。

四、扩容 & 升级

4.1扩容监控节点

4.1.1 扩容监控的拓扑

```yaml

scale-out-cluster111-monitor.yml

monitoring_servers: - host: 10.0.2.15 port: 9090 # ng_port: 12020

grafana_servers: - host: 10.0.2.15 port: 3000

alertmanager_servers: - host: 10.0.2.15 web_port: 9093 cluster_port: 9094

```

4.1.2 步骤与过程

执行扩容命令

tiup cluster scale-out cluster111 ./cluster111-scale-out-monoitor.yml -uroot -p #提示输入ssh密码

提示：scaled cluster 'xxx' out successfully 说明扩容成功

4.1.3 结果验证

4.1.3.1 shell命令验证集群情况

shell tiup cluster show-config cluster111 tiup cluster display cluster111

4.1.3.2 dashboard验证

dashboard ->概况

界面有qps、duration的数据
其他界面没有变化

4.1.3.3 Grafana验证

http://127.0.0.1:3000

默认账号admin 默认密码admin

1、配置数据源：左侧选择configuration->Data Sources 出现后侧的可用数据源，双击cluster111数据源进入测试数据源界面->Save\&test 保存。

2、回到General /Home ，点击General 选择 cluster111-overview 打开

4.1.3 复盘

1、扩容的指令执行的过程比较顺利，10多分钟执行完毕（几十万条数据库）。

2、这里出现一个插曲，误用 tiup cluster check

本意是检测扩展yaml文件有没有错误。使用tiup cluster check 扩容监控拓扑，提示 9100端口已经存在集群里误导了我才有这个插曲。

1）查看cluster11集群配置显示如下：·tiup cluster show-config cluster111

2）使用 tiup cluster check ./cluster111-scale-out-monoitor.yml 提示信息如下图

shell Error: Deploy port conflicts to an existing cluster (spec.deploy.port_conflict)

3）再次确认扩容配置如图：

怀疑有奇怪？是bug？还是啥？

4）又试了试 tiup cluster check ./cluster111-scale-out-tikv2-1.yml 也提示同样的错误

5）还是需要请教官方文档查看官方文档：https://docs.pingcap.com/zh/tidb/stable/tiup-component-cluster-check

没有说check可以检错扩容配置的功能。

总结：

1）我这里是误用（用错了地方）也就是 tiup cluster check 用错了场景了；

2）恰好我要做的扩容监控节点和提示的信息有相关，加深了误导。

4.2扩容1个tikv节点

```yaml

cluster111-scale-out-tikv2-1.yml

tikv_servers: - host: 10.0.2.15 port: 20161 status_port: 20181

```

tiup cluster display cluster111

tiup cluster scale-out cluster111 ./cluster111-scale-out-tikv2-1.yml -uroot -p

注意：

开启监控 tiup cluster start cluster111 -R alertmanager,grafana,prometheus 观察扩容前后的 peer数量，region数量

```shell

扩容前

"region_count": 23,

tiup ctl:v5.3.0 pd store

扩容后

"region_count": 23, 新增的store上没有region

tiup ctl:v5.3.0 pd store ```

```shell

查看当前的默认副本数

show config where name='replication.max-replicas';

显示为1

设置副本数

set config '10.0.2.15:2379' replication.max-replicas=2;

再次查看副本数数

show config where name='replication.max-replicas';

显示为1

"region_count": 23, 新增的store上开始有region ，peer数为46个了（因为2个节点上的都一样）

tiup ctl:v5.3.0 pd store

```

4.3 扩容1个tikv节点（总tikv节点达到3个）

思考 tikv3副本的意义是什么？

提高业务可用性和数据可靠性（分布式raft协议），如果一个节点挂掉，业务正常运行。
3个节点的3副本本质上没有提高数据的横向扩展能力，如果需要提高数据库的横向能力需要增加更多的tikv节点。

扩容yaml

```yaml

cluster111-scale-out-tikv2-2.yml

tikv_servers: - host: 10.0.2.15 port: 20162 status_port: 20182

```

tiup cluster scale-out cluster111 ./cluster111-scale-out-tikv2-2.yml -uroot -p

#当前的peer为46

升级完成过一会，发现peer没有变化，store变成了3个。

```shell

查看当前的默认副本数

show config where name='replication.max-replicas';

显示为2

设置副本数

set config '10.0.2.15:2379' replication.max-replicas=3;

再次查看副本数数

show config where name='replication.max-replicas';

显示为3

"region_count": 23, 新增的store上开始有region ，peer数为69个了（因为3个节点上的都一样）

tiup ctl:v5.3.0 pd store

```

Peer数量变成了69个了，符合预期

说明一个问题：replication.max-replicas 控制着同一份数据的最大副本数量

tiup cluster display cluster111

如果在当前的环境下，是3个节点，把 replication.max-replicas修改为1，peer数会减少吗？

说干就干：

```

设置副本数

set config '10.0.2.15:2379' replication.max-replicas=1; show config where name='replication.max-replicas'; tiup ctl:v5.3.0 pd store

结果说明 peer数是往下掉的，没有掉到预期的23，而且是3个store上的都有掉的副本。（最终会掉到23个）

```

4.4升级到最新版（v5.4.0）

4.4.1 升级tiup组件和cluster组件

tiup update --self && tiup update cluster

4.4.2 编辑 TiUP Cluster 拓扑配置文件

去掉不兼容的配置

```shell

tiup cluster edit-config

tiup cluster edit-config cluster111

修改完成后 :wq 保存并退出编辑模式，输入 Y 确认变更。

```

4.4.3 检查当前集群的健康状况

tiup cluster check cluster111 --cluster

4.4.4升级集群到v5.4.0

tiup cluster upgrade cluster111 v5.4.0

升级完之后验证：tiup cluster display cluster111

4.4.5 升级完成后，如何更新 pd-ctl 等周边工具版本

tiup install ctl:v5.4.0

br工具升级

https://download.pingcap.org/tidb-toolkit-v5.4.0-linux-amd64.tar.gz

解压到 /usr/local0/webserver/tidb/tidb-toolkit-v5.4.0

后期可以以/usr/local0/webserver/tidb/tidb-toolkit/作为tidb-toolkit的根目录（兼容性高），对于升级方便很多（不需要修改脚本中有路径的）但是不兼容的错误不易于排查，暂时用到的地方做好记录吧。

4.5 缩容到 cluster111

```shell

设置副本数为1

set config '10.0.2.15:2379' replication.max-replicas=1; show config where name='replication.max-replicas'; ```

缩容2个tikv，监控

```shell

查看集群情况

tiup cluster display cluster111

缩容

tiup cluster scale-in cluster111 --node 10.0.2.15:9093,10.0.2.15:3000,10.0.2.15:9090,10.0.2.15:20161,10.0.2.15:20162

过一段时间查看集群状态

tiup cluster display cluster111

```

五、备份与恢复

5.1 安装br工具

#安装 br 下载对应的版本

https://download.pingcap.org/tidb-toolkit-v5.3.0-linux-amd64.tar.gz

解压到 /usr/local0/webserver/tidb/tidb-toolkit-v5.3.0

5.2 目录规划

#全备目录 /tidb-bak111/brdata/brbak/brbak{yyyy-MM-dd}/

#增量目录

/tidb-bak111/brdata/brbakincr/brbakincr{yyyy-MM-dd-HHmm}/

/tidb-bak111/brdata/brbaklog/log-{yyyy-MM}

5.3 br备份前置条件

备份前需确认已将 GC 时间调长，确保备份期间不会因为数据丢失导致中断
-- 查看 GC可恢复的时间点 SELECT * FROM mysql.tidb WHERE variable_name = 'tikv_gc_safe_point';
-- 查看gc 的保留时间【变量的方式修改是 tidb_gc_life_time】 SHOW GLOBAL VARIABLES LIKE 'tidb_gc_life_time';
SET GLOBAL tidb_gc_life_time = '60m';
备份前需确认 TiDB 集群没有执行 DDL 操作
要保证备份到的目录有读写权限。测试权限：su tidb 看一下stat的权限。
需要保证备份目录可读可写
需要保证备份目录的上级目录可读可写

5.4 示例

5.4.1 示例1：全备test库(1W压测后 200万的数据量)

time /usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br backup db --db tpcc-s local:///tidb-bak111/brdata/brbak/brbak2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/log-2022-03.log --concurrency 16

```shell Detail BR log in /tidb-bak111/brdata/brbaklog/log-2022-03.log Database backup <------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Checksum <-------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/03/18 03:05:55.571 +00:00] [INFO] [collector.go:65] ["Database backup success summary"] [total-ranges=30] [ranges-succeed=30] [ranges-failed=0] [backup-checksum=7.400938637s] [backup-fast-checksum=12.817097ms] [backup-total-ranges=19] [backup-total-regions=26] [total-take=43.891232093s] [backup-data-size(after-compressed)=142.3MB] [Size=142251394] [BackupTS=431901300942438402] [total-kv=5351334] [total-kv-size=452.6MB] [average-speed=10.31MB/s]

real 0m44.707s user 0m1.182s sys 0m1.332s ```

5.4.2 示例2：增备tpcc库

获取 LAST_BACKUP_TS

/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br validate decode --field="end-version" -s local:///tidb-bak111/brdata/brbak/brbak2022-03-17/ | tail -n1

shell Detail BR log in /tmp/br.log.2022-03-18T03.09.30Z 431901300942438402

执行增量备份

在全量备份后新增加了user表。

/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br backup db --db tpcc -s local:///tidb-bak111/brdata/brbakincr/brbakincr2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/log-2022-03.log --lastbackupts 431901300942438402 --concurrency 16

shell [root@localhost brdata]# cd brbakincr/brbakincr2022-03-17/ [root@localhost brbakincr2022-03-17]# ls 1_46_49_8f0ee22537c2f0e435fff0cbe11d0770af1ef06426d6c562d98e11dab944f2be_1647573256417_write.sst backup.lock backupmeta [root@localhost brbakincr2022-03-17]# ls -lhsrt total 64K 4.0K -rw-r--r--. 1 root root 78 Mar 18 03:14 backup.lock 4.0K -rw-r--r--. 1 tidb tidb 1.6K Mar 18 03:14 1_46_49_8f0ee22537c2f0e435fff0cbe11d0770af1ef06426d6c562d98e11dab944f2be_1647573256417_write.sst 56K -rw-r--r--. 1 root root 53K Mar 18 03:14 backupmeta

5.5 恢复演示

恢复操作前，需确认待恢复的 TiKV 集群是全新的集群

1、删除tpcc库

2、使用全量备份恢复

/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br restore db --db tpcc -s local:///tidb-bak111/brdata/brbak/brbak2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/restore_local20220317.log

sh [root@localhost brbakincr2022-03-17]# /usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br restore db --db tpcc -s local:///tidb-bak111/brdata/brbak/brbak2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/restore_local20220317.log Detail BR log in /tidb-bak111/brdata/brbaklog/restore_local20220317.log Database restore <-----------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/03/18 03:22:30.258 +00:00] [INFO] [collector.go:65] ["Database restore success summary"] [total-ranges=39] [ranges-succeed=39] [ranges-failed=0] [split-region=3.082383573s] [restore-checksum=44.490484521s] [restore-ranges=26] [total-take=45.096075962s] [restore-data-size(after-compressed)=142.3MB] [Size=142251394] [BackupTS=431901561486049282] [total-kv=5351334] [total-kv-size=452.6MB] [average-speed=10.04MB/s]

3、查看数据库是否已经恢复了tpcc库。

已经恢复了但是没有user表。

4、再用增量恢复

/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br restore db --db tpcc -s local:///tidb-bak111/brdata/brbakincr/brbakincr2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/restore_local20220317.log

shell [root@localhost brbakincr2022-03-17]# /usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin/br restore db --db tpcc -s local:///tidb-bak111/brdata/brbakincr/brbakincr2022-03-17/ --pd 127.0.0.1:2379 --log-file /tidb-bak111/brdata/brbaklog/restore_local20220317.log Detail BR log in /tidb-bak111/brdata/brbaklog/restore_local20220317.log Database restore <-----------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/03/18 03:25:51.248 +00:00] [INFO] [collector.go:65] ["Database restore success summary"] [total-ranges=11] [ranges-succeed=11] [ranges-failed=0] [restore-checksum=1.133574ms] [split-region=682.193616ms] [restore-ranges=1] [total-take=2.699661713s] [restore-data-size(after-compressed)=1.558kB] [Size=1558] [BackupTS=431901625370542081] [total-kv=3] [total-kv-size=87B] [average-speed=32.23B/s] [root@localhost brbakincr2022-03-17]#

再查查看数据库表，已经有了user表

5.6 完成备份和恢复脚本

5.6.1 自动备份脚本

```bash

! /bin/sh

!/bin/bash#!/bin/bash

ticde-backup.sh

# 1)手动增加bakall 是全备

ticde-backup.sh bakall

# 2)周日是全备

ticde-backup.sh

# 3)非周日是增备

ticde-backup.sh

使用cron定时每天 01:00:00 执行一次

0 1 * * * ticde-backup.sh

const 【不要修改】

DateFull=date -d today +"%Y-%m-%d %H:%M:%S" DateMin=date -d today +"%Y-%m-%d" DateIncr=date -d today +"%Y-%m-%d-%H%M" DateLog=date -d today +"%Y-%m" CurWeek=date +%u BakFullFlag=0

BakRoot=/tidb-bak111/brdata BakRootData=${BakRoot}/brbak/brbak${DateMin} BakRootDataIncr=${BakRoot}/brbakincr/brbakincr${DateIncr} BakRootLog=${BakRoot}/brbaklog/log-${DateLog}.log

强制全备标记 bakall

CustormArg1=$1 LAST_BACKUP_TS=0

config 【可以修改】

DBName=test BrBin=/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin Pd=127.0.0.1:2379 Concurrency=16 Ratelimit=256

逻辑判断

if [ "$1" = "bakall" ];then # 临时全备防止目录冲突 BakRootData=${BakRootData}-date -d today +"%H%M%S" BakFullFlag=1 fi

if [ ${CurWeek} -eq 7 ];then BakFullFlag=1 fi

if [ ${BakFullFlag} -ne 1 ];then #找到上一次的路径 LAST_BACKUP_DIR=${BakRoot}/brbak/brbak$(date -d "-${CurWeek} days" +%Y-%m-%d) if [ ! -d "${LAST_BACKUP_DIR}" ];then echo '上次全备（上周日的全备目录 ${LAST_BACKUP_DIR} ）不存在！' exit 0 fi LAST_BACKUP_TS=${BrBin}/br validate decode --field="end-version" -s local://${LAST_BACKUP_DIR} | tail -n1 if [ ${LAST_BACKUP_TS} -lt 100 ];then echo '上次全备（上周日的全备目录 ${LAST_BACKUP_DIR} ）获取 LAST_BACKUP_TS 失败！' exit 0 fi

# 执行增量备份
mkdir -p ${BakRootDataIncr}/
chmod 777 ${BakRootDataIncr}/

echo "增量备份到：${BakRootDataIncr}/"

${BrBin}/br backup db \
--db ${DBName} \
-s local://${BakRootDataIncr}/ \
--pd ${Pd} \
--log-file ${BakRootLog} \
--lastbackupts ${LAST_BACKUP_TS} \
--ratelimit ${Ratelimit} \
--concurrency ${Concurrency}

else

# 执行全量备份
mkdir -p ${BakRootData}/
chmod 777 ${BakRootData}/

echo "全量备份到：${BakRootData}/"

${BrBin}/br backup db \
--db ${DBName} \
-s local://${BakRootData}/ \
--pd ${Pd} \
--log-file ${BakRootLog} \
--ratelimit ${Ratelimit} \
--concurrency ${Concurrency}

echo '--- end --- '

```

``shell [root@localhost tidb]# sh ticde-backup.sh bakall 全量备份到：/tidb-bak111/brdata/brbak/brbak2022-03-18-065642/ Detail BR log in /tidb-bak111/brdata/brbaklog/log-2022-03.log [2022/03/18 06:56:43.559 +00:00] [WARN] [backup.go:203] ["setting--ratelimitand--concurrencyat the same time, ignoring--concurrency:--ratelimit` forces sequential (i.e. concurrency = 1) backup"] [ratelimit=268.4MB/s] [concurrency-specified=16] Database backup <------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Checksum <-------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/03/18 06:56:47.121 +00:00] [INFO] [collector.go:65] ["Database backup success summary"] [total-ranges=7] [ranges-succeed=7] [ranges-failed=0] [backup-checksum=451.606693ms] [backup-fast-checksum=8.038543ms] [backup-total-ranges=6] [backup-total-regions=6] [total-take=3.565844245s] [BackupTS=431904942621720580] [total-kv=90023] [total-kv-size=23.28MB] [average-speed=6.529MB/s] [backup-data-size(after-compressed)=13.52MB] [Size=13522252] --- end ---

[root@localhost tidb]# sh ticde-backup.sh Detail BR log in /tmp/br.log.2022-03-18T06.57.02Z 增量备份到：/tidb-bak111/brdata/brbakincr/brbakincr2022-03-18-0657/ Detail BR log in /tidb-bak111/brdata/brbaklog/log-2022-03.log [2022/03/18 06:57:02.968 +00:00] [WARN] [backup.go:203] ["setting --ratelimit and --concurrency at the same time, ignoring --concurrency: --ratelimit forces sequential (i.e. concurrency = 1) backup"] [ratelimit=268.4MB/s] [concurrency-specified=16] Database backup <------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Checksum <-------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/03/18 06:57:03.232 +00:00] [INFO] [collector.go:65] ["Database backup success summary"] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0] [backup-checksum=6.562636ms] [backup-total-ranges=6] [backup-total-regions=6] [total-take=267.386045ms] [Result="Nothing to bakcup"] [Size=0] [BackupTS=431904947706003457] --- end ---

```

5.6.2 手动恢复脚本

```shell

!/bin/bash

ticde-backup.sh

# 1)手动需要传递 back的目录

ticde-restore.sh /tidb-bak111/brdata/brbak/brbak2022-03-17/

先恢复全备，再恢复增备

const 【不要修改】

DateFull=date -d today +"%Y-%m-%d %H:%M:%S" DateMin=date -d today +"%Y-%m-%d" DateLog=date -d today +"%Y-%m"

BakRoot=/tidb-bak111/brdata BakRootLog=${BakRoot}/brbaklog/restore-log-${DateLog}.log

原备份的目录

BakDir=$1

config 【可以修改】

DBName=test BrBin=/usr/local0/webserver/tidb/tidb-toolkit-v5.3.0/bin Pd=127.0.0.1:2379 Concurrency=16 Ratelimit=256

逻辑判断

if [ -z "${BakDir}" ] || [ ! -d "${BakDir}" ] ;then echo "请输入备份所在的目录" exit 0 fi

echo "从备份 ${BakDir}/ 恢复到数据库:${DBName}"

${BrBin}/br restore db \
--db ${DBName} \
-s local://${BakDir}/ \
--pd ${Pd} \
--log-file ${BakRootLog} \
--ratelimit ${Ratelimit} \
--concurrency ${Concurrency}

echo '--- end --- ' ```

bash [root@localhost tidb]# sh ticde-restore.sh 请输入备份所在的目录 [root@localhost tidb]# sh ticde-restore.sh /tidb-bak111/brdata/brbak/brbak2022-03-17/ 从备份 /tidb-bak111/brdata/brbak/brbak2022-03-17// 恢复到数据库:test Detail BR log in /tidb-bak111/brdata/brbaklog/restore-log-2022-03.log

六、思考与总结

6.1 思考

1、cluster111 适合什么样的应用？

1）通过tpcc 1w 10万的数据量的压测 qps保持在 120左右无异常监控正常。

2）本次测试使用硬件可以说小型的企业站，对数据库操作不频繁的应用够用。

2、如果生产环境使用cluster111单IP多端口的（1个tidb-server，1个pd，1个tikv）需要注意什么？

1）按照官网的要求升级机器配置

2）根据实际业务场景做好压测

3）做好定时br备份

3、什么时间点扩容到完整的cluster333（3个tidb-server，3个pd，3个tikv）或cluster3332（3个tidb-server，3个pd，3个tikv，2个tiflash）

1）如果cluster111 正常使用无异常，可以使用不一定要升级到完整集群

2）数据量超过10G以上（br备份的优势场景）

3）数据库读写增多遇到瓶颈（可以看dashboard上的sql分析）配置调忧解决不了问题

4）有大量数据分析场景引入tiflash

5）扩容需要做好规划前期可以考虑tikv节点独立部署

4、疑问：cluster111（1tidb-server,1pd，1tikv 分别部署不同的机器）是否一定比 cluster333（3个tidb-server，3个pd，3个tikv 最小官方部署）性能高？

我感觉是肯定的，有待看源码或后期验证吧！

6.2 总结

1、tidb对于使用者非常很友

2、最小拓扑cluster111 用于生产需要做好数据安全：

做好定时全量备份+增量备份，可以在非核心业务上使用（使用前需要真机压测）。
数据库账号密码安全
dashboard和监控的默认账号密码安全

七、收集社区中关于最小部署的帖子，供大家学习

一些关于最小拓扑和单机部署的问题

1、https://asktug.com/t/topic/574670/2#单机要不要使用 TiDB 的讨论，欢迎TiDBer 来聊聊 2、https://asktug.com/t/topic/68315#单机版集群用于生产环境 3、https://asktug.com/t/topic/67838#最小拓扑的扩展 4、https://asktug.com/t/topic/183459/2#拓扑信息中节点数量与最小部署可用性咨询 5、https://docs.pingcap.com/zh/tidb/stable/quick-start-with-tidb#在单机上模拟部署生产环境集群 6、https://asktug.com/t/topic/575030#官方最小拓扑中的monitoring_servers、grafana_servers、alertmanager_servers可以不要吗？

谢谢！

tidb

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

TiDB-最小实践 Cluster111

TiDB-最小实践

一、说明

1.1 这篇文章出现的原因

1.2 疑问&目标

1.3 本次测试环境

二、拓扑111

2.1 拓扑文件 cluster111.yml

cluster111.yml

三、部署&压测

3.1 准备环境(centos7.3)

调大 sshd 服务的连接数限制

修改 /etc/ssh/sshd_config 将 MaxSessions 调至 30。

重启 sshd 服务

3.2 单IP部署最小拓扑111 v5.3.0

3.2.1 安装 cluster111

安装tiup

1、加载tiup组件

安装cluster组件

更新tiup、更新cluster

2、帮助查看

3、 查看 TiUP 支持的最新可用版本

4、安装cluster111

tiup cluster deploy ./topo.yaml --user root -p

会提示输入密码

提示输入y/n

预估约10多分钟可以安装完毕

如果中间有终端可以重复执行 tiup cluster deploy cluster111 v5.3.0 ./cluster111.yml --user root -p

提示 “Cluster cluster111 deployed successfully, you can start it with command: tiup cluster start cluster111 --init” 表示安装成功

5、通过命令查看集群

6、初始化集群

3.2.2 使用命令查看集群情况

3.2.3 通过dashboard 查看集群情况

3.2.4 tidb相关命令

3.4 压测

3.4.1 TPCC压测

3.4.2 TPCH压测

3.5 Cluster111总结

四、扩容 & 升级

4.1扩容监控节点

4.1.1 扩容监控的拓扑

scale-out-cluster111-monitor.yml

4.1.2 步骤与过程

4.1.3 结果验证

4.1.3.1 shell命令验证集群情况

4.1.3.2 dashboard验证

4.1.3.3 Grafana验证

4.1.3 复盘

4.2扩容1个tikv节点

cluster111-scale-out-tikv2-1.yml

扩容前

"region_count": 23,

扩容后

"region_count": 23, 新增的store上没有region

查看当前的默认副本数

显示为1

设置副本数

再次查看副本数数

显示为1

"region_count": 23, 新增的store上开始有region ，peer数为46个了（因为2个节点上的都一样）

4.3 扩容1个tikv节点（总tikv节点达到3个）

cluster111-scale-out-tikv2-2.yml

查看当前的默认副本数

显示为2

设置副本数

再次查看副本数数

显示为3

"region_count": 23, 新增的store上开始有region ，peer数为69个了（因为3个节点上的都一样）

设置副本数

结果说明 peer数是往下掉的，没有掉到预期的23，而且是3个store上的都有掉的副本。（最终会掉到23个）

4.4升级到最新版（v5.4.0）

4.4.1 升级tiup组件和cluster组件

4.4.2 编辑 TiUP Cluster 拓扑配置文件

tiup cluster edit-config

修改完成后 :wq 保存并退出编辑模式，输入 Y 确认变更。

4.4.3 检查当前集群的健康状况

4.4.4升级集群到v5.4.0

4.4.5 升级完成后，如何更新 pd-ctl 等周边工具版本

4.5 缩容到 cluster111

设置副本数 为1

3、查看 TiUP 支持的最新可用版本

提示 “Cluster `cluster111` deployed successfully, you can start it with command: `tiup cluster start cluster111 --init`” 表示安装成功

设置副本数为1

# 2)周日是全备

# 3)非周日是增备

使用cron定时每天 01:00:00 执行一次

强制全备标记 bakall

# 1)手动需要传递 back的目录