一、使用TiDB ansible进行部署
1.安装系统依赖包
# yum -y install epel-release git curl sshpass && yum -y install python2-pip
2. 创建 tidb 用户
# useradd -m -d /home/tidb tidb
# passwd tidb
3.配置 tidb 用户 sudo 免密码
# visudo
tidb ALL=(ALL) NOPASSWD: ALL
4.tidb用户生成 SSH key
# su - tidb
$ ssh-keygen -t rsa
5.下载 TiDB Ansible
$ git clone -b v3.0.20 https://github.com/pingcap/tidb-ansible.git
6.安装TiDB Ansible及其依赖
$ cd /home/tidb/tidb-ansible
$ sudo pip install -r ./requirements.txt
7.配置部署机器 SSH 互信及 sudo 规则
$ cd /home/tidb/tidb-ansible
$ vi hosts.ini
[servers]
192.168.40.62
[all:vars]
username = tidb
ntp_server = pool.ntp.org
8.部署到目标机器的 root 用户密码,现在目标机器只是本机
$ ansible-playbook -i hosts.ini create_users.yml -u root -k
9.部署目标机器上安装 NTP 服务
$ cd /home/tidb/tidb-ansible
$ ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b
10.部署目标机器上配置 CPUfreq 调节器模式 --虚拟机不支持,跳过设置
$ cpupower frequency-info --governors
$ cpupower frequency-set --governor performance
$ ansible -i hosts.ini all -m shell -a "cpupower frequency-set --governor performance" -u tidb -b
11.在部署目标机器上添加数据盘 ext4 文件系统挂载参数
已经手工挂载好了,可以只进行检查
$ ansible -i hosts.ini all -m shell -a "mount -t ext4" -u tidb -b
12.调整inventory.ini 文件中的变量
$ vi inventory.ini
## TiDB Cluster Part
[tidb_servers]
192.168.40.62
[tikv_servers]
192.168.40.62
[pd_servers]
192.168.40.62
[spark_master]
[spark_slaves]
[lightning_server]
[importer_server]
## Monitoring Part
# prometheus and pushgateway servers
[monitoring_servers]
192.168.40.62
[grafana_servers]
192.168.40.62
# node_exporter and blackbox_exporter servers
[monitored_servers]
192.168.40.62
[alertmanager_servers]
192.168.40.62
[kafka_exporter_servers]
##Binlog Part
[pump_servers]
[drainer_servers]
## Group variables
[pd_servers:vars]
# location_labels = ["zone","rack","host"]
## Global variables
[all:vars]
deploy_dir = /u01/deploy
## Connection
#ssh via normal user
ansible_user = tidb
cluster_name = test-cluster
tidb_version = v3.0.20
13.单机部署TiDB 集群并启动
$ ansible -i inventory.ini all -m shell -a 'whoami' --验证互信
$ ansible -i inventory.ini all -m shell -a 'whoami' -b --验证sudo
$ ansible-playbook local_prepare.yml
$ ansible-playbook bootstrap.yml
$ ansible-playbook deploy.yml
$ ansible-playbook start.yml
14.连接TiDB测试,使用mysql客户端
$ mysql -u root -h192.168.40.62 -P4000
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 663
Server version: 5.7.25-TiDB-v3.0.20 MySQL Community Server (Apache License 2.0)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| mysql |
| test |
+--------------------+
4 rows in set (0.00 sec)
MySQL [(none)]>
15.连接PD进行查看
$ resources/bin/pd-ctl -u http://192.168.40.62:2379 -i
» stores show;
{
"count": 1,
"stores": [
{
"store": {
"id": 1,
"address": "192.168.40.62:20160",
"version": "3.0.20",
"state_name": "Up"
},
"status": {
"capacity": "59.91GiB",
"available": "58.26GiB",
"leader_count": 21,
"leader_weight": 1,
"leader_score": 21,
"leader_size": 21,
"region_count": 21,
"region_weight": 1,
"region_score": 21,
"region_size": 21,
"start_ts": "2021-03-07T17:30:08+08:00",
"last_heartbeat_ts": "2021-03-07T23:33:03.610519001+08:00",
"uptime": "6h2m55.610519001s"
}
}
]
}
» member;
{
"header": {
"cluster_id": 6936842076787009464
},
"members": [
{
"name": "pd_tidbser1",
"member_id": 15687822745039720990,
"peer_urls": [
"http://192.168.40.62:2380"
],
"client_urls": [
"http://192.168.40.62:2379"
]
}
],
"leader": {
"name": "pd_tidbser1",
"member_id": 15687822745039720990,
"peer_urls": [
"http://192.168.40.62:2380"
],
"client_urls": [
"http://192.168.40.62:2379"
]
},
"etcd_leader": {
"name": "pd_tidbser1",
"member_id": 15687822745039720990,
"peer_urls": [
"http://192.168.40.62:2380"
],
"client_urls": [
"http://192.168.40.62:2379"
]
}
}
16.登录监控平台查看
地址: http://192.168.40.62:3000
用户及密码: admin/admin

二、部署过程主要报错及处理过程
1.初始化系统环境时报错
--标记执行初始化过程报错:
$ansible-playbook bootstrap.yml
具体运行日志覆盖了,查看log下面的ansible.log日志
报错 CPU逻辑核数低于8c
2021-03-06 23:36:45,716 p=38235 u=tidb | fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "This machine does not have sufficient CPU to run TiDB, at least 8 cores."}
2021-03-06 23:36:45,717 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: check_system_optional : Preflight check - Check TiDB server's CPU; message: {"changed": false, "msg": "This machine does not have sufficient CPU to run TiDB, at least 8 cores."}
报错 内存低于16000MB。
2021-03-06 23:48:23,218 p=43384 u=tidb | fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "This machine does not have sufficient RAM to run TiDB, at least 16000 MB."}
2021-03-06 23:48:23,218 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: check_system_optional : Preflight check - Check TiDB server's RAM; message: {"changed": false, "msg": "This machine does not have sufficient RAM to run TiDB, at least 16000 MB."}
报错 磁盘的随机iops低于40000.
2021-03-07 00:18:42,295 p=48835 u=tidb | fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 16172 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
2021-03-07 00:18:42,296 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement; message: {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 16172 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
--有下面几种解决办法:
1.调整虚拟机内存大小>=16000MB、CPU逻辑核数>=8c、更换性能更好的SSD磁盘
2.调整配置文件中参数限制的大小,使参数限制值不超过虚拟机目前的值
内存、CPU限制调整的配置文件如下:
vi /home/tidb/tidb-ansible/roles/check_system_optional/defaults/main.yml
# CPU
tidb_min_cpu: 8
tikv_min_cpu: 8
pd_min_cpu: 4
monitor_min_cpu: 4
# Mem
tidb_min_ram: 16000
tikv_min_ram: 16000
pd_min_ram: 8000
monitor_min_ram: 8000
磁盘io限制调整的配置文件如下:
vi /home/tidb/tidb-ansible/roles/machine_benchmark/defaults/main.yml
---
fio_deploy_dir: "{{ tikv_data_dir }}/fio"
# fio randread iops
min_ssd_randread_iops: 40000 --调整到小于报错的值,即小于16172
# fio mixed randread and sequential write
min_ssd_mix_randread_iops: 10000
min_ssd_mix_write_iops: 10000
# fio mixed randread and sequential write lat
max_ssd_mix_randread_lat: 250000
max_ssd_mix_write_lat: 30000
# fio test file size
benchmark_size: 10G
3.设置跳过检查系统最低要求
$ansible-playbook bootstrap.yml --extra-vars "dev_mode=True"
2.启动 TiDB 集群报错
--执行启动命令
$ansible-playbook start.yml
启动过程报错如下:
报错说等待region replication完成,推测和PD有关
PLAY [pd_servers[0]] *********************************************************************************************************************************************************************
TASK [wait for region replication complete] **********************************************************************************************************************************************
FAILED - RETRYING: wait for region replication complete (20 retries left).
FAILED - RETRYING: wait for region replication complete (19 retries left).
FAILED - RETRYING: wait for region replication complete (18 retries left).
FAILED - RETRYING: wait for region replication complete (17 retries left).
FAILED - RETRYING: wait for region replication complete (16 retries left).
FAILED - RETRYING: wait for region replication complete (15 retries left).
FAILED - RETRYING: wait for region replication complete (14 retries left).
FAILED - RETRYING: wait for region replication complete (13 retries left).
FAILED - RETRYING: wait for region replication complete (12 retries left).
FAILED - RETRYING: wait for region replication complete (11 retries left).
FAILED - RETRYING: wait for region replication complete (10 retries left).
FAILED - RETRYING: wait for region replication complete (9 retries left).
FAILED - RETRYING: wait for region replication complete (8 retries left).
FAILED - RETRYING: wait for region replication complete (7 retries left).
FAILED - RETRYING: wait for region replication complete (6 retries left).
FAILED - RETRYING: wait for region replication complete (5 retries left).
FAILED - RETRYING: wait for region replication complete (4 retries left).
FAILED - RETRYING: wait for region replication complete (3 retries left).
FAILED - RETRYING: wait for region replication complete (2 retries left).
FAILED - RETRYING: wait for region replication complete (1 retries left).
fatal: [192.168.40.62]: FAILED! => {"access_control_allow_headers": "accept, content-type, authorization", "access_control_allow_methods": "POST, GET, OPTIONS, PUT, DELETE", "access_control_allow_origin": "*", "attempts": 20, "changed": false, "connection": "close", "content_length": "94", "content_type": "application/json; charset=UTF-8", "cookies": {}, "date": "Sun, 07 Mar 2021 08:49:05 GMT", "json": {"is_initialized": false, "raft_bootstrap_time": "2021-03-07T16:45:30.740147728+08:00"}, "msg": "OK (94 bytes)", "redirected": false, "status": 200, "url": "http://192.168.40.62:2379/pd/api/v1/cluster/status"}
to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry
PLAY RECAP *******************************************************************************************************************************************************************************
192.168.40.62 : ok=25 changed=7 unreachable=0 failed=1
localhost : ok=7 changed=4 unreachable=0 failed=0
ERROR MESSAGE SUMMARY ********************************************************************************************************************************************************************
[192.168.40.62]: Ansible FAILED! => playbook: start.yml; TASK: wait for region replication complete; message: {"access_control_allow_headers": "accept, content-type, authorization", "access_control_allow_methods": "POST, GET, OPTIONS, PUT, DELETE", "access_control_allow_origin": "*", "attempts": 20, "changed": false, "connection": "close", "content_length": "94", "content_type": "application/json; charset=UTF-8", "cookies": {}, "date": "Sun, 07 Mar 2021 08:49:05 GMT", "json": {"is_initialized": false, "raft_bootstrap_time": "2021-03-07T16:45:30.740147728+08:00"}, "msg": "OK (94 bytes)", "redirected": false, "status": 200, "url": "http://192.168.40.62:2379/pd/api/v1/cluster/status"}
Ask for help:
Contact us: support@pingcap.com
It seems that you encounter some problems. You can send an email to the above email address, attached with the tidb-ansible/inventory.ini and tidb-ansible/log/ansible.log files and the error message, or new issue on https://github.com/pingcap/tidb-ansible/issues. We'll try our best to help you deploy a TiDB cluster. Thanks. :-)
解决办法:
修改conf/pd.yml文件,把默认的max-replicas: 3改为1
vi conf/pd.yml
# max-replicas: 3
max-replicas: 1
重新部署和启动
$ ansible-playbook unsafe_cleanup.yml
$ ansible-playbook deploy.yml
$ ansible-playbook start.yml
–END–
最后修改时间:2022-04-26 14:47:06
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




