暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

单虚拟机在线部署TiDB集群及报错处理

一、使用TiDB ansible进行部署

1.安装系统依赖包

# yum -y install epel-release git curl sshpass && yum -y install python2-pip 

2. 创建 tidb 用户

# useradd -m -d /home/tidb tidb
# passwd tidb

3.配置 tidb 用户 sudo 免密码

# visudo
tidb ALL=(ALL) NOPASSWD: ALL

4.tidb用户生成 SSH key

# su - tidb
$ ssh-keygen -t rsa

5.下载 TiDB Ansible

$ git clone -b v3.0.20 https://github.com/pingcap/tidb-ansible.git

6.安装TiDB Ansible及其依赖

$ cd /home/tidb/tidb-ansible 
$ sudo pip install -r ./requirements.txt

7.配置部署机器 SSH 互信及 sudo 规则

$ cd /home/tidb/tidb-ansible
$ vi hosts.ini
[servers]
192.168.40.62

[all:vars]
username = tidb
ntp_server = pool.ntp.org

8.部署到目标机器的 root 用户密码,现在目标机器只是本机

$ ansible-playbook -i hosts.ini create_users.yml -u root -k

9.部署目标机器上安装 NTP 服务

$ cd /home/tidb/tidb-ansible 
$ ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b

10.部署目标机器上配置 CPUfreq 调节器模式 --虚拟机不支持,跳过设置

$ cpupower frequency-info --governors   
$ cpupower frequency-set --governor performance
$ ansible -i hosts.ini all -m shell -a "cpupower frequency-set --governor performance" -u tidb -b

11.在部署目标机器上添加数据盘 ext4 文件系统挂载参数

已经手工挂载好了,可以只进行检查
$ ansible -i hosts.ini all -m shell -a "mount -t ext4" -u tidb -b

12.调整inventory.ini 文件中的变量

$ vi inventory.ini
## TiDB Cluster Part
[tidb_servers]
192.168.40.62

[tikv_servers]
192.168.40.62

[pd_servers]
192.168.40.62

[spark_master]

[spark_slaves]

[lightning_server]

[importer_server]

## Monitoring Part
# prometheus and pushgateway servers
[monitoring_servers]
192.168.40.62

[grafana_servers]
192.168.40.62

# node_exporter and blackbox_exporter servers
[monitored_servers]
192.168.40.62

[alertmanager_servers]
192.168.40.62

[kafka_exporter_servers]

##Binlog Part
[pump_servers]

[drainer_servers]

## Group variables
[pd_servers:vars]
# location_labels = ["zone","rack","host"]

## Global variables
[all:vars]
deploy_dir = /u01/deploy
## Connection
#ssh via normal user
ansible_user = tidb

cluster_name = test-cluster

tidb_version = v3.0.20

13.单机部署TiDB 集群并启动

$ ansible -i inventory.ini all -m shell -a 'whoami'            --验证互信
$ ansible -i inventory.ini all -m shell -a 'whoami' -b	       --验证sudo
$ ansible-playbook local_prepare.yml
$ ansible-playbook bootstrap.yml
$ ansible-playbook deploy.yml
$ ansible-playbook start.yml

14.连接TiDB测试,使用mysql客户端

$ mysql -u root -h192.168.40.62 -P4000
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 663
Server version: 5.7.25-TiDB-v3.0.20 MySQL Community Server (Apache License 2.0)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| mysql              |
| test               |
+--------------------+
4 rows in set (0.00 sec)

MySQL [(none)]> 

15.连接PD进行查看

$ resources/bin/pd-ctl -u http://192.168.40.62:2379 -i
» stores show;
{
  "count": 1,
  "stores": [
    {
      "store": {
        "id": 1,
        "address": "192.168.40.62:20160",
        "version": "3.0.20",
        "state_name": "Up"
      },
      "status": {
        "capacity": "59.91GiB",
        "available": "58.26GiB",
        "leader_count": 21,
        "leader_weight": 1,
        "leader_score": 21,
        "leader_size": 21,
        "region_count": 21,
        "region_weight": 1,
        "region_score": 21,
        "region_size": 21,
        "start_ts": "2021-03-07T17:30:08+08:00",
        "last_heartbeat_ts": "2021-03-07T23:33:03.610519001+08:00",
        "uptime": "6h2m55.610519001s"
      }
    }
  ]
}

» member;
{
  "header": {
    "cluster_id": 6936842076787009464
  },
  "members": [
    {
      "name": "pd_tidbser1",
      "member_id": 15687822745039720990,
      "peer_urls": [
        "http://192.168.40.62:2380"
      ],
      "client_urls": [
        "http://192.168.40.62:2379"
      ]
    }
  ],
  "leader": {
    "name": "pd_tidbser1",
    "member_id": 15687822745039720990,
    "peer_urls": [
      "http://192.168.40.62:2380"
    ],
    "client_urls": [
      "http://192.168.40.62:2379"
    ]
  },
  "etcd_leader": {
    "name": "pd_tidbser1",
    "member_id": 15687822745039720990,
    "peer_urls": [
      "http://192.168.40.62:2380"
    ],
    "client_urls": [
      "http://192.168.40.62:2379"
    ]
  }
}

16.登录监控平台查看

地址: http://192.168.40.62:3000
用户及密码: admin/admin

image.png

二、部署过程主要报错及处理过程

1.初始化系统环境时报错

--标记执行初始化过程报错:
$ansible-playbook bootstrap.yml

具体运行日志覆盖了,查看log下面的ansible.log日志

报错 CPU逻辑核数低于8c
2021-03-06 23:36:45,716 p=38235 u=tidb |  fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "This machine does not have sufficient CPU to run TiDB, at least 8 cores."}
2021-03-06 23:36:45,717 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: check_system_optional : Preflight check - Check TiDB server's CPU; message: {"changed": false, "msg": "This machine does not have sufficient CPU to run TiDB, at least 8 cores."} 

报错 内存低于16000MB。
2021-03-06 23:48:23,218 p=43384 u=tidb |  fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "This machine does not have sufficient RAM to run TiDB, at least 16000 MB."}
2021-03-06 23:48:23,218 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: check_system_optional : Preflight check - Check TiDB server's RAM; message: {"changed": false, "msg": "This machine does not have sufficient RAM to run TiDB, at least 16000 MB."}

报错 磁盘的随机iops低于40000.
2021-03-07 00:18:42,295 p=48835 u=tidb |  fatal: [192.168.40.62]: FAILED! => {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 16172 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
2021-03-07 00:18:42,296 fail [192.168.40.62]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement; message: {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 16172 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

--有下面几种解决办法:
1.调整虚拟机内存大小>=16000MB、CPU逻辑核数>=8c、更换性能更好的SSD磁盘
2.调整配置文件中参数限制的大小,使参数限制值不超过虚拟机目前的值
内存、CPU限制调整的配置文件如下:
vi /home/tidb/tidb-ansible/roles/check_system_optional/defaults/main.yml
# CPU
tidb_min_cpu: 8  
tikv_min_cpu: 8   
pd_min_cpu: 4    
monitor_min_cpu: 4  

# Mem
tidb_min_ram: 16000  
tikv_min_ram: 16000  
pd_min_ram: 8000     
monitor_min_ram: 8000  

磁盘io限制调整的配置文件如下:
vi /home/tidb/tidb-ansible/roles/machine_benchmark/defaults/main.yml 
---

fio_deploy_dir: "{{ tikv_data_dir }}/fio"

# fio randread iops
min_ssd_randread_iops: 40000   --调整到小于报错的值,即小于16172

# fio mixed randread and sequential write
min_ssd_mix_randread_iops: 10000
min_ssd_mix_write_iops: 10000

# fio mixed randread and sequential write lat
max_ssd_mix_randread_lat: 250000
max_ssd_mix_write_lat: 30000

# fio test file size
benchmark_size: 10G

3.设置跳过检查系统最低要求
$ansible-playbook bootstrap.yml --extra-vars "dev_mode=True"

2.启动 TiDB 集群报错

--执行启动命令
$ansible-playbook start.yml

启动过程报错如下:
报错说等待region replication完成,推测和PD有关
PLAY [pd_servers[0]] *********************************************************************************************************************************************************************

TASK [wait for region replication complete] **********************************************************************************************************************************************
FAILED - RETRYING: wait for region replication complete (20 retries left).
FAILED - RETRYING: wait for region replication complete (19 retries left).
FAILED - RETRYING: wait for region replication complete (18 retries left).
FAILED - RETRYING: wait for region replication complete (17 retries left).
FAILED - RETRYING: wait for region replication complete (16 retries left).
FAILED - RETRYING: wait for region replication complete (15 retries left).
FAILED - RETRYING: wait for region replication complete (14 retries left).
FAILED - RETRYING: wait for region replication complete (13 retries left).
FAILED - RETRYING: wait for region replication complete (12 retries left).
FAILED - RETRYING: wait for region replication complete (11 retries left).
FAILED - RETRYING: wait for region replication complete (10 retries left).
FAILED - RETRYING: wait for region replication complete (9 retries left).
FAILED - RETRYING: wait for region replication complete (8 retries left).
FAILED - RETRYING: wait for region replication complete (7 retries left).
FAILED - RETRYING: wait for region replication complete (6 retries left).
FAILED - RETRYING: wait for region replication complete (5 retries left).
FAILED - RETRYING: wait for region replication complete (4 retries left).
FAILED - RETRYING: wait for region replication complete (3 retries left).
FAILED - RETRYING: wait for region replication complete (2 retries left).
FAILED - RETRYING: wait for region replication complete (1 retries left).
fatal: [192.168.40.62]: FAILED! => {"access_control_allow_headers": "accept, content-type, authorization", "access_control_allow_methods": "POST, GET, OPTIONS, PUT, DELETE", "access_control_allow_origin": "*", "attempts": 20, "changed": false, "connection": "close", "content_length": "94", "content_type": "application/json; charset=UTF-8", "cookies": {}, "date": "Sun, 07 Mar 2021 08:49:05 GMT", "json": {"is_initialized": false, "raft_bootstrap_time": "2021-03-07T16:45:30.740147728+08:00"}, "msg": "OK (94 bytes)", "redirected": false, "status": 200, "url": "http://192.168.40.62:2379/pd/api/v1/cluster/status"}
        to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry

PLAY RECAP *******************************************************************************************************************************************************************************
192.168.40.62              : ok=25   changed=7    unreachable=0    failed=1   
localhost                  : ok=7    changed=4    unreachable=0    failed=0   


ERROR MESSAGE SUMMARY ********************************************************************************************************************************************************************
[192.168.40.62]: Ansible FAILED! => playbook: start.yml; TASK: wait for region replication complete; message: {"access_control_allow_headers": "accept, content-type, authorization", "access_control_allow_methods": "POST, GET, OPTIONS, PUT, DELETE", "access_control_allow_origin": "*", "attempts": 20, "changed": false, "connection": "close", "content_length": "94", "content_type": "application/json; charset=UTF-8", "cookies": {}, "date": "Sun, 07 Mar 2021 08:49:05 GMT", "json": {"is_initialized": false, "raft_bootstrap_time": "2021-03-07T16:45:30.740147728+08:00"}, "msg": "OK (94 bytes)", "redirected": false, "status": 200, "url": "http://192.168.40.62:2379/pd/api/v1/cluster/status"}
Ask for help:
Contact us: support@pingcap.com
It seems that you encounter some problems. You can send an email to the above email address, attached with the tidb-ansible/inventory.ini and tidb-ansible/log/ansible.log files and the error message, or new issue on https://github.com/pingcap/tidb-ansible/issues. We'll try our best to help you deploy a TiDB cluster. Thanks. :-)

解决办法:
修改conf/pd.yml文件,把默认的max-replicas: 3改为1
vi conf/pd.yml
# max-replicas: 3
max-replicas: 1

重新部署和启动
$ ansible-playbook unsafe_cleanup.yml
$ ansible-playbook deploy.yml
$ ansible-playbook start.yml

–END–

最后修改时间:2022-04-26 14:47:06
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论