暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

TiDB数据导入工具Lightning

原创 键盘丐 2022-04-05
569

概述

    TiDB Lightning 是一个将全量数据高速导入到 TiDB 集群的工具。

    TiDB Lightning 有以下两个主要的使用场景:

  • 迅速导入大量新数据。

  • 恢复所有备份数据。

    目前,TiDB Lightning 支持:

        导入 Dumpling、CSV 或 Amazon Aurora Parquet 输出格式的数据源。

        从本地盘或 Amazon S3 云盘读取数据。

参数详解

        详见官网TiDB Lightning 配置参数 | PingCAP Docs

安装

        因为Lightning和BR相同,都在tidb-toolkit包中,安装请参考上一篇BR的部署。

导入Dumpling导出的MySQL数据库

  • 查看上个实验dumpling导出文件       


  • 确保当前TiDB数据库群集无spider数据库

[root@node1 spider]# tiup cluster display tidb-test
Starting component `cluster`: /root/.tiup/components/cluster/v1.6.1/tiup-cluster display tidb-test
Cluster type:       tidb
Cluster name:       tidb-test
Cluster version:    v5.2.1
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.59.29:2379/dashboard
ID                   Role          Host           Ports        OS/Arch       Status  Data Dir                      Deploy Dir
--                   ----          ----           -----        -------       ------  --------                      ----------
192.168.59.26:9093   alertmanager  192.168.59.26  9093/9094    linux/x86_64  Up      /tidb-data/alertmanager-9093  /tidb-deploy/alertmanager-9093
192.168.59.26:3000   grafana       192.168.59.26  3000         linux/x86_64  Up      -                             /tidb-deploy/grafana-3000
192.168.59.27:2379   pd            192.168.59.27  2379/2380    linux/x86_64  Up      /tidb-data/pd-2379            /tidb-deploy/pd-2379
192.168.59.28:2379   pd            192.168.59.28  2379/2380    linux/x86_64  Up|L    /tidb-data/pd-2379            /tidb-deploy/pd-2379
192.168.59.29:2379   pd            192.168.59.29  2379/2380    linux/x86_64  Up|UI   /tidb-data/pd-2379            /tidb-deploy/pd-2379
192.168.59.26:9090   prometheus    192.168.59.26  9090         linux/x86_64  Up      /tidb-data/prometheus-9090    /tidb-deploy/prometheus-9090
192.168.59.27:4000   tidb          192.168.59.27  4000/10080   linux/x86_64  Up      -                             /tidb-deploy/tidb-4000
192.168.59.28:4000   tidb          192.168.59.28  4000/10080   linux/x86_64  Up      -                             /tidb-deploy/tidb-4000
192.168.59.29:4000   tidb          192.168.59.29  4000/10080   linux/x86_64  Up      -                             /tidb-deploy/tidb-4000
192.168.59.27:20160  tikv          192.168.59.27  20160/20180  linux/x86_64  Up      /tidb-data/tikv-20160         /tidb-deploy/tikv-20160
192.168.59.28:20160  tikv          192.168.59.28  20160/20180  linux/x86_64  Up      /tidb-data/tikv-20160         /tidb-deploy/tikv-20160
192.168.59.29:20160  tikv          192.168.59.29  20160/20180  linux/x86_64  Up      /tidb-data/tikv-20160         /tidb-deploy/tikv-20160
Total nodes: 12
[root@node1 spider]# mysql -h 192.168.59.27 -P4000 -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 67
Server version: 5.7.25-TiDB-v5.2.1 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| chnbs              |
| mysql              |
| test               |
+--------------------+
6 rows in set (0.00 sec)
  • 编辑TiDB Lightning 工具的配置文件

[root@node1 ~]# cat tidb-lightning.toml
[lightning]
# 日志
level = "info"
file = "tidb-lightning.log"
[tikv-importer]
# 选择使用的local 后端
backend = "local"
# 设置排序的键值对的临时存放地址,目标路径需要是一个空目录
sorted-kv-dir = "/tmp"
[mydumper]
# 源数据目录。
data-source-dir = "/tmp/spider/"
[tidb]
# 目标集群的信息
host = "192.168.59.27"
port = 4000
user = "root"
# 集群pd 的地址
pd-addr = "192.168.59.28:2379"
  • 执行导入命令并检查导入情况

[root@node1 ~]# tidb-lightning -config tidb-lightning.toml
Verbose debug logs will be written to tidb-lightning.log

tidb lightning exit
[root@node1 ~]# tail -f tidb-lightning.log
[2021/11/09 12:44:18.942 +08:00] [INFO] [pd.go:427] ["resume scheduler successful"] [scheduler=balance-region-scheduler]
[2021/11/09 12:44:18.943 +08:00] [INFO] [pd.go:427] ["resume scheduler successful"] [scheduler=balance-hot-region-scheduler]
[2021/11/09 12:44:18.944 +08:00] [INFO] [pd.go:427] ["resume scheduler successful"] [scheduler=balance-leader-scheduler]
[2021/11/09 12:44:18.944 +08:00] [INFO] [pd.go:518] ["restoring config"] [config="{\"enable-cross-table-merge\":\"true\",\"enable-debug-metrics\":\"false\",\"enable-joint-consensus\":\"true\",\"enable-location-replacement\":\"true\",\"enable-make-up-replica\":\"true\",\"enable-one-way-merge\":\"false\",\"enable-remove-down-replica\":\"true\",\"enable-remove-extra-replica\":\"true\",\"enable-replace-offline-replica\":\"true\",\"high-space-ratio\":0.7,\"hot-region-cache-hits-threshold\":3,\"hot-region-schedule-limit\":4,\"leader-schedule-limit\":4,\"leader-schedule-policy\":\"count\",\"low-space-ratio\":0.8,\"max-merge-region-keys\":200000,\"max-merge-region-size\":20,\"max-pending-peer-count\":16,\"max-snapshot-count\":3,\"max-store-down-time\":\"30m0s\",\"merge-schedule-limit\":8,\"patrol-region-interval\":\"100ms\",\"region-schedule-limit\":2048,\"region-score-formula-version\":\"v2\",\"replica-schedule-limit\":64,\"scheduler-max-waiting-operator\":5,\"schedulers-payload\":null,\"schedulers-v2\":[{\"args\":null,\"args-payload\":\"\",\"disable\":false,\"type\":\"balance-region\"},{\"args\":null,\"args-payload\":\"\",\"disable\":false,\"type\":\"balance-leader\"},{\"args\":null,\"args-payload\":\"\",\"disable\":false,\"type\":\"hot-region\"}],\"split-merge-interval\":\"1h0m0s\",\"store-limit\":{\"1\":{\"add-peer\":15,\"remove-peer\":15},\"2\":{\"add-peer\":15,\"remove-peer\":15},\"3\":{\"add-peer\":15,\"remove-peer\":15}},\"store-limit-mode\":\"manual\",\"tolerant-size-ratio\":0}"]
[2021/11/09 12:44:18.951 +08:00] [INFO] [restore.go:1081] ["add back PD leader&region schedulers"]
[2021/11/09 12:44:18.951 +08:00] [INFO] [restore.go:1794] ["skip full compaction"]
[2021/11/09 12:44:18.989 +08:00] [INFO] [restore.go:1979] ["clean checkpoints start"] [keepAfterSuccess=false] [taskID=1636432966245131240]
[2021/11/09 12:44:18.989 +08:00] [INFO] [restore.go:1986] ["clean checkpoints completed"] [keepAfterSuccess=false] [taskID=1636432966245131240] [takeTime=58.257µs] []
[2021/11/09 12:44:18.989 +08:00] [INFO] [restore.go:342] ["the whole procedure completed"] [takeTime=1m32.632600527s] []
[2021/11/09 12:44:18.989 +08:00] [INFO] [main.go:94] ["tidb lightning exit"]
^C
[root@node1 ~]# mysql -h 192.168.59.27 -P4000 -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 73
Server version: 5.7.25-TiDB-v5.2.1 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| chnbs              |
| mysql              |
| spider             |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql> use spider
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;
+---------------------+
| Tables_in_spider    |
+---------------------+
| sp_analyze_conf     |
| sp_area_code        |
| sp_article_category |
| sp_business_code    |
| sp_classify_result  |
| sp_file_cache       |
| sp_gather_conf      |
| sp_meta_basic       |
| sp_meta_detail      |
| sp_parameter        |
| sp_path_regex       |
| sp_process_queue    |
| sp_resource_file    |
+---------------------+
13 rows in set (0.00 sec)

mysql> select count(*) from sp_area_code;
+----------+
| count(*) |
+----------+
|     3438 |
+----------+
1 row in set (0.00 sec)

断点续传实验

  • 先删除spider数据库


mysql> use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| chnbs              |
| mysql              |
| spider             |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql> drop database spider;
Query OK, 0 rows affected (0.31 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| chnbs              |
| mysql              |
| test               |
+--------------------+
6 rows in set (0.00 sec)
  • 编辑配置文件,添加checkpoint区块,具体内容如下

[root@node1 ~]# cat tidb-lightning.toml
[lightning]
# 日志
level = "info"
file = "tidb-lightning.log"
[tikv-importer]
# 选择使用的local 后端
backend = "local"
# 设置排序的键值对的临时存放地址,目标路径需要是一个空目录
sorted-kv-dir = "/tmp"
[mydumper]
# 源数据目录。
data-source-dir = "/tmp/spider/"
[tidb]
# 目标集群的信息
host = "192.168.59.27"
port = 4000
user = "root"
# 集群pd 的地址
pd-addr = "192.168.59.28:2379"
[checkpoint]
# 启用断点续传。
# 导入时,TiDB Lightning 会记录当前进度。
# 若 TiDB Lightning 或其他组件异常退出,在重启时可以避免重复再导入已完成的数据。
enable = true
# 存储断点的方式
#  - file:存放在本地文件系统(要求 v2.1.1 或以上)
#  - mysql:存放在兼容 MySQL 的数据库服务器
driver = "file"
  • 执行导入命令tidb-lightning -config tidb-lightning.toml,中途Ctrl+C强制终断


[root@node1 ~]# tidb-lightning -config tidb-lightning.toml
Verbose debug logs will be written to tidb-lightning.log

^Ctidb lightning exit
[root@node1 ~]# mysql -uroot -h192.168.59.27 -P4000
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 117
Server version: 5.7.25-TiDB-v5.2.1 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| chnbs              |
| mysql              |
| spider             |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql> use spider;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------+
| Tables_in_spider    |
+---------------------+
| sp_analyze_conf     |
| sp_article_category |
| sp_business_code    |
| sp_gather_conf      |
| sp_parameter        |
| sp_path_regex       |
| sp_resource_file    |
+---------------------+
7 rows in set (0.00 sec)

mysql> select count(*) from sp_parameter;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

        可以看出部分表结构已经导入,但数据还没有导入,如下是日志

[2021/11/09 13:21:01.630 +08:00] [WARN] [config.go:483] ["currently only per-task configuration can be applied, global configuration changes can only be made on startup"] ["global config changes"="[lightning.level,lightning.file]"]
[2021/11/09 13:21:01.630 +08:00] [INFO] [info.go:40] ["Welcome to TiDB-Lightning"] [release-version=v5.0.1] [git-hash=b3bd5495369244cfb1d3f2c49763a7fdb55d4ecd] [git-branch=heads/refs/tags/v5.0.1] [go-version=go1.13] [utc-build-time="2021-04-23 06:02:57"] [race-enabled=false]
[2021/11/09 13:21:01.630 +08:00] [INFO] [lightning.go:218] [cfg] [cfg="{\"id\":1636435261630917946,\"lightning\":{\"table-concurrency\":6,\"index-concurrency\":2,\"region-concurrency\":2,\"io-concurrency\":5,\"check-requirements\":true},\"tidb\":{\"host\":\"192.168.59.27\",\"port\":4000,\"user\":\"root\",\"status-port\":10080,\"pd-addr\":\"192.168.59.28:2379\",\"sql-mode\":\"ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER\",\"tls\":\"false\",\"security\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"redact-info-log\":false},\"max-allowed-packet\":67108864,\"distsql-scan-concurrency\":15,\"build-stats-concurrency\":20,\"index-serial-scan-concurrency\":20,\"checksum-table-concurrency\":2},\"checkpoint\":{\"schema\":\"tidb_lightning_checkpoint\",\"driver\":\"file\",\"enable\":true,\"keep-after-success\":false},\"mydumper\":{\"read-block-size\":65536,\"batch-size\":107374182400,\"batch-import-ratio\":0,\"data-source-dir\":\"file:///tmp/spider\",\"character-set\":\"auto\",\"csv\":{\"separator\":\",\",\"delimiter\":\"\\\"\",\"null\":\"\\\\N\",\"header\":true,\"trim-last-separator\":false,\"not-null\":false,\"backslash-escape\":true},\"max-region-size\":268435456,\"filter\":[\"*.*\",\"!mysql.*\",\"!sys.*\",\"!INFORMATION_SCHEMA.*\",\"!PERFORMANCE_SCHEMA.*\",\"!METRICS_SCHEMA.*\",\"!INSPECTION_SCHEMA.*\"],\"files\":null,\"no-schema\":false,\"case-sensitive\":false,\"strict-format\":false,\"default-file-rules\":true},\"tikv-importer\":{\"addr\":\"\",\"backend\":\"local\",\"on-duplicate\":\"replace\",\"max-kv-pairs\":4096,\"send-kv-pairs\":32768,\"region-split-size\":100663296,\"sorted-kv-dir\":\"/tmp\",\"disk-quota\":60684017664,\"range-concurrency\":16,\"engine-mem-cache-size\":536870912,\"local-writer-mem-cache-size\":134217728},\"post-restore\":{\"checksum\":\"required\",\"analyze\":\"optional\",\"level-1-compact\":false,\"post-process-at-last\":true,\"compact\":false},\"cron\":{\"switch-mode\":\"5m0s\",\"log-progress\":\"5m0s\",\"check-disk-quota\":\"0s\"},\"routes\":null,\"security\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"redact-info-log\":false},\"black-white-list\":{\"do-tables\":null,\"do-dbs\":null,\"ignore-tables\":null,\"ignore-dbs\":null}}"]
[2021/11/09 13:21:01.634 +08:00] [INFO] [lightning.go:285] ["load data source start"]
[2021/11/09 13:21:01.635 +08:00] [INFO] [loader.go:299] ["[loader] file is filtered by file router"] [path=metadata]
[2021/11/09 13:21:01.636 +08:00] [INFO] [lightning.go:288] ["load data source completed"] [takeTime=1.046073ms] []
[2021/11/09 13:21:01.636 +08:00] [INFO] [checkpoints.go:954] ["open checkpoint file failed, going to create a new one"] [path=/tmp/tidb_lightning_checkpoint.pb] [error="open /tmp/tidb_lightning_checkpoint.pb: no such file or directory"]
[2021/11/09 13:21:01.644 +08:00] [INFO] [restore.go:312] ["the whole procedure start"]
[2021/11/09 13:21:01.647 +08:00] [INFO] [restore.go:601] ["restore all schema start"]
[2021/11/09 13:21:02.743 +08:00] [INFO] [main.go:46] ["got signal to exit"] [signal=interrupt]
[2021/11/09 13:21:02.743 +08:00] [INFO] [restore.go:327] ["task canceled"] [step=2] [error="context canceled"]
[2021/11/09 13:21:02.743 +08:00] [INFO] [restore.go:342] ["the whole procedure completed"] [takeTime=1.099503617s] []
[2021/11/09 13:21:02.744 +08:00] [INFO] [main.go:94] ["tidb lightning exit"]
  • 执行断点续传的控制命令

[root@node1 ~]# tidb-lightning-ctl --config tidb-lightning.toml --checkpoint-error-destroy=all
[root@node1 ~]# tidb-lightning-ctl --config tidb-lightning.toml --checkpoint-remove=all
  • 再执行导入

[root@node1 ~]# tidb-lightning -config tidb-lightning.toml
Verbose debug logs will be written to tidb-lightning.log

tidb lightning exit
[root@node1 ~]# mysql -uroot -h192.168.59.27 -P4000
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 125
Server version: 5.7.25-TiDB-v5.2.1 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use spider;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------+
| Tables_in_spider    |
+---------------------+
| sp_analyze_conf     |
| sp_area_code        |
| sp_article_category |
| sp_business_code    |
| sp_classify_result  |
| sp_file_cache       |
| sp_gather_conf      |
| sp_meta_basic       |
| sp_meta_detail      |
| sp_parameter        |
| sp_path_regex       |
| sp_process_queue    |
| sp_resource_file    |
+---------------------+
13 rows in set (0.01 sec)

mysql> select count(*) from sp_meta_detail;
+----------+
| count(*) |
+----------+
|    18933 |
+----------+
1 row in set (0.02 sec)

        从以上过程看出数据已经导入,确认是否全部成功,还需要检查日志内容。


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论