暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

clickhouse运维系列_一起学习clickhouse,clickhouse备份恢复学习分享

2172

近期在学习clickhouse数据库,为后期要运维一套核心clickhouse生产数据库做准备。 本次给大家分享下学习clickhouse备份恢复的一些知识给大家分享,本次分享内容有些简陋,待后续深入学习之后再跟大家进一步详细分享。

一、备份方式及工具

ClickHouse提供了多种备份和恢复机制,接下来将给大家一一介绍。

1.1 将数据源复制到其它地方

如通过Kafka等消息中间件,将数据通过持久队列传递到其它订阅服务器,这些订阅服务器将写入ClickHouse时读取相同的数据流,并将其存储在冷存储中。

1.2 文件系统快照

某些本地文件系统提供快照功能,但这种方式不能提供实时查询的最佳选择。 使用这种文件系统创建额外的副本,并将它们与用于Select 查询的分布式表分离,任何修改数据的查询都无法访问此类副本上的快照。 这种方式需额外增加磁盘,代价比较高。

1.3 ClickHouse-copier

ClickHouse-copier 是一个多功能工具,最初创建它是为了用于重新切分PB大小的表,因为它能够在ClickHouse表和集群之间可靠地复制数据,所以它也可用于备份和还原数据。

对于较小的数据量,一个简单的insert into … select … 到远程表也可以。

1.4 Part操作

ClickHouse允许使用alter table … freeze partition … 查询以创建表分区的本地副本,是利用硬链接(hardlink)到/var/lib/clickhouse/shadow/文件夹中实现的,它通常不会因为旧数据而占用额外的磁盘空间。创建的文件副本不由ClickHouse服务器处理。

手动执行 alter table … freeze partition … freeze是冻结的意思,此方式优点:并不会影响ClickHouse的执行,会将数据备份到一些指定路径下 恢复时使用 aler table … attach partition … 从备份中恢复

1.5 ClickHouse -backup

以上工具都属于手工,ClickHouse-backup可以实现自动化。

二、clickhouse-backup介绍

clickhouse-backup是社区开源的一个clickhouse备份工具,可用于实现数据迁移。其原理是先创建一个备份,然后从备份导入数据,使用该工具可用于常规的异地冷备方案。

可通过二进制或者rpm包方式安装。

2.1 使用限制

  • ClickHouse 版本不低于 1.1.54390
  • 仅限MergeTree 系列表引擎
  • 不支持备份 Tiered storage 或 storage_policy
  • 云存储最大备份大小为 5TB
  • AWS S3上的parts数最大为10000

2.2 安装部署

本次通过部署二进制包方式部署clickhouse-backup。

1) 下载 clickhouser-backup
wget https://github.com/Altinity/clickhouse-backup/releases/download/v2.5.9/clickhouse-backup-linux-amd64.tar.gz

2) 解压
tar xf clickhouse-backup-linux-amd64.tar.gz -C /root

3) 复制clickhouse-backup命令到系统中
cp /root/clickhouse/build/linux/amd64/clickhouse-backup /usr/local/bin/

4) 创建clickhouse-backup 配置文件目录
mkdir -p /etc/clickhouse-backup

5) 创建并修改配置文件到/etc/clickhouse-backup目录
general:
  remote_storage: none
  backups_to_keep_local: 7
  backups_to_keep_remote: 31
clickhouse:
  username: default 
  password: "clickhouse" 
  host: localhost
  port: 9000
  disk_mapping: {}
  skip_tables:
    - system.*
    - INFORMATION_SCHEMA.*
    - information_schema.* 
    - _temporary_and_external_tables.*
  timeout: 5m
  freeze_by_part: false
  freeze_by_part_where: ""
  use_embedded_backup_restore: false
  embedded_backup_disk: ""
  secure: false
  skip_verify: false
  sync_replicated_tables: false
  log_sql_queries: true
  restart_command: systemctl restart clickhouse-server.service
  ignore_not_exists_error_during_freeze: true
  check_replicas_before_attach: false
  tls_key: 
  tls_cert: ""
  tls_ca: ""
  debug: false

## 查看clickhouse-backup版本
(base) [root@dsmart ~]# clickhouse-backup -v
Version:         2.5.9
Git Commit:      0387d55d340085e4f9f38cee6df4648b84e972ae
Build Date:      2024-05-20

2.3 查看clickhouse-backup默认配置项

可通过如下方式查看clickhouse-backup的默认配置项。

(base) [root@dsmart ~]# clickhouse-backup default-config 
general:
    remote_storage: none
    max_file_size: 0
    backups_to_keep_local: 0
    backups_to_keep_remote: 0
    log_level: info
    allow_empty_backups: false
    download_concurrency: 16
    upload_concurrency: 4
    upload_max_bytes_per_second: 0
    download_max_bytes_per_second: 0
    object_disk_server_side_copy_concurrency: 32
    use_resumable_state: true
    restore_schema_on_cluster: ""
    upload_by_part: true
    download_by_part: true
    restore_database_mapping: {}
    retries_on_failure: 3
    retries_pause: 30s
    watch_interval: 1h
    full_interval: 24h
    watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
    sharded_operation_mode: ""
    cpu_nice_priority: 15
    io_nice_priority: idle
    rbac_backup_always: true
    rbac_conflict_resolution: recreate
    retriesduration: 100ms
    watchduration: 1h0m0s
    fullduration: 24h0m0s
clickhouse:
    username: default
    password: ""
    host: localhost
    port: 9000
    disk_mapping: {}
    skip_tables:
        - system.*
        - INFORMATION_SCHEMA.*
        - information_schema.*
        - _temporary_and_external_tables.*
    skip_table_engines: []
    timeout: 30m
    freeze_by_part: false
    freeze_by_part_where: ""
    use_embedded_backup_restore: false
    embedded_backup_disk: ""
    embedded_backup_threads: 0
    embedded_restore_threads: 0
    backup_mutations: true
    restore_as_attach: false
    check_parts_columns: true
    secure: false
    skip_verify: false
    sync_replicated_tables: false
    log_sql_queries: true
    config_dir: /etc/clickhouse-server/
    restart_command: exec:systemctl restart clickhouse-server
    ignore_not_exists_error_during_freeze: true
    check_replicas_before_attach: true
    tls_key: ""
    tls_cert: ""
    tls_ca: ""
    max_connections: 16
    debug: false
s3:
    access_key: ""
    secret_key: ""
    bucket: ""
    endpoint: ""
    region: us-east-1
    acl: private
    assume_role_arn: ""
    force_path_style: false
    path: ""
    object_disk_path: ""
    disable_ssl: false
    compression_level: 1
    compression_format: tar
    sse: ""
    sse_kms_key_id: ""
    sse_customer_algorithm: ""
    sse_customer_key: ""
    sse_customer_key_md5: ""
    sse_kms_encryption_context: ""
    disable_cert_verification: false
    use_custom_storage_class: false
    storage_class: STANDARD
    custom_storage_class_map: {}
    concurrency: 17
    part_size: 0
    max_parts_count: 4000
    allow_multipart_download: false
    object_labels: {}
    request_payer: ""
    check_sum_algorithm: ""
    debug: false
gcs:
    credentials_file: ""
    credentials_json: ""
    credentials_json_encoded: ""
    embedded_access_key: ""
    embedded_secret_key: ""
    skip_credentials: false
    bucket: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    debug: false
    force_http: false
    endpoint: ""
    storage_class: STANDARD
    object_labels: {}
    custom_storage_class_map: {}
    client_pool_size: 48
    chunk_size: 0
cos:
    url: ""
    timeout: 2m
    secret_id: ""
    secret_key: ""
    path: ""
    compression_format: tar
    compression_level: 1
    debug: false
api:
    listen: localhost:7171
    enable_metrics: true
    enable_pprof: false
    username: ""
    password: ""
    secure: false
    certificate_file: ""
    private_key_file: ""
    ca_cert_file: ""
    ca_key_file: ""
    create_integration_tables: false
    integration_tables_host: ""
    allow_parallel: false
    complete_resumable_after_restart: true
    watch_is_main_process: false
ftp:
    address: ""
    timeout: 2m
    username: ""
    password: ""
    tls: false
    skip_tls_verify: false
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 48
    debug: false
sftp:
    address: ""
    port: 22
    username: ""
    password: ""
    key: ""
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 48
    debug: false
azblob:
    endpoint_schema: https
    endpoint_suffix: core.windows.net
    account_name: ""
    account_key: ""
    sas: ""
    use_managed_identity: false
    container: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    sse_key: ""
    buffer_size: 0
    buffer_count: 3
    max_parts_count: 256
    timeout: 4h
    debug: false
custom:
    upload_command: ""
    download_command: ""
    list_command: ""
    delete_command: ""
    command_timeout: 4h
    commandtimeoutduration: 4h0m0s

2.4 查看可备份的表

可通过如下clickhouse-backup tables 方式查看当前数据库有哪些可以备份的表。

-- clickhouse-backup tables 
(base) [root@dsmart ~]# clickhouse-backup tables  |grep -v info
ckdb.uk_price_paid     245.43MiB  default  full
default.uk_price_paid  0B         default  full

三、备份测试

本次通过从clickhouse官网导入一些测试数据库到本地测试库进行验证测试。

3.1 导入测试数据

1) 创建表
(base) [root@dsmart clickhouse]#  clickhouse-client --user default --password clickhouse -d ckdb --multiquery <  /root/clickhouse/uk_price_paid.sql

2) 导入测试数据
(base) [root@dsmart clickhouse]# clickhouse-client --user default --password clickhouse -d ckdb --multiquery <  /root/clickhouse/insert_uk_price_paid.sql

wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz

(base) [root@dsmart ~]# tar -zxvf 2021_08_01_07_01_17_data.tgz 
Dish.csv
Menu.csv
MenuItem.csv
MenuPage.csv

(base) [root@dsmart clickhouse]# clickhouse-client --user default --password clickhouse -d db1 -m
ClickHouse client version 22.3.20.29 (official build).
Connecting to database db1 at localhost:9000 as user default.
Connected to ClickHouse server version 22.3.20 revision 54455.

dsmart :) show tables;

SHOW TABLES

Query id: 5376e39b-8ebd-48f1-8f36-92bca1e104f0

┌─name──────┐
│ dish      │
│ menu      │
│ menu_item │
│ menu_page │
└───────────┘

4 rows in set. Elapsed: 0.002 sec. 

## 可通过下数据导入csv格式数据至数据库表中

clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.dish FORMAT CSVWithNames" < /root/Dish.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.menu FORMAT CSVWithNames" < /root/Menu.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO db1.menu_page FORMAT CSVWithNames" < MenuPage.csv
clickhouse-client --user default --password clickhouse --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO db1.menu_item FORMAT CSVWithNames" < /root/MenuItem.csv

## 查看可备份的表
(base) [root@dsmart clickhouse-backup]#  clickhouse-backup --config=/etc/clickhouse-backup/config.yml list
ckdb.uk_price_paid_0523                            245.44MiB   23/05/2024 03:34:54   local       regular
default.uk_price_paid_0523                         771B        23/05/2024 06:29:03   local       regular
db1                                                403.41MiB   23/05/2024 07:19:58   local       regular
2024-05-23T07-30-16                                1.80KiB     23/05/2024 07:30:17   local       regular
2024-05-23T13-18-40                                50.22MiB    23/05/2024 13:18:40   local       regular
replica2_dwd_sal_sales_attendance_bak20230516_56   7.33MiB     16/05/2023 15:36:00   remote      tar, regular
replica2_mictable                                  2.41GiB     16/05/2023 16:01:59   remote      tar, regular
replica1_fosp_dim_fo_bak_20230518                  4.36KiB     18/05/2023 10:23:13   remote      tar, regular
replica1_fosp_kafka_bak_20230518                   2.11MiB     18/05/2023 10:40:54   remote      tar, regular
replica1_fosp_kafka_bak_20230518_2                 2.08MiB     18/05/2023 11:02:19   remote      tar, regular
replica2_local_samples2_back20230907               801.29KiB   07/09/2023 15:13:37   remote      tar, regular
20240229-10-110-3-56                               31.44GiB    29/02/2024 14:32:14   remote      tar, regular

图片

3.2 备份数据库

备份后使用upload上传备份文件

(base) [root@dsmart ~]# clickhouse-backup --config=/etc/clickhouse-backup/config.yml create -t db1.* db1_0524

(base) [root@dsmart ~]# clickhouse-backup --config=/etc/clickhouse-backup/config.yml upload db1_0524

3.3 恢复数据库

[root@ivorysqldb ~]# clickhouse-backup --config /etc/clickhouse-backup/config.yml download db1_0524

[root@ivorysqldb ~]# clickhouse-backup --config /etc/clickhouse-backup/config.yml restore db1_0524

3.4 删除本地备份文件

(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete db1_0524
2024/05/24 10:08:43.838173 error Backup name must be defined
NAME:
   clickhouse-backup delete - Delete specific backup

USAGE:
   clickhouse-backup delete <local|remote> <backup_name>

OPTIONS:
   --config value, -c value                   Config 'FILE' name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --environment-override value, --env value  override any environment variable via CLI parameter
   
(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete local db1_0524

3.5 删除远程备份文件

(base) [root@dsmart backup]# clickhouse-backup --config /etc/clickhouse-backup/config.yml delete remote db1_0524

3.6 config.yml配置文件

clickhouse-backup备份需要创建一个config.yml配置文件,可在/etc/clickhouse-backup目录下创建该文件,该文件内容如下:

general:
  remote_storage: sftp           # REMOTE_STORAGE, if `none` then `upload` and  `download` command will fail
  max_file_size: 1099511627776      # MAX_FILE_SIZE, 1G by default, useless when upload_by_part is true, use for split data parts files 
by archives  
  disable_progress_bar: false     # DISABLE_PROGRESS_BAR, show progress bar during upload and download, have sense only when `upload_con
currency` and `download_concurrency` equal 1
  backups_to_keep_local: 0       # BACKUPS_TO_KEEP_LOCAL, how much newest local backup should keep, 0 mean all created backups will keep
 on local disk
                                 # you shall to run `clickhouse-backup delete local <backup_name>` command to avoid useless disk space a
llocations
  backups_to_keep_remote: 0      # BACKUPS_TO_KEEP_REMOTE, how much newest backup should keep on remote storage, 0 mean all uploaded bac
kups will keep on remote storage. 
                                 # if old backup is required for newer incremental backup, then it will don't delete. Be careful with lo
ng incremental backup sequences.
  log_level: error                # LOG_LEVEL
  allow_empty_backups: false     # ALLOW_EMPTY_BACKUPS
  download_concurrency: 1        # DOWNLOAD_CONCURRENCY, max 255
  upload_concurrency: 1          # UPLOAD_CONCURRENCY, max 255
  upload_by_part: true           # UPLOAD_BY_PART
  download_by_part: true         # DOWNLOAD_BY_PART
  restore_database_mapping: {}   # RESTORE_DATABASE_MAPPING, restore rules from backup databases to target databases, which is useful on
 change destination database all atomic tables will create with new uuid.
  retries_on_failure: 3          # RETRIES_ON_FAILURE, retry if failure during upload or download
  retries_pause: 100ms           # RETRIES_PAUSE, time duration pause after each download or upload fail 
clickhouse:
  username: default                # CLICKHOUSE_USERNAME
  password: "clickhouse"                     # CLICKHOUSE_PASSWORD
  host: localhost                  # CLICKHOUSE_HOST
  port: 9000                       # CLICKHOUSE_PORT, don't use 8123, clickhouse-backup doesn't support HTTP protocol
  disk_mapping: {}                 # CLICKHOUSE_DISK_MAPPING, use it if your system.disks on restored servers not the same with system.d
isks on server where backup was created
  skip_tables:                     # CLICKHOUSE_SKIP_TABLES
    - system.*
    - INFORMATION_SCHEMA.*
    - information_schema.*
  timeout: 5m                  # CLICKHOUSE_TIMEOUT
  freeze_by_part: false        # CLICKHOUSE_FREEZE_BY_PART, allows freeze part by part instead of freeze the whole table
  freeze_by_part_where: ""     # CLICKHOUSE_FREEZE_BY_PART_WHERE, allows parts filtering during freeze when freeze_by_part: true
  secure: false                # CLICKHOUSE_SECURE, use SSL encryption for connect
  skip_verify: false           # CLICKHOUSE_SKIP_VERIFY
  sync_replicated_tables: true # CLICKHOUSE_SYNC_REPLICATED_TABLES
  tls_key: ""                  # CLICKHOUSE_TLS_KEY, filename with TLS key file 
  tls_cert: ""                 # CLICKHOUSE_TLS_CERT, filename with TLS certificate file 
  tls_ca: ""                   # CLICKHOUSE_TLS_CA, filename with TLS custom authority file 
  log_sql_queries: true        # CLICKHOUSE_LOG_SQL_QUERIES, enable log clickhouse-backup SQL queries on `system.query_log` table inside
 clickhouse-server
  debug: false                 # CLICKHOUSE_DEBUG
  config_dir:      "/etc/clickhouse-server"              # CLICKHOUSE_CONFIG_DIR
  restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND, this command use when you try to restore with --r
bac or --config options
  ignore_not_exists_error_during_freeze: true # CLICKHOUSE_IGNORE_NOT_EXISTS_ERROR_DURING_FREEZE, allow avoiding backup failures when yo
u often CREATE / DROP tables and databases during backup creation, clickhouse-backup will ignore `code: 60` and `code: 81` errors during
 execute `ALTER TABLE ... FREEZE`
  check_replicas_before_attach: true # CLICKHOUSE_CHECK_REPLICAS_BEFORE_ATTACH, allow to avoid concurrent ATTACH PART execution when res
tore ReplicatedMergeTree tables
sftp:
  address: "xxx.xxx.xxx.xxx"       # SFTP_ADDRESS
  port: 60002                   # SFTP_PORT
  username: "root"                 # SFTP_USERNAME
  password: "root"                 # SFTP_PASSWORD
  key: ""                      # SFTP_KEY
  path: "/data/clickhouse_backup/"  # SFTP_PATH
  concurrency: 1               # SFTP_CONCURRENCY     
  compression_format: tar      # SFTP_COMPRESSION_FORMAT
  compression_level: 1         # SFTP_COMPRESSION_LEVEL
  debug: false                 # SFTP_DEBUG
最后修改时间:2024-06-05 09:43:30
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论