暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

第五章:运维 OceanBase 数据库 5.3 如何对 OceanBase 数据库进行备份和恢复

1842

5.3 如何对 OceanBase 数据库进行备份和恢复

OceanBase 数据库备份/恢复概述

逻辑备份和物理备份

OceanBase 数据库的备份分为以下两类:

  • 逻辑备份

    逻辑备份就是数据导出,将数据库对象的定义和数据导出为 SQL 或 CSV 文件等。

    逻辑备份的优点是功能灵活,可以指定租户名、库名、表名和对象类型等导出,针对表的逻辑备份甚至可以指定 QUERY SQL 带条件导出数据。

    缺点则是只支持离线导出,不支持增量导出。

  • 物理备份。

    物理备份是数据块备份。目前是按集群级别备份,集群里的所有租户都会被备份。备份的内容包含全量数据、增量数据和事务日志。

    理论上当拥有全量数据备份以及其后的所有事务日志备份,就可以还原出该数据库到历史任意时间点。

备份介质

OceanBase 数据库开源版本 V3.1.0 仅支持逻辑备份与恢复,V3.1.1 开始支持物理备份与恢复。

OceanBase 数据库的逻辑备份与恢复是使用工具 OBDUMPER 和 OBLOADER 进行数据导出和导入,具体操作请参考 4.8 如何使用 OBDUMPER / OBLOADER 工具导出/导入 OceanBase 数据

OceanBase 数据库的物理备份是将基线数据(SSTable)和事务日志(clog)备份到一个共享目录里。该共享目录可以是 NFS 目录、阿里云的 OSS 存储或腾讯云的 COS 存储。每个目录都要挂载到每个 OceanBase 数据库节点的本地文件系统上,OceanBase 数据库备份会自动选择从哪个节点备份数据到该节点的备份目录。

在恢复的时候,目标 OceanBase 集群的每个节点上也需要挂载一个共享目录到本地文件系统,这个目录包含全部备份文件,这样就可以还原到任意历史时间点。

恢复时的共享目录可以和备份是同一个共享目录(也可以不是同一个)。备份时的共享目录通常用于将生产环境的备份还原到线下测试环境。

备份策略

OceanBase 数据库的备份策略跟传统数据库备份一样,支持数据全量备份、数据增量备份和事务日志备份。

在开启全量备份之前,需要先开启事务日志备份。OceanBase 数据库的事务日志备份跟传统数据库的日志备份不完全相同,它是近实时备份。平均每秒钟检查一下事务日志是否有新增,如果有就备份到备份目录里。

OceanBase 数据库的备份跟传统数据库的备份还有个不一样的地方是全量备份的触发要求。OceanBase 数据库的全量备份实际上是备份数据文件里的基线数据(SSTable)。所以要求在全量备份之前,需先对集群发起一次合并(major freeze)操作。集群合并结束后,数据文件生成新的基线版本,全量备份就直接备份新版本的数据。

由于 OceanBase 数据库合并之后生成的基线版本是全局一致性的版本,所以 OceanBase 数据库的全量备份也就很容易满足了全局一致性要求。

此后,在 OceanBase 集群下一次合并之前,数据文件里的版本基本不变,因此无需再次做全量备份。所以 OceanBase 数据库会要求每次全量备份之前都有新的基线版本生成,否则就会报错。

租户恢复

OceanBase 数据库目前的物理备份是集群级别的备份(整个集群一起备份)。虽然集群数据是三副本,但备份时只会备份一个副本的数据。OceanBase 数据库的恢复是按租户进行恢复,如果租户架构也是三副本的,那么恢复的时候会自动恢复出三副本,并且有一个主副本和两个备副本。


如何配置备份恢复目录

目前的版本物理备份恢复使用的目录只能是 NFS 共享的目录。

alter system set backup_dest='file:///xxx';

其中该共享目录 xxx 要挂载到每个 OceanBase 节点的本地文件系统上。

NFS 服务端配置

  • 安装 NFS 工具包

    sudo yum install nfs-utils

  • 设置 NFS 目录

    mkdir -p /data/backup

    将该目录设置为 NFS 共享目录。

  • 修改配置文件 /etc/exports

    vim /etc/exports
    /data/backup/ 172.xx.xx.0/16(rw,sync,all_squash)

    说明:

    • 172.xx.xx.0/16:设置的允许访问这个 NFS 目录的客户端 IP 网段。尽量只包含 OceanBase 集群相关节点服务器的 IP 的网段。

    • rw:允许客户端读写这个目录。

    • sync:同步写模式。

    • all_squash:将客户端的所有 UIDs 和 GIDs 映射到 NFSS SERVER 端的匿名用户。

  • 设置 NFS 目录权限

    # centos7
    sudo chown nfsnobody:nfsnobody -R /data/backup
    
    # centos8 去掉了 nfsnobody
    sudo chown nobody:nobody -R /data/backup

  • 重启 NFS 服务

    [admin@obce00 ~]$ sudo systemctl restart nfs-server
    
    [admin@obce00 ~]$ sudo systemctl status nfs-server
    ● nfs-server.service - NFS server and services
       Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
      Drop-In: /run/systemd/generator/nfs-server.service.d
               └─order-with-mounts.conf
       Active: active (exited) since Sat 2021-10-23 14:19:25 CST; 9min ago
      Process: 1241106 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
      Process: 1241104 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
      Process: 1241102 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
      Process: 1241129 ExecStart=/bin/sh -c if systemctl -q is-active gssproxy; then systemctl reload gssproxy ; fi (code=exited, status=0/SUCCESS)
      Process: 1241117 ExecStart=/usr/sbin/rpc.nfsd (code=exited, status=0/SUCCESS)
      Process: 1241114 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
     Main PID: 1241129 (code=exited, status=0/SUCCESS)
        Tasks: 0 (limit: 195588)
       Memory: 0B
       CGroup: /system.slice/nfs-server.service
    
    Oct 23 14:19:25 obce00 systemd[1]: Starting NFS server and services...
    Oct 23 14:19:25 obce00 systemd[1]: Started NFS server and services.
    
    [admin@obce00 ~]$ sudo showmount -e
    Export list for obce00:
    /data/backup 172.xx.xx.0/16
    
    # centos 8
    
    [admin@obce00 ~]$ sudo exportfs -arv
    exporting 172.xx.xx.0/16:/data/backup
    
    [admin@obce00 ~]$ sudo exportfs -s
    /data/backup  172.xx.xx.0/16(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,all_squash)

NFS 客户端配置

所有 OceanBase 集群节点都要部署 NFS 客户端,将 NFS SERVER 的共享目录挂载到本机。客户端也需要先安装 NFS 软件。

  • 安装 NFS 工具包

    sudo yum install nfs-utils

  • 修改内核参数

    sudo vim /etc/sysctl.conf +R
    
    sunrpc.tcp_max_slot_table_entries=128

  • 挂载目录

    sudo mount -tnfs4 -o rw,timeo=30,wsize=1048576,rsize=1048576,namlen=512,sync,lookupcache=positive 172.xx.xxx.54:/data/backup /backup

    注意

    挂载前需要先运行以下命令创建挂载目录。手动挂载的目录,在机器重启后需要重新挂载,也可以在文件 /etc/fstab 目录里配置。

    sudo mkdir /backup

  • 查看结果

    [admin@obce01 ~]$ mount |grep backup
    172.xx.xxx.54:/data/backup on /backup type nfs4 (rw,relatime,sync,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=30,retrans=2,sec=sys,clientaddr=172.xx.xxx.53,lookupcache=pos,local_lock=none,addr=172.xx.xxx.54)
    [admin@obce01 ~]$


如何发起 OceanBase 集群备份

OceanBase 数据库的集群备份过程包含以下几步:

  1. (可选)配置事务日志备份

  2. 发起 MINOR FREEZE

  3. 开启事务日志备份

  4. 发起 MAJOR FREEZE

  5. 发起全量备份

(可选)配置事务日志备份

如果备份目标是:希望能恢复到历史任意时间点。则必须配置事务日志备份。 事务日志备份支持两种模式:Optional 和 Mandatory

  • Optional 模式:表示业务优先。

    在事务日志备份来不及的情况下,日志可能来不及备份就被回收,将会发生事务日志备份断流。这是默认行为。

  • Mandatory 模式:表示备份优先。

    在该模式下,如果事务日志备份速度跟不上用户事务日志的写入速度,可能会导致事务提交变慢甚至报错。

    通常情况下,如果事务日志盘目录空间满足要求(OBSERVER 节点内存的 3~4 倍),并且备份存储性能不是太差,出现这种备份事务日志赶不上的情况的概率很低。

事务日志压缩目前支持的压缩算法有:zstd_1.3.8 和 lz4_1.0,默认使用压缩算法 lz4_1.0

日志压缩需要事先开启。

ALTER SYSTEM SET backup_log_archive_option = 'optional compression= enable';

设置后,也可以修改备份模式和压缩算法。

ALTER SYSTEM SET backup_log_archive_option='mandatory compression= zstd_1.3.8';

也可以关闭相应备份模式下的事务日志压缩。默认备份模式是 optional,建议修改时都带上实际的备份模式。

ALTER SYSTEM SET backup_log_archive_option = 'optional compression= disable';
或 
ALTER SYSTEM SET backup_log_archive_option = 'mandatory compression= disable';

发起 MINOR FREEZE

事务日志开始备份时,并不是从当前的事务日志开始备份,而是从上一次 minor freeze 开始。

注意

上次合并时默认也会有一个 minor freeze。 所以,为了降低事务日志开始备份的时间,您需先发起一个 minor freeze。

alter system minor freeze ;

minor freeze 命令会瞬时返回,很快就会有转储操作。转储的时间根据当前内存中增量数据数量而定,通常在几秒到几十秒。转储操作对租户读写性能的影响很低。

开启事务日志备份

事务日志的备份是通过命令 archihvelog 触发的。archivelog 启动后,集群的事务日志定期备份到指定的备份目录。触发的时间间隔由参数 log_archive_checkpoint_interval 指定,范围为 [5s,1h] ,默认时间是 120s。

以当前版本(V3.1.1)的能力,这个参数默认值不建议调小。因为事务日志的备份是以分区或者分区组为单位进行的,当租户的分区数非常多的时候,备份并发的小 IO 会比较多,短期内的 IOPS 可能到达事务日志盘的瓶颈。

alter system archivelog;

事务日志备份开始后,可以查看备份进度确认备份任务开始。

MySQL [oceanbase]> select incarnation, tenant_id, status, min_first_time, max_next_time, now() from CDB_OB_BACKUP_ARCHIVELOG_SUMMARY;
+-------------+-----------+--------+----------------------------+----------------------------+---------------------+
| incarnation | tenant_id | status | min_first_time             | max_next_time              | now()               |
+-------------+-----------+--------+----------------------------+----------------------------+---------------------+
|           1 |         1 | DOING  | 2021-10-20 17:49:15.079198 | 2021-10-24 16:18:24.531211 | 2021-10-24 16:20:17 |
|           1 |      1001 | DOING  | 2021-10-20 17:49:15.079198 | 2021-10-24 16:18:24.531211 | 2021-10-24 16:20:17 |
+-------------+-----------+--------+----------------------------+----------------------------+---------------------+
2 rows in set (0.589 sec)

事务日志备份是按租户分开进行的。列 status 为 DOING 时,表示事务日志在进行。max_next_time 表示最近一次的备份时间。从时间可以看出基本上是 2 分钟触发一次。

发起 MAJOR FREEZE

默认情况下,OceanBase 数据库增量数据都在内存里。当触发转储时,内存中的增量数据直接以 SSTable 格式写到磁盘,但并没有跟磁盘上的基线数据进行合并。所以在上一次合并结束后,基线数据的内容是没有变化的。

OceanBase 数据库的数据备份就是备份磁盘上数据文件里的基线数据。如果这个基线数据的时间是在事务日志开始备份的时间之前,那这个基线数据即使备份成功了也没有意义。如果上次全量备份成功之后基线数据的内容还是没有变化,那再次全量备份也没有意义。

所以,在开启全量数据备份之前要发起合并操作,并且该合并操作需要在事务日志备份开启成功后发起。否则,全量备份会收到相应报错。

alter system major freeze;

合并的时间取决于增量数据的数量,通常要几分钟到几十分钟。可以通过下面 SQL 观察合并进度。

SELECT ZONE,svr_ip,major_version,ss_store_count,merged_ss_store_count,modified_ss_store_count,merge_start_time,merge_finish_time,merge_process
FROM __all_virtual_partition_sstable_image_info
order by major_version desc , zone ;

如果列 merge_finish_time 有数值存在,表示合并结束。

也可以通过下面视图确认合并是否结束。

SELECT * FROM __all_zone WHERE name='merge_status';
+----------------------------+----------------------------+-------+--------------+-------+------+
| gmt_create                 | gmt_modified               | zone  | name         | value | info |
+----------------------------+----------------------------+-------+--------------+-------+------+
| 2021-10-20 15:21:20.765982 | 2021-10-24 16:45:57.990705 |       | merge_status |     0 | IDLE |
| 2021-10-20 15:21:20.767009 | 2021-10-24 16:44:55.001369 | zone1 | merge_status |     0 | IDLE |
| 2021-10-20 15:21:20.767009 | 2021-10-24 16:45:47.005686 | zone2 | merge_status |     0 | IDLE |
| 2021-10-20 15:21:20.769822 | 2021-10-24 16:44:31.857504 | zone3 | merge_status |     0 | IDLE |
+----------------------------+----------------------------+-------+--------------+-------+------+
4 rows in set (0.734 sec)

当列 info 的值都是 IDLE 时,表示该 Zone 合并完成。

发起全量备份

发起全量备份之前,您可设置备份密码(这一步是可选的)。设置密码是会话级别设置,不设置即为空。

MySQL [oceanbase]> set encryption on identified by 'ba******6' only;
Query OK, 0 rows affected (0.225 sec)

全量备份发起命令是 alter system backup database。如果事务日志备份已经开始,那必须等事务日志备份成功之后才可发起全量备份,否则命令会报错。

alter system backup database;

MySQL [oceanbase]> alter system backup database;
ERROR 9040 (HY000): backup can not start, because log archive is not doing. log archive status : BEGINNING.

此外,重新开始全量备份之前距离事务日志备份开始之后需要进行一次合并,否则命令也会报错。

MySQL [oceanbase]> alter system backup database;
ERROR 9040 (HY000): backup can not start, because log archive start timestamp is bigger than frozen timestamp, need major freeze first. start timestamp : 1635069322951942, frozen timestamp : 1635064748441948 .

查看全量备份任务先看视图 CDB_OB_BACKUP_PROGRESS

SELECT incarnation, tenant_id, backup_type, bs_key, partition_count, start_time, completion_time, status
FROM CDB_OB_BACKUP_PROGRESS; 

输出:

+-------------+-----------+-------------+--------+-----------------+----------------------------+----------------------------+---------+
| incarnation | tenant_id | backup_type | bs_key | partition_count | start_time                 | completion_time            | status  |
+-------------+-----------+-------------+--------+-----------------+----------------------------+----------------------------+---------+
|           1 |         1 | D           |      6 |               0 | 2021-10-24 16:56:02.682700 | 2021-10-24 16:56:14.035231 | RUNNING |
|           1 |      1001 | D           |      6 |             220 | 2021-10-24 16:56:02.682700 | 2021-10-24 16:56:12.510089 | RUNNING |
+-------------+-----------+-------------+--------+-----------------+----------------------------+----------------------------+---------+
2 rows in set (0.456 sec)

如果该视图有记录,表示有全量备份任务在进行。如果视图为空,则表示没有全量备份任务或者全量备份任务已经结束。可以查看备份历史记录视图 CDB_OB_BACKUP_SET_DETAILS

select incarnation, tenant_id, bs_key , backup_type, encryption_mode, start_time, completion_time, elapsed_secondes , keep,  output_bytes_display,  output_rate_bytes_display, status  
from CDB_OB_BACKUP_SET_DETAILS 
order by start_time desc limit 10;

输出:

+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
| incarnation | tenant_id | bs_key | backup_type | encryption_mode | start_time                 | completion_time            | elapsed_secondes | keep | output_bytes_display | output_rate_bytes_display | status    |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
|           1 |         1 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:36.480644 |              154 | NO   | 7.17GB               | 47.71MB/S                 | COMPLETED |
|           1 |      1001 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:23.446364 |              141 | NO   | 7.17GB               | 52.13MB/S                 | COMPLETED |
|           1 |      1001 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:53.129814 |               85 | NO   | 3.61GB               | 43.70MB/S                 | COMPLETED |
|           1 |         1 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:56.047487 |               87 | NO   | 3.61GB               | 42.24MB/S                 | COMPLETED |
|           1 |      1001 |      4 | D           | NONE            | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:38.728365 |               52 | NO   | 1.42GB               | 27.92MB/S                 | COMPLETED |
|           1 |         1 |      4 | D           | NONE            | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:41.517514 |               55 | NO   | 1.42GB               | 26.50MB/S                 | COMPLETED |
|           1 |         1 |      3 | D           | NONE            | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:27.693007 |               63 | NO   | 1.52GB               | 24.63MB/S                 | COMPLETED |
|           1 |      1001 |      3 | D           | NONE            | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:26.604282 |               62 | NO   | 1.52GB               | 25.07MB/S                 | COMPLETED |
|           1 |         1 |      2 | D           | PASSWORD        | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:39.021754 |               56 | NO   | 1.11GB               | 20.22MB/S                 | COMPLETED |
|           1 |      1001 |      2 | D           | PASSWORD        | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:38.164658 |               55 | NO   | 1.11GB               | 20.54MB/S                 | COMPLETED |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
10 rows in set (0.015 sec)

(可选) 停止全量备份和事务日志备份

若您想要取消在进行中的全量备份,可以通过命令 alter system cancel backup 取消。如果全量备份已经结束或者没有全量备份任务时,使用该命令会报错。

ALTER SYSTEM CANCEL BACKUP;
ERROR 9049 (HY000): backup status is stopped, can not cancel

您可通过命令 alter system noarchivelog 停止事务日志备份任务。如果事务日志备份已经停止,使用该命令会报错。

MySQL [oceanbase]> ALTER SYSTEM NOARCHIVELOG;
Query OK, 0 rows affected (0.552 sec)

MySQL [oceanbase]> ALTER SYSTEM NOARCHIVELOG;
ERROR 9024 (HY000): log archive backup is already disabled

设置备份清理策略

默认情况下,OceanBase 数据库的备份会一直保存。您可通过参数 backup_recovery_window 设置。参数默认值是 0 表示备份永不过期,可以设置为 7d 等。

MySQL [oceanbase]> alter system set backup_recovery_window='2d';
Query OK, 0 rows affected (0.442 sec)

仅设置参数 backup_recovery_window 的情况下,OceanBase 数据库不会自动清理备份,您还需要设置参数 auto_delete_expired_backup 值为 true

MySQL [oceanbase]> alter system set auto_delete_expired_backup=true;
Query OK, 0 rows affected (0.071 sec)

自动删除备份会在参数变更后异步调用,查看备份历史可以看到很多备份集在自动删除中。

select incarnation, tenant_id, bs_key , backup_type, encryption_mode, start_time, completion_time, elapsed_secondes , keep,  output_bytes_display,  output_rate_bytes_display, status  
from CDB_OB_BACKUP_SET_DETAILS 
order by start_time desc ;

输出:

+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
| incarnation | tenant_id | bs_key | backup_type | encryption_mode | start_time                 | completion_time            | elapsed_secondes | keep | output_bytes_display | output_rate_bytes_display | status    |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
|           1 |         1 |      7 | D           | PASSWORD        | 2021-10-24 19:05:29.023969 | 2021-10-24 19:07:39.488182 |              130 | NO   | 7.29GB               | 57.23MB/S                 | COMPLETED |
|           1 |      1001 |      7 | D           | PASSWORD        | 2021-10-24 19:05:29.023969 | 2021-10-24 19:07:33.432748 |              124 | NO   | 7.29GB               | 60.02MB/S                 | COMPLETED |
|           1 |         1 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:36.480644 |              154 | NO   | 7.17GB               | 47.71MB/S                 | COMPLETED |
|           1 |      1001 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:23.446364 |              141 | NO   | 7.17GB               | 52.13MB/S                 | COMPLETED |
|           1 |         1 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:56.047487 |               87 | NO   | 3.61GB               | 42.24MB/S                 | COMPLETED |
|           1 |      1001 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:53.129814 |               85 | NO   | 3.61GB               | 43.70MB/S                 | COMPLETED |
|           1 |         1 |      4 | D           | NONE            | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:41.517514 |               55 | NO   | 1.42GB               | 26.50MB/S                 | DELETING  |
|           1 |      1001 |      4 | D           | NONE            | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:38.728365 |               52 | NO   | 1.42GB               | 27.92MB/S                 | DELETING  |
|           1 |         1 |      3 | D           | NONE            | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:27.693007 |               63 | NO   | 1.52GB               | 24.63MB/S                 | DELETING  |
|           1 |      1001 |      3 | D           | NONE            | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:26.604282 |               62 | NO   | 1.52GB               | 25.07MB/S                 | DELETING  |
|           1 |         1 |      2 | D           | PASSWORD        | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:39.021754 |               56 | NO   | 1.11GB               | 20.22MB/S                 | DELETING  |
|           1 |      1001 |      2 | D           | PASSWORD        | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:38.164658 |               55 | NO   | 1.11GB               | 20.54MB/S                 | DELETING  |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
12 rows in set (0.155 sec)

关于自动删除备份具体逻辑还比较复杂,详情请参考 自动清理备份的数据

如果自动删除过期备份没有起作用,您还可使用命令 ALTER SYSTEM DELETE BACKUPSET [备份集ID] 手动删除备份。

其中,备份集 ID 可从视图 CDB_OB_BACKUP_SET_DETAILS 的列 BS_KEY 中获取。重复删除同一个备份集将会提示删除中。

MySQL [oceanbase]> ALTER SYSTEM DELETE BACKUPSET 1;
Query OK, 0 rows affected (0.759 sec)

MySQL [oceanbase]> ALTER SYSTEM DELETE BACKUPSET 1;
ERROR 9044 (HY000): delete backup data is in progress

查看视图 CDB_OB_BACKUP_TASK_CLEAN_HISTORY 可以获取备份删除历史。

select incarnation, tenant_id, bs_key, backup_type, partition_count, start_time, completion_time, status
from CDB_OB_BACKUP_TASK_CLEAN_HISTORY;

输出:

+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+--------+
| incarnation | tenant_id | bs_key | backup_type | partition_count | start_time                 | completion_time            | status |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+--------+
|           1 |         1 |      1 | D           |             218 | 2021-10-20 17:58:42.301646 | 2021-10-20 17:59:36.557908 | FINISH |
|           1 |         1 |      2 | D           |             218 | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:39.021754 | FINISH |
|           1 |         1 |      3 | D           |             218 | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:27.693007 | FINISH |
|           1 |         1 |      4 | D           |             218 | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:41.517514 | FINISH |
|           1 |      1001 |      1 | D           |             218 | 2021-10-20 17:58:42.301646 | 2021-10-20 17:59:35.724704 | FINISH |
|           1 |      1001 |      2 | D           |             218 | 2021-10-20 18:14:42.848095 | 2021-10-20 18:15:38.164658 | FINISH |
|           1 |      1001 |      3 | D           |             218 | 2021-10-20 19:48:24.574229 | 2021-10-20 19:49:26.604282 | FINISH |
|           1 |      1001 |      4 | D           |             218 | 2021-10-20 20:26:46.729579 | 2021-10-20 20:27:38.728365 | FINISH |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+--------+
8 rows in set (0.003 sec)

当备份集很大的时候,备份的清理会需要一些时间。如果要取消正在删除中的备份,可以使用命令 ALTER SYSTEM CANCEL DELETE BACKUP; 取消正在执行的清理任务。已经删除的备份文件不会恢复。同时还要关闭自动清理备份机制。

ALTER SYSTEM SET auto_delete_expired_backup = 'False';

ALTER SYSTEM CANCEL DELETE BACKUP;


如何发起 OceanBase 数据库租户恢复

OceanBase 数据库的恢复是按租户恢复,且只能恢复到一个空租户环境。

这里模拟租户被删除故障,执行租户恢复过程。

MySQL [oceanbase]> select now();
+---------------------+
| now()               |
+---------------------+
| 2021-10-24 22:55:20 |
+---------------------+
1 row in set (0.002 sec)

MySQL [oceanbase]> drop tenant obmysql force;
Query OK, 0 rows affected (0.110 sec)

MySQL [oceanbase]> drop resource pool pool_mysql;

准备空租户的资源池

备份租户前需创建租户的资源池。

首先您需有一个规格合适的资源单元规格,之后您可以运行以下命令,使用该资源单元规格创建资源池。

create resource pool my_pool unit='my_unit_config',unit_num=2;

查看备份集

查看当前有效的备份集。

select incarnation, tenant_id, bs_key , backup_type, encryption_mode, start_time, completion_time, elapsed_secondes , keep,  output_bytes_display,  output_rate_bytes_display, status  
from CDB_OB_BACKUP_SET_DETAILS 
order by start_time desc ;

输出:

+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
| incarnation | tenant_id | bs_key | backup_type | encryption_mode | start_time                 | completion_time            | elapsed_secondes | keep | output_bytes_display | output_rate_bytes_display | status    |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
|           1 |         1 |      7 | D           | PASSWORD        | 2021-10-24 19:05:29.023969 | 2021-10-24 19:07:39.488182 |              130 | NO   | 7.29GB               | 57.23MB/S                 | COMPLETED |
|           1 |      1001 |      7 | D           | PASSWORD        | 2021-10-24 19:05:29.023969 | 2021-10-24 19:07:33.432748 |              124 | NO   | 7.29GB               | 60.02MB/S                 | COMPLETED |
|           1 |         1 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:36.480644 |              154 | NO   | 7.17GB               | 47.71MB/S                 | COMPLETED |
|           1 |      1001 |      6 | D           | PASSWORD        | 2021-10-24 16:56:02.682700 | 2021-10-24 16:58:23.446364 |              141 | NO   | 7.17GB               | 52.13MB/S                 | COMPLETED |
|           1 |         1 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:56.047487 |               87 | NO   | 3.61GB               | 42.24MB/S                 | COMPLETED |
|           1 |      1001 |      5 | D           | NONE            | 2021-10-21 08:46:28.618568 | 2021-10-21 08:47:53.129814 |               85 | NO   | 3.61GB               | 43.70MB/S                 | COMPLETED |
+-------------+-----------+--------+-------------+-----------------+----------------------------+----------------------------+------------------+------+----------------------+---------------------------+-----------+
6 rows in set (0.008 sec)

这里有 3 个备份集,其中备份集 5 是没有加密的,备份集 6 和 7 是加密的。这里假设使用备份集 7 来恢复,那么需要在会话级别先设置备份集的解密密码。

MySQL [oceanbase]> SET DECRYPTION IDENTIFIED BY 'ba******6';
Query OK, 0 rows affected (0.001 sec)

打开租户恢复参数

参数 restore_concurrency 指定了恢复线程的并发数,默认是 0,不恢复。需要修改为大于 0 的值。

MySQL [oceanbase]> alter system set restore_concurrency = 8;
Query OK, 0 rows affected (0.130 sec)

通常开启恢复命令后默认还需等待一段时间才开始恢复,整个恢复期间会有三次等待。每次等待时间由内部参数 _restore_idle_time 设置,默认值是 60s


注意

隐含参数未来版本可能会发生变化。生产环境不建议调整这个参数,在测试的时候,如果追求恢复调度时间尽可能的短,可以缩小这个时间到 10s。

MySQL [oceanbase]> ALTER SYSTEM SET _restore_idle_time = '10s';
Query OK, 0 rows affected (0.017 sec)

开始恢复

恢复命令如下:

 ALTER SYSTEM RESTORE <dest_tenant_name> FROM <source_tenan_tname> at 'uri' UNTIL 'timestamp' WITH 'restore_option';

参数描述
dest_tenant_name指恢复的新租户的名称。
source_tenant_name指原集群的租户。
uri指备份时设置的 backup_dest。
timestamp指恢复的时间戳,需要大于等于最早备份的数据备份的 CDB_OB_BACKUP_SET_DETAILS 的 START_TIME,小于等于日志备份 CDB_OB_BACKUP_ARCHIVELOG_SUMMARY 的 MAX_NEXT_TIME。
restore_option支持 backup_cluster_name、backup_cluster_id、pool_list、locality、kms_encrypt:
backup_cluster_name 为必选项,填写源集群的名称。
backup_cluster_id 为必选项,填写源集群的 cluster_id。
pool_list 为必选项,填写用户的资源池。
locality 为可选项,填写租户的 Locality 信息。
kms_encrypt 为可选项,为 true 则表示在恢复时需要使用指定的 kms_encrypt_info。
MySQL [oceanbase]> ALTER SYSTEM RESTORE obmysql2 FROM obmysql at 'file:///backup' UNTIL '2021-10-24 19:07:39.488' WITH 'backup_cluster_name=obdemo&backup_cluster_id=2&pool_list=my_pool';
Query OK, 0 rows affected (0.015 sec)

MySQL [oceanbase]> select * from __all_tenant;
+----------------------------+----------------------------+-----------+-------------+-------------+-------------------+-------------------+--------+----------------+---------------+-----------+-----------------------+------------------+
| gmt_create                 | gmt_modified               | tenant_id | tenant_name | replica_num | zone_list         | primary_zone      | locked | collation_type | info          | read_only | rewrite_merge_version | locality         |
+----------------------------+----------------------------+-----------+-------------+-------------+-------------------+-------------------+--------+----------------+---------------+-----------+-----------------------+------------------+
| 2021-10-20 15:21:20.569347 | 2021-10-20 15:21:20.569347 |         1 | sys         |          -1 | zone1;zone2;zone3 | zone1;zone2,zone3 |      0 |              0 | system tenant |         0 |                     0 | FULL{1}@zone1, FU|
| 2021-10-25 00:08:10.322557 | 2021-10-25 00:08:10.322557 |      1005 | obmysql2    |          -1 | zone1;zone2;zone3 | RANDOM            |      0 |              0 |               |         0 |                     0 | FULL{1}@zone1, FU|
+----------------------------+----------------------------+-----------+-------------+-------------+-------------------+-------------------+--------+----------------+---------------+-----------+-----------------------+------------------+
2 rows in set (0.005 sec)

查看恢复进度和结果

  • 查看集群事件日志

    SELECT DATE_FORMAT(gmt_create, '%b%d %H:%i:%s') gmt_create_ , module, event, name1, value1, name2, value2, rs_svr_ip,name3,value3,name4,value4
    FROM __all_rootservice_event_history
    WHERE 1 = 1   
      AND module IN ('physical_restore','leader_coordinator')
    ORDER BY gmt_create DESC
    limit 30;

    输出:

    +----------------+--------------------+-----------------------+------------+----------------------+-------------+----------+---------------+---------------+------------------------+--------------+--------+
    | gmt_create_    | module             | event                 | name1      | value1               | name2       | value2   | rs_svr_ip     | name3         | value3                 | name4        | value4 |
    +----------------+--------------------+-----------------------+------------+----------------------+-------------+----------+---------------+---------------+------------------------+--------------+--------+
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 7      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105009185964893       | partition_id | 0      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 6      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 5      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 4      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 3      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 2      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105009185964881       | partition_id | 0      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 1      |
    | Oct25 00:12:23 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105558941729769       | partition_id | 0      |
    | Oct25 00:12:16 | physical_restore   | restore_success       | tenant     | obmysql2             |             |          | 172.xx.xxx.53 |               |                        |              |        |
    | Oct25 00:12:16 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | RESTORE_SUCCESS        |              |        |
    | Oct25 00:12:16 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | POST_CHECK             |              |        |
    | Oct25 00:12:16 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | REBUILD_INDEX          |              |        |
    | Oct25 00:09:45 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105009185914881       | partition_id | 0      |
    | Oct25 00:09:34 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | RESTORE_USER_REPLICA   |              |        |
    | Oct25 00:09:33 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | CREATE_USER_PARTITIONS |              |        |
    | Oct25 00:09:27 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | MODIFY_SCHEMA          |              |        |
    | Oct25 00:09:27 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | UPGRADE_POST           |              |        |
    | Oct25 00:09:27 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | UPGRADE_PRE            |              |        |
    | Oct25 00:09:12 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1005     | 172.xx.xxx.53 | tablegroup_id | 1105009185914881       | partition_id | 0      |
    | Oct25 00:08:10 | physical_restore   | change_restore_status | job_id     | 10                   | tenant      | obmysql2 | 172.xx.xxx.53 | status        | RESTORE_SYS_REPLICA    |              |        |
    | Oct25 00:08:10 | physical_restore   | change_restore_status | job_id     | 10                   | tenant_name | obmysql2 | 172.xx.xxx.53 | status        | CREATE_TENANT          |              |        |
    | Oct25 00:08:10 | physical_restore   | restore_start         | ret        | 0                    | tenant_name | obmysql2 | 172.xx.xxx.53 |               |                        |              |        |
    | Oct25 00:05:56 | physical_restore   | restore_failed        | tenant     | obmysql2             |             |          | 172.xx.xxx.53 |               |                        |              |        |
    | Oct25 00:05:56 | physical_restore   | change_restore_status | job_id     | 9                    | tenant      | obmysql2 | 172.xx.xxx.53 | status        | RESTORE_FAIL           |              |        |
    | Oct25 00:05:56 | physical_restore   | change_restore_status | job_id     | 9                    | tenant_name | obmysql2 | 172.xx.xxx.53 | status        | CREATE_TENANT          |              |        |
    | Oct25 00:05:56 | physical_restore   | restore_start         | ret        | 0                    | tenant_name | obmysql2 | 172.xx.xxx.53 |               |                        |              |        |
    | Oct25 00:05:10 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1004     | 172.xx.xxx.53 | tablegroup_id | 1104459430101993       | partition_id | 7      |
    | Oct25 00:05:10 | leader_coordinator | switch_leader         | current_rs | "172.xx.xxx.53:2882" | tenant_id   | 1004     | 172.xx.xxx.53 | tablegroup_id | 1104459430101993       | partition_id | 6      |
    +----------------+--------------------+-----------------------+------------+----------------------+-------------+----------+---------------+---------------+------------------------+--------------+--------+
    30 rows in set (0.083 sec)

  • 查看恢复进度看视图 __all_restore_info

    obclient> SELECT * FROM __all_restore_info;

  • 查看恢复历史 __all_restore_history

    MySQL [oceanbase]> SELECT * FROM __all_restore_history order by gmt_create desc limit 1;
    +----------------------------+----------------------------+--------+-----------------+-----------+-------------+-----------------+----------------------------+----------------------------+----------+-----------------+-----------------++
    | gmt_create                 | gmt_modified               | job_id | external_job_id | tenant_id | tenant_name | status          | start_time                 | completion_time            | pg_count | finish_pg_count | partition_count ||
    +----------------------------+----------------------------+--------+-----------------+-----------+-------------+-----------------+----------------------------+----------------------------+----------+-----------------+-----------------++
    | 2021-10-25 00:12:16.700153 | 2021-10-25 00:12:16.700153 |     10 |              -1 |      1005 | obmysql2    | RESTORE_SUCCESS | 2021-10-25 00:08:10.270205 | 2021-10-25 00:12:16.700153 |      100 |             100 |             100 ||
    +----------------------------+----------------------------+--------+-----------------+-----------+-------------+-----------------+----------------------------+----------------------------+----------+-----------------+-----------------++
    1 row in set (0.003 sec)

  • 查看恢复结果

    [admin@obce00 ~]$ obclient -h172.xx.xxx.54 -uroot@obmysql2#obdemo -P2883 -p***** -c -A tpccdb -e "select max(o_entry_d) from bmsql_oorder;"
    +---------------------+
    | max(o_entry_d)      |
    +---------------------+
    | 2021-10-24 19:07:39 |
    +---------------------+
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论