本文通过StarRocks自带的broker,备份数据库表到kerberos加密的CDP HDFS HA上进行演示。
0.ENV
CentOS 7.8.2003
StarRocks 1.15.2(含apache_hdfs_broker)
CDP 7.1.4
1. 在Broker主机上安装mysql-client
1) 下载并解压mysql
[root@sr03 soft]# wget https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-5.7.33-1.el7.x86_64.rpm-bundle.tar[root@sr03 soft]# mkdir 5.7.33[root@sr03 soft]# tar xvf mysql-5.7.33-1.el7.x86_64.rpm-bundle.tar -C 5.7.33[root@sr03 soft]# yum -y install libmysqlclient.so.18 #会安装maridb-libs
2) 卸载mariadb软件包
查看:
[root@cdh1 soft]# rpm -qa|grep mariadb
卸载:
[root@cdh1 soft]# rpm -e --nodeps mariadb-libs-5.5.68-1.el7.i686
3) 安装mysql客户端
[root@sr03 5.7.33]# rpm -ivh mysql-community-common-5.7.33-1.el7.x86_64.rpm[root@sr03 5.7.33]# rpm -ivh mysql-community-libs-5.7.33-1.el7.x86_64.rpm[root@sr03 5.7.33]# rpm -ivh mysql-community-client-5.7.33-1.el7.x86_64.rpm
4) 验证客户端
登录StarRocks,成功
[root@sr03 5.7.33]# mysql -hsr01 -P9030 -uroot -p87z_L8do
2. 安装keberos依赖包
1) 在broker节点安装kerberos客户端软件包
# yum -y install krb5-libs krb5-workstation openldap-clients
2) 配置客户端
hdfs采用Kerberos认证方式,Broker备份到hdfs上需要krb5.conf文件,krb5.conf文件包含Kerberos的配置信息。
默认方式,将krb5.conf文件存放在/etc目录中。
[root@sr03 ~]# cat etc/krb5.conf[libdefaults]dns_lookup_realm = falsedns_lookup_kdc = falseticket_lifetime = 24hrenew_lifetime = 7dforwardable = truerdns = falsedefault_realm = RUNDBA.NET[realms]RUNDBA.NET = {kdc = nn01.rundba.net:88kdc = nn02.rundba.net:88master_kdc = nn01.rundba.net:88admin_server = nn01.rundba.net:749}[domain_realm]rundba.net = RUNDBA.NET.rundba.net = RUNDBA.NET
也可以通过设置环境变量KRB5_CONFIG指定krb5.conf文件位置-本次不使用
KRB5_CONFIG环境变量配置参考:
vi etc/profile### StarRocks apache_hdfs_broker CDP kerberos ###export KRB5_CONFIG=/StarRocks/apache_hdfs_broker/conf/krb5.conf######
加载环境变量-本次不使用
[root@sr03 StarRocks]# source etc/profile
3) 重启broker
[root@sr03 StarRocks]# StarRocks/apache_hdfs_broker/bin/stop_broker.shstop java, and remove pid file.[root@sr03 StarRocks]# StarRocks/apache_hdfs_broker/bin/start_broker.sh --daemon[root@sr03 StarRocks]# jps9952 BrokerBootstrap9975 Jps
3. CDP用户密码keytab上传至Broker主机上
1) 用户keytab文件上传至Broker主机
将用户的it01.keytab文件上传至服务器端任意路径,如/StarRocks/apache_hdfs_broker/conf/,权限为1007:1007
[root@sr03 ~]# ls -l StarRocks/apache_hdfs_broker/conf/it01.keytab-rw-r--r-- 1 1007 1007 506 Jun 7 14:33 StarRocks/apache_hdfs_broker/conf/it01.keytab
2) 验证登录
[root@sr03 conf]# kinit it01Password for it01@RUNDBA.NET: #输入密码[root@sr03 conf]# klist #当前用户已经登录Ticket cache: FILE:/tmp/krb5cc_0Default principal: it01@RUNDBA.NETValid starting Expires Service principal06/16/2021 15:12:00 06/17/2021 15:10:53 krbtgt/RUNDBA.NET@RUNDBA.NETrenew until 06/23/2021 15:10:53
3) 创建备份文件目录-在CDP集群主机上
[root@nn01 ~]# hdfs dfs -ls user/it01[root@nn01 ~]# hdfs dfs -mkdir user/it01/backup[root@nn01 ~]# hdfs dfs -ls -d user/it01/backupdrwxr-xr-x - it01 it01 0 2021-06-16 15:11 user/it01/backup
4. 创建REPOSITORY
1) 创建REPOSITORY
CDP启用HDFS HA,且使用了kerberos加密
mysql> CREATE REPOSITORY cdp_repo-> WITH BROKER broker3-> ON LOCATION "hdfs://nameservice1:8020/user/it01/backup"-> PROPERTIES-> (-> "dfs.nameservices" = "nameservice1",-> "dfs.ha.namenodes.nameservice1" = "namenode573, namenode981",-> "dfs.namenode.rpc-address.nameservice1.namenode573" = "nn01.rundba.net:8020",-> "dfs.namenode.rpc-address.nameservice1.namenode981" = "nn02.rundba.net:8020",-> "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",-> "hadoop.security.authentication" = "kerberos",-> "kerberos_principal" = "it01@RUNDBA.NET",-> "kerberos_keytab" = "/StarRocks/apache_hdfs_broker/conf/it01.keytab"-> );Query OK, 0 rows affected (1.27 sec)
2) 查看REPOSITORY
cdp_repo之前已经删除,后来又重新创建,但createTime显示仍为之前创建时间,应为bug。
mysql> SHOW REPOSITORIES;+--------+----------+---------------------+------------+-------------------------------------------+---------+--------+| RepoId | RepoName | CreateTime | IsReadOnly | Location | Broker | ErrMsg |+--------+----------+---------------------+------------+-------------------------------------------+---------+--------+| 20006 | cdp_repo | 2021-06-08 16:01:37 | false | hdfs://nameservice1:8020/user/it01/backup | broker3 | NULL |+--------+----------+---------------------+------------+-------------------------------------------+---------+--------+1 row in set (0.01 sec)
3) 删除REPOSITORY
当不使用时,可删除repository
mysql> drop repository cdp_repo;
5. 备份
1) 全量备份ssb数据库下的表dates到仓库cdp_repo中
BACKUP SNAPSHOT ssb.snapshot_label3TO cdp_repoON (dates)PROPERTIES ("type" = "full");
2) 查看备份进度
mysql> show backup;+-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+| JobId | SnapshotName | DbName | State | BackupObjs | CreateTime | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |+-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+| 20008 | snapshot_label3 | ssb | SNAPSHOTING | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | NULL | NULL | NULL | | | | [OK] | 86400 |+-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+1 row in set (0.01 sec)mysql> show backup;+-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+| JobId | SnapshotName | DbName | State | BackupObjs | CreateTime | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |+-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+| 20008 | snapshot_label3 | ssb | UPLOAD_SNAPSHOT | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27 | NULL | NULL | | | | [OK] | 86400 |+-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+1 row in set (0.00 sec)mysql> show backup;+-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+| JobId | SnapshotName | DbName | State | BackupObjs | CreateTime | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |+-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+| 20008 | snapshot_label3 | ssb | SAVE_META | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27 | 2021-06-16 15:28:33 | NULL | | | | [OK] | 86400 |+-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+1 row in set (0.00 sec)mysql> show backup;+-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+| JobId | SnapshotName | DbName | State | BackupObjs | CreateTime | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |+-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+| 20008 | snapshot_label3 | ssb | FINISHED | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27 | 2021-06-16 15:28:33 | 2021-06-16 15:28:39 | | | | [OK] | 86400 |+-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+1 row in set (0.00 sec)
State一列表示备份作业当前所在阶段,可以随时关注备份进度,状态参考:
PENDING:作业初始状态。
SNAPSHOTING:正在进行快照操作。
UPLOAD_SNAPSHOT:快照结束,准备上传。
UPLOADING:正在上传快照。
SAVE_META:正在本地生成元数据文件。
UPLOAD_INFO:上传元数据文件和本次备份作业的信息。
FINISHED:备份完成。
CANCELLED:备份失败或被取消。
6. 备份速度计算及HDFS特性说明
1) lineorder_flat表大小查看
当前表占用空间约370G
mysql> show data from ssb.lineorder_flat;+----------------+----------------+------------+--------------+------------+| TableName | IndexName | Size | ReplicaCount | RowCount |+----------------+----------------+------------+--------------+------------+| lineorder_flat | lineorder_flat | 369.205 GB | 480 | 3643371678 || | Total | 369.205 GB | 480 | |+----------------+----------------+------------+--------------+------------+2 rows in set (0.00 sec)
2) 备份
备份数据库ssb下的lineorder_flat表
mysql> BACKUP SNAPSHOT ssb.snapshot_label4-> TO cdp_repo-> ON (lineorder_flat)-> PROPERTIES ("type" = "full");Query OK, 0 rows affected (0.06 sec)
3) 备份时长及速率
369.205GB快照创建时长4秒,上传hdfs占用14分钟,平均上传速率370G*3/(14*60)=1.32G/s
mysql> show backup;+-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+| JobId | SnapshotName | DbName | State | BackupObjs | CreateTime | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |+-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+| 20010 | snapshot_label4 | ssb | FINISHED | [default_cluster:ssb.lineorder_flat] | 2021-06-16 15:41:35 | 2021-06-16 15:41:39 | 2021-06-16 15:55:39 | 2021-06-16 15:55:45 | | | | [OK] | 86400 |+-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+1 row in set (0.00 sec)
4) 速率对比
HDFS IO写入峰值为1.5G/s,数据初始写入和写入结束IO较低,通过hdfs监控页面IO速率和上步骤计算平均速率1.3G/s相差不大,以计算值为准。

可以通过计算速率对整库备份时长进行估算。
5) 备份说明
备份数据需要写入3份,每一个块有2份冗余,lable4为lineorder_flat表数据量370G的备份,共占用hdfs 1.1T。
[root@nn01 ~]# hdfs dfs -du -h /user/it01/backup/__palo_repository_cdp_repo/55 165 /user/it01/backup/__palo_repository_cdp_repo/__repo_info46.9 K 140.6 K /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label1158.8 G 476.4 G /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label246.9 K 140.6 K /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label3369.2 G 1.1 T /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label4

备份名称中带有百度palo标签信息。
7. 小结
文章通过StarRocks自带的apache_hdfs_broker备份一张表快照,进行演示。
备份前需要创建repository,因CDP平台提供的hdfs使用了kerberos加密,同时启用了HDFS HA,需要在创建repository前,进行kerberos客户端加密配置,创建repository需要指定对应的信息进行配置。
文中并未对分区等其它备份方式进行演示,还可以通过脚本,实现自动备份。DorisManager目前尚未提供备份功能,建议在实现快照、导出,通过结合Percona开源工具xtrabackup实现在线热备,同时集成到DM中,实现较为完整的备份体系。


-- 完 ---

不足之处,还望抛转。
作者:王坤,微信公众号:rundba,欢迎转载,转载请注明出处。
如需公众号转发,请联系wx:landnow。




