暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

0055.S StarRocks使用broker备份表到kerberos加密的CDP HDFS HA上

rundba 2022-05-23
2217

本文通过StarRocks自带的broker,备份数据库表到kerberos加密的CDP HDFS HA上进行演示。



0.ENV



  • CentOS 7.8.2003

  • StarRocks 1.15.2(含apache_hdfs_broker)

  • CDP 7.1.4




1. 在Broker主机上安装mysql-client



1) 下载并解压mysql

    [root@sr03 soft]# wget https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-5.7.33-1.el7.x86_64.rpm-bundle.tar
    [root@sr03 soft]# mkdir 5.7.33
    [root@sr03 soft]# tar xvf mysql-5.7.33-1.el7.x86_64.rpm-bundle.tar -C 5.7.33
    [root@sr03 soft]# yum -y install libmysqlclient.so.18      #会安装maridb-libs


    2) 卸载mariadb软件包

    查看:


      [root@cdh1 soft]# rpm -qa|grep mariadb


      卸载:

        [root@cdh1 soft]# rpm -e --nodeps mariadb-libs-5.5.68-1.el7.i686


        3) 安装mysql客户端

          [root@sr03 5.7.33]# rpm -ivh mysql-community-common-5.7.33-1.el7.x86_64.rpm
          [root@sr03 5.7.33]# rpm -ivh mysql-community-libs-5.7.33-1.el7.x86_64.rpm
          [root@sr03 5.7.33]# rpm -ivh mysql-community-client-5.7.33-1.el7.x86_64.rpm


          4) 验证客户端

          登录StarRocks,成功

            [root@sr03 5.7.33]# mysql -hsr01 -P9030 -uroot -p87z_L8do




            2. 安装keberos依赖包



            1) 在broker节点安装kerberos客户端软件包

              # yum -y install krb5-libs krb5-workstation openldap-clients


              2) 配置客户端

              hdfs采用Kerberos认证方式,Broker备份到hdfs上需要krb5.conf文件,krb5.conf文件包含Kerberos的配置信息。

              默认方式,将krb5.conf文件存放在/etc目录中。

                [root@sr03 ~]# cat etc/krb5.conf 
                [libdefaults]
                dns_lookup_realm = false
                dns_lookup_kdc = false
                ticket_lifetime = 24h
                renew_lifetime = 7d
                forwardable = true
                rdns = false
                default_realm = RUNDBA.NET
                [realms]
                RUNDBA.NET = {
                 kdc = nn01.rundba.net:88
                 kdc = nn02.rundba.net:88
                 master_kdc = nn01.rundba.net:88
                 admin_server = nn01.rundba.net:749
                }
                [domain_realm]
                rundba.net = RUNDBA.NET
                .rundba.net = RUNDBA.NET

                也可以通过设置环境变量KRB5_CONFIG指定krb5.conf文件位置-本次不使用

                KRB5_CONFIG环境变量配置参考:

                  vi etc/profile
                  ### StarRocks apache_hdfs_broker CDP kerberos ###
                  export KRB5_CONFIG=/StarRocks/apache_hdfs_broker/conf/krb5.conf
                  ######


                  加载环境变量-本次不使用

                    [root@sr03 StarRocks]# source etc/profile


                    3) 重启broker

                      [root@sr03 StarRocks]# StarRocks/apache_hdfs_broker/bin/stop_broker.sh 
                      stop java, and remove pid file.
                      [root@sr03 StarRocks]# StarRocks/apache_hdfs_broker/bin/start_broker.sh --daemon
                      [root@sr03 StarRocks]# jps
                      9952 BrokerBootstrap
                      9975 Jps




                      3. CDP用户密码keytab上传至Broker主机上



                      1) 用户keytab文件上传至Broker主机

                      将用户的it01.keytab文件上传至服务器端任意路径,如/StarRocks/apache_hdfs_broker/conf/,权限为1007:1007

                        [root@sr03 ~]# ls -l StarRocks/apache_hdfs_broker/conf/it01.keytab 
                        -rw-r--r-- 1 1007 1007 506 Jun  7 14:33 StarRocks/apache_hdfs_broker/conf/it01.keytab


                        2) 验证登录

                          [root@sr03 conf]# kinit it01
                          Password for it01@RUNDBA.NET:           #输入密码
                          [root@sr03 conf]# klist                  #当前用户已经登录
                          Ticket cache: FILE:/tmp/krb5cc_0
                          Default principal: it01@RUNDBA.NET
                          Valid starting       Expires              Service principal
                          06/16/2021 15:12:00  06/17/2021 15:10:53  krbtgt/RUNDBA.NET@RUNDBA.NET
                           renew until 06/23/2021 15:10:53


                          3) 创建备份文件目录-在CDP集群主机上

                            [root@nn01 ~]# hdfs dfs -ls user/it01
                            [root@nn01 ~]# hdfs dfs -mkdir user/it01/backup
                            [root@nn01 ~]# hdfs dfs -ls -d user/it01/backup
                            drwxr-xr-x   - it01 it01          0 2021-06-16 15:11 user/it01/backup




                            4. 创建REPOSITORY



                            1) 创建REPOSITORY

                            CDP启用HDFS HA,且使用了kerberos加密

                              mysql> CREATE REPOSITORY cdp_repo
                                 -> WITH BROKER broker3
                                 -> ON LOCATION "hdfs://nameservice1:8020/user/it01/backup"
                                 -> PROPERTIES
                                 -> (
                                 ->     "dfs.nameservices" = "nameservice1",
                                 ->     "dfs.ha.namenodes.nameservice1" = "namenode573, namenode981",
                                 ->     "dfs.namenode.rpc-address.nameservice1.namenode573" = "nn01.rundba.net:8020",
                                 ->     "dfs.namenode.rpc-address.nameservice1.namenode981" = "nn02.rundba.net:8020",
                                 ->     "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                                 ->     "hadoop.security.authentication" = "kerberos",
                                 ->     "kerberos_principal" = "it01@RUNDBA.NET",
                                 ->     "kerberos_keytab" = "/StarRocks/apache_hdfs_broker/conf/it01.keytab"
                                 -> );
                              Query OK, 0 rows affected (1.27 sec)


                              2) 查看REPOSITORY

                              cdp_repo之前已经删除,后来又重新创建,但createTime显示仍为之前创建时间,应为bug。

                                mysql> SHOW REPOSITORIES;
                                +--------+----------+---------------------+------------+-------------------------------------------+---------+--------+
                                | RepoId | RepoName | CreateTime          | IsReadOnly | Location                                  | Broker  | ErrMsg |
                                +--------+----------+---------------------+------------+-------------------------------------------+---------+--------+
                                | 20006  | cdp_repo | 2021-06-08 16:01:37 | false      | hdfs://nameservice1:8020/user/it01/backup | broker3 | NULL   |
                                +--------+----------+---------------------+------------+-------------------------------------------+---------+--------+
                                1 row in set (0.01 sec)


                                3) 删除REPOSITORY

                                当不使用时,可删除repository

                                  mysql> drop repository cdp_repo;




                                  5. 备份



                                  1) 全量备份ssb数据库下的表dates到仓库cdp_repo中

                                    BACKUP SNAPSHOT ssb.snapshot_label3
                                    TO cdp_repo
                                    ON (dates)
                                    PROPERTIES ("type" = "full");


                                    2) 查看备份进度

                                      mysql> show backup;
                                      +-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | JobId | SnapshotName    | DbName | State       | BackupObjs                  | CreateTime          | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |
                                      +-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | 20008 | snapshot_label3 | ssb    | SNAPSHOTING | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | NULL                 | NULL               | NULL         |                 |          |            | [OK]   | 86400   |
                                      +-------+-----------------+--------+-------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      1 row in set (0.01 sec)
                                      mysql> show backup;
                                      +-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | JobId | SnapshotName    | DbName | State           | BackupObjs                  | CreateTime          | SnapshotFinishedTime | UploadFinishedTime | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |
                                      +-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | 20008 | snapshot_label3 | ssb    | UPLOAD_SNAPSHOT | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27  | NULL               | NULL         |                 |          |            | [OK]   | 86400   |
                                      +-------+-----------------+--------+-----------------+-----------------------------+---------------------+----------------------+--------------------+--------------+-----------------+----------+------------+--------+---------+
                                      1 row in set (0.00 sec)
                                      mysql> show backup;
                                      +-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | JobId | SnapshotName    | DbName | State     | BackupObjs                  | CreateTime          | SnapshotFinishedTime | UploadFinishedTime  | FinishedTime | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |
                                      +-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+
                                      | 20008 | snapshot_label3 | ssb    | SAVE_META | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27  | 2021-06-16 15:28:33 | NULL         |                 |          |            | [OK]   | 86400   |
                                      +-------+-----------------+--------+-----------+-----------------------------+---------------------+----------------------+---------------------+--------------+-----------------+----------+------------+--------+---------+
                                      1 row in set (0.00 sec)
                                      mysql> show backup;
                                      +-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                      | JobId | SnapshotName    | DbName | State    | BackupObjs                  | CreateTime          | SnapshotFinishedTime | UploadFinishedTime  | FinishedTime        | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |
                                      +-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                      | 20008 | snapshot_label3 | ssb    | FINISHED | [default_cluster:ssb.dates] | 2021-06-16 15:28:21 | 2021-06-16 15:28:27  | 2021-06-16 15:28:33 | 2021-06-16 15:28:39 |                 |          |            | [OK]   | 86400   |
                                      +-------+-----------------+--------+----------+-----------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                      1 row in set (0.00 sec)


                                      State一列表示备份作业当前所在阶段,可以随时关注备份进度,状态参考:

                                      •     PENDING:作业初始状态。

                                      •     SNAPSHOTING:正在进行快照操作。

                                      •     UPLOAD_SNAPSHOT:快照结束,准备上传。

                                      •     UPLOADING:正在上传快照。

                                      •     SAVE_META:正在本地生成元数据文件。

                                      •     UPLOAD_INFO:上传元数据文件和本次备份作业的信息。

                                      •     FINISHED:备份完成。

                                      •     CANCELLED:备份失败或被取消。




                                      6. 备份速度计算及HDFS特性说明



                                      1) lineorder_flat表大小查看

                                      当前表占用空间约370G

                                        mysql> show data from ssb.lineorder_flat;
                                        +----------------+----------------+------------+--------------+------------+
                                        | TableName      | IndexName      | Size       | ReplicaCount | RowCount   |
                                        +----------------+----------------+------------+--------------+------------+
                                        | lineorder_flat | lineorder_flat | 369.205 GB | 480          | 3643371678 |
                                        |                | Total          | 369.205 GB | 480          |            |
                                        +----------------+----------------+------------+--------------+------------+
                                        2 rows in set (0.00 sec)


                                        2) 备份

                                        备份数据库ssb下的lineorder_flat表

                                          mysql> BACKUP SNAPSHOT ssb.snapshot_label4
                                             -> TO cdp_repo
                                             -> ON (lineorder_flat)
                                             -> PROPERTIES ("type" = "full");
                                          Query OK, 0 rows affected (0.06 sec)


                                          3) 备份时长及速率

                                          369.205GB快照创建时长4秒,上传hdfs占用14分钟,平均上传速率370G*3/(14*60)=1.32G/s

                                            mysql> show backup;
                                            +-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                            | JobId | SnapshotName    | DbName | State    | BackupObjs                           | CreateTime          | SnapshotFinishedTime | UploadFinishedTime  | FinishedTime        | UnfinishedTasks | Progress | TaskErrMsg | Status | Timeout |
                                            +-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                            | 20010 | snapshot_label4 | ssb    | FINISHED | [default_cluster:ssb.lineorder_flat] | 2021-06-16 15:41:35 | 2021-06-16 15:41:39  | 2021-06-16 15:55:39 | 2021-06-16 15:55:45 |                 |          |            | [OK]   | 86400   |
                                            +-------+-----------------+--------+----------+--------------------------------------+---------------------+----------------------+---------------------+---------------------+-----------------+----------+------------+--------+---------+
                                            1 row in set (0.00 sec)


                                            4) 速率对比

                                            HDFS IO写入峰值为1.5G/s,数据初始写入和写入结束IO较低,通过hdfs监控页面IO速率和上步骤计算平均速率1.3G/s相差不大,以计算值为准。

                                            可以通过计算速率对整库备份时长进行估算。


                                            5) 备份说明

                                            备份数据需要写入3份,每一个块有2份冗余,lable4为lineorder_flat表数据量370G的备份,共占用hdfs 1.1T。

                                              [root@nn01 ~]# hdfs dfs -du -h /user/it01/backup/__palo_repository_cdp_repo/
                                              55       165      /user/it01/backup/__palo_repository_cdp_repo/__repo_info
                                              46.9 K   140.6 K  /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label1
                                              158.8 G  476.4 G  /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label2
                                              46.9 K   140.6 K  /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label3
                                              369.2 G  1.1 T    /user/it01/backup/__palo_repository_cdp_repo/__ss_snapshot_label4

                                              备份名称中带有百度palo标签信息。



                                              7. 小结



                                              文章通过StarRocks自带的apache_hdfs_broker备份一张表快照,进行演示。

                                              备份前需要创建repository,因CDP平台提供的hdfs使用了kerberos加密,同时启用了HDFS HA,需要在创建repository前,进行kerberos客户端加密配置,创建repository需要指定对应的信息进行配置。

                                              文中并未对分区等其它备份方式进行演示,还可以通过脚本,实现自动备份。DorisManager目前尚未提供备份功能,建议在实现快照、导出,通过结合Percona开源工具xtrabackup实现在线热备,同时集成到DM中,实现较为完整的备份体系。


                                              -- 完 ---

                                              不足之处,还望抛转。

                                              作者:王坤,微信公众号:rundba,欢迎转载,转载请注明出处。

                                              如需公众号转发,请联系wx:landnow。





                                              文章转载自rundba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                              评论