暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

GBase 8a Mpp Cluster集群产品性能优化篇避免大表Update

原创 Bright 2022-04-12
609

优化场景

    目前GBase集群内Update采用全列替换方式,即更新字段全列数据重新组织DC形成文件(包括过滤条件匹配到和未匹配到的所有数据),替换原有全列文件。


随着业务数据积累,尤其是部分G表数据量已达数百亿,对G表执行Update操作的原意是加工增量数据,但由于全列替换的原因,造成耗时较长。

         在GDS层的业务脚本中,存在类似如下的业务流程:

         1. 执行INSERT,向目标表插入增量数据

Insert G_XXXX_ACCT_INCOME_DAY

     (

     S_DATE,

     CURR_CODE,

     ACCT_ORG,

     ACCT_NO,

     CUST_NO,

     BILL_DATE,

     INT_BNP,

     SVC_BNP,

     LATE_CHG_BNP,

     INTEREST,

     MERCH_INCOME,

     ANNUAL_FEE,

     SPLOAN_FEE,

     DELAY_FEE,

     CASH_FEE,

     OTHER_FEE,

     ALL_INCOME,

     FUND_COST

     )

     Select

     '20150505',

     curr.CURR_CODE as CURR_CODE,

     acct.ACCT_ORG,

     acct.ACCT_NO,

     acct.CUST_NO,

     acct.CARD_BILL_DATE,

     0,

     0,

     0,

     0,

     0,

     0,

     0,

     0,

     0,

     0,

     0,

     0

     From BD_XXXX_ACCT acct

     left join  g_XX_ccard_currcode curr on curr.ORI_ORG=acct.ACCT_ORG AND STARTDATE <='20140101' AND ENDDATE>='20140101';

        

         2. 对目标表执行UPDATE,对其增量数据进行更新(原意是仅更新增量数据,但由于全列替换的原因,导致目标表整列数据文件会被重组,造成性能问题)。

         update G_XXXX_ACCT_INCOME_DAY acct

               left join G_XXXX_ACCT_INCOME_DAY_T1 t1 on t1.S_DATE=acct.S_DATE …

       left join G_XXXX_ACCT_INCOME_DAY_T2 t2 on t2.S_DATE=acct.S_DATE …

       set

          acct.INT_BNP = case when t1.INT_BNP is null then 0 else t1.INT_BNP end  ,

          acct.SVC_BNP = case when t1.SVC_BNP is null then 0 else t1.SVC_BNP end ,

          acct.LATE_CHG_BNP = case when t1.LATE_CHG_BNP  is null then 0 else t1.LATE_CHG_BNP end ,

          acct.INTEREST = case when t2.INTEREST is null then 0 else t2.INTEREST end  ,

          acct.MERCH_INCOME = case when t2.MERCH_INCOME is null then 0 else t2.MERCH_INCOME  end  ,

          acct.ANNUAL_FEE = case when t2.ANNUAL_FEE is null then 0 else t2.ANNUAL_FEE end ,

          acct.SPLOAN_FEE = case when t2.SPLOAN_FEE is null then 0 else t2.SPLOAN_FEE end ,

          acct.DELAY_FEE = case when t2.DELAY_FEE is null then 0 else t2.DELAY_FEE  end ,

          acct.CASH_FEE = case when t2.CASH_FEE is null then 0 else t2.CASH_FEE end ,

          acct.OTHER_FEE = case when t2.OTHER_FEE is null then 0 else t2.OTHER_FEE end ,

          acct.ALL_INCOME = case when t2.ALL_INCOME is null then 0 else t2.ALL_INCOME end ,

          acct.FUND_COST = case when t2.FUND_COST is null then 0 else t2.FUND_COST end

          where acct.S_DATE='20140101';

检查方法

         检查执行慢的Update语句,确认其所属的业务是否可以进行优化。

 

优化方法

         优化建议1

         UPDATE语句的目的是对增量数据进行加工,而这种加工可以放在INSERT语句中进行,因此在上述场景中,INSERT和UPDATE可以合并为一条INSERT语句。

         改写后语句如下(合并INSERT和UPDATE):

         Insert G_XXXX_ACCT_INCOME_DAY

          (

          S_DATE,

          CURR_CODE,

          ACCT_ORG,

          ACCT_NO,

          CUST_NO,

          BILL_DATE,

          INT_BNP,

          SVC_BNP,

          LATE_CHG_BNP,

          INTEREST,

          MERCH_INCOME,

          ANNUAL_FEE,

          SPLOAN_FEE,

          DELAY_FEE,

          CASH_FEE,

          OTHER_FEE,

          ALL_INCOME,

          FUND_COST

          )

      Select

          '20140606',

          curr.CURR_CODE as CURR_CODE,

          acct.ACCT_ORG,

          acct.ACCT_NO,

          acct.CUST_NO,

          acct.CARD_BILL_DATE,

          case when t1.INT_BNP is null then 0 else t1.INT_BNP end   ,

          case when t1.SVC_BNP is null then 0 else t1.SVC_BNP end ,

          case when t1.LATE_CHG_BNP  is null then 0 else t1.LATE_CHG_BNP end ,

          case when t2.INTEREST is null then 0 else t2.INTEREST end  ,

          case when t2.MERCH_INCOME is null then 0 else t2.MERCH_INCOME  end  ,

          case when t2.ANNUAL_FEE is null then 0 else t2.ANNUAL_FEE end ,

          case when t2.SPLOAN_FEE is null then 0 else t2.SPLOAN_FEE end ,

          case when t2.DELAY_FEE is null then 0 else t2.DELAY_FEE  end ,

          case when t2.CASH_FEE is null then 0 else t2.CASH_FEE end ,

          case when t2.OTHER_FEE is null then 0 else t2.OTHER_FEE end ,

          case when t2.ALL_INCOME is null then 0 else t2.ALL_INCOME end ,

          case when t2.FUND_COST is null then 0 else t2.FUND_COST end

      From BD_XXXX_ACCT acct

               left join  g_XX_ccard_currcode curr on curr.ORI_ORG=acct.ACCT_ORG AND STARTDATE <='20140101' AND ENDDATE>='20140101'

               left join  bdpdbtest.G_XXXX_ACCT_INCOME_DAY_T1 t1 on t1.S_DATE='20140101' and t1.ACCT_ORG=acct.ACCT_ORG AND t1.CUST_NO=acct.CUST_NO AND t1.ACCT_NO=acct.ACCT_NO

               left join  bdpdbtest.G_XXXX_ACCT_INCOME_DAY_T2 t2 on t2.S_DATE='20140101' and t2.CURR_CODE=curr.CURR_CODE AND t2.CUST_NO=acct.CUST_NO AND t2.ACCT_NO=acct.ACCT_NO

              where acct.StartDt<=date('20140101') and acct.EndDt>=date('20140101') ;

 

        优化建议2

         对于必须执行Update的场景,可以先把增量数据插入一个临时表。在临时表上做完Update后,再把临时表插入目标表。这样避免了直接在数据量巨大的目标表上做Update,提升性能。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论