优化场景
目前GBase集群内Update采用全列替换方式,即更新字段全列数据重新组织DC形成文件(包括过滤条件匹配到和未匹配到的所有数据),替换原有全列文件。
随着业务数据积累,尤其是部分G表数据量已达数百亿,对G表执行Update操作的原意是加工增量数据,但由于全列替换的原因,造成耗时较长。
在GDS层的业务脚本中,存在类似如下的业务流程:
1.
执行INSERT,向目标表插入增量数据
Insert
G_XXXX_ACCT_INCOME_DAY
(
S_DATE,
CURR_CODE,
ACCT_ORG,
ACCT_NO,
CUST_NO,
BILL_DATE,
INT_BNP,
SVC_BNP,
LATE_CHG_BNP,
INTEREST,
MERCH_INCOME,
ANNUAL_FEE,
SPLOAN_FEE,
DELAY_FEE,
CASH_FEE,
OTHER_FEE,
ALL_INCOME,
FUND_COST
)
Select
'20150505',
curr.CURR_CODE as CURR_CODE,
acct.ACCT_ORG,
acct.ACCT_NO,
acct.CUST_NO,
acct.CARD_BILL_DATE,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
From BD_XXXX_ACCT acct
left join
g_XX_ccard_currcode curr on curr.ORI_ORG=acct.ACCT_ORG AND STARTDATE
<='20140101' AND ENDDATE>='20140101';
2.
对目标表执行UPDATE,对其增量数据进行更新(原意是仅更新增量数据,但由于全列替换的原因,导致目标表整列数据文件会被重组,造成性能问题)。
update G_XXXX_ACCT_INCOME_DAY acct
left
join G_XXXX_ACCT_INCOME_DAY_T1 t1 on t1.S_DATE=acct.S_DATE …
left join G_XXXX_ACCT_INCOME_DAY_T2 t2
on t2.S_DATE=acct.S_DATE …
set
acct.INT_BNP = case when t1.INT_BNP
is null then 0 else t1.INT_BNP end ,
acct.SVC_BNP = case when t1.SVC_BNP is null
then 0 else t1.SVC_BNP end ,
acct.LATE_CHG_BNP = case when
t1.LATE_CHG_BNP is null then 0 else
t1.LATE_CHG_BNP end ,
acct.INTEREST = case when t2.INTEREST
is null then 0 else t2.INTEREST end ,
acct.MERCH_INCOME = case when
t2.MERCH_INCOME is null then 0 else t2.MERCH_INCOME end ,
acct.ANNUAL_FEE = case when
t2.ANNUAL_FEE is null then 0 else t2.ANNUAL_FEE end ,
acct.SPLOAN_FEE = case when
t2.SPLOAN_FEE is null then 0 else t2.SPLOAN_FEE end ,
acct.DELAY_FEE = case when
t2.DELAY_FEE is null then 0 else t2.DELAY_FEE
end ,
acct.CASH_FEE = case when t2.CASH_FEE
is null then 0 else t2.CASH_FEE end ,
acct.OTHER_FEE = case when t2.OTHER_FEE
is null then 0 else t2.OTHER_FEE end ,
acct.ALL_INCOME = case when
t2.ALL_INCOME is null then 0 else t2.ALL_INCOME end ,
acct.FUND_COST = case when
t2.FUND_COST is null then 0 else t2.FUND_COST end
where acct.S_DATE='20140101';
检查方法
检查执行慢的Update语句,确认其所属的业务是否可以进行优化。
优化方法
优化建议1
UPDATE语句的目的是对增量数据进行加工,而这种加工可以放在INSERT语句中进行,因此在上述场景中,INSERT和UPDATE可以合并为一条INSERT语句。
改写后语句如下(合并INSERT和UPDATE):
Insert G_XXXX_ACCT_INCOME_DAY
(
S_DATE,
CURR_CODE,
ACCT_ORG,
ACCT_NO,
CUST_NO,
BILL_DATE,
INT_BNP,
SVC_BNP,
LATE_CHG_BNP,
INTEREST,
MERCH_INCOME,
ANNUAL_FEE,
SPLOAN_FEE,
DELAY_FEE,
CASH_FEE,
OTHER_FEE,
ALL_INCOME,
FUND_COST
)
Select
'20140606',
curr.CURR_CODE as CURR_CODE,
acct.ACCT_ORG,
acct.ACCT_NO,
acct.CUST_NO,
acct.CARD_BILL_DATE,
case when t1.INT_BNP is null then 0
else t1.INT_BNP end ,
case when t1.SVC_BNP is null then 0
else t1.SVC_BNP end ,
case when t1.LATE_CHG_BNP is null then 0 else t1.LATE_CHG_BNP end ,
case when t2.INTEREST is null then 0
else t2.INTEREST end ,
case when t2.MERCH_INCOME is null
then 0 else t2.MERCH_INCOME end ,
case when t2.ANNUAL_FEE is null then
0 else t2.ANNUAL_FEE end ,
case when t2.SPLOAN_FEE is null then
0 else t2.SPLOAN_FEE end ,
case when t2.DELAY_FEE is null then 0
else t2.DELAY_FEE end ,
case when t2.CASH_FEE is null then 0
else t2.CASH_FEE end ,
case when t2.OTHER_FEE is null then 0
else t2.OTHER_FEE end ,
case when t2.ALL_INCOME is null then
0 else t2.ALL_INCOME end ,
case when t2.FUND_COST is null then 0
else t2.FUND_COST end
From BD_XXXX_ACCT acct
left join g_XX_ccard_currcode curr on
curr.ORI_ORG=acct.ACCT_ORG AND STARTDATE <='20140101' AND
ENDDATE>='20140101'
left join bdpdbtest.G_XXXX_ACCT_INCOME_DAY_T1 t1 on
t1.S_DATE='20140101' and t1.ACCT_ORG=acct.ACCT_ORG AND t1.CUST_NO=acct.CUST_NO
AND t1.ACCT_NO=acct.ACCT_NO
left join bdpdbtest.G_XXXX_ACCT_INCOME_DAY_T2 t2 on
t2.S_DATE='20140101' and t2.CURR_CODE=curr.CURR_CODE AND
t2.CUST_NO=acct.CUST_NO AND t2.ACCT_NO=acct.ACCT_NO
where acct.StartDt<=date('20140101')
and acct.EndDt>=date('20140101') ;
优化建议2
对于必须执行Update的场景,可以先把增量数据插入一个临时表。在临时表上做完Update后,再把临时表插入目标表。这样避免了直接在数据量巨大的目标表上做Update,提升性能。




