暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

一个hive语句错误和优化得实际案例

原创 杨勇 2021-08-13
1455

1、原始语句,语句写得很长,运行环境hadoop组件是2.7.2,hive为2.3
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.vectorized.execution.enabled=false;
SELECT ‘0’ AS rtype,
cf.MDTRT_ID,
med,
rx,
vis,
oprn,
ABS(hash(reflect(“java.util.UUID”, “randomUUID”))) % 1 AS par
FROM
(SELECT a.MDTRT_ID,
concat_ws(‘δ’, collect_list(a.med_content)) med,
concat_ws(‘δ’, collect_list(b.rx_content)) rx
FROM ims_ads_test.tmp_mdtrt_mid_51000020210803102900107001 a
INNER JOIN ims_ads_test.tmp_rx_mid_51000020210803102900107001 b ON a.MDTRT_ID = b.MDTRT_ID
AND a.MED_MDTRT_TYPE IN (‘1’,
‘2’,
‘3’,
‘4’)
AND a.FIXMEDINS_OPTINS IN (‘510322’,
‘510304’,
‘510303’,
‘510302’,
‘510321’,
‘510340’,
‘510311’,
‘510399’,
‘510300’)
AND a.SETL_TIME BETWEEN ‘2021-07-01 00:00:00’ AND ‘2021-07-31 23:59:59’
AND a.MEDFEE_SUMAMT BETWEEN 5 AND 99
AND a.MDTRT_ID IS NOT NULL
GROUP BY a.MDTRT_ID) cf
LEFT JOIN
(SELECT a.MDTRT_ID,
concat_ws(‘δ’, collect_list(b.dise_content)) vis
FROM ims_ads_test.tmp_mdtrt_mid_51000020210803102900107001 a
INNER JOIN ims_ads_test.tmp_dise_mid_51000020210803102900107001 b ON a.MDTRT_ID = b.MDTRT_ID
AND a.MED_MDTRT_TYPE IN (‘1’,
‘2’,
‘3’,
‘4’)
AND a.FIXMEDINS_OPTINS IN (‘510322’,
‘510304’,
‘510303’,
‘510302’,
‘510321’,
‘510340’,
‘510311’,
‘510399’,
‘510300’)
AND a.SETL_TIME BETWEEN ‘2021-07-01 00:00:00’ AND ‘2021-07-31 23:59:59’
AND a.MEDFEE_SUMAMT BETWEEN 5 AND 99
AND a.MDTRT_ID IS NOT NULL
GROUP BY a.MDTRT_ID) zd ON cf.MDTRT_ID = zd.MDTRT_ID
LEFT JOIN
(SELECT a.MDTRT_ID,
concat_ws(‘δ’,collect_list(b.content)) oprn
FROM ims_ads_test.tmp_mdtrt_mid_51000020210803102900107001 a
INNER JOIN ims_ads_test.tmp_oper_mid_51000020210803102900107001 b ON a.MDTRT_ID = b.MDTRT_ID
AND a.MED_MDTRT_TYPE IN (‘1’,
‘2’,
‘3’,
‘4’)
AND a.FIXMEDINS_OPTINS IN (‘510322’,
‘510304’,
‘510303’,
‘510302’,
‘510321’,
‘510340’,
‘510311’,
‘510399’,
‘510300’)
AND a.SETL_TIME BETWEEN ‘2021-07-01 00:00:00’ AND ‘2021-07-31 23:59:59’
AND a.MEDFEE_SUMAMT BETWEEN 5 AND 99
AND a.MDTRT_ID IS NOT NULL
GROUP BY a.MDTRT_ID) ss ON cf.MDTRT_ID = ss.MDTRT_ID;
2、运行报错,错误日志如下:
08-05 01:45:14-INFO : Map 1: 77(+100,-65)/207 Map 10: -/- Map 7: 2/2 Map 9: -/- Reducer 2: 0/2 Reducer 3: 0/2 Reducer 4: 0/2 Reducer 5: 0/1 Reducer 6: 0/1 Reducer 8: 1/1

08-05 01:45:19-ERROR : Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1625148180011_330900_1_02, diagnostics=[Task failed, taskId=task_1625148180011_330900_1_02_000091, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1625148180011_330900_1_02_000091_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable1.run(TaskRunner2Callable.java:61)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1764)atorg.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)atorg.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)atorg.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)atjava.util.concurrent.FutureTask.run(FutureTask.java:266)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutor1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1764) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:494)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:199)
… 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1140)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:483)
… 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1098)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1137)
… 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:425)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1061)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1081)
… 20 more
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
at org.apache.hadoop.io.BytesWritable.set(BytesWritable.java:171)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:575)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:404)
… 23 more

3、分析:
查看错误,明显是运行得时候出错了,所以他不是编译、语法错误。而是运行中出现得错误。运行出错有环境资源问题,也有可能数据得问题(脏数据导致计算出错),很显然这里都不是,这里是明显做集合转换得时候,数据太大,导致溢出。是函数得bug。

4、解决办法,可以重构语法,分配实现。这里给出一个优化后得语句,看上去简洁,干净。效率也高。
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.vectorized.execution.enabled=false;
SELECT ‘0’ AS rtype,
mast.MDTRT_ID,
mast.med,
mast.rx,
concat_ws(‘δ’, collect_list(c.dise_content)) vis,
concat_ws(‘δ’,collect_list(d.content)) oprn,
ABS(hash(reflect(“java.util.UUID”, “randomUUID”))) % 1 AS par
FROM (SELECT a.MDTRT_ID,
concat_ws(‘δ’, collect_list(a.med_content)) med,
concat_ws(‘δ’, collect_list(b.rx_content)) rx
FROM ims_ads_test.tmp_mdtrt_mid_51000020210803102900107001 a,
ims_ads_test.tmp_rx_mid_51000020210803102900107001 b
where a.MED_MDTRT_TYPE IN (‘1’,
‘2’,
‘3’,
‘4’)
AND a.FIXMEDINS_OPTINS IN (‘510322’,
‘510304’,
‘510303’,
‘510302’,
‘510321’,
‘510340’,
‘510311’,
‘510399’,
‘510300’)
AND a.SETL_TIME BETWEEN ‘2021-07-01 00:00:00’ AND ‘2021-07-31 23:59:59’
AND a.MEDFEE_SUMAMT BETWEEN 5 AND 99
AND a.MDTRT_ID IS NOT NULL
AND a.MDTRT_ID = b.MDTRT_ID
group by a.MDTRT_ID) mast
left join ims_ads_test.tmp_dise_mid_51000020210803102900107001 c
ON mast.MDTRT_ID = c.MDTRT_ID
left join ims_ads_test.tmp_oper_mid_51000020210803102900107001 d
ON mast.MDTRT_ID = d.MDTRT_ID
GROUP BY mast.MDTRT_ID,
mast.med,
mast.rx;

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论