暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

hive语句多次查询一个表导致编译错误得分析处理

原创 杨勇 2021-08-02
1152

# 1、原始语句执行错误:
SELECT reflect(“java.util.UUID”, “randomUUID”) AS rid,
reflect(“java.util.UUID”, “randomUUID”) AS insu_stt_id,
date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyy-MM-dd’) AS stt_date,
t.poolarea_no AS admdvs,
NVL(t2.nw_insu_psncnt,0) AS nw_insu_psncnt,
NVL(t1.redc_insu_psncnt,0) AS redc_insu_psncnt,
NVL(t.crt_insu_psncnt,0) AS crt_insu_psncnt,
‘hgjc’ AS created_by,
current_date() AS created_date,
‘hgjc’ AS updated_by,
current_date() AS updated_date,
date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’) AS dt
FROM
(SELECT COUNT(DISTINCT temp1.psn_no) AS crt_insu_psncnt,
temp1.poolarea_no
FROM test_fct.fct_oms_psn_insu_di temp1
WHERE temp1.psn_insu_stas = ‘1’
GROUP BY temp1.poolarea_no) t
LEFT JOIN
(SELECT temp.poolarea_no,
COUNT(DISTINCT temp.psn_no) AS redc_insu_psncnt
FROM test_fct.fct_oms_psn_insu_di temp
WHERE date_format(DATE_ADD(temp.crt_insu_date,-6),‘yyyyMMdd’) =date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’)
AND temp.psn_insu_stas IN (‘2’,‘4’)
GROUP BY temp.poolarea_no) t1
ON t.poolarea_no = t1.poolarea_no
LEFT JOIN
(SELECT temp.poolarea_no,
COUNT(DISTINCT temp.psn_no) AS nw_insu_psncnt
FROM test_fct.fct_oms_psn_insu_di temp
WHERE temp.fst_insu_ym = date_format(DATE_ADD(current_date(),-1),‘yyyyMM’)
AND date_format(DATE_ADD(temp.crt_insu_date,-6),‘yyyyMMdd’) = date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’)
AND temp.psn_insu_stas = ‘1’
GROUP BY temp.poolarea_no) t2
ON t2.poolarea_no = t1.poolarea_no ;

# 2、错误
#错误
2021-07-29 16:39:40,992 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.ql.session.SessionState]: Resetting thread name to HiveServer2-Handler-Pool: Thread-689154
2021-07-29 16:39:40,992 INFO [-Pool: Thread-689154][org.apache.hive.service.server.ThreadWithGarbageCleanup]: Adding RawStore: org.apache.hadoop.hive.metastore.ObjectStore@3b311305, for the thread: HiveServer2-Handler-Pool: Thread-689154 to threadRawStoreMap for future cleanup.
2021-07-29 16:39:40,992 INFO [-Pool: Thread-610466][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: 41d19f14-f1dd-4f66-b2c8-06799c5e3678
2021-07-29 16:39:40,992 INFO [-Pool: Thread-610466][org.apache.hadoop.hive.ql.session.SessionState]: Resetting thread name to HiveServer2-Handler-Pool: Thread-610466
2021-07-29 16:39:40,992 INFO [-Pool: Thread-610466][org.apache.hive.service.server.ThreadWithGarbageCleanup]: Adding RawStore: org.apache.hadoop.hive.metastore.ObjectStore@304b9dd4, for the thread: HiveServer2-Handler-Pool: Thread-610466 to threadRawStoreMap for future cleanup.
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.ql.session.SessionState]: Updating thread name to d7a7f4a3-508e-489e-b690-0d9a6c3eebb2 HiveServer2-Handler-Pool: Thread-689154
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.ql.session.SessionState]: Resetting thread name to HiveServer2-Handler-Pool: Thread-689154
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hive.service.server.ThreadWithGarbageCleanup]: Adding RawStore: org.apache.hadoop.hive.metastore.ObjectStore@3b311305, for the thread: HiveServer2-Handler-Pool: Thread-689154 to threadRawStoreMap for future cleanup.
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:40,993 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.ql.session.SessionState]: Updating thread name to d7a7f4a3-508e-489e-b690-0d9a6c3eebb2 HiveServer2-Handler-Pool: Thread-689154
2021-07-29 16:39:40,997 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.common.LogUtils]: Success to close the operation log appender for query id hive_20210729163845_2b180a47-0efa-4357-bea4-fc8cf81ac6e6
2021-07-29 16:39:40,997 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:41,012 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:41,012 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:41,012 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:41,012 INFO [-Pool: Thread-689154][org.apache.hadoop.hive.conf.HiveConf]: Using the default value passed in for log id: d7a7f4a3-508e-489e-b690-0d9a6c3eebb2
2021-07-29 16:39:40,993 WARN [-Pool: Thread-610466][org.apache.hive.service.cli.thrift.ThriftCLIService]: Error executing statement: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ConcurrentModificationException null
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:377) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:194) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:277) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:311) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:500) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:487) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_242]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.session.HiveSessionProxy1.run(HiveSessionProxy.java:63) [hiveservice2.2.0TBDS5.1.3.0.jar:2.2.0TBDS5.1.3.0]atjava.security.AccessController.doPrivileged(NativeMethod) [?:1.8.0242]atjavax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0242]atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1764) [hadoopcommon2.7.2TBDS5.1.3.0.jar:?]atorg.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) [hiveservice2.2.0TBDS5.1.3.0.jar:2.2.0TBDS5.1.3.0]atcom.sun.proxy.1.run(HiveSessionProxy.java:63) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_242] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_242] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1764) ~[hadoop-common-2.7.2-TBDS-5.1.3.0.jar:?] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0] at com.sun.proxy.Proxy52.executeStatementAsync(Unknown Source) ~[?:?]
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:507) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.rpc.thrift.TCLIServiceProcessorProcessorExecuteStatement.getResult(TCLIService.java:1437) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.rpc.thrift.TCLIServiceProcessorProcessorExecuteStatement.getResult(TCLIService.java:1422) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.thrift.server.TThreadPoolServerWorkerProcess.run(TThreadPoolServer.java:286) [hiveexec2.2.0TBDS5.1.3.0.jar:2.2.0TBDS5.1.3.0]atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[?:1.8.0242]atjava.util.concurrent.ThreadPoolExecutorWorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242] at java.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: java.util.ConcurrentModificationException
at java.util.HashMapHashIterator.nextNode(HashMap.java:1445) [?:1.8.0242]atjava.util.HashMapHashIterator.nextNode(HashMap.java:1445) ~[?:1.8.0_242] at java.util.HashMapKeyIterator.next(HashMap.java:1469) ~[?:1.8.0_242]
at org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationFromSMBJoins(TezCompiler.java:640) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:117) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:138) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11179) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:257) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:487) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1256) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1241) ~[hive-exec-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:192) ~[hive-service-2.2.0-TBDS-5.1.3.0.jar:2.2.0-TBDS-5.1.3.0]
… 26 more

# 3、分析错误原因:
Error executing statement: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ConcurrentModificationException null
明显是编译并行出错,可以把并行关闭:即可解决问题:
set hive.driver.parallel.compilation=flase;
但真正好得优化其实要从语句做:
hive中得表实际是文件,其中表fct_oms_psn_insu_di做了3次查询,io三倍增加,且用了临时表,做了三次匿名存储,然后大表又三次左连接,所以总体性能io增加了12次,cpu计算膨胀了3倍。
其实完全一次扫描就可以得出结果。一次表得io,一次cpu做聚合计算即可完成。
上面得错误关键信息:

4、优化后得语句:
SELECT reflect(“java.util.UUID”, “randomUUID”) AS rid,
reflect(“java.util.UUID”, “randomUUID”) AS insu_stt_id,
date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyy-MM-dd’) AS stt_date,
poolarea_no AS admdvs,
COUNT(DISTINCT case when fst_insu_ym = date_format(DATE_ADD(current_date(),-1),‘yyyyMM’) AND date_format(DATE_ADD(crt_insu_date,-6),‘yyyyMMdd’) = date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’) AND psn_insu_stas = ‘1’ then psn_no else null end ) AS nw_insu_psncnt,
COUNT(DISTINCT case when (date_format(DATE_ADD(crt_insu_date,-6),‘yyyyMMdd’) =date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’) AND psn_insu_stas IN (‘2’,‘4’)) then psn_no else null end ) AS redc_insu_psncnt,
COUNT(DISTINCT case when psn_insu_stas = ‘1’ then psn_no else null end ) as crt_insu_psncnt
‘hgjc’ AS created_by,
current_date() AS created_date,
‘hgjc’ AS updated_by,
current_date() AS updated_date,
date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’) AS dt
FROM test_fct.fct_oms_psn_insu_di t
where psn_insu_stas = ‘1’
or (date_format(DATE_ADD(crt_insu_date,-6),‘yyyyMMdd’) =date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’)
AND psn_insu_stas IN (‘2’,‘4’))
or (fst_insu_ym = date_format(DATE_ADD(current_date(),-1),‘yyyyMM’)
AND date_format(DATE_ADD(crt_insu_date,-6),‘yyyyMMdd’) = date_format(concat(‘2021’,’-’,‘07’,’-’,‘01’),‘yyyyMMdd’)
AND psn_insu_stas = ‘1’)
group by poolarea_no;

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论