irpas技术客

【presto】presto查询报错整理_kiraraLou_presto 查询报错

irpas 3514

问题一: 报错内容:

Error running query: Corrupted statistics for column "[firstcontactsname] optional binary " in Parquet file "hdfs://nameservice1/user/hive/warehouse/riskn.db/ads_baihang_order_001_01/etl_dt=2021-09-12/000014_0": [min: 0xE4B881E4B89AE697BA, max: 0x3138373631333135363538, num_nulls: 182605] 报错原因:

原因是存储的parquet文件有部分是损坏的,而presto在0.216开始添加了一个文件完整性检测,这个默认是开启的,可以直接关掉。

解决方案: 修改hive的catalog配置文件修改session级别配置 #全局关闭 :vim catalog/hive.properties hive.parquet.fail-on-corrupted-statistics=false #在会话中关闭,在sql前添加配置,推荐此种方式 set session hive.parquet_fail_with_corrupted_statistics=false;
问题二: 问题描述

presto 查询分桶表报错, 参考:https://blog.csdn.net/Mrerlou/article/details/119614586


问题三: sql 内容 select a.mid, a.loan_date, a.channel, count(a.loan_date), sum(b.loan_amount) from inods.t04_ferryorderindia_orders a join inods.t04_ferrycoreindia_loan b on cast(a.order_no as varchar) = cast(b.order_no as varchar) where a.loan_date > '2021-12-06' and a.loan_date < cast('2021-12-12' as date) group by a.mid,a.loan_date,a.channel 报错内容 Error running query: line 10:19: '>' cannot be applied to date, varchar(10)

问题原因

Presto不支持自动类型转换。

SQL中loan_date和 '2021-12-06' 分别是date类型和varchar类型。

解决方案

比较符号左右字段的类型需要设置为一致。

如:

cast(a.loan_date as varchar) > '2021-12-06' 或者 a.loan_date < cast('2021-12-12' as date) 最终sql select a.mid, a.loan_date, a.channel, count(a.loan_date), sum(b.loan_amount) from inods.t04_ferryorderindia_orders a join inods.t04_ferrycoreindia_loan b on cast(a.order_no as varchar) = cast(b.order_no as varchar) where a.loan_date > cast('2021-12-06' as date) and a.loan_date < cast('2021-12-12' as date) group by a.mid,a.loan_date,a.channel
问题四 问题描述

删除HIVE的部分分区后,PRESTO查询数据失败,但是HIVE查询却有数据

报错内容

查询使用presto查询hive表,报错内容如下:

Error running query: Partition location does not exist: hdfs://nameservice1/user/hive/warehouse/rsc.db/bd_banker_coupon_log/etl_dt=2021-12-31

报错原因

对hive数据分区进行了物理删除,但是hive查询时可以屏蔽该错误,但是Presto查询的时候不能屏蔽该错误,而是将该错误抛出。

解决方案

手动删除hive的元数据中的分区信息,如下所示:

alter table rsc.bd_banker_coupon_log drop partition(etl_dt=2021-12-30);

或者是指定分区来查询

select * from rsc.bd_banker_coupon_log where etl_dt='20220112' limit 1
问题五 报错内容 com.facebook.presto.operator.PageTooLargeException: Remote page is too large at com.facebook.presto.operator.HttpRpcShuffleClient.rewriteException(HttpRpcShuffleClient.java:130) at com.facebook.presto.operator.PageBufferClient$1.onFailure(PageBufferClient.java:361) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1052) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: com.facebook.airlift.http.client.ResponseTooLargeException: Maximum response size exceeded at com.facebook.airlift.http.client.jetty.BufferingResponseListener.onHeaders(BufferingResponseListener.java:55) at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:98) at org.eclipse.jetty.client.ResponseNotifier.notifyHeaders(ResponseNotifier.java:90) at org.eclipse.jetty.client.HttpReceiver.responseHeaders(HttpReceiver.java:268) at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.headerComplete(HttpReceiverOverHTTP.java:256) at org.eclipse.jetty.http.HttpParser.parseFields(HttpParser.java:1218) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1502) at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172) at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135) at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73) at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133) at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:155) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) ... 1 more 报错重点 Remote page is too large 报错原因

可能是有个很大的row,或者很大的数据/JSON 导致的。body.length+content.length长度不能超过102400

参考:https://github.com/prestodb/presto/issues/5350

解决方案

修改/etc/config.properties中exchange.http-client.max-content-length 参数由默认的 32M 改为 128MB

exchange.http-client.max-content-length=128MB 闲话

在0.67版本中看上去这个bug 已经很大程度避免了,但是不知道为什么新版本(0.269)又有了。

参考: https://prestodb.io/docs/current/release/release-0.76.html


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #Presto #查询报错 #问题一报错内容Error #running #Query #corrupted #statistics #for