irpas技术客

Spark 3.0.0 + Hadoop3.1.3 + HIve 3.1.2 遇到的坑_丶亦尘

大大的周 5546

? ?前几天公司的集群从 Hadoop 2.7 + Spark 2.4 + Hive 2.X + Scala2.11 直接干到了?Spark 3.0.0 + Hadoop3.1.3 + HIve 3.1.2 + Scala 2.12 ,然后项目就要跟着升级啦,结果问题就来了一堆了,此篇记录下遇到的几个问题以及解决办法。

使用 mvn package 打包的时候发现 df.foreachPartition内部报错:Error:(83, 19) value hasNext is not a member of Object解决如下

?

? 解决办法: 说没有hasNext 方法,Idea里面Ctrl + 鼠标左键能点进去,说了就是打包的问题了啊,Google一顿操作,有小伙伴也遇到这个问题,貌似Scala2.12要背锅,要改写法 df.rdd.foreachPartition ,然后就愉快的下一步了

?

? 2. Spark 消费Kafka 报错,org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setEvictionPolicy

java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setEvictionPolicy(Lorg/apache/commons/pool2/impl/EvictionPolicy;)V at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.init(InternalKafkaConsumerPool.scala:191) at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.<init>(InternalKafkaConsumerPool.scala:162) at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool.<init>(InternalKafkaConsumerPool.scala:53) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer$.<init>(KafkaDataConsumer.scala:606) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer$.<clinit>(KafkaDataConsumer.scala) at org.apache.spark.sql.kafka010.KafkaBatchPartitionReader.<init>(KafkaBatchPartitionReader.scala:52) at org.apache.spark.sql.kafka010.KafkaBatchReaderFactory$.createReader(KafkaBatchPartitionReader.scala:40) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:60) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

? ? 查看源代码了,发现InternalKafkaConsumerPool$PoolConfig 类 确实找不到方法setEvictionPolicy,想了一下这个类是别人Spark大佬提交的,肯定是测试过的,为啥我找不到这个方法呢,说明这个方法是父类的,那就找父类了,发现是org.apache.commons.pool2.impl.GenericKeyedObjectPoolConfig,然后就去maven respository 看下了,发现commons-pool2包,官方用的是2.6.2的,我的maven给我引用的是2.3的,那就改pom文件啦。

<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_${scala.version}</artifactId> <version>${spark.version}</version> <exclusions> <exclusion> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> </exclusion> </exclusions> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-pool2 --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-pool2</artifactId> <version>2.8.0</version> </dependency>

? ?3. 自己写object ,提交到集群上报错:object not serializable,不能忍,这肯定是Spark 3.0 升级之后的锅,我无奈的加了一个伴生类,然后让伴生类实现Serializable 接口

? ? ?

? 最后附上Google参考的文章链接:https://github.com/leongu-tc/myspark/issues/30


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #spark #300 #hadoop313 #hive #312 #遇到的坑