Quantcast
Channel: Spark Streaming failing on YARN Cluster - Stack Overflow
Viewing all articles
Browse latest Browse all 5

Answer by Sidharth for Spark Streaming failing on YARN Cluster

$
0
0

I recently ran into the same issue. Here was my scenario:

Cloudera Managed CDH 5.3.3 cluster with 7 nodes. I was submitting the job from one of the nodes and it used to fail in both yarn-cluster and yarn-master modes with the same issue.

If you look at the stacktrace, you'll find this line-

15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1.jar15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/pyspark.zip15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/python/lib/py4j-0.8.2.1-src.zip15/08/12 13:24:49 INFO Client: Source and destination file systems are the same. Not copying file:/home/hdfs/spark-1.4.1/examples/src/main/python/streaming/kyt.py

This is the reason why the job fails because resources are not copied.

In my case, it was resolved by correcting the HADOOP_CONF_DIR path. It wasn't pointing to the exact folder that contains the core-site.xml and yarn-site.xml and other configuration files. Once this was fixed, the resources were copied during the initiation of the ApplicationMaster and the job ran correctly.


Viewing all articles
Browse latest Browse all 5

Trending Articles