Tuesday, September 13, 2016

[Big Data - SPARK] Fixing environment problems when running spark-submit jobs from client ( Spark 1.6)


Hi everyone,

Today i will publish my first post about Apache Spark..

The problem I face is when we run a spark job from client machine or an edgenode, we get jars not found errors even if we create UberJar or classpath not founds errors.

In this post, I am gonna talk about how to set client environment in order to submit spark jobs succesfully and show how to debug a java process in detail..

Spark version : 1.6, CDH 5.7.1



Lets first talk about our environment

In our environment, developers work on an edgenode, configured for bda clusters and develops its spark code. After user submits its job in client mode, the job is submitted to hadoop cluster like following example

$spark-submit --master yarn --deploy-mode client --driver-memory 2g --executor-memory 6g --executor-cores 2 --keytab /tmp/ONEKEYTAB --principal ONEPRINCIPAL /tmp/OURSPARKJAR HDFSPATH

In Edgenode, we installed spark binaries and applied spark-client configuration.
You can see my blogpost about that from here.

Before upgrade to Spark 1.6, eveything works fine for this configuration.
After upgrade , problems with related to classpath, jars not found , etc.. occurred.

The spark configuration for client is below:

$cat spark-defaults.conf
spark.eventLog.dir=hdfs://CLUSTERNAME/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.yarn.historyServer.address=http://SPARKHISTORYSERVER:18088
spark.master=yarn-client
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/assembly/lib/spark-assembly-1.6.0-cdh5.7.1-hadoop2.6.0-cdh5.7.1.jar
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
spark.driver.allowMultipleContexts=true


$which spark-submit
/usr/bin/spark-submit

After upgrade to 1.6, we saw the following errors on spark job logs

Job aborted due to stage failure: Task 0 in stage 2674.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2674.0 (TID 8603, 10.230.19.116): java.lang.RuntimeException: Stream '/jars/OURSPARKJAR' was not found.

Or Kafka/.. jars not found errors..

!!!!!!!!!!! So here is our findings..

1 - After upgrade to Spark 1.6 , there is a change on spark-env.sh , and jars required to spark are hard-coded.

$cd /etc/spark/conf
$cat spark-env.sh
..
# Set distribution classpath. This is only used in CDH 5.3 and later.
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")
..
$cat classpath.txt
...
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/xmlenc-0.52.jar
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/xz-1.0.jar
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/zkclient-0.7.jar
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/zookeeper-3.4.5-cdh5.7.1.jar
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/LICENSE.txt
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/NOTICE.txt

So, first action is copy all the jars in spark cluster hosts to our edgenode..  And modified our spark-env.sh file again in edgenode.

But after that our problems are not resolved.

2 - Our next action is the debugging spark jobs and see what is going on.

For that you can use java debugging tools like , jcmd and jvisualcmd.




Here you can see details of JVM of process .

Second alternative is use jcmd, note that you should be owner of the process!

With jcmd you can print running environment of process with following

$jcmd 67746 VM.system_properties

...
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native
...

And you can see those information for a specific spark job from spark history server gui..



SOLUTION

CAUSE > Here we see that , our spark job works on server with wrong environment values!!!

After Spark 1.6, the spark job keeps classpath values wrt origin host. So for that reason, we hit classpath errors and jar not found errors.

After finding this, we just created soft links to real paths of sparkserver, and everything solved.

I want to mention that after upgrade to CDH 5.7 , you are no longer able to download spark-client configuration via Cloudera Manager, so you have to manually reconfigure all your clients.

BONUS>  YOU SHOULD SET YOUR HADOOP_CONF_DIR VARIABLE to CONSISTS OF HIVE_CONF_DIR and HBASE_CONF_DIR if your spark-job uses hbase..

export HADOOP_CONF_DIR=$HADOOP_CONF_DIR:$HIVE_CONF_DIR:$HBASE_CONF_DIR

PS: Thanks to ANIL CHALIL 

Ok, that is all

Thanks for reading.

Enjoy & share.

Source:
http://blog.cloudera.com/blog/category/spark/
support.oracle.com












1 comment :

  1. Really nice blog post.provided a helpful information.I hope that you will post more updates like this
    Hadoop Admin Online Course India

    ReplyDelete