Re: graphframes on cluster

Imran Rajjad Fri, 22 Sep 2017 03:23:50 -0700

sorry for posting without complete information

I am connecting to spark cluster with the driver program as the backend of
web application. This is intended to listen to job progress and some other
work. Below is how I am connecting to the cluster


sparkConf = new SparkConf().setAppName("isolated test")
   .setMaster("spark://master:7077")
    .set("spark.executor.memory","6g")
    .set("spark.driver.memory","6g")
    .set("spark.driver.maxResultSize","2g")
    .set("spark.executor.extrajavaoptions","-Xmx8g")

.set("spark.jars.packages","graphframes:graphframes:0.5.0-spark2.1-s_2.11")
    .set("spark.jars","/home/usr/jobs.jar"); //this is shared location
Linux machines and has the required java classes

the crash occurs at

gFrame.connectedComponents().setBroadcastThreshold(2).run();

with exception

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 5, 10.112.29.80):
java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$SerializationProxy to field
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

after googling around..this appears to be related to dependencies but I
don't have much dependencies apart from a few POJOs which have been
included through context

regards,
Imran




On Wed, Sep 20, 2017 at 9:00 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Could you include the code where it fails?
> Generally the best way to use gf is to use the --packages options with
> spark-submit command
>
> ------------------------------
> *From:* Imran Rajjad <raj...@gmail.com>
> *Sent:* Wednesday, September 20, 2017 5:47:27 AM
> *To:* user @spark
> *Subject:* graphframes on cluster
>
> Trying to run graph frames on a spark cluster. Do I need to include the
> package in spark context settings? or the only the driver program is
> suppose to have the graphframe libraries in its class path? Currently the
> job is crashing when action function is invoked on graphframe classes.
>
> regards,
> Imran
>
> --
> I.R
>



-- 
I.R

Re: graphframes on cluster

Reply via email to