sorry for posting without complete information I am connecting to spark cluster with the driver program as the backend of web application. This is intended to listen to job progress and some other work. Below is how I am connecting to the cluster
sparkConf = new SparkConf().setAppName("isolated test") .setMaster("spark://master:7077") .set("spark.executor.memory","6g") .set("spark.driver.memory","6g") .set("spark.driver.maxResultSize","2g") .set("spark.executor.extrajavaoptions","-Xmx8g") .set("spark.jars.packages","graphframes:graphframes:0.5.0-spark2.1-s_2.11") .set("spark.jars","/home/usr/jobs.jar"); //this is shared location Linux machines and has the required java classes the crash occurs at gFrame.connectedComponents().setBroadcastThreshold(2).run(); with exception Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, 10.112.29.80): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) after googling around..this appears to be related to dependencies but I don't have much dependencies apart from a few POJOs which have been included through context regards, Imran On Wed, Sep 20, 2017 at 9:00 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Could you include the code where it fails? > Generally the best way to use gf is to use the --packages options with > spark-submit command > > ------------------------------ > *From:* Imran Rajjad <raj...@gmail.com> > *Sent:* Wednesday, September 20, 2017 5:47:27 AM > *To:* user @spark > *Subject:* graphframes on cluster > > Trying to run graph frames on a spark cluster. Do I need to include the > package in spark context settings? or the only the driver program is > suppose to have the graphframe libraries in its class path? Currently the > job is crashing when action function is invoked on graphframe classes. > > regards, > Imran > > -- > I.R > -- I.R