The following sample code works for me. Could you share your code ? df = DataFrame([1,2,3]) df_b=sc.broadcast(df) def f(a): print(df_b.value)
sc.parallelize(range(1,10)).foreach(f) On Sat, May 14, 2016 at 12:59 AM, abi <analyst.tech.j...@gmail.com> wrote: > pandas dataframe is broadcasted successfully. giving errors in datanode > function called kernel > > Code: > > dataframe_broadcast = sc.broadcast(dataframe) > > def kernel(): > df_v = dataframe_broadcast.value > > > Error: > > I get this error when I try accessing the value member of the broadcast > variable. Apprently it does not have a value, hence it tries to load from > the file again. > > File > "C:\spark-1.6.1-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\broadcast.py", > line 97, in value > self._value = self.load(self._path) > File > "C:\spark-1.6.1-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\broadcast.py", > line 88, in load > return pickle.load(f) > ImportError: No module named indexes.base > > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) > at > > org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) > at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) > at > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pandas-dataframe-broadcasted-giving-errors-in-datanode-function-called-kernel-tp26953.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang