bq. the right way to reach JVM in python Can you tell us more about what you want to achieve ?
If you want to pass some value to workers, you can use broadcast variable. Cheers On Mon, Sep 28, 2015 at 10:31 PM, YiZhi Liu <javeli...@gmail.com> wrote: > Hi Ted, > > Thank you for reply. The sc works at driver, but how can I reach the > JVM in rdd.map ? > > 2015-09-29 11:26 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > >>>> sc._jvm.java.lang.Integer.valueOf("12") > > 12 > > > > FYI > > > > On Mon, Sep 28, 2015 at 8:08 PM, YiZhi Liu <javeli...@gmail.com> wrote: > >> > >> Hi, > >> > >> I'm doing some data processing on pyspark, but I failed to reach JVM > >> in workers. Here is what I did: > >> > >> $ bin/pyspark > >> >>> data = sc.parallelize(["123", "234"]) > >> >>> numbers = data.map(lambda s: > >> >>> > SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip())) > >> >>> numbers.collect() > >> > >> I got, > >> > >> Caused by: org.apache.spark.api.python.PythonException: Traceback > >> (most recent call last): > >> File > >> > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", > >> line 111, in main > >> process() > >> File > >> > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py", > >> line 106, in process > >> serializer.dump_stream(func(split_index, iterator), outfile) > >> File > >> > "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py", > >> line 263, in dump_stream > >> vs = list(itertools.islice(iterator, batch)) > >> File "<stdin>", line 1, in <lambda> > >> AttributeError: 'NoneType' object has no attribute '_jvm' > >> > >> at > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) > >> at > >> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) > >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) > >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > >> at org.apache.spark.scheduler.Task.run(Task.scala:88) > >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >> ... 1 more > >> > >> While _jvm at the driver end looks fine: > >> > >> >>> > >> >>> > SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip()) > >> 123 > >> > >> The program is trivial, I just wonder what is the right way to reach > >> JVM in python. Any help would be appreciated. > >> > >> Thanks > >> > >> -- > >> Yizhi Liu > >> Senior Software Engineer / Data Mining > >> www.mvad.com, Shanghai, China > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > > > -- > Yizhi Liu > Senior Software Engineer / Data Mining > www.mvad.com, Shanghai, China >