Hi,
Affects Version/s:1.6.0
Component/s:PySpark
I faced below exception when I tried to run
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=filter#pyspark.sql.SQLContext.jsonRDD
samples:
Exception: Python in worker has different version 2.7 than that in driver
3.5,
Have you built the Spark jars? Can you run the Spark Scala shell?
--Hossein
On Tuesday, October 6, 2015, Khandeshi, Ami <ami.khande...@fmr.com.invalid>
wrote:
> > Sys.setenv(SPARKR_SUBMIT_ARGS="--verbose sparkr-shell")
> > Sys.setenv(SPARK_PRINT_LAUNCH_COMMAND=
Why not letting SparkSQL deal with parallelism? When using SparkSQL data
sources you can control parallelism by specifying mapred.min.split.size
and mapred.max.split.size in your Hadoop configuration. You can then
repartition your data as you wish and save it as Parquet.
--Hossein
On Thu, May 28