Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions

2016-02-28 Thread Hossein Vatani
Hi, Affects Version/s:1.6.0 Component/s:PySpark I faced below exception when I tried to run http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=filter#pyspark.sql.SQLContext.jsonRDD samples: Exception: Python in worker has different version 2.7 than that in driver 3.5,

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-06 Thread Hossein
Have you built the Spark jars? Can you run the Spark Scala shell? --Hossein On Tuesday, October 6, 2015, Khandeshi, Ami <ami.khande...@fmr.com.invalid> wrote: > > Sys.setenv(SPARKR_SUBMIT_ARGS="--verbose sparkr-shell") > > Sys.setenv(SPARK_PRINT_LAUNCH_COMMAND=

Re: Loading CSV to DataFrame and saving it into Parquet for speedup

2015-06-05 Thread Hossein
Why not letting SparkSQL deal with parallelism? When using SparkSQL data sources you can control parallelism by specifying mapred.min.split.size and mapred.max.split.size in your Hadoop configuration. You can then repartition your data as you wish and save it as Parquet. --Hossein On Thu, May 28