I want to add that there a regression when using pyspark to read data from HDFS. its performance during map tasks has gone down approx 1 -> 0.5x. I have tested the 1.0.2 and the performance was fine, but the 1.1 release candidate has this issue. I tested by setting the following properties to make sure it was not due to these.
set("spark.io.compression.codec","lzf").set("spark.shuffle.spill","false") in conf object. Let me know if you need further information. Regards, Gurvinder On 09/04/2014 07:47 AM, Denny Lee wrote: > When I start the thrift server (on Spark 1.1 RC4) via: > ./sbin/start-thriftserver.sh --master spark://hostname:7077 > --driver-class-path $CLASSPATH > > It appears that the thrift server is starting off of localhost as > opposed to hostname. I have set the spark-env.sh to use the hostname, > modified the /etc/hosts for the hostname, and it appears to work properly. > > But when I start the thrift server, connectivity can only be via > localhost:10000 as opposed to hostname:10000. > > Any ideas on what configurations I may be setting incorrectly here? > > Thanks! > Denny > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org