Re: Spark processes not doing on killing corresponding YARN application

2014-09-09 Thread didata
parkAppPort//tcpor, better yet, use a port-deterministic strategy mentioned earlier.(Hopefully the verbosity here will help someone in their furute search. Fedora aside, the original problem here can be network related, as I discovered).sincerely,didata -- View this message in context: http://apache-s

Q: About scenarios where driver execution flow may block...

2014-09-06 Thread didata
Hello friends: I have a theory question about call blocking in a Spark driver. Consider this (admittedly contrived =:)) snippet to illustrate this question... x = rdd01.reduceByKey() # or maybe some other 'shuffle-requiring action'. b = sc.broadcast(x. take(20)) # Or any statement that r

Re: Spark processes not doing on killing corresponding YARN application

2014-09-04 Thread didata
Thanks for asking this. I've have this issue with pyspark too on YARN 100 of the time: I quit out of pyspark and, while my Unix shell prompt returns, a 'yarn application -list' always shows (as does the UI) that application is still running (or at least not totally dead). When I then log onto

This always tries to connect to HDFS: user$ export MASTER=local[NN]; pyspark --master local[NN] ...

2014-08-31 Thread didata
); export MASTER=local[NN]; pyspark --master local[NN]* Without temporarily moving the Hadoop/YARN configuration directory, how do I dynamcally instruct pyspark on the CLI to not use HDFS? (i.e. without hard-codes anywhere, such as in */etc/spark/spark-env.sh*) Thank you in advance! didata staff

Re: Using pyspark shell in local[n] (single machine) mode unnecessarily tries to connect to HDFS NameNode ...

2014-04-10 Thread DiData
ries on and not compiling from source... Is there a reason why you aren't just using the binaries? On Thu, Apr 10, 2014 at 1:30 PM, DiData wrote: Hello friends: I recently compiled and installed Spark v0.9 from the Apache distribution. Note: I have the Cloudera/CDH5 Spark RPMs co-installe

Using pyspark shell in local[n] (single machine) mode unnecessarily tries to connect to HDFS NameNode ...

2014-04-10 Thread DiData
namenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused* [ ... snip ... ] >>> >>> # === -- Sincerely, DiData