Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Patch here: https://github.com/apache/spark/pull/609 On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell wrote: > Dean - our e-mails crossed, but thanks for the tip. Was independently > arriving at your solution :) > > Okay I'll submit something. > > - Patrick > > On Wed, Apr 30, 2014 at 2:14 PM, Ma

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Dean - our e-mails crossed, but thanks for the tip. Was independently arriving at your solution :) Okay I'll submit something. - Patrick On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin wrote: > Cool, that seems to work. Thanks! > > On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell wrote: >> Mar

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
Cool, that seems to work. Thanks! On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell wrote: > Marcelo - Mind trying the following diff locally? If it works I can > send a patch: > > patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit > diff --git a/bin/spark-submit b/bin/spark-submit

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Marcelo - Mind trying the following diff locally? If it works I can send a patch: patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # e

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Dean Wampler
Try this: #!/bin/bash for x in "$@"; do echo "arg: $x" done ARGS_COPY=("$@") # Make ARGS_COPY an array with the array elements in $@ for x in "${ARGS_COPY[@]}"; do# preserve array arguments. echo "arg_copy: $x" done On Wed, Apr 30, 2014 at 3:51 PM, Patrick Wendell wrot

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
So I reproduced the problem here: == test.sh == #!/bin/bash for x in "$@"; do echo "arg: $x" done ARGS_COPY=$@ for x in "$ARGS_COPY"; do echo "arg_copy: $x" done == ./test.sh a b "c d e" f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell wrote: > Yeah I think the problem is that the spark-submit script doesn't pass > the argument array to spark-class in the right way, so any quoted > strings get flattened. > > I think we'll need to figure out how to do this correctly in the bash > s

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. We do: ORIG_ARGS=$@ $SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS This works: // remove all the code relating

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
Just pulled again just in case. Verified your fix is there. $ ./bin/spark-submit --master yarn --deploy-mode client --driver-java-options "-Dfoo -Dbar" blah blah blah error: Unrecognized option '-Dbar'. run with --help for more information or --verbose for debugging output On Wed, Apr 30, 2014 a

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options "-Dfoo=a -Dbar=b" scala> sys.props.get("foo") res0: Option[String] = Some(a) scala> sys.props.get

SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
Hello all, Maybe my brain is not evolved enough to be able to trace through what happens with command-line arguments as they're parsed through all the shell scripts... but I really can't figure out how to pass more than a single JVM option on the command line. Unless someone has an obvious workar

Fwd: Spark RDD cache memory usage

2014-04-30 Thread Han JU
Hi, As I understand, by default in Spark a fraction of the executor memory (60%) is reserved for RDD caching. So if there's no explicit caching in the code (eg. rdd.cache() etc.), or if we persist RDD with StorageLevel.DISK_ONLY, is this part of memory wasted? Does Spark allocates the RDD cache me