[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-06 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65913114 Yea I should have been more careful. I agree that we should figure out a proper solution. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-04 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65751457 Hey @jimjh I reverted this patch after all. It seems risky to push this into 1.2 this late in the release cycle. Let's try to figure out a correct solution for 1.3.

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-02 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65360604 @jimjh Also did you have a chance to test this on a real yarn cluster? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-02 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65360572 I just had a discussion with @marmbrus and I think these jars are actually pretty important. For this reason I think it would be best if we pull this into 1.2. The exi

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65141491 I thought about making this a generic "add all the jars in this directory to the dist cache and to the app's classpath". This would make sense for regular application depe

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65141005 @andrewor14 see my comments from 14 days ago about generalizing. If this isn't going into 1.2 I would much rather go back to investigating just having an Uber jar.

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65140058 Hey @jimjh the overall approach makes sense. However I think having a config that is explicitly only for data nucleus jars is a little too specific. Is it possible to

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121780 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -551,6 +584,13 @@ private[spark] object ClientBase extends Loggin

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121693 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -584,6 +624,19 @@ private[spark] object ClientBase extends Loggin

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121480 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -584,6 +624,19 @@ private[spark] object ClientBase extends Loggin

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121349 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,10 +224,42 @@ private[spark] trait ClientBase extends Loggin

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121229 --- Diff: docs/running-on-yarn.md --- @@ -132,6 +132,18 @@ Most of the configs are the same for Spark on YARN as for other deployment modes The ma

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-12-01 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r21121180 --- Diff: docs/running-on-yarn.md --- @@ -132,6 +132,18 @@ Most of the configs are the same for Spark on YARN as for other deployment modes The ma

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-29 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64972493 Hey @jimjh I'll look at this later today. Just a heads up we are in the middle of the 1.2 release right now, and depending on how much behavior this PR adds we may not

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-29 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64963171 @vanzin ping! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-26 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64721913 Are we good to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fea

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64318723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64318717 [Test build #23818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23818/consoleFull) for PR 3238 at commit [`fe95125`](https://gith

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64313146 [Test build #23818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23818/consoleFull) for PR 3238 at commit [`fe95125`](https://githu

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64312996 @vanzin Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-6426 LGTM. There's some code duplication here that is not really your fault, and we could try to clean up in a separate change. --- If your project is set up for it, you can r

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20821470 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -684,6 +737,13 @@ private[spark] object ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20814231 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -684,6 +737,13 @@ private[spark] object ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20813987 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,10 +224,42 @@ private[spark] trait ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-24 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20813704 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -22,6 +22,7 @@ import java.net.{InetAddress, UnknownHostException, UR

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64157991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64157988 [Test build #23772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23772/consoleFull) for PR 3238 at commit [`6c31fe0`](https://gith

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64154161 [Test build #23772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23772/consoleFull) for PR 3238 at commit [`6c31fe0`](https://githu

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-23 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64154021 Hey @andrewor14 I am ready. `spark.yarn.datanucleus.dir` can be used to specify the directory containing the datanucleus jars, and `spark.yarn.datanucleus.jars` (internal)

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64104012 [Test build #23752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23752/consoleFull) for PR 3238 at commit [`16df17c`](https://gith

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64104013 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-64101814 [Test build #23752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23752/consoleFull) for PR 3238 at commit [`16df17c`](https://githu

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-18 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63502170 Created! https://issues.apache.org/jira/browse/SPARK-4474 I will try to find time to add docs and write a test for this today. --- If your project is set up for it

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-18 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63476674 I'm referring to other jars that could be the same situation as datanucleus in the future. They should be included in the Uber jar for some reason or another can't be.

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-18 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20507053 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -551,6 +574,9 @@ private[spark] object ClientBase extends Logging

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63437361 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63437350 [Test build #23541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23541/consoleFull) for PR 3238 at commit [`d28d8e9`](https://gith

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-17 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63430856 Added `spark.yarn.datanucleus.dir`. If not specified, looks for datanucleus under `SPARK_HOME/lib`. I tried using the option with an absolute path, a `file://` URI,

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63430802 [Test build #23541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23541/consoleFull) for PR 3238 at commit [`d28d8e9`](https://githu

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-17 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63317118 I would be more inclined to make the config name not specific to datanucleus. That way if we have similar issue in the future we can just drop another jar in there.

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-16 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63230766 I am still inclined to go with (2), with a variant of what @vanzin is suggesting. (Yes, I am hoping to get this out for 1.2.0. Do we have a date?) We could have a `

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63130947 I wonder if using `XmlAppendingTransformer` in the shade plugin would help; it doesn't look like the different plugin.xml files actually conflict with one another, seems l

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63122630 According to the comment in `compute-classpath.sh` we currently don't have a solution to include it in the assembly jar. We have two options (1) do some build magic to

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20377414 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63109218 @andrewor14 what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63100910 Yeah it looks like its apache licensed: http://www.datanucleus.org/documentation/license.html It would be much nicer if it could be in in the assembly jar. Othe

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63087228 That was added in [this commit](https://github.com/apache/spark/commit/cf0a8f0204bb8acdaf441b03c924c278fef08e28). --- If your project is set up for it, you can reply to th

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread jimjh
Github user jimjh commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20367567 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63085403 [datanucleus](http://www.datanucleus.org/documentation/usage.html) uses the Apache 2 license. --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63086439 Sorry, I must be mixing this up with Ganglia. I'm 0 for 2 recently. Well I found this comment in `compute-classpath.sh`: ``` # When Hive support is nee

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-14 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-63086308 @andrewor14 yep I tested it on a YARN cluster on EMR. I instantiated and printed a class from the datanucleus package, using spark assemblies before and after the fix. --

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20343870 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62988124 [Test build #2 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull) for PR 3238 at commit [`84e6cba`](https://gith

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62988129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20329191 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62977570 [Test build #2 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull) for PR 3238 at commit [`84e6cba`](https://githu

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62976870 add to whitelist. Standalone mode requires all distributed jars to be visible to all workers, so there is no equivalent to the distributed cache used in Yarn. --- If

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread jimjh
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62917939 Yes, going by the comments [here](https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L111), the datanucleus jars are used for Hive support. What

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62900827 Aren't the datanucleus jars only needed for the hive sql support? Where is this being handled in standalone mode? --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-62851973 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-12 Thread jimjh
GitHub user jimjh opened a pull request: https://github.com/apache/spark/pull/3238 SPARK-2624 add datanucleus jars to the container in yarn-cluster If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container. Th