Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65913114
Yea I should have been more careful. I agree that we should figure out a
proper solution.
---
If your project is set up for it, you can reply to this email and have your
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65751457
Hey @jimjh I reverted this patch after all. It seems risky to push this
into 1.2 this late in the release cycle. Let's try to figure out a correct
solution for 1.3.
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65360604
@jimjh Also did you have a chance to test this on a real yarn cluster?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65360572
I just had a discussion with @marmbrus and I think these jars are actually
pretty important. For this reason I think it would be best if we pull this into
1.2. The exi
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65141491
I thought about making this a generic "add all the jars in this directory
to the dist cache and to the app's classpath". This would make sense for
regular application depe
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65141005
@andrewor14 see my comments from 14 days ago about generalizing. If this
isn't going into 1.2 I would much rather go back to investigating just having
an Uber jar.
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-65140058
Hey @jimjh the overall approach makes sense. However I think having a
config that is explicitly only for data nucleus jars is a little too specific.
Is it possible to
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121780
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -551,6 +584,13 @@ private[spark] object ClientBase extends Loggin
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121693
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -584,6 +624,19 @@ private[spark] object ClientBase extends Loggin
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121480
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -584,6 +624,19 @@ private[spark] object ClientBase extends Loggin
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121349
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,10 +224,42 @@ private[spark] trait ClientBase extends Loggin
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121229
--- Diff: docs/running-on-yarn.md ---
@@ -132,6 +132,18 @@ Most of the configs are the same for Spark on YARN as
for other deployment modes
The ma
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r21121180
--- Diff: docs/running-on-yarn.md ---
@@ -132,6 +132,18 @@ Most of the configs are the same for Spark on YARN as
for other deployment modes
The ma
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64972493
Hey @jimjh I'll look at this later today. Just a heads up we are in the
middle of the 1.2 release right now, and depending on how much behavior this PR
adds we may not
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64963171
@vanzin ping!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64721913
Are we good to merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fea
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64318723
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64318717
[Test build #23818 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23818/consoleFull)
for PR 3238 at commit
[`fe95125`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64313146
[Test build #23818 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23818/consoleFull)
for PR 3238 at commit
[`fe95125`](https://githu
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64312996
@vanzin Thanks for the review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-6426
LGTM. There's some code duplication here that is not really your fault, and
we could try to clean up in a separate change.
---
If your project is set up for it, you can r
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20821470
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -684,6 +737,13 @@ private[spark] object ClientBase extends Logging {
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20814231
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -684,6 +737,13 @@ private[spark] object ClientBase extends Logging {
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20813987
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,10 +224,42 @@ private[spark] trait ClientBase extends Logging {
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20813704
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -22,6 +22,7 @@ import java.net.{InetAddress, UnknownHostException, UR
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64157991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64157988
[Test build #23772 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23772/consoleFull)
for PR 3238 at commit
[`6c31fe0`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64154161
[Test build #23772 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23772/consoleFull)
for PR 3238 at commit
[`6c31fe0`](https://githu
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64154021
Hey @andrewor14 I am ready. `spark.yarn.datanucleus.dir` can be used to
specify the directory containing the datanucleus jars, and
`spark.yarn.datanucleus.jars` (internal)
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64104012
[Test build #23752 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23752/consoleFull)
for PR 3238 at commit
[`16df17c`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64104013
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-64101814
[Test build #23752 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23752/consoleFull)
for PR 3238 at commit
[`16df17c`](https://githu
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63502170
Created! https://issues.apache.org/jira/browse/SPARK-4474
I will try to find time to add docs and write a test for this today.
---
If your project is set up for it
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63476674
I'm referring to other jars that could be the same situation as datanucleus
in the future. They should be included in the Uber jar for some reason or
another can't be.
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20507053
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -551,6 +574,9 @@ private[spark] object ClientBase extends Logging
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63437361
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63437350
[Test build #23541 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23541/consoleFull)
for PR 3238 at commit
[`d28d8e9`](https://gith
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63430856
Added `spark.yarn.datanucleus.dir`. If not specified, looks for datanucleus
under `SPARK_HOME/lib`.
I tried using the option with an absolute path, a `file://` URI,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63430802
[Test build #23541 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23541/consoleFull)
for PR 3238 at commit
[`d28d8e9`](https://githu
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63317118
I would be more inclined to make the config name not specific to
datanucleus. That way if we have similar issue in the future we can just drop
another jar in there.
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63230766
I am still inclined to go with (2), with a variant of what @vanzin is
suggesting. (Yes, I am hoping to get this out for 1.2.0. Do we have a date?)
We could have a `
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63130947
I wonder if using `XmlAppendingTransformer` in the shade plugin would help;
it doesn't look like the different plugin.xml files actually conflict with one
another, seems l
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63122630
According to the comment in `compute-classpath.sh` we currently don't have
a solution to include it in the assembly jar. We have two options (1) do some
build magic to
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20377414
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63109218
@andrewor14 what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63100910
Yeah it looks like its apache licensed:
http://www.datanucleus.org/documentation/license.html
It would be much nicer if it could be in in the assembly jar. Othe
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63087228
That was added in [this
commit](https://github.com/apache/spark/commit/cf0a8f0204bb8acdaf441b03c924c278fef08e28).
---
If your project is set up for it, you can reply to th
Github user jimjh commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20367567
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63085403
[datanucleus](http://www.datanucleus.org/documentation/usage.html) uses the
Apache 2 license.
---
If your project is set up for it, you can reply to this email and have yo
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63086439
Sorry, I must be mixing this up with Ganglia. I'm 0 for 2 recently.
Well I found this comment in `compute-classpath.sh`:
```
# When Hive support is nee
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-63086308
@andrewor14 yep I tested it on a YARN cluster on EMR. I instantiated and
printed a class from the datanucleus package, using spark assemblies before and
after the fix.
--
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20343870
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62988124
[Test build #2 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)
for PR 3238 at commit
[`84e6cba`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62988129
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/3238#discussion_r20329191
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62977570
[Test build #2 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)
for PR 3238 at commit
[`84e6cba`](https://githu
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62976870
add to whitelist. Standalone mode requires all distributed jars to be
visible to all workers, so there is no equivalent to the distributed cache used
in Yarn.
---
If
Github user jimjh commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62917939
Yes, going by the comments
[here](https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L111),
the datanucleus jars are used for Hive support.
What
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62900827
Aren't the datanucleus jars only needed for the hive sql support? Where is
this being handled in standalone mode?
---
If your project is set up for it, you can reply
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3238#issuecomment-62851973
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user jimjh opened a pull request:
https://github.com/apache/spark/pull/3238
SPARK-2624 add datanucleus jars to the container in yarn-cluster
If `spark-submit` finds the datanucleus jars, it adds them to the driver's
classpath, but does not add it to the container.
Th
61 matches
Mail list logo