Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
github isn't letting me reopen this, so I'm going to submit the patch with
reworked docs as a new PR. The machines do not like me today.
---
If your project is set up for it, you can reply
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
@srowen anything else I need to do here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
Any comments on the latest patch? Anyone?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74899/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #74899 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74899/testReport)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
The latest patch embraces the fact that 2.6 is the base hadoop version so
the `hadoop-aws` JAR is always pulled in, dependencies set up. One thing to
bear in mind here that the [Phase I
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #74899 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74899/testReport)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
I haven't forgotten this; I've just been trying to make the module
POM-only, while adding support for Hadoop 2.6 builds, which is causing some
issues downstream. Specifically, my downstream
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73430/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #73430 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73430/testReport)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #73430 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73430/testReport)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
(apologies for not replying; rebuilding a deceased laptop)
My main concern is to have the ability to make spark releases which include
the object store client libraries and a set of
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
I still don't think this answered my last questions? yes, I understand all
this back story. That's why this is taking such a large amount of everyone's
time. The purpose and discussion and commits
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
Still waiting reviews for this. Anyone? Ideally before my forthcoming Spark
Summit talk...
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
@nchammas sorry, should be clearer: "you must never use an aws-sdk version
other than the one hadoop-aws was built with, else things will break". if you
pull in hadoop-aws, that happens. If
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72155/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #72155 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72155/testReport)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #72155 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72155/testReport)**
for PR 12004 at commit
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
> the AWS SDK you get will be in sync with hadoop-aws; you have to keep
them in sync.
Did you mean here, "you _don't_ have to keep them in sync"?
> Dependency management is an
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
@nchammas the AWS SDK you get will be in sync with hadoop-aws; you have to
keep them in sync.
what is more brittle is the transients: httpclient, joda time, jackson,
etc, which is
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
Surely hadoop-aws depends on the version of the AWS SDK it wants to?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
> This won't be enabled in a default build of Spark.
Okie doke. I don't want to derail the PR review here, but I'll ask since
it's on-topic:
Is there a way for projects like
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71725/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #71725 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71725/testReport)**
for PR 12004 at commit
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
Thanks for elaborating on where this work will help @steveloughran. Again,
just speaking from my own point of view as Spark user and
[Flintrock](https://github.com/nchammas/flintrock) maintainer,
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
latest patch: has updated the dependency settings. As noted, works for
Hadoop versions from 2.7 to 3.0.2-alpha & the HADOOP-13345 branch, at least if
you build the last two with a
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
Here's why this matters, and why a simple "isn't this just a matter of
dropping in the JARs" isn't the solution:
*getting getting the right jars together with the right spark version
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #71725 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71725/testReport)**
for PR 12004 at commit
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
> Does a build of Spark + Hadoop 2.7 right now have no ability at all to
read from S3 out of the box, or just not full / ideal support?
No ability at all, as far as I can tell. People
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
I have the impression that you can't really use Spark with S3 and only S3,
not as an intermediate store, because it's too eventually-consistent. Does the
presence of additional integration libraries
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
As a dumb end-user, and as the maintainer of
[Flintrock](https://github.com/nchammas/flintrock), my interest in this PR
stems from the hope that we will be able to get builds of Spark against the
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
If I may, I believe the intent here is to add an extra dependency-only
module that adds in Hadoop's integration modules for various cloud stores. If
building with this module enabled, you build in
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/12004
I've pointed out this before, and again: FWIW I really don't see what this
pull request is trying to accomplish
---
If your project is set up for it, you can reply to this email and have your
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
(Continuing email thread): Yes, try `./dev/test-dependencies.sh
--replace-manifest`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
this patch is ready for review. Anyone?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71148/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #71148 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71148/testReport)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #71148 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71148/testReport)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
Test failure due to new artifacts
```
+++ b/dev/pr-deps/spark-deps-hadoop-2.7
@@ -16,8 +16,6 @@ arpack_combined_all-0.1.jar
avro-1.7.7.jar
avro-ipc-1.7.7.jar
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69578/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #69578 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69578/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
the latest patch moves to the suggested name `spark-hadoop-cloud`; the
external test repo is in sync. Those test are all working happily against s3
ireland, Azure and rackspace swift, on
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #69578 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69578/consoleFull)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
The latest patch
1. keeps the cloud package separate from hadoop-2.7. This is important
avoid outstanding problems related to org.json licensed artifacts in the aws
SDK JARs. The
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69480/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #69480 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69480/consoleFull)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #69480 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69480/consoleFull)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
This is the patch stripped down to the packaging and some tests to load the
direct and indirect dependencies, so verifying that the classpath is valid
within the module itself. It also
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68936/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #68936 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68936/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #68936 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68936/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68668/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #68668 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68668/consoleFull)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #68668 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68668/consoleFull)**
for PR 12004 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
I do think it would be better to consider, first, just the module and doc
bit. What do you think @rxin et al?
No I may be arguing against something nobody is suggesting. This here is
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
2.6 vs 2.7 vs later releases âa moving target, with AWS versions and
other issues to worry about.
[HADOOP-13687](https://issues.apache.org/jira/browse/HADOOP-13687) is going to
add a
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
If Hadoop 2.5 vs 2.6 behaves differently w.r.t. S3 support classes, we can
vary dependencies within the existing profile even, sure. That should be fixed
up. However I think we may be juust about to
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
I had something tangible: the integration tests. It's clear those aren't
wanted. Now I'm proposing something more minimal, yet still tangible for anyone
trying to build spark such that it
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/12004
I'm with Sean here -- we shouldn't create a module just because we might
create something in the future. Why don't we create the module when there is
something specific to add?
---
If your project
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
sean: there's two things: tests and packaging.
1. The packaging has to go in as probably the only way to get whatever
spark is built with to be consistent. That includes excluding
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/12004
I haven't looked at it. It looks like a huge patch and sounds like
something which can live externally. I am not sure it's a goal to suck in more
cloud-specific support here; EC2 support was farmed
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
Has anyone had a chance to review this? Is there more clarification needed,
or some specific aspect of the patch which needs changing?
Without this it is near-impossible to have a
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67991/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #67991 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67991/consoleFull)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #67991 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67991/consoleFull)**
for PR 12004 at commit
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
that's it warning that the manifest has changed. Which it has: there's now
hadoop-azure, hadoop-openstack and hadoop-aws JARs on the CP, along with
dependencies (amazon-aws SDK,
Github user nchammas commented on the issue:
https://github.com/apache/spark/pull/12004
@steveloughran - Is this message in the most recent build log critical?
```
Spark's published dependencies DO NOT MATCH the manifest file
(dev/spark-deps).
To update the manifest
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66962 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66962/consoleFull)**
for PR 12004 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66962 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66962/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66962/
Test FAILed.
---
Github user mtustin-handy commented on the issue:
https://github.com/apache/spark/pull/12004
I don't see any downsides to this. At present working with s3 isn't super
painful, but I do see why one would want support to be better and smoother.
---
If your project is set up for it,
Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
# Packaging:
1. this addresses the problem that it's not always immediately obvious to
people what they have to do to get, say s3a working. Do you know precisely
which version of
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/12004
@steveloughran can you clarify what this does? It seems like just 5000
lines of examples and test cases? Users can already use these cloud stores by
just adding the proper dependencies, can't they?
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66513/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66513 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66513/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66513 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66513/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66505 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66505/consoleFull)**
for PR 12004 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/12004
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66505/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/12004
**[Test build #66505 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66505/consoleFull)**
for PR 12004 at commit
93 matches
Mail list logo