Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/14981
I've gone ahead and opened #15167, which only prevents the publication of
the assembly but continues to publish the non-assembly artifact.
---
If your project is set up for it, you can reply to
Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/14981
After some investigation I think that it's going to be pretty hard to
remove the use of the assembly in PySpark streaming tests and definitely a
change outside of the scope of what we'd want to
Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/14981
In the past I came very close to being able to remove the Python tests'
dependencies on the streaming assemblies, so I'm going to see whether I can
revive any of those old patches to completely
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Not that I know of. I believe it's reasonable to say that there's a valid
open question, and a reasonable argument at this stage that the non-assembly
artifacts are allowed. If a 2.0.1 release is
Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/14981
Has there been any authoritative answer on whether we are prohibited from
publishing _non-assembly_ Spark Kinesis artifacts to Maven? I read through [the
thread on
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
@srowen Please don't get me wrong, I don't have any interest on this
extension either, but just want to make sure we start doing the right thing for
Apache Spark. I will try to ping some of the
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Yeah https://github.com/apache/spark/pull/14981#issuecomment-247789298 is
certainly the argument against including them, that's clear.
But I also outlined the argument 'for': you could also
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
I am referring to
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/
I don't think it is up to us
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
The pointer is exactly your quote on the e-mail to legal-discuss:
http://www.apache.org/legal/resolved.html#prohibited says:
-
CAN APACHE PROJECTS RELY ON COMPONENTS UNDER
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
That isn't the conclusion I took from the discussion on legal-discuss - do
you have a pointer? I took that it was at best ambiguous but not obviously
prohibited to distribute these because they are
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
Yes, and this is the intent. It's ok to have these in the source release
(similar to ganglia) but we don't publish them in maven repository and it
becomes available only if people goes and
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
The issue is that this also removes the non assembly artifact from the
release. That does not seem to be strictly needed license wise. It is easy and
tidy though.
---
If your project is set up
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
@srowen @rxin My understanding is that the mvn deploy is what takes care of
actually publishing the files to maven staging repository :
`
$MVN -DzincPort=$ZINC_PORT --settings
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
CC @rxin I think the direct but slightly hacky way to just address this
issue is to modify `release-build.sh` around here ...
```
# Remove any extra files generated during install
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65488/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65488 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65488/consoleFull)**
for PR 14981 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65486/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65486 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65486/consoleFull)**
for PR 14981 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65488 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65488/consoleFull)**
for PR 14981 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65486 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65486/consoleFull)**
for PR 14981 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Yeah, it's either going to be turning off the profile entirely (if we can't
distribute the non-assembly artifact), or leaving it on but manually excluding
the assembly artifact.
---
If your
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
Ok, reverting the commit to remove kinesis assembly as the python tests are
relying on it for the transient dependencies. Note that I was also trying to
overcome this requirement by appending all
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/14981
Yeah, I don't know of an easier workaround.
My understanding is also that the asf concern is tied to distribution, so
not publishing to maven should be sufficient.
On Sep 14,
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Ah. The assembly is needed for Pyspark Kinesis tests. Not sure if you would
know @koeninger but is there any easy way around that?
I'd say we can instead look at modifying the release
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65345/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65345 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65345/consoleFull)**
for PR 14981 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65345 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65345/consoleFull)**
for PR 14981 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
@lresende yes I think so. I think we can consider SPARK-17418 about the
assembly, and SPARK-17422 about (possibly) removing the other artifacts
depending on LEGAL-198.
---
If your project is set
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
@srowen should I update this PR with the removal of kinesis assembly then ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
I personally think it's fine to go ahead and remove the Kinesis assembly
module, because we know that can't be distributed. Separately, yes, the
question is whether the non-assembly Ganglia and
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
I would still wait for the feedback from legal before removing anything.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/14981
The assembly is mostly a convenience; one jar to be provided in the command
line vs. multiple (and figuring out what those are).
For those packaging it with their app, maven takes care of
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/14981
Yeah, if it's cleaner to remove the kinesis-asl-assembly module I don't
think it's a serious hardship to users.
---
If your project is set up for it, you can reply to this email and have your
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
I presume the best practice is certainly to build your deps into your app,
and simplicity is good, all else equal. I suppose I'd remove this module but
don't feel strongly about it. Removing it
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/14981
Isn't that mostly down to whether someone wants to just put the whole
assembly on their classpath, vs install the project and depend on it in their
build tool? I can see why someone would want
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Agree with that @koeninger , mostly asking if the assembly module has any
use if it's not published?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/14981
My 2 cents are that we should make things as easy as possible for users,
within the bounds of what ASF legal is willing to tolerate ;) Which probably
means having it exist, but not published to
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
@vanzin and possibly @koeninger -- so, if we can't publish the Kinesis
assembly, is there a purpose in having the module exist in the repo? would
someone want to build that assembly from source?
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65107/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65107 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65107/consoleFull)**
for PR 14981 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65107 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65107/consoleFull)**
for PR 14981 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Good question. The Kinesis (non-assembly) artifact does not itself bundle
any Amazon-licensed code. However it of course strongly depends on it.
But, the Kinesis artifact itself is optional
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
Spark kinesis has dependency on the kinesis client which is category-x
com.amazonaws
amazon-kinesis-client
${aws.kinesis.client.version}
Thus
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14981
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65008/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65008 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65008/consoleFull)**
for PR 14981 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Leave the ganglia one because that artifact isn't an assembly after all.
Publishing the Spark part of that without redistributing the rest is OK.
Really, here we need to only pull the
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
As for the the Ganglia one, I will create another jira, to track that
separately as this (Kinesis) one might involve more changes around the python
and examples.
---
If your project is set up
Github user lresende commented on the issue:
https://github.com/apache/spark/pull/14981
@srowen, The Kinesis assembly has been published by Spark releases for a
while. Here is the link to the 2.0 release on repository.apache.org :
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
Aha, right. That one is probably OK as there's technically no third-party
code distributed in those artifacts. However, this won't be OK:
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/14981
They're not in the assembly, but they're published to maven as a separate
thing; can that be an issue?
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kinesis-asl_2.10
---
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14981
@lresende I don't think we actually distribute the kinesis code, because
that profile just enables the kinesis assembly, and that does not cause this
stuff to get built into a Spark jar that's
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14981
**[Test build #65008 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65008/consoleFull)**
for PR 14981 at commit
57 matches
Mail list logo