Re: [DISCUSS] Spark runner packaging

2016-07-08 Thread Amit Sela
I like the profile idea Dan, mostly because while I believe we should do our best to make adoption easier, we should still default to the actual use case where such pipelines will run on clusters.. On Fri, Jul 8, 2016 at 1:53 AM Dan Halperin wrote: > Thanks Amit, that does clear things up! > > O

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Dan Halperin
Thanks Amit, that does clear things up! On Thu, Jul 7, 2016 at 3:30 PM, Amit Sela wrote: > I don't think that the Spark runner is special, it's just the way it was > until now and that's why I brought up the subject here. > > The main issue is that currently, if a user wants to write a beam app

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
I don't think that the Spark runner is special, it's just the way it was until now and that's why I brought up the subject here. The main issue is that currently, if a user wants to write a beam app using the Spark runner, he'll have to provide the Spark dependencies, or he'll get a ClassNotFoundE

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Dan Halperin
hey folks, In general, we should optimize for running on clusters rather than running locally. Examples is a runner-independent module, with non-compile-time deps on runners. Most runners are currently listed as being runtime deps -- it sounds like that works, for most cases, but might not be the

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Ismaël Mejía
Good discussion subject Amit, I let the whole beam distribution subjects continue in BEAM-320, however there is a not yet discussed aspect of the spark runner, the maven behavior: When you import the beam spark runner as a dependency you are obliged to provide your spark dependencies by hand too,

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Jean-Baptiste Onofré
No problem and good idea to discuss in the Jira. Actually, I started to experiment a bit beam distributions on a branch (that I can share with people interested). Regards JB On 07/07/2016 10:12 PM, Amit Sela wrote: Thanks JB, I've missed that one. I suggest we continue this in the ticket co

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
Thanks JB, I've missed that one. I suggest we continue this in the ticket comments. Thanks, Amit On Thu, Jul 7, 2016 at 11:05 PM Jean-Baptiste Onofré wrote: > Hi Amit, > > I think your proposal is related to: > > https://issues.apache.org/jira/browse/BEAM-320 > > As described in the Jira, I'm

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Jean-Baptiste Onofré
Hi Amit, I think your proposal is related to: https://issues.apache.org/jira/browse/BEAM-320 As described in the Jira, I'm planning to provide (in dedicated Maven modules) is a Beam distribution including: - an uber jar to wrap the dependencies - the underlying runtime backends - etc Regards

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Lukasz Cwik
What I meant by saying that this could be part of Apache Beam is that the build scripts that generate the binary artifact could be part of Apache Beam not the binary artifact itself. So the question I was asking is whether the build scripts to generate the artifact should be part of Apache Beam, or

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Robert Bradshaw
I don't think the proposal is to put this into the source release, rather to have a separate binary artifact that's Beam+Spark. On Thu, Jul 7, 2016 at 11:54 AM, Vlad Rozov wrote: > I am not sure if I read the proposal correctly, but note that it will be > against Apache policy to include compile

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Vlad Rozov
I am not sure if I read the proposal correctly, but note that it will be against Apache policy to include compiled binaries into the source release. On the other side, each runner may include necessary run-time binaries as test only dependencies into the runner's maven pom.xml On 7/7/16 11:01,

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Lukasz Cwik
That makes a lot of sense. I can see other runners following suit where there is a packaged up version for different scenarios / backend cluster runtimes. Should this be part of Apache Beam as a separate maven module or another sub-module inside of Apache Beam, or something else? On Thu, Jul 7, 2

[DISCUSS] Spark runner packaging

2016-07-07 Thread Amit Sela
Hi everyone, Lately I've encountered a number of issues concerning the fact that the Spark runner does not package Spark along with it and forcing people to do this on their own. In addition, this seems to get in the way of having beam-examples executed against the Spark runner, again because it w