Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-28 Thread Luciano Resende
Just want to provide a quick update that we have submitted the "Spark Extras" proposal for review by the Apache board (see link below with the contents). https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing Note that we are in the quest for a project na

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-18 Thread Luciano Resende
Evan, As long as you meet the criteria we discussed on this thread, you are welcome to join. Having said that, I have already seen other contributors that are very active on some of connectors but are not Apache Committers yet, and i wanted to be fair, and also avoid using the project as an avenu

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-17 Thread Luciano Resende
On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin wrote: > First, really thank you for leading the discussion. > > I am concerned that it'd hurt Spark more than it helps. As many others > have pointed out, this unnecessarily creates a new tier of connectors or > 3rd party libraries appearing to be en

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Reynold Xin
First, really thank you for leading the discussion. I am concerned that it'd hurt Spark more than it helps. As many others have pointed out, this unnecessarily creates a new tier of connectors or 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF. We can alleviate this concer

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Luciano Resende
On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan wrote: > Hi folks, > > Sorry to join the discussion late. I had a look at the design doc > earlier in this thread, and it was not mentioned what types of > projects are the targets of this new "spark extras" ASF umbrella > > Is the desire to have a

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Evan Chan
Hi folks, Sorry to join the discussion late. I had a look at the design doc earlier in this thread, and it was not mentioned what types of projects are the targets of this new "spark extras" ASF umbrella Is the desire to have a maintained set of spark-related projects that keep pace with the

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Steve Loughran
On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" wrote: >Yeah in support of this statement I think that my primary interest in >this Spark Extras and the good work by Luciano here is that anytime we >take bits out of a code base and “move it to GitHub” I see a bad precedent >being set. > >C

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
100% agree with Sean & Reynold's comments on this. Adding this as a TLP would just cause more confusion as to "official" endorsement. On Fri, Apr 15, 2016 at 11:50 AM, Sean Owen wrote: > On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I know the name might be confusing, but I also

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set. Creating this project at the ASF creates a synergy between *Apache Spar

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Hey Reynold, Thanks. Getting to the heart of this, I think that this project would be successful if the Apache Spark PMC decided to participate and there was some overlap. As much as I think it would be great to stand up another project, the goal here from Luciano and crew (myself included) would

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah, so it’s the *Apache Spark* project. Just to clarify. Not once did you say Apache Spark below. On 4/15/16, 9:50 AM, "Sean Owen" wrote: >On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I know the name might be confusing, but I also think that the projects have >> a very big

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Jean-Baptiste Onofré
+1 Regards JB On 04/15/2016 06:41 PM, Mattmann, Chris A (3980) wrote: Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: > I know the name might be confusing, but I also think that the projects have > a very big synergy, more like sibling projects, where "Spark Extras" extends > the Spark community and develop/maintain components for, and pretty much > only for

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Reynold Xin
Anybody is free and welcomed to create another ASF project, but I don't think "Spark extras" is a good name. It unnecessarily creates another tier of code that ASF is "endorsing". On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
I think this meant to be understood as a community site, and as a directory listing pointers to third-party projects. It's not a project of its own, and not part of Spark itself, with no special status. At least, I think that's how it should be presented and pretty much seems to come across that wa

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger wrote: > Given that not all of the connectors were removed, I think this > creates a weird / confusing three tier system > > 1. connectors in the official project's spark/extras or spark/external > 2. connectors in "Spark Extras" > 3. connectors in

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:18 AM, Sean Owen wrote: > Why would this need to be an ASF project of its own? I don't think > it's possible to have a yet another separate "Spark Extras" TLP (?) > > There is already a project to manage these bits of code on Github. How > about all of the interested par

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
Given that not all of the connectors were removed, I think this creates a weird / confusing three tier system 1. connectors in the official project's spark/extras or spark/external 2. connectors in "Spark Extras" 3. connectors in some random organization's github On Fri, Apr 15, 2016 at 11:18 A

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Chris Fregly
and how does this all relate to the existing 1-and-a-half-class citizen known as spark-packages.org? support for this citizen is buried deep in the Spark source (which was always a bit odd, in my opinion): https://github.com/apache/spark/search?utf8=%E2%9C%93&q=spark-packages On Fri, Apr 15, 20

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
Why would this need to be an ASF project of its own? I don't think it's possible to have a yet another separate "Spark Extras" TLP (?) There is already a project to manage these bits of code on Github. How about all of the interested parties manage the code there, under the same process, under the

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
After some collaboration with other community members, we have created a initial draft for Spark Extras which is available for review at https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing We would like to invite other community members to participate

Re: SPARK-13843 and future of streaming backends

2016-03-28 Thread Cody Koeninger
Are you talking about group/identifier name, or contained classes? Because there are plenty of org.apache.* classes distributed via maven with non-apache group / identifiers. On Fri, Mar 25, 2016 at 6:54 PM, David Nalley wrote: > >> As far as group / artifact name compatibility, at least in the

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Mridul Muralidharan
On Saturday, March 26, 2016, Sean Owen wrote: > This has been resolved; see the JIRA and related PRs but also > > http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html > > This change happened subsequent to current thread (thanks Marcelo) and could as well

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jean-Baptiste Onofré
Hi Luciano, I didn't mean Spark proper, but more something like you proposed. Regards JB On 03/26/2016 06:38 PM, Luciano Resende wrote: On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote: Hi Luciano, If we take the "pure" technical vision, there

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Luciano Resende
On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré wrote: > Hi Luciano, > > If we take the "pure" technical vision, there's pros and cons of having > spark-extra (or whatever the name we give) still as an Apache project: > > Pro: > - Governance & Quality Insurance: we follow the Apache rules

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jean-Baptiste Onofré
Hi Luciano, If we take the "pure" technical vision, there's pros and cons of having spark-extra (or whatever the name we give) still as an Apache project: Pro: - Governance & Quality Insurance: we follow the Apache rules, meaning that a release has to be staged and voted by the PMC. It's a f

Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Luciano Resende
I believe some of this has been resolved in the context of some parts that had interest in one extra connector, but we still have a few removed, and as you mentioned, we still don't have a simple way or willingness to manage and be current on new packages like kafka. And based on the fact that this

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Sean Owen
This has been resolved; see the JIRA and related PRs but also http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html This is not a scenario where a [VOTE] needs to take place, and code changes don't proceed through PMC votes. From the project perspective, cod

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jacek Laskowski
Hi, Although I'm not that much experienced member of ASF, I share your concerns. I haven't looked at the issue from this point of view, but after having read the thread I think PMC should've signed off the migration of ASF-owned code to a non-ASF repo. At least a vote is required (and this discuss

Re: SPARK-13843 and future of streaming backends

2016-03-25 Thread David Nalley
> As far as group / artifact name compatibility, at least in the case of > Kafka we need different artifact names anyway, and people are going to > have to make changes to their build files for spark 2.0 anyway. As > far as keeping the actual classes in org.apache.spark to not break > code despi

Re: SPARK-13843 and future of streaming backends

2016-03-20 Thread Marcelo Vanzin
Hi Reynold, thanks for the info. On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > If one really feels strongly that we should go through all the overhead to > setup an ASF subproject for these modules that won't work with the new > structured streaming, and want to spearhead to setup separat

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Cody Koeninger
i. An ASF project can clearly decide that some of its code is no longer worth maintaining and delete it. This isn't really any different. It's still apache licensed so ultimately whoever wants the code can get it. ii. I think part of the rationale is to not tie release management to Spark, so i

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
Spark has hit one of the enternal problems of OSS projects, one hit by: ant, maven, hadoop, ... anything with a plugin model. Take in the plugin: you're in control, but also down for maintenance Leave out the plugin: other people can maintain it, be more agile, etc. But you've lost control, an

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Hari Shreedharan
I have worked with various ASF projects for 4+ years now. Sure, ASF projects can delete code as they feel fit. But this is the first time I have really seen code being "moved out" of a project without discussion. I am sure you can do this without violating ASF policy, but the explanation for that w

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Imran Rashid
On Thu, Mar 17, 2016 at 2:55 PM, Cody Koeninger wrote: > Why would a PMC vote be necessary on every code deletion? > Certainly PMC votes are not necessary on *every* code deletion. I dont' think there is a very clear rule on when such discussion is warranted, just a soft expectation that commit

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Cody Koeninger
There's a difference between "without discussion" and "without as much discussion as I would have liked to have a chance to notice it". There are plenty of PRs that got merged before I noticed them that I would rather have not gotten merged. As far as group / artifact name compatibility, at least

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Thu, Mar 17, 2016 at 12:01 PM, Cody Koeninger wrote: > i. An ASF project can clearly decide that some of its code is no > longer worth maintaining and delete it. This isn't really any > different. It's still apache licensed so ultimately whoever wants the > code can get it. Absolutely. But I

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 10:09 AM, Jean-Baptiste Onofré wrote: > a project can have multiple repos: it's what we have in ServiceMix, in > Karaf. > For the *-extra on github, if the code has been in the ASF, the PMC members > have to vote to move the code on *-extra. That's good to know. To me tha

SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Hello all, Recently a lot of the streaming backends were moved to a separate project on github and removed from the main Spark repo. While I think the idea is great, I'm a little worried about the execution. Some concerns were already raised on the bug mentioned above, but I'd like to have a more

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I was not aware of a discussion in Dev list about this - agree with most of the observations. In addition, I did not see PMC signoff on moving (sub-)modules out. Regards Mridul On Thursday, March 17, 2016, Marcelo Vanzin wrote: > Hello all, > > Recently a lot of the streaming backends were mov

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Adam Kocoloski
> On Mar 19, 2016, at 8:32 AM, Steve Loughran wrote: > > >> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote: >> >> Hi Steve, thanks for the write up. >> >> On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran >> wrote: >>> If you want a separate project, eg. SPARK-EXTRAS, then it *generally* nee

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Shane Curcuru
Marcelo Vanzin wrote earlier: > Recently a lot of the streaming backends were moved to a separate > project on github and removed from the main Spark repo. Question: why was the code removed from the Spark repo? What's the harm in keeping it available here? The ASF is perfectly happy if anyone

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Jean-Baptiste Onofré
Hi Marcelo, a project can have multiple repos: it's what we have in ServiceMix, in Karaf. For the *-extra on github, if the code has been in the ASF, the PMC members have to vote to move the code on *-extra. Regards JB On 03/18/2016 06:07 PM, Marcelo Vanzin wrote: Hi Steve, thanks for the

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 18 Mar 2016, at 22:24, Marcelo Vanzin wrote: > > On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote: >> So, my comment here is that any code *cannot* be removed from an Apache >> project if there is a VETO issued which so far I haven't seen, though maybe >> Marcelo can clarify that. > >

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote: > > Hi Steve, thanks for the write up. > > On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran > wrote: >> If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs >> to go through incubation. While normally its the incubator P

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 17 Mar 2016, at 21:33, Marcelo Vanzin wrote: > > Hi Reynold, thanks for the info. > > On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: >> If one really feels strongly that we should go through all the overhead to >> setup an ASF subproject for these modules that won't work with the new

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Luciano Resende
On Fri, Mar 18, 2016 at 7:58 AM, Cody Koeninger wrote: > > Or, as Cody Koeniger suggests, having a spark-extras project in the ASF > with a focus on extras with their own support channel. > > To be clear, I didn't suggest that and don't think that's the best > solution. I said to the people who

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Sean Owen
Code can be removed from an ASF project. That code can live on elsewhere (in accordance with the license) It can't be presented as part of the official ASF project, like any other 3rd party project The package name certainly must change from org.apache.spark I don't know of a protocol, but common

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Note the non-kafka bug was filed right before the change was pushed. So there really wasn't any discussion before the decision was made to remove that code. I'm just trying to merge both discussions here in the list where it's a little bit more dynamic than bug updates that end up getting lost in

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Also, just wanted to point out something: On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > Thanks for initiating this discussion. I merged the pull request because it > was unblocking another major piece of work for Spark 2.0: not requiring > assembly jars While I do agree that's more impor

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Cody Koeninger
Anyone can fork apache licensed code. Committers can approve pull requests that delete code from asf repos. Because those two things happen near each other in time, it's somehow a process violation? I think the discussion would be better served by concentrating on how we're going to solve the pr

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Reynold Xin
Thanks for initiating this discussion. I merged the pull request because it was unblocking another major piece of work for Spark 2.0: not requiring assembly jars, which is arguably a lot more important than sources that are less frequently used. I take full responsibility for that. I think it's in

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I am not referring to code edits - but to migrating submodules and code currently in Apache Spark to 'outside' of it. If I understand correctly, assets from Apache Spark are being moved out of it into thirdparty external repositories - not owned by Apache. At a minimum, dev@ discussion (like this

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread chrismattmann
cense allows that, but the community itself must steward the code and part of that is hearing everyone's voice within that community before acting. Cheers, Chris -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-and-future-of-streamin

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Luciano Resende
If the intention is to actually decouple and give a life of it's own to these connectors, I would have expected that they would still be hosted as different git repositories inside Apache even tough users will not really see much difference as they would still be mirrored in GitHub. This makes it m

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
Hi Steve, thanks for the write up. On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran wrote: > If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs > to go through incubation. While normally its the incubator PMC which > sponsors/oversees the incubating project, it doesn't h

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Imran Rashid
On Fri, Mar 18, 2016 at 3:15 PM, Shane Curcuru wrote: > Question: why was the code removed from the Spark repo? What's the harm > in keeping it available here? Assuming the Spark PMC has no plan on releasing the code, why would we keep it in our codebase? It only makes the codebase harder to

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Jean-Baptiste Onofré
Hi Marcelo, I quickly discussed with Reynold this morning about this. I share your concerns. I fully understand that it's painful for users to wait a Spark releases to include fix in streaming backends as it's not really related. It makes sense to provide backends "outside" of ASF, especially

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Mattmann, Chris A (3980)
90089 USA WWW: http://irds.usc.edu/ ++ -Original Message- From: Marcelo Vanzin Date: Friday, March 18, 2016 at 3:24 PM To: jpluser Cc: "dev@spark.apache.org" Subject: Re: SPARK-13843 and future of streaming bac

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Luciano Resende
On Fri, Mar 18, 2016 at 10:07 AM, Marcelo Vanzin wrote: > Hi Steve, thanks for the write up. > > On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran > wrote: > > If you want a separate project, eg. SPARK-EXTRAS, then it *generally* > needs to go through incubation. While normally its the incubator P

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Cody Koeninger
Why would a PMC vote be necessary on every code deletion? There was a Jira and pull request discussion about the submodules that have been removed so far. https://issues.apache.org/jira/browse/SPARK-13843 There's another ongoing one about Kafka specifically https://issues.apache.org/jira/browse

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Cody Koeninger
> Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with > a focus on extras with their own support channel. To be clear, I didn't suggest that and don't think that's the best solution. I said to the people who want things done that way, which committer is going to step up

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote: > So, my comment here is that any code *cannot* be removed from an Apache > project if there is a VETO issued which so far I haven't seen, though maybe > Marcelo can clarify that. No, my intention was not to veto the change. I'm actually for t