Re: Question on providing CDH packages

2014-08-27 Thread Robert Metzger
Just for the record, I was able to build a Flink version that is compatible to CDH4 by using the official Hadoop 2.0.0-alpha release. So with the next release (0.6.1-incubating or 0.7-incubating) we can ship an additional "hadoop200alpha" binary package that works for users using a distro which is

Re: Question on providing CDH packages

2014-08-19 Thread Alan Gates
No objections. That seems like a good way to help our users while avoiding the appearance of favoring one vendor over another. Alan. On Mon, Aug 18, 2014 at 10:43 AM, Stephan Ewen wrote: > I like Sean's idea very much: Creating the three packages (Hadoop 1.x, > Hadoop 2.x, Hadoop 2.0 with Yar

Re: Question on providing CDH packages

2014-08-18 Thread Robert Metzger
Supporting the Hadoop 2.0 (not 2.2) YARN API would be a lot of coding effort. There was a huge API change between the two versions. Maybe we can find a technical solution to this political/legal problem: I'm going to build and try a Flink version against the "2.1.1-beta" (or similar) (official Apac

Re: Question on providing CDH packages

2014-08-18 Thread Stephan Ewen
I like Sean's idea very much: Creating the three packages (Hadoop 1.x, Hadoop 2.x, Hadoop 2.0 with Yarn beta). Any objections to creating a help site that says "For that vendor with this version pick the following binary release" ? Stephan > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra >

Re: Question on providing CDH packages

2014-08-18 Thread Henry Saputra
As for Flink, for now the additional CDH4 packaged binary is to support "non-standard" Hadoop version that some customers may already have. Based on "not a question of supporting a vendor but a Hadoop version combo.", would the approach that Flink had done to help customers get go and running quic

Re: Question on providing CDH packages

2014-08-18 Thread Sean Owen
It's probably the same thing as with Spark. Spark doesn't actually work with YARN 'beta'-era releases, but works 'stable' and specially supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard, but, is probably the only distro of it you'll still run into in circulation). (And so it's ki

Re: Question on providing CDH packages

2014-08-18 Thread Stephan Ewen
I think the main problem was that CDH4 is a non standard build. All others we tried worked with hadoop-1.2 and 2.2/2.4 builds. But I understand your points. So, instead of creating those packages, we can make a guide "how to pick the right distribution", which points you to the hadoop-1.2 and 2.

Re: Question on providing CDH packages

2014-08-18 Thread Sean Owen
Vendor X may be slightly against having two Flink-for-X distributions -- their own and another on a site/project they may not control. Are all these builds really needed? meaning, does a generic Hadoop 2.x build not work on some or most of these? I'd hope so. Might keep things simpler for everyone

Re: Question on providing CDH packages

2014-08-18 Thread Alan Gates
My concern with this is it appears to put Apache in the business of picking the right Hadoop vendors. What about IBM, Pivotal, etc.? I get that the actual desire here is to make things easy for users, and that the original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 95% of user

Re: Question on providing CDH packages

2014-08-18 Thread Stephan Ewen
The approach seems fair in the way it presents all vendors equally and still offers user a convenient way to get started. I personally like it, but I cannot say in how far this is compliant with Apache policies.

Re: Question on providing CDH packages

2014-08-18 Thread Robert Metzger
Hi, I think we all agree that our project benefits from providing pre-compiled binaries for different hadoop distributions. I've drafted an extension of the current download page, that I would suggest to use after the release: http://i.imgur.com/MucW2HD.png As you can see, users can directly pick

Re: Question on providing CDH packages

2014-08-15 Thread Henry Saputra
Ah sorry Alan, did not see your reply to Owen. Mea culpa from me. - Henry On Fri, Aug 15, 2014 at 2:15 PM, Alan Gates wrote: > Sorry, apparently this was unclear, as others asked the same question. > Flink hasn't had any Apache releases yet. I was referring to the proposed > release that Rob

Re: Question on providing CDH packages

2014-08-15 Thread Henry Saputra
Hi Sean, I don't think Flink has done with a release yet. We are trying to do several RCs to get the one that good enough to ve voted on. - Henry On Fri, Aug 15, 2014 at 11:26 AM, Sean Owen wrote: > PS, sorry for being dense, but I don't see vendor packages at > http://flink.incubator.apache.or

Re: Question on providing CDH packages

2014-08-15 Thread Henry Saputra
Agree with Robert, ASF only releases source code. So the binary packages is just convenience from Flink that targeted specific Hadoop vendors. If you look at Apache Spark download page [1], they also do the same thing by providing distro specific binaries. AFAIK this should NOT be a problem and

Re: Question on providing CDH packages

2014-08-15 Thread Alan Gates
Sorry, apparently this was unclear, as others asked the same question. Flink hasn't had any Apache releases yet. I was referring to the proposed release that Robert sent out, http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/ Alan. Sean Owen August 15, 2

Re: Question on providing CDH packages

2014-08-15 Thread Alan Gates
+1 to not holding the release on this. Since the release is only the source*, if we later decide that CDH specific packages are ok we can add them in with no extra votes, etc. Alan. *Apache releases only source code. This is so that users, distributers, etc. can verify the integrity etc.

Re: Question on providing CDH packages

2014-08-15 Thread Robert Metzger
Hi, I'm glad you've brought this topic up. (Thank you also for checking the release!). I've used Spark's release script as a reference for creating ours (why reinventing the wheel, they have excellent infrastructure), and they had a CDH4 profile, so I thought its okay for Apache projects to have t

Re: Question on providing CDH packages

2014-08-15 Thread Sean Owen
PS, sorry for being dense, but I don't see vendor packages at http://flink.incubator.apache.org/downloads.html ? Is it this page? http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html That's more benign, just helping people rebuild for certain distros if desired. Can the example be ge

Re: Question on providing CDH packages

2014-08-15 Thread Owen O'Malley
As a mentor, I agree that vendor specific packages aren't appropriate for the Apache site. (Disclosure: I work at Hortonworks.) Working with the vendors to make packages available is great, but they shouldn't be hosted at Apache. .. Owen On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen wrote: > I h

Re: Question on providing CDH packages

2014-08-15 Thread Sean Owen
I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I have for example lobbied Spark to remove CDH-specific releases and build profiles. Not just for this reason, but because it is often unnecessary to have vendor-specific builds, and also just increases maintenance overhead for the pr

Question on providing CDH packages

2014-08-15 Thread Alan Gates
Let me begin by noting that I obviously have a conflict of interest since my company is a direct competitor to Cloudera. But as a mentor and Apache member I believe I need to bring this up. What is the Apache policy towards having a vendor specific package on a download site? It is strange t