Re: [DISCUSS] Proposed bylaws for Hadoop

2010-10-21 Thread Tom White
On Wed, Oct 20, 2010 at 3:15 PM, Chris Douglas cdoug...@apache.org wrote: On Wed, Oct 20, 2010 at 10:14 AM, Nigel Daley nda...@mac.com wrote: FWIW, PMC does not generally operate by majority but by consensus. was given as a rationale when explaining to an existing PMC member why it was ok to

bringing the codebases back in line

2010-10-21 Thread Ian Holsman
Hi guys. I wanted to start a conversation about how we could merge the the cloudera + yahoo distribtutions of hadoop into our codebase, and what would be required.

Re: bringing the codebases back in line

2010-10-21 Thread Allen Wittenauer
On Oct 21, 2010, at 12:13 PM, Ian Holsman wrote: Hi guys. I wanted to start a conversation about how we could merge the the cloudera + yahoo distribtutions of hadoop into our codebase, and what would be required. *grabs popcorn*

Re: bringing the codebases back in line

2010-10-21 Thread Ian Holsman
so what do you think is required to get them into a release? On Thu, Oct 21, 2010 at 4:00 PM, Owen O'Malley omal...@apache.org wrote: On Oct 21, 2010, at 12:13 PM, Ian Holsman wrote: I wanted to start a conversation about how we could merge the the cloudera + yahoo distribtutions of

Re: bringing the codebases back in line

2010-10-21 Thread Owen O'Malley
On Oct 21, 2010, at 2:00 PM, Ian Holsman wrote: so what do you think is required to get them into a release? I'd planned to start making a release next month. -- Owen

Re: bringing the codebases back in line

2010-10-21 Thread Ian Holsman
yep.. I've heard it's a source of contention... but I'd like to see how we can get it so the amount of patches that the large companies apply on top of the current production apache release gets minimized, and the large installations are all running nearly identical code on their clusters, and

Re: bringing the codebases back in line

2010-10-21 Thread Owen O'Malley
On Oct 21, 2010, at 2:54 PM, Eli Collins wrote: It's worth double checking. When we added the YDH patch set to CDH3 we ran a script to see which patches were in YDH but not yet in trunk and it turned up around 100 or so patches. If you could generate a list, that would be useful for tracking

Re: bringing the codebases back in line

2010-10-21 Thread Owen O'Malley
On Oct 21, 2010, at 3:19 PM, Doug Cutting wrote: Cloudera's distribution is based on Y!'s 0.20 distribution, together with patches from the Apache 0.20-append branch, Cloudera's Distribution of Hadoop includes many tools from outside of Hadoop and even outside of Apache. -- Owen

Re: bringing the codebases back in line

2010-10-21 Thread Allen Wittenauer
On Oct 21, 2010, at 2:53 PM, Ian Holsman wrote: yep.. I've heard it's a source of contention... Sure. Maybe like 8 months ago to anyone who was paying attention. In discussing it with people, I've heard that a major issue (not the only one i'm sure) is lack of resources to actually

Re: bringing the codebases back in line

2010-10-21 Thread Ian Holsman
right.. Cloudera is bundling it's add-ons into a single tarball to make it easier to install. but my main bone of contention here is not in the bundling, but that in order for those tools to work, they need to make changes to the base hadoop package. In my ideal world, I'd like to be able to

Re: bringing the codebases back in line

2010-10-21 Thread Konstantin Boudnik
On Thu, Oct 21, 2010 at 05:53PM, Ian Holsman wrote: In discussing it with people, I've heard that a major issue (not the only one i'm sure) is lack of resources to actually test the apache releases on large clusters, and that it is very hard getting this done in short cycles (hence the large

Re: bringing the codebases back in line

2010-10-21 Thread Eli Collins
On Thu, Oct 21, 2010 at 3:30 PM, Jakob Homan jho...@yahoo-inc.com wrote: It's worth double checking.  When we added the YDH patch set to CDH3 we ran a script to see which patches were in YDH but not yet in trunk and it turned up around 100 or so patches. If the patch was just checking 1:1

Re: bringing the codebases back in line

2010-10-21 Thread Arun C Murthy
Ian, On Oct 21, 2010, at 4:50 PM, Ian Holsman wrote: but the other question I have which hopefully you guys can answer is does the yahoo distribution have ALL the patches from the trunk on it? because if it doesn't I think that is problematic as well for other reasons. Yahoo put security

Re: bringing the codebases back in line

2010-10-21 Thread Arun C Murthy
On Oct 21, 2010, at 5:17 PM, Eli Collins wrote: On Thu, Oct 21, 2010 at 3:30 PM, Jakob Homan jho...@yahoo-inc.com wrote: If the patch was just checking 1:1 Jira to patch, it would certainly not work. We were uploading multiple patches to the same JIRA to avoid opening extraneous issues

Re: Limiting concurrent maps

2010-10-21 Thread Arun C Murthy
On Oct 21, 2010, at 5:30 PM, Michael Moores wrote: I don't see how the capacity scheduler could limit the number of maps running concurrently across the whole cluster, even if this is the only job running. Easy, set a maximum limit on the queue. Arun

Re: bringing the codebases back in line

2010-10-21 Thread Eli Collins
On Thu, Oct 21, 2010 at 4:50 PM, Ian Holsman had...@holsman.net wrote: right.. Cloudera is bundling it's add-ons into a single tarball to make it easier to install. CDH contains a number of different projects, however each project has a distinct tarball (and packages). The tarball is

Re: bringing the codebases back in line

2010-10-21 Thread Milind A Bhandarkar
but the other question I have which hopefully you guys can answer is does the yahoo distribution have ALL the patches from the trunk on it? because if it doesn't I think that is problematic as well for other reasons. What are these other reasons ? yahoo distribution runs on our production

Re: bringing the codebases back in line

2010-10-21 Thread Ian Holsman
On Thu, Oct 21, 2010 at 8:42 PM, Milind A Bhandarkar mili...@yahoo-inc.comwrote: but the other question I have which hopefully you guys can answer is does the yahoo distribution have ALL the patches from the trunk on it? because if it doesn't I think that is problematic as well for

Re: bringing the codebases back in line

2010-10-21 Thread Arun C Murthy
I was merely pointing out, given the number of interested parties on that jira, that having Hadoop RPMs for Linux is very desirable. We could have a technical discussion on the ways to go about doing RPMs, debs etc., but it is clear that there is a need for something more than tgz

Re: bringing the codebases back in line

2010-10-21 Thread Milind A Bhandarkar
right.. the trunk is not for production use. I wasn't suggesting that. So, what are you suggesting ? That Yahoo distribution of Hadoop should *not* be the version we run on our production clusters ? but the trunk is what will eventually become the next release. Then someone in yahoo