RE: [DISCUSS] Hadoop Security Release off Yahoo! patchset

Severance, Steve Tue, 18 Jan 2011 10:51:07 -0800

I want to thank Yahoo! for this release. At eBay we are very excited about the 
opportunity to test a build of Hadoop that has already been extensively field 
tested on large clusters. At eBay we are primarily concerned with cluster 
availability and throughput so having a build like this available to the 
community is a huge win.

Hats off to Arun, Eric and everyone at Yahoo! for releasing this.

Steve

-----Original Message-----
From: Eric Baldeschwieler [mailto:eri...@yahoo-inc.com] 
Sent: Friday, January 14, 2011 10:25 AM
To: general@hadoop.apache.org
Cc: general@hadoop.apache.org
Subject: Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

Hi Ian,

Thanks for holding off on that last .5. I've been working in a big email giving 
move context on this. Let me preview some issues. 

Our goal with this branch is two fold: 1) get the code out in a branch quickly 
so we an collaborate on it with the community. 2) not change the character of 
the code. See testing below. We're happy to compromise any other dimension, as 
long as we can do 1&2 above. 

1) I agree this is not a good precedent. We don't support mega-patches in 
general. We are doing this as part of discontinuing the "yahoo distribution of 
Hadoop".  We don't plan to continue doing 30 person year projects outside 
apache and then merging them in!!

2) append is hard. It is so hard we rewrote the entire write pipeline (5 
person-years work) in trunk after giving up on the codeline you are suggesting 
we merge in. That work is what distinguishes all post 20 releases from 20 
releases in my mind. I dont trust the 20 append code line. We've been hurt 
badly by it.  We did the rewrite only after losing a bunch of production data a 
bunch of times with the previous code line.  I think the various 20 append 
patch lines may be fine for specialized hbase clusters, but they doesn't have 
the rigor behind them to bet your business in them.

3) I think having a very stable recent codeline available for teams coming into 
Hadoop who want to run big business apps and contribute code back is very 
helpful. I've been talking to folks in other orgs and they've expressed a huge 
amount of interest in this work, but begged us to put it into apache, so their 
oversight bodies will let them use it. 

4) we're happy to incorporate ideas into how to best merge the work into trunk. 
Let's find the most cost effective way to preserve the most devel data 
possible. 

5) testing. Ian, I think you do us a disservice when you talk about us just 
testing in our environments. If you look at the history of the project, we've 
been the force behind every stable release of apache Hadoop.  And all the 
non-apache Hadoop release had been tracking this patch set. We fully support 
the community developing independent testing capabilities.  We plan to 
contribute to that effort.  But we are the organization with far and away the 
best record for testing Hadoop. 

We are proud of thus release, we want to share it. Help us sort out how. 

Thanks!

---
E14 - via iPhone

On Jan 14, 2011, at 6:15 AM, "Ian Holsman" <had...@holsman.net> wrote:

> (with my Apache hat on)
> I'm -0.5 on doing this as one big mega-patch and not including append (as 
> opposed to a series of smaller patches).
> 
> for the following reasons:
> 
> 1. It encourages bad behavior. We want discussion (and development) to happen 
> on the lists, not in some office. By allowing these large code-dumps it 
> condones this behavior, and we will likely see it again and again. Like it or 
> not, this is not the apache model of open source governance. 
> 
> 2. There is a risk that some code that is not in a JIRA or separate patch 
> creeps in unwittingly. This isn't a major deal per se, but we don't really 
> have the proper paper trail, or the documentation on what bug it fixed etc 
> etc.
> 
> 3. Other groups (Facebook for example) are running with their own set of 
> patches. They currently have the luxury of examining each individual patch to 
> decide if they want to integrate it (and test it) in their environment. We 
> are forcing them to do the work of finding the bits they want in this huge 
> patch.
> 
> 4. By not including the append patch, we are making this release unusable for 
> a large portion of our community who run hbase.
> 
> 5. It makes it very hard to test. While It makes me comfortable that it has 
> gone through Yahoo!'s QA and is running in their environments, it doesn't 
> mean that it will work in other organizations who have different workload 
> mixes and software running on them. With one huge patch it makes it all or 
> nothing.. either they take the code-drop and perform a large QA-integration 
> effort, or they forgo the whole patch together.
> 
> 
> **BUT** we have both the Yahoo! & Cloudera guys happy to do it, and to spend 
> their time doing it.. so I think having the code-drop will put us in a better 
> place then where we are.
> 
> 
> BTW, I'd like to point out a discrepancy here:
> 
> On another thread discussing hadoop-0.20-append as a separate branch, most 
> people agreed that new features shouldn't be added to 0.20, now we have a 
> major feature and we are all gung ho for it.. 
> 
> --Ian
> 
> On Jan 14, 2011, at 2:21 AM, Arun C Murthy wrote:
> 
>> 
>> On Jan 13, 2011, at 10:59 PM, Stack wrote:
>> 
>>> (Man, it was looking good there for a second when 0.20.100 was about
>>> security+append!)
>>> 
>>> Good luck w/ the release Arun.
>>> 
>> 
>> Thanks!
>> 
>>> We might be following your 0.20.100 with a 0.20.200 append.
>>> 
>> 
>> Super!
>> 
>> Arun
>

RE: [DISCUSS] Hadoop Security Release off Yahoo! patchset

Reply via email to