Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 05:52, Milind Bhandarkar wrote: Ok, my mistake. They have only asked for documented specifications. I may have been influenced by all the specifications I have read. All of them were in English, which is characterized as a natural language. But then, if you are proposing a

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 23:57, Allen Wittenauer wrote: On May 13, 2011, at 3:53 PM, Ted Dunning wrote: But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 23:16, Doug Cutting wrote: On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk already. This

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 07:16, Doug Cutting wrote: Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a different task. +1 That

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Segel, Mike
But Cloudera's release is a bit murky. The math example is a bit flawed... X represents the set of stable releases. Y represents the set of available patches. C represents the set of Cloudera releases. So if C contains a release X(n) plus a set of patches that is contained in Y, Then does it

MAPREDUCE-5

2011-05-16 Thread Evert Lammerts
When Reducers start running during a certain job (mapred.reduce.slowstart.completed.maps = 0.8) it takes about 20 minutes before the DN stopd reacting. This seems to be due to a number of Exceptions in the TT - at least, it's the only place I'm seeing errors. The three recurring ones are

Acceptance tests

2011-05-16 Thread Evert Lammerts
Hi all, What acceptance tests are people using when buying clusters for Hadoop? Any pointers to relevant methods? Thanks, Evert Lammerts

Re: Acceptance tests

2011-05-16 Thread Allen Wittenauer
On May 16, 2011, at 11:03 AM, Evert Lammerts wrote: Hi all, What acceptance tests are people using when buying clusters for Hadoop? Any pointers to relevant methods? We get some test nodes from various manufacturers. We do some raw IO benchmarking vs. our other nodes. We add

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Eli Collins
On Mon, May 16, 2011 at 10:19 AM, Allen Wittenauer a...@apache.org wrote: On May 16, 2011, at 5:00 AM, Segel, Mike wrote: X represents the set of stable releases. Y represents the set of available patches. C represents the set of Cloudera releases. So if C contains a release X(n) plus a set

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Allen Wittenauer
On May 16, 2011, at 2:09 PM, Eli Collins wrote: Allen, There are few things in Hadoop in CDH that are not in trunk, branch-20-security, or branch-20-append. The stuff in this category is not major (eg HADOOP-6605, better JAVA_HOME detection). But that's my point: when is it no

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Ian Holsman
Does Hadoop compatibility and the ability to say includes Apache Hadoop only apply when we're talking about MR and HDFS APIs? It is confusing isn't it. We could go down the route java did and say that the API's are 'hadoop' and ours is just a reference implementation of it. (but

Re: Apache Hadoop Hackathon: 5/18 in Palo Alto and San Francisco

2011-05-16 Thread Jeff Hammerbacher
Hey, We've got a great group coming together again on Wednesday for an Apache Hadoop Hackathon in Palo Alto and San Francisco. Sign up at http://hadoophackathon.eventbrite.com. As a reminder, we'll have Nigel Daley, the release manager for 0.22, present in Palo Alto. If you have build and

Re: Apache Hadoop Hackathon: 5/18 in Palo Alto and San Francisco

2011-05-16 Thread Joe Stein
Any chance for something in the east (NYC) or do I need to start nagging the wife and kids that west coast weather is the way to go? I will post on the NYC HUG maybe we can get some Hack together contrib to Hadoop but maybe some evening/day that a few commiters on this list that are in NYC

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Scott Carey
On trademarks, what about the phrase: New distribution for Apache Hadoop? I've seen that used, and its something that replaces most of the stack. I believe Apache Hadoop is trademarked in this context, even if Hadoop alone isn't. Compatible with Apache Hadoop is a smaller issue, defining some

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Konstantin Boudnik
We have the following method coverage: Common ~60% HDFS ~80% MR ~70% (better analysis will be available after our projects are connected to Sonar, I think). While method coverage isn't completely adequate answer to your question, I'd say there is a possibility to sneak in some

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Eric Baldeschwieler
My understanding is that a history if defending your trade mark is more important than registration. Apache does defend Hadoop. --- E14 - typing on glass On May 16, 2011, at 6:52 PM, Segel, Mike mse...@navteq.com wrote: Let me clarify... I searched on Hadoop as a term in any TM. Nothing

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Andrew Purtell
On trademarks, what about the phrase: New distribution for Apache Hadoop? I've seen that used, and its something that replaces most of the stack. [...] A proprietary derivative work with most of the guts replaced is not an Apache Hadoop distribution, nor a distribution for Apache Hadoop.