Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
The way it has been done in JCK was a specs written in somewhat formalized language and a tool (called testgen, written in Perl if I remember correctly) which was dynamically generating a lot of lang tests. I think this is a middle ground Milind has mentioned. BTW, it was a _huge_ effort: Sun had

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a different task. The Hadoop mark should only be used to refer to

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
On Thu, May 12, 2011 at 20:40, Milind Bhandarkar mbhandar...@linkedin.com wrote: Cos, Can you give me an example of a system test that is not a functional test ? My assumption was that the functionality being tested is specific to a component, and that inter-component interactions (that's

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Milind Bhandarkar
Sure. As I said before, they are not mutually exclusive. Just stating my experience that specs without a test suite are of no use. If I were to prioritize, I would give priority to a TCK over natural-language specs. That's all. So far, I have seen many replacements for HDFS as InputFormat and

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Milind Bhandarkar
Cos, I remember the issues about the inter-component interactions at that point when you were part of the Yahoo Hadoop FIT team (I was on the other side of the same floor, remember ? ;-) Things like, Can Pig take full URIs as input, and so works with viewfs, Can Local jobtracker still use HDFS

RE: Stability issue - dead DN's

2011-05-13 Thread Evert Lammerts
Hi Mike, You really really don't want to do this. Long story short... It won't work. Can you elaborate? Are you talking about the bonded interfaces or about having a separated network for interconnects and external network? What can go wrong there? Just a suggestion.. You don't want

Re: Stability issue - dead DN's

2011-05-13 Thread Segel, Mike
Bonded will work but you may not see the performance you would expect. If you need 1 GBe, go 10GBe less headache and has even more headroom. Multiple interfaces won't work. Or I should say didn't work in past releases. If you think about it, clients have to connect to each node. So having two

What is Hadoop? Was: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Owen O'Malley
On Tue, May 10, 2011 at 3:29 AM, Steve Loughran ste...@apache.org wrote: I think we should revisit this issue before people with their own agendas define what compatibility with Apache Hadoop is for us I agree completely. As you point out, this week we've had a flood of products calling

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Nathan Roberts
Key seems to be how one would interpret version. Replace it with a synonym like variant and this may be the intent. On 5/13/11 9:50 AM, Doug Cutting cutt...@gmail.com wrote: Yes, but there's an is earlier in the sentence. Doug On May 13, 2011 3:44 PM, Ted Dunning tdunn...@maprtech.com

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 1:53 AM, Doug Cutting wrote: Here certified is probably just intended to mean that the software uses a certified open source license, e.g., listed at http://www.opensource.org/licenses/. However they should say that this includes or contains the various Apache products,

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
On Fri, May 13, 2011 at 00:11, Milind Bhandarkar mbhandar...@linkedin.com wrote: Cos, I remember the issues about the inter-component interactions at that point when you were part of the Yahoo Hadoop FIT team (I was on the other side of the same floor, remember ? ;-) Vaguely ;) Of course I

RE: Stability issue - dead DN's

2011-05-13 Thread Evert Lammerts
Hi Mike, Thanks for trying to help out. I had a talk with our networking guys this afternoon. According to them (and this is way out of my area of expertise, so excuse any mistakes) multiple interfaces shouldn't be a problem. We could set up a nameserver to resolve hostnames to addresses in

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are the two cases we permit. We used to say that it was

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 2:55 PM, Doug Cutting wrote: On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk already. This policy was clarified after that product

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 3:16 PM, Doug Cutting wrote: On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/14/2011 12:17 AM, Allen Wittenauer wrote: ... and if those patches are rejected by the community? It would be very strange, since they've mostly been released in 203, although not yet having been committed to trunk. Doug

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Roy T. Fielding
On May 13, 2011, at 2:55 PM, Doug Cutting wrote: On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are

Re: Stability issue - dead DN's

2011-05-13 Thread Segel, Mike
Ok... Hum, look, I've been force fed a couple of margaritas so, my memory is a bit foggy... You say your clients connect on nic A. Your cluster connects on nic B. What happens when you want to upload a file from your client to HDFS? Or even access it? ... ;-) Sent from a remote device.

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Ted Dunning
But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this mean that your question reduces to the question of what happens when non-Apache changes are made

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 3:53 PM, Ted Dunning wrote: But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this mean that your question reduces to the

Re: What is Hadoop? Was: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Ian Holsman
On May 14, 2011, at 12:41 AM, Owen O'Malley wrote: On Tue, May 10, 2011 at 3:29 AM, Steve Loughran ste...@apache.org wrote: I think we should revisit this issue before people with their own agendas define what compatibility with Apache Hadoop is for us I agree completely. As you point