Re: Defining Hadoop Compatibility -revisiting-

2011-05-31 Thread Owen O'Malley
On May 24, 2011, at 9:23 AM, Steve Loughran wrote: I've drafted a policy on the wiki based on this discussion. http://wiki.apache.org/hadoop/Defining%20Hadoop Others need to look at, edit, etc, then we can vote on whether to take it into the managed documentation. I think it looks

Re: Defining Hadoop Compatibility -revisiting-

2011-05-24 Thread Steve Loughran
I've drafted a policy on the wiki based on this discussion. http://wiki.apache.org/hadoop/Defining%20Hadoop Others need to look at, edit, etc, then we can vote on whether to take it into the managed documentation.

Re: Defining Hadoop Compatibility -revisiting-

2011-05-23 Thread Sanjay Radia
Agree. On May 12, 2011, at 11:16 PM, Doug Cutting wrote: Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a different

Re: Defining Hadoop Compatibility -revisiting-

2011-05-18 Thread Doug Cutting
On 05/17/2011 07:53 PM, Matthew Foley wrote: And this statement of permission in the publicly available FAQ constitutes a license, so it is imprecise to say that ASF doesn't license its trademarks. :-) That's not the way I interpret it. I believe that a license would be required to permit a

RE: Defining Hadoop Compatibility -revisiting-

2011-05-17 Thread Segel, Mike
:19 AM To: general@hadoop.apache.org Cc: Matthew Foley Subject: Re: Defining Hadoop Compatibility -revisiting- TESS only has registered trademarks -- that's the kind of trademark you put an (R) next to. But you can have an ordinary unregistered trademark -- the kind you put a tm next to -- just

Re: Defining Hadoop Compatibility -revisiting-

2011-05-17 Thread Doug Cutting
Matt, Have you read Apache's trademark policy page? http://www.apache.org/foundation/marks/ Apache does not generally license its trademarks. Constructions like, Acme Foo powered by Apache Bar are generally permitted as they are not deemed to create confusion about the origin of Bar. Cheers,

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 05:52, Milind Bhandarkar wrote: Ok, my mistake. They have only asked for documented specifications. I may have been influenced by all the specifications I have read. All of them were in English, which is characterized as a natural language. But then, if you are proposing a

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 23:57, Allen Wittenauer wrote: On May 13, 2011, at 3:53 PM, Ted Dunning wrote: But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 23:16, Doug Cutting wrote: On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk already. This

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Steve Loughran
On 13/05/11 07:16, Doug Cutting wrote: Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a different task. +1 That

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Segel, Mike
But Cloudera's release is a bit murky. The math example is a bit flawed... X represents the set of stable releases. Y represents the set of available patches. C represents the set of Cloudera releases. So if C contains a release X(n) plus a set of patches that is contained in Y, Then does it

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Eli Collins
On Mon, May 16, 2011 at 10:19 AM, Allen Wittenauer a...@apache.org wrote: On May 16, 2011, at 5:00 AM, Segel, Mike wrote: X represents the set of stable releases. Y represents the set of available patches. C represents the set of Cloudera releases. So if C contains a release X(n) plus a set

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Allen Wittenauer
On May 16, 2011, at 2:09 PM, Eli Collins wrote: Allen, There are few things in Hadoop in CDH that are not in trunk, branch-20-security, or branch-20-append. The stuff in this category is not major (eg HADOOP-6605, better JAVA_HOME detection). But that's my point: when is it no

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Ian Holsman
Does Hadoop compatibility and the ability to say includes Apache Hadoop only apply when we're talking about MR and HDFS APIs? It is confusing isn't it. We could go down the route java did and say that the API's are 'hadoop' and ours is just a reference implementation of it. (but

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Scott Carey
On trademarks, what about the phrase: New distribution for Apache Hadoop? I've seen that used, and its something that replaces most of the stack. I believe Apache Hadoop is trademarked in this context, even if Hadoop alone isn't. Compatible with Apache Hadoop is a smaller issue, defining some

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Konstantin Boudnik
We have the following method coverage: Common ~60% HDFS ~80% MR ~70% (better analysis will be available after our projects are connected to Sonar, I think). While method coverage isn't completely adequate answer to your question, I'd say there is a possibility to sneak in some

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Eric Baldeschwieler
My understanding is that a history if defending your trade mark is more important than registration. Apache does defend Hadoop. --- E14 - typing on glass On May 16, 2011, at 6:52 PM, Segel, Mike mse...@navteq.com wrote: Let me clarify... I searched on Hadoop as a term in any TM. Nothing

Re: Defining Hadoop Compatibility -revisiting-

2011-05-16 Thread Andrew Purtell
of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Mon, 5/16/11, Scott Carey sc...@richrelevance.com wrote: From: Scott Carey sc...@richrelevance.com Subject: Re: Defining Hadoop Compatibility -revisiting- To: general@hadoop.apache.org general@hadoop.apache.org Cc

Re: What is Hadoop? Was: Defining Hadoop Compatibility -revisiting-

2011-05-15 Thread Eric Baldeschwieler
Interesting point! I can see a future where there are many folks mixing and matching hadoop and non-hadoop components. Swapping out HDFS seems particularly popular. On May 13, 2011, at 4:17 PM, Ian Holsman wrote: ... I think thats a great idea. Maybe we should also create names/marks

Re: Defining Hadoop Compatibility -revisiting-

2011-05-15 Thread Eric Baldeschwieler
Good point. On May 12, 2011, at 11:16 PM, Doug Cutting wrote: Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a

Re: Defining Hadoop Compatibility -revisiting-

2011-05-15 Thread Eric Baldeschwieler
Good point. Tests are a must for the Hadoop community to meet its own goals (quality and backwards compatibility). Writing detailed specs for something that is evolving this quickly is challenging. Also in a lot of cases, documenting the current APIs to POSIX like detail will mainly

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
The way it has been done in JCK was a specs written in somewhat formalized language and a tool (called testgen, written in Perl if I remember correctly) which was dynamically generating a lot of lang tests. I think this is a middle ground Milind has mentioned. BTW, it was a _huge_ effort: Sun had

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
Certification semms like mission creep. Our mission is to produce open-source software. If we wish to produce testing software, that seems fine. But running a certification program for non-open-source software seems like a different task. The Hadoop mark should only be used to refer to

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
On Thu, May 12, 2011 at 20:40, Milind Bhandarkar mbhandar...@linkedin.com wrote: Cos, Can you give me an example of a system test that is not a functional test ? My assumption was that the functionality being tested is specific to a component, and that inter-component interactions (that's

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Milind Bhandarkar
Sure. As I said before, they are not mutually exclusive. Just stating my experience that specs without a test suite are of no use. If I were to prioritize, I would give priority to a TCK over natural-language specs. That's all. So far, I have seen many replacements for HDFS as InputFormat and

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Milind Bhandarkar
Cos, I remember the issues about the inter-component interactions at that point when you were part of the Yahoo Hadoop FIT team (I was on the other side of the same floor, remember ? ;-) Things like, Can Pig take full URIs as input, and so works with viewfs, Can Local jobtracker still use HDFS

What is Hadoop? Was: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Owen O'Malley
On Tue, May 10, 2011 at 3:29 AM, Steve Loughran ste...@apache.org wrote: I think we should revisit this issue before people with their own agendas define what compatibility with Apache Hadoop is for us I agree completely. As you point out, this week we've had a flood of products calling

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Nathan Roberts
Key seems to be how one would interpret version. Replace it with a synonym like variant and this may be the intent. On 5/13/11 9:50 AM, Doug Cutting cutt...@gmail.com wrote: Yes, but there's an is earlier in the sentence. Doug On May 13, 2011 3:44 PM, Ted Dunning tdunn...@maprtech.com

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 1:53 AM, Doug Cutting wrote: Here certified is probably just intended to mean that the software uses a certified open source license, e.g., listed at http://www.opensource.org/licenses/. However they should say that this includes or contains the various Apache products,

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Konstantin Boudnik
On Fri, May 13, 2011 at 00:11, Milind Bhandarkar mbhandar...@linkedin.com wrote: Cos, I remember the issues about the inter-component interactions at that point when you were part of the Yahoo Hadoop FIT team (I was on the other side of the same floor, remember ? ;-) Vaguely ;) Of course I

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are the two cases we permit. We used to say that it was

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 2:55 PM, Doug Cutting wrote: On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk already. This policy was clarified after that product

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 3:16 PM, Doug Cutting wrote: On 05/14/2011 12:13 AM, Allen Wittenauer wrote: So what do we do about companies that release a product that says includes Apache Hadoop but includes patches that aren't committed to trunk? We yell at them to get those patches into trunk

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Doug Cutting
On 05/14/2011 12:17 AM, Allen Wittenauer wrote: ... and if those patches are rejected by the community? It would be very strange, since they've mostly been released in 203, although not yet having been committed to trunk. Doug

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Roy T. Fielding
On May 13, 2011, at 2:55 PM, Doug Cutting wrote: On 05/13/2011 07:28 PM, Allen Wittenauer wrote: If it has a modified version of Hadoop (i.e., not an actual Apache release or patches which have never been committed to trunk), are they allowed to say includes Apache Hadoop? No. Those are

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Ted Dunning
But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this mean that your question reduces to the question of what happens when non-Apache changes are made

Re: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Allen Wittenauer
On May 13, 2011, at 3:53 PM, Ted Dunning wrote: But distribution Z includes X kind of implies the existence of some such that X != Y, Y != empty-set and X+Y = Z, at least in common usage. Isn't that the same as a non-trunk change? So doesn't this mean that your question reduces to the

Re: What is Hadoop? Was: Defining Hadoop Compatibility -revisiting-

2011-05-13 Thread Ian Holsman
On May 14, 2011, at 12:41 AM, Owen O'Malley wrote: On Tue, May 10, 2011 at 3:29 AM, Steve Loughran ste...@apache.org wrote: I think we should revisit this issue before people with their own agendas define what compatibility with Apache Hadoop is for us I agree completely. As you point

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Steve Loughran
On 12/05/2011 03:26, M. C. Srivas wrote: While the HCK is a great idea to check quickly if an implementation is compliant, we still need a written specification to define what is meant by compliance, something akin to a set of RFC's, or a set of docs like the IEEE POSIX specifications. For

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Steve Loughran
On 12/05/2011 00:20, Aaron Kimball wrote: What does it mean to implement those interfaces? I'm +1 for a TCK-based definition. In addition to statically implementing a set of interfaces, each interface also implicitly includes a set of acceptable inputs and predicted outputs (or ranges of

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Segel, Mike
While IANAL... As long as any implementation follows Apache's license regarding derivative works, it's fair game. (this is my understanding YMMV) The APL is very liberal in what one can do with a derivative work... Surely Apache has some lawyers who can summarize what is allowable when

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Milind Bhandarkar
HCK and written specifications are not mutually exclusive. However, given the evolving nature of Hadoop APIs, functional tests need to evolve as well, and having them tied to a current stable version is easier to do than it is to tie the written specifications. - milind -- Milind Bhandarkar

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Allen Wittenauer
On May 12, 2011, at 2:23 AM, Steve Loughran wrote: I think Sun NFS might be a good example of similar defacto standard, or MS SMB -it is up to others to show they are compatible with what is effective the reference implementation. Being closed source, there is no option for anyone to

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Konstantin Boudnik
TCK (or JCK initially) was done as a tool to basically compare Java Lang specs with a particular implementation including but not limited to an extensive suite of say compiler tests. So I assume before we can embark on any sort of HCK suite some formal specs would have to be defined. It's rather

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Konstantin Boudnik
On Thu, May 12, 2011 at 09:45, Milind Bhandarkar mbhandar...@linkedin.com wrote: HCK and written specifications are not mutually exclusive. However, given the evolving nature of Hadoop APIs, functional tests need to evolve as I would actually expand it to 'functional and system tests' because

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Milind Bhandarkar
The problem with (only) specs is that they are written in natural language, and subject to human interpretation, and since humans are bad at natural language interpretation, this gives rise to something called standards bodies and lawyers, and that has never been good for anyone in the past ;-)

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Milind Bhandarkar
Cos, Can you give me an example of a system test that is not a functional test ? My assumption was that the functionality being tested is specific to a component, and that inter-component interactions (that's what you meant, right?) would be taken care by the public interface and semantics of a

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Eric Baldeschwieler
label: print +1; goto label; I could not agree more with everything you said steve! The Apache Hadoop project should own the definition of Apache Hadoop. Hadoop is far from done. The interfaces need to keep evolving to get to a place where we can be proud of them. I support vendors

Re: Defining Hadoop Compatibility -revisiting-

2011-05-12 Thread Ted Dunning
I would say that an English spec with associated test suite is a middle ground. On Thu, May 12, 2011 at 9:52 PM, Milind Bhandarkar mbhandar...@linkedin.com wrote: Ok, my mistake. They have only asked for documented specifications. I may have been influenced by all the specifications I have

Re: Defining Hadoop Compatibility -revisiting-

2011-05-11 Thread Eric Baldeschwieler
This is a really interesting topic! I completely agree that we need to get ahead of this. I would be really interested in learning of any experience other apache projects, such as apache or tomcat have with these issues. --- E14 - typing on glass On May 10, 2011, at 6:31 AM, Steve Loughran

Re: Defining Hadoop Compatibility -revisiting-

2011-05-11 Thread Ted Dunning
As a specific example of how these are important, over in Mahout-land we have been wrestling with determining just what it means to have dependencies in the lib directory inside a jar. This isn't documented, behaves differently in different versions of Hadoop and means that some Mahout programs