Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to m

[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-923: -- Status: Patch Available (was: Open) > Allow setting logfile location in pig.properties >

Re: Pig 0.4.0 release

2009-08-17 Thread Dmitriy Ryaboy
Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I thi

RE: Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see t

RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
I have a question: Will we be able to fix piggybank sources given that Zebra needs 0.20 and the rest of Pig requires 0.18? If the answer is yes then, +1 for the release. I agree with the plan of making 0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1. Thanks, Santhosh ---

RE: Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
Hi Santhosh, What do you mean by "fixing piggybank"? Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:37 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release I have a question: Will we be able to fix piggybank sou

RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues with Piggybank? Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:43 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Santhosh,

Build failed in Hudson: Pig-Patch-minerva.apache.org #166

2009-08-17 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/changes Changes: [olga] PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart(olgan) -- [...truncated 111335 lines...] [exec] [junit] 09/08/17 20:

[jira] Commented: (PIG-923) Allow setting logfile location in pig.properties

2009-08-17 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744194#action_12744194 ] Hadoop QA commented on PIG-923: --- -1 overall. Here are the results of testing the latest attachm

[jira] Commented: (PIG-923) Allow setting logfile location in pig.properties

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744198#action_12744198 ] Dmitriy V. Ryaboy commented on PIG-923: --- As mentioned above, this patch does not require

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744209#action_12744209 ] Todd Lipcon commented on PIG-924: - Hey guys, Any word on this? From the packaging perspective

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744216#action_12744216 ] Todd Lipcon commented on PIG-924: - Oops, apparently it is Monday and my brain is scrambled. Ab

RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Rephrasing my question: Till we release 0.5.0, will zebra's requirement on hadoop-0.20 prevent fixing of any bugs/issues with Piggybank? Santhosh -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:47 PM To: pig-dev@hadoop.apache.or

[jira] Commented: (PIG-824) SQL interface for Pig

2009-08-17 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744238#action_12744238 ] Thejas M Nair commented on PIG-824: --- JFlex.jar (required for build this patch) can be downlo

RE: Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
I don't think so. Each contrib project for now has a separate build.xml. Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:47 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Till we release 0.5.0, will zebra's r

Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi
Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to

[jira] Updated: (PIG-925) Fix join in local mode

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Attachment: PIG-925-1.patch We do not handle LOJoin at all in local mode. Previously, LOJoin is converted to Co

[jira] Updated: (PIG-925) Fix join in local mode

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Status: Patch Available (was: Open) > Fix join in local mode > -- > > Key:

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Olga Natkovich
+1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is com

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744273#action_12744273 ] Daniel Dai commented on PIG-924: I am reviewing the patch. > Make Pig work with multiple vers

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
My vote is -1 -Original Message- From: Santhosh Srinivasan Sent: Monday, August 17, 2009 4:38 PM To: 'pig-dev@hadoop.apache.org' Subject: RE: Proposal to create a branch for contrib project Zebra Is there any precedence for such proposals? I am not comfortable with extending committer a

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: M

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Olga Natkovich
Raghu is PMC member and as such already has committer rights to all subprojects. So we are not breaking any new grounds here. The reasoning is the same as for creating branches for Pig multiquery work that we did in Pig. Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Yiping Han
+1 On 8/18/09 7:11 AM, "Olga Natkovich" wrote: > +1 > > -Original Message- > From: Raghu Angadi [mailto:rang...@yahoo-inc.com] > Sent: Monday, August 17, 2009 4:06 PM > To: pig-dev@hadoop.apache.org > Subject: Proposal to create a branch for contrib project Zebra > > > Thanks to the

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Its good to know that Raghu Angadi is a PMC member and that he has committer rights to all subprojects. That's besides the point. The example of a branch for multi-query is not quite right. Multi-query was part of the pig development efforts and not a contrib project. Raghu is suggesting that he

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744305#action_12744305 ] Todd Lipcon commented on PIG-924: - Couple notes on the patch: - you've turned javac.deprecati

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Milind A Bhandarkar
IANAC, but my (non-binding) vote is also -1. I think all the improvements and feature addition to zebra should be available through pig trunk. The codebase is not big enough to justify creating a branch. If the reason is Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry shou

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Olga Natkovich
Over time the plan is to move Zebra to a subproject. Until this is done, they need to have an environment where they can do their work efficiently. I am not sure what is the concern with allowing them to have a dev branch. Olga -Original Message- From: Santhosh Srinivasan Sent: Monday, A

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Efficiently is a subjective term. When zebra was made a contrib project, it was very clear that they will have growing pains. If efficiency was a top priority then zebra should have chosen the incubation route. There will be no oversight and control into what goes into contrib. This is a very bad

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744307#action_12744307 ] Dmitriy V. Ryaboy commented on PIG-924: --- Thanks for looking, Todd -- most of those chang

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744310#action_12744310 ] Todd Lipcon commented on PIG-924: - Gotcha, thanks for explaining. Aside from the nits, patch l

Build failed in Hudson: Pig-Patch-minerva.apache.org #167

2009-08-17 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/ -- [...truncated 111282 lines...] [exec] [junit] 09/08/18 01:01:56 INFO dfs.DataNode: PacketResponder 2 for block blk_3027939285115887556_1011 terminating [exec]

[jira] Commented: (PIG-925) Fix join in local mode

2009-08-17 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744316#action_12744316 ] Hadoop QA commented on PIG-925: --- -1 overall. Here are the results of testing the latest attachm

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Arun C Murthy
On Aug 17, 2009, at 4:38 PM, Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. There has been sufficient prece

[jira] Commented: (PIG-833) Storage access layer

2009-08-17 Thread Jeff Hammerbacher (JIRA)
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744323#action_12744323 ] Jeff Hammerbacher commented on PIG-833: --- Hey, Raghu, you mention that a design document

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Giridharan Kesavan's omission as a committer is an oversight on part of the hadoop team. Ideally, he should be listed as a release engineer with committer privileges Secondly, QA/Release/etc are necessarily evils to ship a high quality product while contrib projects are not. That leaves us with c

[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-08-17 Thread Jeff Hammerbacher (JIRA)
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744326#action_12744326 ] Jeff Hammerbacher commented on PIG-823: --- Hey, Great to see Owl source! I've filed a tic

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Arun C Murthy
That leaves us with contrib committers. Can you point to earlier email threads that cover the topic of giving committer access to contrib projects? Specifically, what does it mean to award someone committer privileges to a contrib project, what are the access privileges that come with such ri

RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
After a lot of back and forth and information sharing, its clear in my mind that branches are not required for contrib projects. My vote remains -1 Thanks, Santhosh -Original Message- From: Milind A Bhandarkar [mailto:mili...@yahoo-inc.com] Sent: Monday, August 17, 2009 5:32 PM To: pig-

[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-911: -- Status: Open (was: Patch Available) > [Piggybank] SequenceFileLoader > -

[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-911: -- Attachment: pig_911.2.patch Addressed Alan's comments. > [Piggybank] SequenceFileLoader > --

[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-911: -- Status: Patch Available (was: Open) > [Piggybank] SequenceFileLoader > -

[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744343#action_12744343 ] Dmitriy V. Ryaboy commented on PIG-911: --- Concerning making this a StoreFunc, as well --

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi
Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. (b): My experience with Hadoop is that "Contrib" by definition is very loosely coupled with core. By convention, we as committers to core (hdfs, mapred, etc) did not hav

[jira] Commented: (PIG-833) Storage access layer

2009-08-17 Thread Raghu Angadi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744361#action_12744361 ] Raghu Angadi commented on PIG-833: -- will try to get some initial docs attached to this jira

[jira] Updated: (PIG-925) Fix join in local mode

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Status: Patch Available (was: Open) > Fix join in local mode > -- > > Key:

[jira] Updated: (PIG-925) Fix join in local mode

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Attachment: PIG-925-2.patch Address the javac warning > Fix join in local mode > -- > >

[jira] Updated: (PIG-925) Fix join in local mode

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Status: Open (was: Patch Available) > Fix join in local mode > -- > > Key:

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi
The reason for a branch is purely based on fair number of improvements we are planning for Zebra and our desire to have a stable Zebra implementation for users to use along with PIG on Hadoop-0.20. New features planned (jiras will be filed soon) : * Column security (different permissions f

Hudson build is back to normal: Pig-Patch-minerva.apache.org #168

2009-08-17 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/168/

[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader

2009-08-17 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744373#action_12744373 ] Hadoop QA commented on PIG-911: --- +1 overall. Here are the results of testing the latest attachm

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi
Raghu Angadi wrote: Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. [...] Reason for (a) is simple scalability. We can not monitor everything. If I meant to say "Reason for (b)" (why contrib commits are treated bi

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744378#action_12744378 ] Daniel Dai commented on PIG-924: Hi, Dmitriy, Generally the patch is good. Just like Todd sai

[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-17 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-926: - Attachment: (was: mj_phase2_1.patch) > Merge-Join phase 2 > -- > >

[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-17 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744382#action_12744382 ] Ashutosh Chauhan commented on PIG-926: -- Additionally, with this approach, there will not