[jira] Subscription: PIG patch available

2013-05-03 Thread jira
Issue Subscription Filter: PIG patch available (23 issues) Subscriber: pigdaily Key Summary PIG-3297Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc https://issues.apache.org/jira/browse/PIG-3297 PIG-3295Casting from bytearra

[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3311: --- Patch Info: Patch Available > add pig-withouthadoop-h2 to mvn-jar > ---

[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Bill Graham (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648931#comment-13648931 ] Bill Graham commented on PIG-3311: -- +1 > add pig-withouthadoop-h2 to mvn-j

Re: A major addition to Pig. Working with spatial data

2013-05-03 Thread Daniel Dai
I am not sure how other Apache projects dealing with it? Seems Solr also has some connector to JTS? Thanks, Daniel On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy wrote: > Thanks Alan for your interest. It's too bad that an open source licensing > issue is holding me back from doing some open so

[jira] [Updated] (PIG-3312) Pig duplicates avro records

2013-05-03 Thread Hans Uhlig (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Uhlig updated PIG-3312: Attachment: twitter.json twitter.avsc twitter.avro > Pig duplicates avro

[jira] [Created] (PIG-3312) Pig duplicates avro records

2013-05-03 Thread Hans Uhlig (JIRA)
Hans Uhlig created PIG-3312: --- Summary: Pig duplicates avro records Key: PIG-3312 URL: https://issues.apache.org/jira/browse/PIG-3312 Project: Pig Issue Type: Bug Components: impl Affe

[jira] [Commented] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648866#comment-13648866 ] Daniel Dai commented on PIG-2248: - That will be great! Thanks > Pig parser

[jira] [Commented] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-05-03 Thread Johnny Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648863#comment-13648863 ] Johnny Zhang commented on PIG-2248: --- Daniel, this is a good idea, and in bigger picture. 3

Re: Function To Compute Product of Values in Bag

2013-05-03 Thread Julien Le Dem
As for the PRODUCT, I don't see why it could not be added to builtin. It is a very generic and dependency less function. On Fri, May 3, 2013 at 1:36 PM, Sergey Goder wrote: > Thanks for the tip about numerical accuracy issues and the elegant solution > exploiting log/exp. It is very much apprec

Pig package supporting both hadoop 1 and 2

2013-05-03 Thread Julien Le Dem
Hi Pig developers, I'm looking into having a Pig package that works both for Hadoop 1.0 and Hadoop 2.0 That means have both pig*.jar and pig*-h2.jar in the package and choosing the right one dynamically. In particular I created this JIRA as a first step: https://issues.apache.org/jira/browse/PIG

[jira] [Updated] (PIG-3223) AvroStorage does not handle comma separated input paths

2013-05-03 Thread Johnny Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-3223: -- Attachment: PIG-3223.branch-0.11.patch.txt Rohini, thanks for your +1 on RB. Here is the patch for branch

[jira] [Created] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3311: -- Summary: add pig-withouthadoop-h2 to mvn-jar Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Com

[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3311: --- Attachment: PIG-3311.patch PIG-3311.patch adds -withouthadoop to the mvn-jar target >

[jira] [Commented] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648841#comment-13648841 ] Daniel Dai commented on PIG-2248: - What I mean is something like (for illustration only, may

[jira] [Updated] (PIG-2873) Converting bin/pig shell script to python

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2873: Resolution: Fixed Fix Version/s: 0.12 Hadoop Flags: Reviewed Status: Resolved (was: Pa

[jira] [Commented] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-05-03 Thread Johnny Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648838#comment-13648838 ] Johnny Zhang commented on PIG-2248: --- Daniel, can you explain the term "symbol resolving or

Re: CHANGES.txt in trunk

2013-05-03 Thread Alan Gates
What do mean by remove? They should still be in the file. They may need to be relocated under the 0.11 section. But the trunk CHANGES file should include all changes that are on trunk. Alan. On May 3, 2013, at 1:34 PM, Rohini Palaniswamy wrote: > Hi, > I see lot of patches that went into

[jira] [Commented] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648831#comment-13648831 ] Daniel Dai commented on PIG-2248: - Actually, I feel the check is very involved in the patch.

[jira] [Commented] (PIG-3287) MultiQueryOptimizer can prevent CombinerOptimizer from working

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648815#comment-13648815 ] Daniel Dai commented on PIG-3287: - Yes, we weight multiquery over combiner, though it is not

dev@pig.apache.org

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648805#comment-13648805 ] Daniel Dai commented on PIG-3293: - Must be the "caster" in D's POCast is null. Can you attac

[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2013-05-03 Thread Martin Gerlach (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648793#comment-13648793 ] Martin Gerlach commented on PIG-1824: - Doesn't work for me, either (with codecs module).

Re: CHANGES.txt in trunk

2013-05-03 Thread Rohini Palaniswamy
I will put up the patch Daniel. Thanks, Rohini On Fri, May 3, 2013 at 1:38 PM, Daniel Dai wrote: > Sure, I used to clean this up before release, but not strictly follow this > rule. Patch welcome. > > Thanks, > Daniel > > > On Fri, May 3, 2013 at 1:34 PM, Rohini Palaniswamy > wrote: > > > Hi,

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Koji Noguchi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648788#comment-13648788 ] Koji Noguchi commented on PIG-3251: --- bq. FYI, couple of tests from TestBZip are failing af

Re: CHANGES.txt in trunk

2013-05-03 Thread Daniel Dai
Sure, I used to clean this up before release, but not strictly follow this rule. Patch welcome. Thanks, Daniel On Fri, May 3, 2013 at 1:34 PM, Rohini Palaniswamy wrote: > Hi, >I see lot of patches that went into 0.11 are under trunk in the > CHANGES.txt. Should we sync the file with the CHA

Re: Function To Compute Product of Values in Bag

2013-05-03 Thread Sergey Goder
Thanks for the tip about numerical accuracy issues and the elegant solution exploiting log/exp. It is very much appreciated. Sergey On Fri, May 3, 2013 at 11:42 AM, Kai Londenberg < kai.londenb...@googlemail.com> wrote: > Hi, > > Just a hint: It's usually better to work with log probabilites an

CHANGES.txt in trunk

2013-05-03 Thread Rohini Palaniswamy
Hi, I see lot of patches that went into 0.11 are under trunk in the CHANGES.txt. Should we sync the file with the CHANGES.txt in branch-0.11 and remove those jiras from trunk that went into 0.11? What is the usual process of updating CHANGES.txt when a jira is checked both into a branch and also

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648776#comment-13648776 ] Daniel Dai commented on PIG-3251: - bq. Or, are you suggesting I create two silly wrappers in

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Koji Noguchi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648745#comment-13648745 ] Koji Noguchi commented on PIG-3251: --- FYI, couple of tests from TestBZip are failing after

Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

2013-05-03 Thread Rohini Palaniswamy
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10351/#review20138 --- Ship it! Thanks Johnny. Looks good. - Rohini Palaniswamy On May

[jira] [Updated] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Koji Noguchi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3251: -- Attachment: pig-3251-trunk-v05.patch Thanks Daniel. bq.is the patch ready? Ah, forgot to flag it as patc

[jira] [Updated] (PIG-3223) AvroStorage does not handle comma separated input paths

2013-05-03 Thread Johnny Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-3223: -- Attachment: PIG-3223.patch.txt latest patch addressed Rohini's comments in RB > AvroStor

Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

2013-05-03 Thread Johnny Zhang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10351/ --- (Updated May 3, 2013, 7:30 p.m.) Review request for pig. Changes --- lat

Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

2013-05-03 Thread Johnny Zhang
> On May 3, 2013, 6:53 p.m., Rohini Palaniswamy wrote: > > Thanks for the comments, Rohini! appreciate. I will post the revised patch very soon. - Johnny --- This is an automatically generated e-mail. To reply, visit: https://reviews.a

Re: Review Request: PIG-3223 AvroStorage does not handle comma separated input paths

2013-05-03 Thread Rohini Palaniswamy
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10351/#review20127 --- contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/stora

[jira] [Updated] (PIG-3309) TestJsonLoaderStorage fails with IBM JDK 6/7

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3309: Resolution: Fixed Fix Version/s: (was: 0.11.2) 0.12 Hadoop Flags: Reviewed

[jira] [Updated] (PIG-3309) TestJsonLoaderStorage fails with IBM JDK 6/7

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3309: Assignee: Leonardo Rangel Augusto > TestJsonLoaderStorage fails with IBM JDK 6/7 > --

Re: Function To Compute Product of Values in Bag

2013-05-03 Thread Kai Londenberg
Hi, Just a hint: It's usually better to work with log probabilites and sum over them, than to work with raw probabilities and to use multiplication. You might easily run into numerical accuracy issues otherwise. i.e. exploit this fact: product(x1, ..., xn) = exp(sum(log(x1), ..., log(xn))) best

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648664#comment-13648664 ] Daniel Dai commented on PIG-3251: - Hi, [~knoguchi], is the patch ready? Some comments for pi

Function To Compute Product of Values in Bag

2013-05-03 Thread Sergey Goder
I'm creating a multinomial naive bayes classifier using pig and need to compute the product of probabilities. There are an arbitrary number of values in the bag so I would like to be able to use a function similar to the builtin SUM to do this. I looked through the source code and found that with s

[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-03 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Attachment: PIG-3307_2.patch PIG-3307_2.patch removes the unused parameter in getNext(\*)

[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648594#comment-13648594 ] Daniel Dai commented on PIG-3307: - Any performance implication for this change?

[jira] [Commented] (PIG-2586) A better plan/data flow visualizer

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648591#comment-13648591 ] Daniel Dai commented on PIG-2586: - Looks good. In the schedule, can we get some sample pages

[jira] [Updated] (PIG-3308) Storing data in hive columnar rc format

2013-05-03 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3308: Resolution: Fixed Fix Version/s: (was: 0.10.1) 0.12 Assignee: Marcin Cz

[jira] [Updated] (PIG-3308) Storing data in hive columnar rc format

2013-05-03 Thread Marcin Czech (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Czech updated PIG-3308: -- Attachment: PIG-3308.patch Yes my mistake. Problem was only in the test file. Now should be fine. Code is

[jira] [Updated] (PIG-3308) Storing data in hive columnar rc format

2013-05-03 Thread Marcin Czech (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Czech updated PIG-3308: -- Attachment: (was: PIG-3308.patch) > Storing data in hive columnar rc format > ---

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-05-03 Thread Koji Noguchi (JIRA)
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648464#comment-13648464 ] Koji Noguchi commented on PIG-3251: --- bq. Using hadoop's bzip codec on 0.23/2.0 would have

[jira] [Updated] (PIG-3310) ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations

2013-05-03 Thread JIRA
[ https://issues.apache.org/jira/browse/PIG-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément Stenac updated PIG-3310: Attachment: generate-uid-for-nested-fields.patch > ImplicitSplitInserter does not generate new ui

[jira] [Created] (PIG-3310) ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations

2013-05-03 Thread JIRA
Clément Stenac created PIG-3310: --- Summary: ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations Key: PIG-3310 URL: https://issues.apache.org/jira/browse/PIG-3310