[jira] [Updated] (PIG-2717) Tuple field mangled during flattening

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2717: Priority: Critical (was: Major) Fix Version/s: 0.10.1 0.11 Assignee:

[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281453#comment-13281453 ] Daniel Dai commented on PIG-2405: - I am sure this is the cause for some unit test failures

Re: About the Pig Latin Implementation

2012-05-23 Thread Daniel Dai
Check the VLDB paper: http://infolab.stanford.edu/~olston/publications/vldb09.pdf On Tue, May 22, 2012 at 8:40 PM, Li Shengmei lisheng...@ict.ac.cn wrote: Hi, all         I am new to Pig Latin. I have read the paper Pig Latin: A not-so-foreign language for data processing published in

[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-05-23 Thread fang fang chen (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281461#comment-13281461 ] fang fang chen commented on PIG-2405: - Will try my best to fix this when available.

[jira] [Commented] (PIG-2134) ReadScalars message scalar has more than one row in the output does not provide enough information to help programmer find and fix script syntax error.

2012-05-23 Thread Stan Rosenberg (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281654#comment-13281654 ] Stan Rosenberg commented on PIG-2134: - Is there any progress on this? I am wondering if

[jira] [Commented] (PIG-2166) UDFs to join a bag

2012-05-23 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281704#comment-13281704 ] Thejas M Nair commented on PIG-2166: Bags in pig are expected to be bags containing

[jira] [Commented] (PIG-2166) UDFs to join a bag

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281714#comment-13281714 ] Gianmarco De Francisci Morales commented on PIG-2166: - I think we need

[jira] [Commented] (PIG-2709) PigAvroRecordReader should specify which file has a problem when throwing IOException

2012-05-23 Thread Mike Percy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281725#comment-13281725 ] Mike Percy commented on PIG-2709: - Thanks for the commit, Daniel!

[jira] [Updated] (PIG-2706) Add clear to list of grunt commands

2012-05-23 Thread JIRA
[ https://issues.apache.org/jira/browse/PIG-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2706: Attachment: PIG-2706-2 I found that on this way it is possible to clear and to set the cursor to 0,0

[jira] [Created] (PIG-2719) Describe displays wrong column name after self-join

2012-05-23 Thread Brian Tan (JIRA)
Brian Tan created PIG-2719: -- Summary: Describe displays wrong column name after self-join Key: PIG-2719 URL: https://issues.apache.org/jira/browse/PIG-2719 Project: Pig Issue Type: Bug Affects

[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281934#comment-13281934 ] Daniel Dai commented on PIG-2353: - You mean the global rank is implemented by group all +

[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281943#comment-13281943 ] Gianmarco De Francisci Morales commented on PIG-2353: - No, sorry, there

[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281954#comment-13281954 ] Daniel Dai commented on PIG-2353: - So partitioned and non-partitioned RANK are using

[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281959#comment-13281959 ] Gianmarco De Francisci Morales commented on PIG-2353: - Yes, partitioned

Re: Is there a good benchmark to evaluate the CPU time/space tradeoff in the shuffle stage of hadoop?

2012-05-23 Thread Gianmarco De Francisci Morales
I am afraid that the space vs. time conversion factor is hardware dependent, and as such cannot be optimized apriori. Imagine having a 100Mbit ethernet vs. Infiniband connection, o running hadoop on embedded processors vs. top end servers. There is no single-size-fits-all unfortunately. The only

Re: Some questions on intermediate serialization in Pig

2012-05-23 Thread Jonathan Coveney
Another question is clarifying what BinStorage does compared to InterStorage. It looks like it might just be a legacy storage format? I'm assuming that you do the R_1/R_2/R_3 to be able to find the next Tuple in the stream, but once you do that, can't you just read a tuple, and then read skip 12

[jira] [Updated] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Jie Li (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Li updated PIG-2691: Status: Patch Available (was: Open) Changed the field alias of the TOKENIZE result from bag_of_tokenTuples to

[jira] [Updated] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Jie Li (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Li updated PIG-2691: Attachment: PIG-2691.patch Duplicate TOKENIZE schema - Key: PIG-2691

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Jie Li (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282066#comment-13282066 ] Jie Li commented on PIG-2691: - Oops, broke some unit tests. Fixing. Duplicate

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282099#comment-13282099 ] Daniel Dai commented on PIG-2691: - Patch looks good. One potential issue is it introduces

[jira] [Assigned] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-2691: --- Assignee: Jie Li Duplicate TOKENIZE schema - Key:

[jira] [Resolved] (PIG-2720) Pig embeded in Python has trouble with -- style comment and default

2012-05-23 Thread Brian Tan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Tan resolved PIG-2720. Resolution: Invalid Really sorry! I submitted an empty bug report in mistake. Pig embeded

[jira] [Created] (PIG-2720) Pig embeded in Python has trouble with -- style comment and default

2012-05-23 Thread Brian Tan (JIRA)
Brian Tan created PIG-2720: -- Summary: Pig embeded in Python has trouble with -- style comment and default Key: PIG-2720 URL: https://issues.apache.org/jira/browse/PIG-2720 Project: Pig Issue Type:

[jira] [Updated] (PIG-2720) Empty bug report. Please ignore or delete. Sorry!

2012-05-23 Thread Brian Tan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Tan updated PIG-2720: --- Summary: Empty bug report. Please ignore or delete. Sorry! (was: Pig embeded in Python has trouble with --

Re: Some questions on intermediate serialization in Pig

2012-05-23 Thread Jonathan Coveney
And one more question to pile on: What defines the binary data that the raw tuple comparator will be run on? It seems like that it comes from hadoop, and the format generally makes sense (you get bytes and do with them what you will). The thing that confuses me is why don't you have to deal with

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Jie Li (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282128#comment-13282128 ] Jie Li commented on PIG-2691: - As there was no documentation on the field schema of TOKENIZE,

[jira] [Commented] (PIG-2167) CUBE operation in Pig

2012-05-23 Thread Russell Jurney (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282163#comment-13282163 ] Russell Jurney commented on PIG-2167: - Playing with this now, applied it to 0.10.

[jira] [Commented] (PIG-2167) CUBE operation in Pig

2012-05-23 Thread Prasanth J (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282165#comment-13282165 ] Prasanth J commented on PIG-2167: - Russell.. Try using the patch in JIRA-2170. I uploaded a

[jira] [Commented] (PIG-2167) CUBE operation in Pig

2012-05-23 Thread Russell Jurney (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282179#comment-13282179 ] Russell Jurney commented on PIG-2167: - Thanks prasanth, but which JIRA do you mean?

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282184#comment-13282184 ] Gianmarco De Francisci Morales commented on PIG-2691: - I agree with Jie,

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

2012-05-23 Thread Daniel Dai (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282195#comment-13282195 ] Daniel Dai commented on PIG-2691: - Sounds good. +1 Duplicate TOKENIZE