[jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script

2011-11-24 Thread Ashutosh Chauhan (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157003#comment-13157003 ] Ashutosh Chauhan commented on PIG-2339: --- @Min, Which version of HCatalog you tried wit

[jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script

2011-11-24 Thread Min Zhou (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156999#comment-13156999 ] Min Zhou commented on PIG-2339: --- do you try a non-equal expression like below, where pt is the

Re: Is there a way around the nested distinct problem?

2011-11-24 Thread John Meagher
I've done this with the following: raw = load 'thing' as (user,page);pageviews = foreach (group raw by (user, page)) generate flatten(group), count($1) as pageviews; pagecounts = foreach ( group pageviews by page ) generate flatten(group), count($1) as uniques, sum(pageviews) as pageviews; It's t

Re: Is there a way around the nested distinct problem?

2011-11-24 Thread Gianmarco De Francisci Morales
If you are willing to give up some (very small) precision, for this specific kind of queries, you can use approximate counters like Flajolet-Martin or HyperLogLog counters. We could implement them in a special COUNT_APPROX() builtin function. You can also use bloom filters to have an approximate di

[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

2011-11-24 Thread Gianmarco De Francisci Morales (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156642#comment-13156642 ] Gianmarco De Francisci Morales commented on PIG-2359: - Totally cool! Do

[jira] [Updated] (PIG-2387) BinStorageRecordReader causes negative progress

2011-11-24 Thread Anitha Raju (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anitha Raju updated PIG-2387: - Description: Hi, When an input file of size greater than default split size is loaded using BinStorage()

[jira] [Created] (PIG-2387) BinStorageRecordReader causes negative progress

2011-11-24 Thread Anitha Raju (Created) (JIRA)
BinStorageRecordReader causes negative progress --- Key: PIG-2387 URL: https://issues.apache.org/jira/browse/PIG-2387 Project: Pig Issue Type: Bug Affects Versions: 0.9.0, 0.8.0 Rep

[jira] [Commented] (PIG-1270) Push limit into loader

2011-11-24 Thread Min Zhou (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156591#comment-13156591 ] Min Zhou commented on PIG-1270: --- sorry, some mistakes ||case||job cost time||HDFS bytes read

[jira] [Updated] (PIG-1270) Push limit into loader

2011-11-24 Thread Min Zhou (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated PIG-1270: -- Attachment: PIG-1270-3.patch Here is the patch which would fix the bug. > Push limit into loader