Yeah - we definitely want to convert it to a MFU type flush algorithm.
If someone wants to take a crack at it before we can get to it - that would be
awesome
From: Namit Jain [mailto:nj...@facebook.com]
Sent: Friday, February 27, 2009 1:59 PM
To: hive-user@hadoop
It dumps 10% of the hash table randomly today
From: Scott Carey [mailto:sc...@richrelevance.com]
Sent: Friday, February 27, 2009 1:41 PM
To: hive-user@hadoop.apache.org
Subject: Re: Combine() optimization
Does it dump all contents and start over, or use a LRU or MFU algorithm?
LinkedHashMap mak
Does it dump all contents and start over, or use a LRU or MFU algorithm?
LinkedHashMap makes LRUs and similar constructs fairly easy to make.
My guess is that most data types have biased value distributions that will take
advantage of map side partial aggregation fairly well.
On 2/26/09 6:02 PM
add file adds the files to the distributed cache. it's the same as the -files
option in hadoop streaming (and hadoop in general).
so u can use this option.
From: Min Zhou [coderp...@gmail.com]
Sent: Thursday, February 26, 2009 5:53 PM
To: hive-user@hadoop.apache.
Look at the patch for
http://issues.apache.org/jira/browse/HIVE-223
It has not been committed yet.
Thanks,
-namit
From: Qing Yan [qing...@gmail.com]
Sent: Friday, February 27, 2009 12:12 AM
To: hive-user@hadoop.apache.org
Subject: Re: Combine() optim
Ouch, I was getting tons of exceptions after turning on map-side
aggregation:
java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
at java.lang.StringCoding.encode(StringCoding.java:272)
at java.lang.String.getBytes(String.java:947)
at
o