Re: sorting in hive -- general

2015-03-08 Thread Alexander Pivovarov
1. sort by - key are distributed according to MR partitioner (controlled by distributed by in hive) Lets assume hash partitioned uses the same column as sort by and uses x mod 16 formula to get reducer id reduced 0 will have keys 0 16 32 reducer 1 will have keys 1 17 33 if you merge reducer

Re: sorting in hive -- general

2015-03-08 Thread max scalf
Thank you very much for the explanation Alexander. On Sun, Mar 8, 2015 at 1:14 PM, Alexander Pivovarov apivova...@gmail.com wrote: 1. sort by - key are distributed according to MR partitioner (controlled by distributed by in hive) Lets assume hash partitioned uses the same column as sort

Re: sorting in hive -- general

2015-03-08 Thread max scalf
Thank you Alexander. So is it fair to assume when sort by is used and multiple files are produced per reducer at the end of it all of then are put togeather/merged to get the results back? And can sort by be used without distributed by and expect same result as order by ? On Sat, Mar 7, 2015 at

sorting in hive -- general

2015-03-07 Thread max scalf
Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY

Re: sorting in hive -- general

2015-03-07 Thread Alexander Pivovarov
sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement local sort within partition similar to RANK() OVER (PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM,