1. sort by -
key are distributed according to MR partitioner (controlled by distributed
by in hive)
Lets assume hash partitioned uses the same column as sort by and uses x mod
16 formula to get reducer id
reduced 0 will have keys
0
16
32
reducer 1 will have keys
1
17
33
if you merge reducer
Thank you very much for the explanation Alexander.
On Sun, Mar 8, 2015 at 1:14 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
1. sort by -
key are distributed according to MR partitioner (controlled by
distributed by in hive)
Lets assume hash partitioned uses the same column as sort
Thank you Alexander. So is it fair to assume when sort by is used and
multiple files are produced per reducer at the end of it all of then are
put togeather/merged to get the results back?
And can sort by be used without distributed by and expect same result as
order by ?
On Sat, Mar 7, 2015 at
Hello all,
I am a new to hadoop and hive in general and i am reading hadoop the
definitive guide by Tom White and on page 504 for the hive chapter, Tom
says below with regards to soritng
*Sorting and Aggregating*
*Sorting data in Hive can be achieved by using a standard ORDER BY clause.
ORDER BY
sort by query produces multiple independent files.
order by - just one file
usually sort by is used with distributed by.
In older hive versions (0.7) they might be used to implement local sort
within partition
similar to RANK() OVER (PARTITION BY A ORDER BY B)
On Sat, Mar 7, 2015 at 3:02 PM,