sorting in hive -- general

max scalf Sat, 07 Mar 2015 15:03:37 -0800

Hello all,

I am a new to hadoop and hive in general and i am reading "hadoop the
definitive guide" by Tom White and on page 504 for the hive chapter, Tom
says below with regards to soritng


*Sorting and Aggregating*
*Sorting data in Hive can be achieved by using a standard ORDER BY clause.
ORDER BY performs a parallel total sort of the input (like that described
in “Total Sort” on page 261). When a globally sorted result is not
required—and in many cases it isn’t—you can use Hive’s nonstandard
extension, SORT BY, instead. SORT BY produces a sorted file per reducer.*


My Questions is, what exactly does he mean by "globally sorted result"?, if
the sort by operation produces a sorted file per reducer does that mean at
the end of the sort all the reducer are put back together to give the
correct results ?

sorting in hive -- general

Reply via email to