Re: Where does hive do sampling in order by ?

2015-03-02 Thread Xuefu Zhang
there is no sampling for order by in Hive. Hive uses a single reducer for order by (if you're talking about MR execution engine). Hive on Spark is different for this, thought. Thanks, Xuefu On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang wrote: > Order by usually invoke 2 steps (sampling job and re

Where does hive do sampling in order by ?

2015-03-02 Thread Jeff Zhang
Order by usually invoke 2 steps (sampling job and repartition job) but hive only run one mr job for order by, so wondering when and where does hive do sampling ? client side ? -- Best Regards Jeff Zhang