Hi Lefty, Couldn¹t find the documentation for what hive.groupby.orderby.position.alias=true does.
I suspect that might be what Mich was looking for (though I tend to write the column names explicitly). Cheers, Gopal From: Lefty Leverenz <leftylever...@gmail.com> Reply-To: "user@hive.apache.org" <user@hive.apache.org> Date: Sunday, March 29, 2015 at 8:32 PM To: "user@hive.apache.org" <user@hive.apache.org> Subject: Re: ORDER BY clause in Hive I added information about this in the Order By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang uageManualSortBy-SyntaxofOrderBy> section of the wiki. Thanks, Mich and Gopal! -- Lefty On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <m...@peridale.co.uk> wrote: > Hi Lefty, Gopal, > > It appears that ORDER BY 3 is not interpreted as ORDERR BY ³the result set > column three² which standard SQL as evident from Oracle and Sybase does. > > So I made it an ORDER BY from the result set EXPLICITELY as shown below and it > worked OK > > SELECT > rs.Customer_ID > , rs.Number_of_orders > , rs.Total_customer_amount > , rs.Average_order > , rs.Standard_deviation > FROM > ( > SELECT cust_id AS Customer_ID, > COUNT(amount_sold) AS Number_of_orders, > SUM(amount_sold) AS Total_customer_amount, > AVG(amount_sold) AS Average_order, > STDDEV(amount_sold) AS Standard_deviation > FROM sales > GROUP BY cust_id > HAVING SUM(amount_sold) > 94000 > AND AVG(amount_sold) < STDDEV(amount_sold) > ) rs > ORDER BY > rs.Total_customer_amount > ; > > > +-----------------+----------------------+---------------------------+-------- > -------------+------------------------+--+ > | rs.customer_id | rs.number_of_orders | rs.total_customer_amount | > rs.average_order | rs.standard_deviation | > +-----------------+----------------------+---------------------------+-------- > -------------+------------------------+--+ > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > +-----------------+----------------------+---------------------------+-------- > -------------+------------------------+--+ > 9 rows selected (209.699 seconds) > > > Regards, > > > Mich Talebzadeh > > http://talebzadehmich.wordpress.com > > Publications due shortly: > Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries > or their employees, unless expressly so stated. It is the responsibility of > the recipient to ensure that this email is virus free, therefore neither > Peridale Ltd, its subsidiaries nor their employees accept any responsibility. > > > From: Mich Talebzadeh [mailto:m...@peridale.co.uk] > Sent: 29 March 2015 00:11 > To: user@hive.apache.org > Subject: ORDER BY clause in Hive > > > Hi, > > Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is > working OK > > I have a simple aggregate query as follows: > > SELECT cust_id AS Customer_ID, > COUNT(amount_sold) AS Number_of_orders, > SUM(amount_sold) AS Total_customer_amount, > AVG(amount_sold) AS Average_order, > STDDEV(amount_sold) AS Standard_deviation > FROM sales > GROUP BY cust_id > HAVING SUM(amount_sold) > 94000 > AND AVG(amount_sold) < STDDEV(amount_sold) > ORDER BY 3 ; > > The original table and data are from Oracle sh.sales table > > Oracle comes back for this query with > > CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER > STANDARD_DEVIATION > ----------- ---------------- --------------------- ------------- > ------------------ > 1743 238 94786.13 398.26105 > 582.26845 > 429 231 94819.41 410.473636 > 615.038404 > 2994 227 94862.61 417.89696 > 624.53793 > 6395 268 97010.48 361.979403 > 577.991448 > 12783 240 97573.55 406.556458 > 591.6785 > 4974 235 98006.16 417.047489 > 625.670115 > 42167 266 98585.96 370.62391 > 592.079099 > 10747 256 99578.09 388.976914 > 601.938312 > 11407 248 103412.66 416.986532 > 623.479751 > > 9 rows selected. > > Ordered by TOTAL_CUSTOMER_AMOUNT > > And hive returns for the same query > > +--------------+-------------------+------------------------+----------------- > ----+---------------------+--+ > | customer_id | number_of_orders | total_customer_amount | average_order > | standard_deviation | > +--------------+-------------------+------------------------+----------------- > ----+---------------------+--+ > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > +--------------+-------------------+------------------------+----------------- > ----+---------------------+--+ > 9 rows selected (215.774 seconds) > > But ordering in Hive does not seem to be correct! Please note ordering is on > column three, total_customer_amount > > I also tried this in Sybase and got the same as Oracle. > > Adaptive Server cpu time: 100 ms. > Customer_ID Number_of_orders Total_customer_amount > Average_order Standard_deviation > -------------------- ---------------- > ----------------------------------------- > ---------------------------------------------------- > --------------------------- > 1743 238 > 94786.13 398.2610504201680 > 582.268450 > 429 231 > 94819.41 410.4736363636363 > 615.038404 > 2994 227 > 94862.61 417.8969603524229 > 624.537930 > 6395 268 > 97010.48 361.9794029850746 > 577.991448 > 12783 240 > 97573.55 406.5564583333333 > 591.678500 > 4974 235 > 98006.16 417.0474893617021 > 625.670115 > 42167 266 > 98585.96 370.6239097744360 > 592.079099 > 10747 256 > 99578.09 388.9769140625000 > 601.938312 > 11407 248 > 103412.66 416.9865322580645 > 623.479751 > > I tried Google search and seems to be different suggestions. May be I have to > rewrite the code? > > Thanks > > Mich Talebzadeh > > http://talebzadehmich.wordpress.com > > Publications due shortly: > Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries > or their employees, unless expressly so stated. It is the responsibility of > the recipient to ensure that this email is virus free, therefore neither > Peridale Ltd, its subsidiaries nor their employees accept any responsibility. >