Hi,

I reported back in April 2015 that what Hive calls Standard Deviation
Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
been rectified in Hive 2

Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I
did tests with Spark 1.6 as well and Spark correctly points STTDEV to
STDDEV_SAMP.

The following query was used

SELECT

SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
AS MYSTDDEV,
        STDDEV(amount_sold) AS STDDEV,
        STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
        STDDEV_POP(amount_sold) AS STDDEV_POP
from    sales;

The following is from running the above query on Hive where STDDEV -->
 STDDEV_POP which is incorrect

+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
260.72704617040444
|
+--------------------+---------------------+--------------------+---------------------+--+

The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
correct

+--------------------+---------------------+--------------------+---------------------+--+
|      mystddev      |       stddev        |    stddev_samp     |
stddev_pop      |
+--------------------+---------------------+--------------------+---------------------+--+
| 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
260.72704617042166  |
+--------------------+---------------------+--------------------+---------------------+--+

Hopefully The Hive one will be corrected.

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to