I know stddev_samp and stddev_pop gives different values, because they have different definition. What I want to know is why stddev_samp gives "NaN", and not a numeric value.
On Thu, Jul 7, 2016 at 5:39 PM, Sean Owen <so...@cloudera.com> wrote: > I don't think that's relevant here. The question is why would samp > give a different result to pop, not the result of "stddev". Neither > one is a 'correct' definition of standard deviation in the abstract; > one or the other is correct depending on what standard deviation you > are trying to measure. > > On Thu, Jul 7, 2016 at 9:37 AM, Mich Talebzadeh > <mich.talebza...@gmail.com> wrote: >> The correct STDDEV function used is STDDEV_SAMP not STDDEV_POP. That is the >> correct one. >> >> You can actually work that one out yourself >> >> >> BTW Hive also gives a wrong value. This is what I reported back in April >> about Hive giving incorrect value >> >> Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I >> did tests with Spark 1.6 as well. Spark correctly points STTDEV to >> STDDEV_SAMP. >> >> The following query was used >> >> SELECT >> >> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1)) >> AS MYSTDDEV, >> STDDEV(amount_sold) AS STDDEV, >> STDDEV_SAMP(amount_sold) AS STDDEV_SAMP, >> STDDEV_POP(amount_sold) AS STDDEV_POP >> from sales; >> >> The following is from running the above query on Hive where STDDEV --> >> STDDEV_POP which is incorrect >> >> >> +--------------------+---------------------+--------------------+---------------------+--+ >> | mystddev | stddev | stddev_samp | >> stddev_pop | >> +--------------------+---------------------+--------------------+---------------------+--+ >> | 260.7270919450411 | 260.72704617040444 | 260.7270722861465 | >> 260.72704617040444 | >> +--------------------+---------------------+--------------------+---------------------+--+ >> >> The following is from Spark-sql where STDDEV --> STDDEV_SAMP which is >> correct >> >> +--------------------+---------------------+--------------------+---------------------+--+ >> | mystddev | stddev | stddev_samp | >> stddev_pop | >> +--------------------+---------------------+--------------------+---------------------+--+ >> | 260.7270919450411 | 260.7270722861637 | 260.7270722861637 | >> 260.72704617042166 | >> +--------------------+---------------------+--------------------+---------------------+--+ >> >> HTH >> >> >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> >> >> On 7 July 2016 at 09:29, Sean Owen <so...@cloudera.com> wrote: >>> >>> No, because these are different values defined differently. If you >>> have 1 data point, the sample stdev is undefined while population >>> stdev is defined. Refer to their definition. >>> >>> On Thu, Jul 7, 2016 at 9:23 AM, Mungeol Heo <mungeol....@gmail.com> wrote: >>> > Hello, >>> > >>> > As I mentioned at the title, stddev_samp function gives a NaN while >>> > stddev_pop gives a numeric value on the same data. >>> > The stddev_samp function will give a numeric value, if I cast it to >>> > decimal. >>> > E.g. cast(stddev_samp(column_name) as decimal(16,3)) >>> > Is it a bug? >>> > >>> > Thanks >>> > >>> > - mungeol >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org