Re: stddev_samp() gives NaN

Sean Owen Thu, 07 Jul 2016 01:57:07 -0700

The OP is not calling stddev though, so I still don't see that this is
the question at hand.

But while we're off on the topic -- while I certainly agree that
stddev is mapped to the sample standard deviation in DBs, it doesn't
actually make much sense as a default.

What you get back is not the standard deviation (as in, sqrt of second
central moment) of the values in the grouping or table, which is I
presume what people think they're getting.

You're getting an estimate the standard deviation of a population from
which the values are theoretically some random sample, but that's
rarely true. I disagree that this is the general use case, so, have
always thought this was a just a historical practice in RDBMSes that
was actually not a good decision.

Maybe that's why Hive defined it differently, but, even I would prefer
consistency in this regard.

On Thu, Jul 7, 2016 at 9:41 AM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> stddev is mapped to stdddev_samp. That is the general use case or rather
> common use of standard deviation.
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: stddev_samp() gives NaN

Reply via email to