Pyspark follows SQL databases here. stddev is stddev_samp, and sample standard deviation is the calculation with the Bessel correction, n-1 in the denominator. stddev_pop is simply standard deviation, with n in the denominator.
On Tue, Sep 19, 2023 at 7:13 AM Helene Bøe <helene.b...@hydro.com.invalid> wrote: > Hi! > > > > I am applying the stddev function (so actually stddev_samp), however when > comparing with the sample standard deviation in Excel the resuls do not > match. > > I cannot find in your documentation any more specifics on how the sample > standard deviation is calculated, so I cannot compare the difference toward > excel, which uses > > . > > I am trying to avoid using Excel at all costs, but if the stddev_samp > function is not calculating the standard deviation correctly I have a > problem. > > I hope you can help me resolve this issue. > > > > Kindest regards, > > > > *Helene Bøe* > *Graduate Project Engineer* > Recycling Process & Support > > M: +47 980 00 887 > helene.b...@hydro.com > <https://intra.hydro.com/EPiServer/CMS/Content/en/%2c%2c9/?epieditmode=False> > > Norsk Hydro ASA > Drammensveien 264 > NO-0283 Oslo, Norway > www.hydro.com > <https://intra.hydro.com/EPiServer/CMS/Content/en/%2c%2c9/?epieditmode=False> > > > NOTICE: This e-mail transmission, and any documents, files or previous > e-mail messages attached to it, may contain confidential or privileged > information. If you are not the intended recipient, or a person responsible > for delivering it to the intended recipient, you are hereby notified that > any disclosure, copying, distribution or use of any of the information > contained in or attached to this message is STRICTLY PROHIBITED. If you > have received this transmission in error, please immediately notify the > sender and delete the e-mail and attached documents. Thank you. >