> So, it seems like, at best, the docs could mention which kind of standard
> deviation is in use and why it probably doesn't matter. Anyway, I learned a
> lot about scaling!
Pull Requests to improve the documentation are always very much appreciated :)
-
Yes, I just realized that it doesn't work out unless you divide by the std.
It seems like using the population or sample standard deviation is not
important in this case since it's not easy to get the unbiased sample std.
I came across some other techniques for scaling described in section "Class
On Tue, Nov 6, 2012 at 4:17 PM, Doug Coleman wrote:
> Actually, from the numpy docs, the ddof=1 for np.std doesn't make it
> unbiased. There's a whole wikipedia article on calculating the unbiased
> standard deviation, and it seems to be different for the normal distribution
> than for others and
Actually, from the numpy docs, the ddof=1 for np.std doesn't make it
unbiased. There's a whole wikipedia article on calculating the unbiased
standard deviation, and it seems to be different for the normal
distribution than for others and involves the gamma function--the advice
from the wiki is not
2012/11/6 Olivier Grisel :
> None, False: no stdev
> True, "pop": population stdev
> "sample": sample stdev
>
> +1 but with "population" instead of "pop".
Alright :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
None, False: no stdev
True, "pop": population stdev
"sample": sample stdev
+1 but with "population" instead of "pop".
2012/11/6 Lars Buitinck :
> 2012/11/6 Gael Varoquaux :
>> That said, I am OK adding an additional parameter, if people think that
>> it is important. The one used in numpy, "ddof"
2012/11/6 Gael Varoquaux :
> That said, I am OK adding an additional parameter, if people think that
> it is important. The one used in numpy, "ddof", is somewhat cryptic,
> though.
How about overloading with_std to take...
None, False: no stdev
True, "pop": population stdev
"sample": sample stde
On Tue, Nov 6, 2012 at 6:48 AM, Gael Varoquaux
wrote:
> I am actually -1 on this, because the consequence would be that np.std(X,
> axis=-1) would no longer be one. I am afraid that it would confuse the
> users.
>
> I believe that the n/(n - 1) difference is completely irrelevent for
> machine lea
I am actually -1 on this, because the consequence would be that np.std(X,
axis=-1) would no longer be one. I am afraid that it would confuse the
users.
I believe that the n/(n - 1) difference is completely irrelevent for
machine learning purpose. If a quantity is relevant, it is the norm of
the fe
2012/11/5 Doug Coleman :
> It seems this is rarely the case in machine learning, so perhaps it would be
> better to scale using the sample standard deviation, which numpy already
> supports, or to make it a flag.
+1
Since we renamed Scaler since the last release (?), we can make
population stdev
preprocessor.scaler calls numpy's default standard deviation, which is the
population standard deviation (delta-degrees-of-freedom is 0). This is
usually reserved for when you have the entire set of data.
It seems this is rarely the case in machine learning, so perhaps it would
be better to scale
11 matches
Mail list logo