Re: [Scikit-learn-general] SVC documentation inaccuracy

2012-03-17 Thread Mathieu Blondel
On Sun, Mar 18, 2012 at 4:45 AM, James Bergstra wrote: > I agree that it's a good idea to correct C for sample size when moving > from a sub-problem to the full thing.  I just wouldn't use the word > "optimal" to describe the new value of C that you get this way - it's > an extrapolation, a good

Re: [Scikit-learn-general] SVC documentation inaccuracy

2012-03-17 Thread James Bergstra
On Sat, Mar 17, 2012 at 1:51 PM, Alexandre Gramfort wrote: >> This statement doesn't sound true. Generally hyper-parameters >> (especially ones to do with regularization) *do* depend on training >> set size, and not in such straightforward ways.  Data is never >> perfectly I.I.D. and sometimes it

Re: [Scikit-learn-general] SVC documentation inaccuracy

2012-03-17 Thread Alexandre Gramfort
> This statement doesn't sound true. Generally hyper-parameters > (especially ones to do with regularization) *do* depend on training > set size, and not in such straightforward ways.  Data is never > perfectly I.I.D. and sometimes it can be far from it. My impression > was that standard practice f

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-17 Thread Andreas
On 03/07/2012 11:18 AM, Alexandre Gramfort wrote: >> I love that :) >> Then I can finally put my MLP code somewhere ;) >> > give it a start then. > > the convention should be that the gist contains 1 file with an "if > __name__ == '__main__':" > that contains an example that people can try. I

Re: [Scikit-learn-general] SVC documentation inaccuracy

2012-03-17 Thread James Bergstra
On Sat, Mar 17, 2012 at 4:44 AM, Alexandre Gramfort wrote: > without the scale_C the libsvm/liblinear bindings are the only models > whose hyperparameters > depend on the training set size. This statement doesn't sound true. Generally hyper-parameters (especially ones to do with regularization) *

Re: [Scikit-learn-general] Is there a function for convenient binarization of categorical data?

2012-03-17 Thread Lars Buitinck
Op 17 maart 2012 13:25 heeft Conrad Lee het volgende geschreven: > The google prediction API seems to do some of this automatic detection of > whether a feature is categorical or numerical.  For example, if at least one > value of a feature is a string, then they treat that feature as categorical.

Re: [Scikit-learn-general] Is there a function for convenient binarization of categorical data?

2012-03-17 Thread Conrad Lee
> > > We could try to create a function that takes an arbitrary matrix of > feature > > vectors, and automatically converts the fields that appear to be > categorical > > into boolean fields. Of course, we won't be able to write a function > that > > always knows which fields are categorical and

Re: [Scikit-learn-general] SVC documentation inaccuracy

2012-03-17 Thread Alexandre Gramfort
hi guys, the scale_C is not released yet and not setting it in the current release raises a warning. But maybe we could be even more explicit to warn users. right now C is None by default and defaults to n_samples which amounts to the C=1 with scale_C=False which is the default behavior of libsvm