Re: [Scikit-learn-general] Is there a reference for the Bootstrap iterator?

2014-07-08 Thread Lars Buitinck
2014-07-08 21:33 GMT+02:00 Andreas Mueller : > On Jul 8, 2014 8:40 PM, "Chris Holdgraf" wrote: >> >> Hey all - I know that Bootstrap has a billion papers on it, but I was >> wondering if there's a specific paper one should reference if we've been >> using the Bootstrap iterator (here). I can't fin

Re: [Scikit-learn-general] How to use precomputed distance matrix in KMean?

2014-07-08 Thread Li, Leon
Thank you Andy! On Tue, Jul 8, 2014 at 3:43 PM, Andreas Mueller wrote: > Hi Leon. > Due to the nature of the algorithm one needs to compute means. That is not > possible with a precomputed distance matrix. So using an arbitrary distance > is not really possible here. > Best, > Andy > On Jul 8,

Re: [Scikit-learn-general] How to use precomputed distance matrix in KMean?

2014-07-08 Thread Andreas Mueller
Hi Leon. Due to the nature of the algorithm one needs to compute means. That is not possible with a precomputed distance matrix. So using an arbitrary distance is not really possible here. Best, Andy On Jul 8, 2014 9:39 PM, "Li, Leon" wrote: > Dear All, > > I'm new to scikit-learn. I don't know h

[Scikit-learn-general] How to use precomputed distance matrix in KMean?

2014-07-08 Thread Li, Leon
Dear All, I'm new to scikit-learn. I don't know how to use a precomputed distance matrix in KMean. Could you shed a light on it? Thanks, Leon -- Open source business process management suite built on Java and Eclipse Turn

Re: [Scikit-learn-general] Is there a reference for the Bootstrap iterator?

2014-07-08 Thread Andreas Mueller
I would cite ESL as a good textbook. Is you want the original, look there. On Jul 8, 2014 8:40 PM, "Chris Holdgraf" wrote: > Hey all - I know that Bootstrap has a billion papers on it, but I was > wondering if there's a specific paper one should reference if we've been > using the Bootstrap itera

[Scikit-learn-general] Is there a reference for the Bootstrap iterator?

2014-07-08 Thread Chris Holdgraf
Hey all - I know that Bootstrap has a billion papers on it, but I was wondering if there's a specific paper one should reference if we've been using the Bootstrap iterator (here ). I can't find one on the page

Re: [Scikit-learn-general] RandomizedLasso and lasso_stability_path

2014-07-08 Thread Michael Eickenberg
Looking at the code, fit_intercept=False will unfortunately not prevent _randomized_lasso from centering the data. I think this should be considered an inconsistency (whether it be the reason or not for the differences you observe) Michael On Tuesday, July 8, 2014, Michael Eickenberg wrote: > T

Re: [Scikit-learn-general] RandomizedLasso and lasso_stability_path

2014-07-08 Thread Michael Eickenberg
The RandomizedLasso object fits an intercept by default, ie it subtracts the means of the columns of your design X. This can make it rank deficient. You can try setting fit_intercept to false, but there may very likely be other processing steps that make the two differ. Michael On Tuesday, July 8

[Scikit-learn-general] RandomizedLasso and lasso_stability_path

2014-07-08 Thread Luca Puggini
I did not but according to the theory all the variables should be selected with probability one if p wrote: > Hi, > RandomizedLasso and lasso_stability path should return the same results if > used on the same data. This does not happen when the number of variables > is smaller than the number of

Re: [Scikit-learn-general] Sample weighting in RandomizedSearchCV

2014-07-08 Thread Hamed Zamani
Dear Joel, Yes. After updating the version of Scikit-learn to 0.15b2 the problem was solved. Thanks, Hamed On Tue, Jul 8, 2014 at 2:51 PM, Joel Nothman wrote: > This shouldn't be the case, though it's not altogether well-documented. > According to > https://github.com/scikit-learn/scikit-lea

Re: [Scikit-learn-general] RandomizedLasso and lasso_stability path returns different results

2014-07-08 Thread Michael Eickenberg
Did you fix the random number generator using the keyword random_state= ? Otherwise this may vary statistically. Michael On Tue, Jul 8, 2014 at 6:11 PM, Luca Puggini wrote: > Hi, > RandomizedLasso and lasso_stability path should return the same results if > used on the same data. This does no

[Scikit-learn-general] RandomizedLasso and lasso_stability path returns different results

2014-07-08 Thread Luca Puggini
Hi, RandomizedLasso and lasso_stability path should return the same results if used on the same data. This does not happen when the number of variables is smaller than the number of samples (at least this is the situation that I have tried). Accoring to the theory the correct result should be the

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Mathieu Blondel
On Tue, Jul 8, 2014 at 11:27 PM, Sheila the angel wrote: > First I scaled the complete data-set and then splitting it in test and > train data. > You should not pre-process the data before splitting it. Just ask yourself how you would use your model in practice. In a real-world setting, you woul

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:27 GMT+02:00 Sheila the angel : > First I scaled the complete data-set and then splitting it in test and train > data. Not the cleanest option, but that should work. -- Open source business process managemen

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Sheila the angel
First I scaled the complete data-set and then splitting it in test and train data. On 8 July 2014 16:13, Lars Buitinck wrote: > 2014-07-08 16:00 GMT+02:00 Michael Eickenberg < > michael.eickenb...@gmail.com>: > > That totally depends on your data. Here it looks like you are scaling > down a >

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:00 GMT+02:00 Michael Eickenberg : > That totally depends on your data. Here it looks like you are scaling down a > feature that captures a lot of the variation you are looking for, thus > making it less important with respect to the other features in the euclidean > distance. You coul

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Michael Eickenberg
That totally depends on your data. Here it looks like you are scaling down a feature that captures a lot of the variation you are looking for, thus making it less important with respect to the other features in the euclidean distance. You could try selecting important features beforehand. But they

[Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Sheila the angel
While using Nearest Neighbors Classification, I am getting higher cross-validation accuracy with raw data (without scaling) compare to scaled data (using preprocessing.scale) . Is this normal? When should one scale the data? Thanks -- Sheila --

Re: [Scikit-learn-general] Scipy optimizer problem when adding Jacobian

2014-07-08 Thread Gael Varoquaux
Hi, I believe that this is a question for the scipy mailing list. Gaƫl On Tue, Jul 08, 2014 at 02:44:40PM +0200, Bao Thien wrote: > Dear all, > I need to optimize a loss function and currently use some optimizers from > scipy.optimize.minimize > More detail like this: > + parameters to optimize

Re: [Scikit-learn-general] Sample weighting in RandomizedSearchCV

2014-07-08 Thread Joel Nothman
This shouldn't be the case, though it's not altogether well-documented. According to https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L1225, if the fit_params value has the same length as the samples, it should be similarly indexed. So this would be a bug ... if

[Scikit-learn-general] Scipy optimizer problem when adding Jacobian

2014-07-08 Thread Bao Thien
Dear all, I need to optimize a loss function and currently use some optimizers from scipy.optimize.minimize More detail like this: + parameters to optimize : X - size is about 50 + init parameters: X0 + bounds - all parameters are in [0,1] + loss function: L (defined) + Jacobian (gradient) : J (de

Re: [Scikit-learn-general] Sample weighting in RandomizedSearchCV

2014-07-08 Thread Kyle Kastner
It looks like fit_params are passed wholesale to the classifier being fit - this means the sample weights will be a different size than the fold of (X, y) fed to the classifier (since the weights aren't getting KFolded...). Unfortunately I do not see a way to accomodate for this currently - sample_

[Scikit-learn-general] Sample weighting in RandomizedSearchCV

2014-07-08 Thread Hamed Zamani
Dear all, I am using Scikit-Learn library and I want to weight all training samples (according to unbalanced data). According to the tutorial and what I found in the web, I should use this method: search = RandomizedSearchCV(estimator, param_distributions, n_iter=args.iterations, scoring=mae_scor