Re: [Scikit-learn-general] How to present parameter search results

2013-06-07 Thread Olivier Grisel
TL;DNR: parameter search results datastructure choice should anticipate new use-cases Thanks Joel for the detailed analysis. I the current situation I think I my-self I like: 5. many attributes, each an array, on a custom results object This makes it possible to write a `__repr__` method on tha

Re: [Scikit-learn-general] How to present parameter search results

2013-06-07 Thread Romaniuk, Michal
It would be great if there was a way to access the parameter search results as a numpy ndarray, with one axis for each parameter and one additional axis for the cross-validation folds. This would make it easy to visualise the grid search results, compute the mean, median or variance for each gri

Re: [Scikit-learn-general] How to present parameter search results

2013-06-07 Thread Andreas Mueller
On 06/07/2013 03:13 PM, Romaniuk, Michal wrote: > It would be great if there was a way to access the parameter search results > as a numpy ndarray, with one axis for each parameter and one additional axis > for the cross-validation folds. This would make it easy to visualise the grid > search re

Re: [Scikit-learn-general] Re-cycling pipeline stages in GridSearchCV?

2013-06-07 Thread Andreas Mueller
On 06/07/2013 12:08 AM, Joel Nothman wrote: > I proposed something that did this among a more general solution for > warm starts without memoizing a couple of weeks ago, but I think > memoizing is neater and handles most cases. To handle it generally, > you could add a memoize parameter to Pipel

Re: [Scikit-learn-general] Re-cycling pipeline stages in GridSearchCV?

2013-06-07 Thread Gael Varoquaux
> Memorization and parallelization don't play along nicely. Yes, I am strongly thinking of adding optional memoization directly to joblib.Parallel. It is often a fairly natural place to put a memoization as structures should be pickleable and data transfer should be limited. What do people think?

Re: [Scikit-learn-general] How to present parameter search results

2013-06-07 Thread Gael Varoquaux
> It would be great if there was a way to access the parameter search > results as a numpy ndarray, with one axis for each parameter and one > additional axis for the cross-validation folds. This would make it easy > to visualise the grid search results, compute the mean, median or > variance for e

[Scikit-learn-general] Help with speeding up SVR training and Gridsearching for my problem

2013-06-07 Thread neo01124 wtf
Hi I am using the scikitlearn implementation of Nu-SVR. My problem (automatic phonetic segmentation for singing voice) has ~ 50k points with 36 features. Seems relatively small to me compared to the datasets I have been reading about. The problem is it takes a long time (~6 hours) to fit the NuSV

Re: [Scikit-learn-general] Help with speeding up SVR training and Gridsearching for my problem

2013-06-07 Thread Andreas Mueller
On 06/07/2013 09:33 PM, neo01124 wtf wrote: Hi I am using the scikitlearn implementation of Nu-SVR. My problem (automatic phonetic segmentation for singing voice) has ~ 50k points with 36 features. Seems relatively small to me compared to the datasets I have been reading about. The problem i

Re: [Scikit-learn-general] Help with speeding up SVR training and Gridsearching for my problem

2013-06-07 Thread Olivier Grisel
**Kernel SVM are not scalable** to large or even medium number of samples as the complexity is quadratic (or more). You should try to: - learn independent SVR models on a partitions of the data (e.g. 10 models trained on 5000 samples each) and then compute the mean predictions of the 10 models as

Re: [Scikit-learn-general] greetings; more flexibility in trees

2013-06-07 Thread Ken Geis
On May 23, 2013, at 5:03 AM, Gilles Louppe wrote: >> So I'd like to contribute a simple MAE criterion that would be efficient for >> random splits (i.e. O(n) given a single batch update.) Is the direction >> forward for something like this to hard-code more criteria in _tree.pyx, or >> would it