Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Mathieu Blondel
On Sat, Oct 29, 2011 at 12:50 AM, Gael Varoquaux wrote: > Could you give an example of what you have in mind? I am probably just > revealing my lack of knowledge here. I already gave 2 examples: early stopping (tuning the number of iterations) and regularization parameter tuning (e.g. using Bott

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
In my case my interest / priority in this pattern is more in being able to do onliners in an intereracive IPython session rather than avoiding copy on large scale data (although this is interesting too). >>> X_train, y_train, X_test, y_test = load_svmlight_files('train.dat', >>> 'test.dat') >>> c

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 05:29:51PM +0200, Andreas Mueller wrote: > As far as I can see, the IterGrid class is not in the API documentation > at all. I fixed that in f997740. If you see other of such cases of missing reference documentation to generally useful functions, or missing see-alsos, feel

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Sat, Oct 29, 2011 at 12:33:00AM +0900, Mathieu Blondel wrote: > On Sat, Oct 29, 2011 at 12:18 AM, Olivier Grisel > wrote: > > percent_val would be a constructor param in that case at it's not data > > dependent. > Good point! > > I am +1 for X_val=None, y_val=None in fit for the GridSearchCV

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 05:29:51PM +0200, Andreas Mueller wrote: > Am 28.10.2011 17:23, schrieb Gael Varoquaux: > > On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote: > >> Fair enough, but the user should also be aware of the API that turns a > >> param grid (a dict of sequences) into

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Mathieu Blondel
On Sat, Oct 29, 2011 at 12:18 AM, Olivier Grisel wrote: > percent_val would be a constructor param in that case at it's not data > dependent. Good point! > I am +1 for X_val=None, y_val=None in fit for the GridSearchCV class Or maybe a new object GridSearchValidation, as the semantics are a bi

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 17:23, schrieb Gael Varoquaux: > On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote: >> Fair enough, but the user should also be aware of the API that turns a >> param grid (a dict of sequences) into an iterable param_list as done >> in GridSearchCV. > Good point. And I th

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote: > Fair enough, but the user should also be aware of the API that turns a > param grid (a dict of sequences) into an iterable param_list as done > in GridSearchCV. Good point. And I think that this is a documentation problem. It clearl

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Gael Varoquaux : > On Fri, Oct 28, 2011 at 04:17:01PM +0200, Olivier Grisel wrote: >> > I am actually not sure that I have understood the usecase that we are >> > discussing. > >> I think the use case is performing a grid search for a single >> predefined train / validation datasets pair

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Mathieu Blondel : > On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel > wrote: > >> This is a lot of complex boilerplate for the newcomer. > > Plus, that would be a waste of memory and cpu time as the grid search > would re-split the data just after. > > Lately I've been working on large

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 04:17:01PM +0200, Olivier Grisel wrote: > > I am actually not sure that I have understood the usecase that we are > > discussing. > I think the use case is performing a grid search for a single > predefined train / validation datasets pair. scores = [estimator.set_param(pa

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 11:45:42PM +0900, Mathieu Blondel wrote: > Plus, that would be a waste of memory and cpu time as the grid search > would re-split the data just after. I agree with the memory, but the CPU should be really negligible. > Lately I've been working on large-scale algorithms whe

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Mathieu Blondel
On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel wrote: > This is a lot of complex boilerplate for the newcomer. Plus, that would be a waste of memory and cpu time as the grid search would re-split the data just after. Lately I've been working on large-scale algorithms where it would be very us

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Satrajit Ghosh
hi, perhaps i'm completely missing the point here. but isn't the whole point of the validation set that nobody touches it to do any model/feature selection? all it should be used for is to test your model after it has been finalized. and if it is used for model/feature selection, i do not see it a

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Alexandre Gramfort : > class SingleSplit(object): > >    def __init__(self, train_index, test_index): >        self.train_index = train_index >        self.test_index = test_index > >    def __iter__(self): >        yield self.train_index, self.test_index > > is this more complicated tha

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 16:20, schrieb Alexandre Gramfort: > class SingleSplit(object): > > def __init__(self, train_index, test_index): > self.train_index = train_index > self.test_index = test_index > > def __iter__(self): > yield self.train_index, self.test_index > > i

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
> Thus it seems to me that the function that would really need to be > replaced whould be cross_val_score, but it is a bit trivial to replace: > > estimator.fit(X_train, y_train).score(X_test, y_test) > > A ShuffleSplit can be used inside this in combination of a GridSearch to > do parameter

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Alexandre Gramfort
class SingleSplit(object): def __init__(self, train_index, test_index): self.train_index = train_index self.test_index = test_index def __iter__(self): yield self.train_index, self.test_index is this more complicated than this? Alex -

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Gael Varoquaux : > > I am actually not sure that I have understood the usecase that we are > discussing. I think the use case is performing a grid search for a single predefined train / validation datasets pair. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel --

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 04:06:35PM +0200, Olivier Grisel wrote: > To address Andreas use case (which seems valid to me) I think we > should have a new grid_search utility function that does not try to > implement the `fit` API which is too restrictive for this use case. I am not too excited about

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Alexandre Gramfort
>> cv = SingleSplit(X_train, y_train, X_test, y_test) ? > > This API cannot work, as the cross-validation object is supposed to > return indices and thus should be applied to concatenated train and test > data. oups indeed; cv = SingleSplit(idx_train, idx_test) should do the tricks right? Alex

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 16:06, schrieb Andreas Mueller: > Am 28.10.2011 16:03, schrieb Gael Varoquaux: >> On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote: >>> class SingleSplit? >>> cv = SingleSplit(X_train, y_train, X_test, y_test) ? >> This API cannot work, as the cross-validation object

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Gael Varoquaux : > On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote: >> class SingleSplit? > >> cv = SingleSplit(X_train, y_train, X_test, y_test) ? > > This API cannot work, as the cross-validation object is supposed to > return indices and thus should be applied to co

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 16:03, schrieb Gael Varoquaux: > On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote: >> class SingleSplit? >> cv = SingleSplit(X_train, y_train, X_test, y_test) ? > This API cannot work, as the cross-validation object is supposed to > return indices and thus should be

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Alexandre Gramfort : > indeed you just need to define a new CV object of length 1 > > maybe be worth adding to cross_validation.py > > class SingleSplit? > > cv = SingleSplit(X_train, y_train, X_test, y_test) ? Yes but that won't be natural to use with GridSearchCV where you pass the da

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 16:01, schrieb Alexandre Gramfort: > indeed you just need to define a new CV object of length 1 > > maybe be worth adding to cross_validation.py > > class SingleSplit? > > cv = SingleSplit(X_train, y_train, X_test, y_test) ? That's what I had in mind. ---

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote: > class SingleSplit? > cv = SingleSplit(X_train, y_train, X_test, y_test) ? This API cannot work, as the cross-validation object is supposed to return indices and thus should be applied to concatenated train and test data. G --

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Alexandre Gramfort
indeed you just need to define a new CV object of length 1 maybe be worth adding to cross_validation.py class SingleSplit? cv = SingleSplit(X_train, y_train, X_test, y_test) ? Alex On Fri, Oct 28, 2011 at 9:56 AM, Gael Varoquaux wrote: > On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Muell

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Olivier Grisel
2011/10/28 Andreas Mueller : > Hi everybody. > This is about the grid_search and cross_validation modules. > Often, in particular when the dataset is large or the algorithm slow, > it is not feasible to do n-fold cross validation and people use > a single training/validation split to find hyperpara

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Andreas Mueller
Am 28.10.2011 15:56, schrieb Gael Varoquaux: > On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Mueller wrote: >> What do you think? > It seems to me that ShuffleSplit should be useable for this purpose. Am I > wrong? It might be useful to document it somewhere, though. In principle, yes. Though m

Re: [Scikit-learn-general] Grid search with validation set

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Mueller wrote: > What do you think? It seems to me that ShuffleSplit should be useable for this purpose. Am I wrong? It might be useful to document it somewhere, though. Cheers, Gaël --