On Sat, Oct 29, 2011 at 12:50 AM, Gael Varoquaux
wrote:
> Could you give an example of what you have in mind? I am probably just
> revealing my lack of knowledge here.
I already gave 2 examples: early stopping (tuning the number of
iterations) and regularization parameter tuning (e.g. using Bott
In my case my interest / priority in this pattern is more in being
able to do onliners in an intereracive IPython session rather than
avoiding copy on large scale data (although this is interesting too).
>>> X_train, y_train, X_test, y_test = load_svmlight_files('train.dat',
>>> 'test.dat')
>>> c
On Fri, Oct 28, 2011 at 05:29:51PM +0200, Andreas Mueller wrote:
> As far as I can see, the IterGrid class is not in the API documentation
> at all.
I fixed that in f997740. If you see other of such cases of missing
reference documentation to generally useful functions, or missing
see-alsos, feel
On Sat, Oct 29, 2011 at 12:33:00AM +0900, Mathieu Blondel wrote:
> On Sat, Oct 29, 2011 at 12:18 AM, Olivier Grisel
> wrote:
> > percent_val would be a constructor param in that case at it's not data
> > dependent.
> Good point!
> > I am +1 for X_val=None, y_val=None in fit for the GridSearchCV
On Fri, Oct 28, 2011 at 05:29:51PM +0200, Andreas Mueller wrote:
> Am 28.10.2011 17:23, schrieb Gael Varoquaux:
> > On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote:
> >> Fair enough, but the user should also be aware of the API that turns a
> >> param grid (a dict of sequences) into
On Sat, Oct 29, 2011 at 12:18 AM, Olivier Grisel
wrote:
> percent_val would be a constructor param in that case at it's not data
> dependent.
Good point!
> I am +1 for X_val=None, y_val=None in fit for the GridSearchCV class
Or maybe a new object GridSearchValidation, as the semantics are a bi
Am 28.10.2011 17:23, schrieb Gael Varoquaux:
> On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote:
>> Fair enough, but the user should also be aware of the API that turns a
>> param grid (a dict of sequences) into an iterable param_list as done
>> in GridSearchCV.
> Good point. And I th
On Fri, Oct 28, 2011 at 05:21:38PM +0200, Olivier Grisel wrote:
> Fair enough, but the user should also be aware of the API that turns a
> param grid (a dict of sequences) into an iterable param_list as done
> in GridSearchCV.
Good point. And I think that this is a documentation problem. It clearl
2011/10/28 Gael Varoquaux :
> On Fri, Oct 28, 2011 at 04:17:01PM +0200, Olivier Grisel wrote:
>> > I am actually not sure that I have understood the usecase that we are
>> > discussing.
>
>> I think the use case is performing a grid search for a single
>> predefined train / validation datasets pair
2011/10/28 Mathieu Blondel :
> On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel
> wrote:
>
>> This is a lot of complex boilerplate for the newcomer.
>
> Plus, that would be a waste of memory and cpu time as the grid search
> would re-split the data just after.
>
> Lately I've been working on large
On Fri, Oct 28, 2011 at 04:17:01PM +0200, Olivier Grisel wrote:
> > I am actually not sure that I have understood the usecase that we are
> > discussing.
> I think the use case is performing a grid search for a single
> predefined train / validation datasets pair.
scores = [estimator.set_param(pa
On Fri, Oct 28, 2011 at 11:45:42PM +0900, Mathieu Blondel wrote:
> Plus, that would be a waste of memory and cpu time as the grid search
> would re-split the data just after.
I agree with the memory, but the CPU should be really negligible.
> Lately I've been working on large-scale algorithms whe
On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel
wrote:
> This is a lot of complex boilerplate for the newcomer.
Plus, that would be a waste of memory and cpu time as the grid search
would re-split the data just after.
Lately I've been working on large-scale algorithms where it would be
very us
hi,
perhaps i'm completely missing the point here. but isn't the whole point of
the validation set that nobody touches it to do any model/feature selection?
all it should be used for is to test your model after it has been finalized.
and if it is used for model/feature selection, i do not see it a
2011/10/28 Alexandre Gramfort :
> class SingleSplit(object):
>
> def __init__(self, train_index, test_index):
> self.train_index = train_index
> self.test_index = test_index
>
> def __iter__(self):
> yield self.train_index, self.test_index
>
> is this more complicated tha
Am 28.10.2011 16:20, schrieb Alexandre Gramfort:
> class SingleSplit(object):
>
> def __init__(self, train_index, test_index):
> self.train_index = train_index
> self.test_index = test_index
>
> def __iter__(self):
> yield self.train_index, self.test_index
>
> i
> Thus it seems to me that the function that would really need to be
> replaced whould be cross_val_score, but it is a bit trivial to replace:
>
> estimator.fit(X_train, y_train).score(X_test, y_test)
>
> A ShuffleSplit can be used inside this in combination of a GridSearch to
> do parameter
class SingleSplit(object):
def __init__(self, train_index, test_index):
self.train_index = train_index
self.test_index = test_index
def __iter__(self):
yield self.train_index, self.test_index
is this more complicated than this?
Alex
-
2011/10/28 Gael Varoquaux :
>
> I am actually not sure that I have understood the usecase that we are
> discussing.
I think the use case is performing a grid search for a single
predefined train / validation datasets pair.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
--
On Fri, Oct 28, 2011 at 04:06:35PM +0200, Olivier Grisel wrote:
> To address Andreas use case (which seems valid to me) I think we
> should have a new grid_search utility function that does not try to
> implement the `fit` API which is too restrictive for this use case.
I am not too excited about
>> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
>
> This API cannot work, as the cross-validation object is supposed to
> return indices and thus should be applied to concatenated train and test
> data.
oups indeed;
cv = SingleSplit(idx_train, idx_test)
should do the tricks right?
Alex
Am 28.10.2011 16:06, schrieb Andreas Mueller:
> Am 28.10.2011 16:03, schrieb Gael Varoquaux:
>> On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote:
>>> class SingleSplit?
>>> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
>> This API cannot work, as the cross-validation object
2011/10/28 Gael Varoquaux :
> On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote:
>> class SingleSplit?
>
>> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
>
> This API cannot work, as the cross-validation object is supposed to
> return indices and thus should be applied to co
Am 28.10.2011 16:03, schrieb Gael Varoquaux:
> On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote:
>> class SingleSplit?
>> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
> This API cannot work, as the cross-validation object is supposed to
> return indices and thus should be
2011/10/28 Alexandre Gramfort :
> indeed you just need to define a new CV object of length 1
>
> maybe be worth adding to cross_validation.py
>
> class SingleSplit?
>
> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
Yes but that won't be natural to use with GridSearchCV where you pass
the da
Am 28.10.2011 16:01, schrieb Alexandre Gramfort:
> indeed you just need to define a new CV object of length 1
>
> maybe be worth adding to cross_validation.py
>
> class SingleSplit?
>
> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
That's what I had in mind.
---
On Fri, Oct 28, 2011 at 10:01:11AM -0400, Alexandre Gramfort wrote:
> class SingleSplit?
> cv = SingleSplit(X_train, y_train, X_test, y_test) ?
This API cannot work, as the cross-validation object is supposed to
return indices and thus should be applied to concatenated train and test
data.
G
--
indeed you just need to define a new CV object of length 1
maybe be worth adding to cross_validation.py
class SingleSplit?
cv = SingleSplit(X_train, y_train, X_test, y_test) ?
Alex
On Fri, Oct 28, 2011 at 9:56 AM, Gael Varoquaux
wrote:
> On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Muell
2011/10/28 Andreas Mueller :
> Hi everybody.
> This is about the grid_search and cross_validation modules.
> Often, in particular when the dataset is large or the algorithm slow,
> it is not feasible to do n-fold cross validation and people use
> a single training/validation split to find hyperpara
Am 28.10.2011 15:56, schrieb Gael Varoquaux:
> On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Mueller wrote:
>> What do you think?
> It seems to me that ShuffleSplit should be useable for this purpose. Am I
> wrong? It might be useful to document it somewhere, though.
In principle, yes. Though m
On Fri, Oct 28, 2011 at 03:54:00PM +0200, Andreas Mueller wrote:
> What do you think?
It seems to me that ShuffleSplit should be useable for this purpose. Am I
wrong? It might be useful to document it somewhere, though.
Cheers,
Gaël
--
31 matches
Mail list logo