Re: [Scikit-learn-general] Sub sampling large datasets

2012-04-11 Thread Gael Varoquaux
On Wed, Apr 11, 2012 at 10:55:34AM +0200, Jean-Louis Durrieu wrote: > I was thinking it would be a good idea to include in gmm.py such a > mechanism. Core sets are a very beautiful idea from the theoretical standpoint and I'd love to have them in the scikit. We had even added them in the list of i

Re: [Scikit-learn-general] Sub sampling large datasets

2012-04-11 Thread Olivier Grisel
Le 11 avril 2012 10:55, Jean-Louis Durrieu a écrit : > Hi all, > > On Feb 7, 2012, at 8:47 AM, Olivier Grisel wrote: > >> 2012/2/6 Shishir Pandey : >>> >>> I am working with a dataset which too big to fit in the memory. Is there a >>> way in scikits-learn to sub sample the existing dataset maintai

Re: [Scikit-learn-general] Sub sampling large datasets

2012-04-11 Thread Jean-Louis Durrieu
Hi all, On Feb 7, 2012, at 8:47 AM, Olivier Grisel wrote: > 2012/2/6 Shishir Pandey : >> >> I am working with a dataset which too big to fit in the memory. Is there a >> way in scikits-learn to sub sample the existing dataset maintaining its >> properties so that I can load it in my RAM? > > We

Re: [Scikit-learn-general] Sub sampling large datasets

2012-02-06 Thread Olivier Grisel
2012/2/6 Shishir Pandey : > Hi > > I am working with a dataset which too big to fit in the memory. Is there a > way in scikits-learn to sub sample the existing dataset maintaining its > properties so that I can load it in my RAM? We don't have any "smart" subsampler in scikit-learn (like a GMM cor

[Scikit-learn-general] Sub sampling large datasets

2012-02-06 Thread Shishir Pandey
Hi I am working with a dataset which too big to fit in the memory. Is there a way in scikits-learn to sub sample the existing dataset maintaining its properties so that I can load it in my RAM? with regards, --Shishir Pandey