Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Gael Varoquaux
On Sun, May 12, 2013 at 01:35:07PM +0200, Alexandre ABRAHAM wrote: > I know that the first purpose of scikit is not to handle big data but > would you be interested by a PR of my silhouette block implementation ? +1 for PR. I think that I would introduce a keyword argument to switch between the 2

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Matthieu Brucher
Hi, I guess Theano is a big dependency. I for one do not consider GPU ready for heavy numerical processes. Those that are _massively_ data parellel may be parallelized, but task parallelism is madly suited for GPUs. And the way Alexandre parallized the computation is more task- than data-paralleli

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Ronnie Ghose
theano for the parallelization? from what i understand your PR uses on-the-fly computation to reduce memory usage vs all at once. Wouldn't Theano help? As in could you per chance 'theano-ize' the parallel calculation maybe? I consider heavy numerical processes to be (at least now) mostly the doma

Re: [Scikit-learn-general] multiprocessing error

2013-05-12 Thread Andreas Mueller
Hi Matthias. Unfortunately joblib doesn't handle large datasets very gracefully at the moment. Have you tried setting the pre_dispatch parameter? Otherwise it could be that all jobs are dispatched even if only two are run. Hth, Andy On 05/12/2013 05:12 PM, Matthias Ekman wrote: Dear all, us

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Alexandre ABRAHAM
Hi Ronnie, I have never used Theano, could you be a little more specific ? What do you want to compute ? What is your input data ? Basically, all these metrics are independant of the scikit and take numpy arrays as input so you can use it with any data under this format. Now, if you want to integ

[Scikit-learn-general] multiprocessing error

2013-05-12 Thread Matthias Ekman
Dear all, using sklearn 0.13 (fresh Ubuntu 12.04 installation), I'm getting the error below, which I belief is a memory error. What strikes me is that I'm using a machine with 512GB of RAM, so that shouldn't be happening. Is there maybe a system setting that limits the amount of RAM on a user bas

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Ronnie Ghose
uhhh +1. any chance of using theano with it? On Sun, May 12, 2013 at 7:35 AM, Alexandre ABRAHAM < abraham.alexan...@gmail.com> wrote: > Hey scikit people, > > I know that the first purpose of scikit is not to handle big data but > would you be interested by a PR of my silhouette block implementa

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-12 Thread Alexandre ABRAHAM
Hey scikit people, I know that the first purpose of scikit is not to handle big data but would you be interested by a PR of my silhouette block implementation ? My benches have shown that it is a bit slower than the scikit one when data is small but it divides memory usage by n_cluster ^ 2. Plus i