On Sun, May 12, 2013 at 01:35:07PM +0200, Alexandre ABRAHAM wrote:
> I know that the first purpose of scikit is not to handle big data but
> would you be interested by a PR of my silhouette block implementation ?
+1 for PR. I think that I would introduce a keyword argument to switch
between the 2
Hi,
I guess Theano is a big dependency.
I for one do not consider GPU ready for heavy numerical processes. Those
that are _massively_ data parellel may be parallelized, but task
parallelism is madly suited for GPUs. And the way Alexandre parallized the
computation is more task- than data-paralleli
theano for the parallelization?
from what i understand your PR uses on-the-fly computation to reduce memory
usage vs all at once. Wouldn't Theano help? As in could you per chance
'theano-ize' the parallel calculation maybe? I consider heavy numerical
processes to be (at least now) mostly the doma
Hi Matthias.
Unfortunately joblib doesn't handle large datasets very gracefully at
the moment.
Have you tried setting the pre_dispatch parameter? Otherwise it could be
that all jobs
are dispatched even if only two are run.
Hth,
Andy
On 05/12/2013 05:12 PM, Matthias Ekman wrote:
Dear all,
us
Hi Ronnie,
I have never used Theano, could you be a little more specific ? What do you
want to compute ? What is your input data ? Basically, all these metrics
are independant of the scikit and take numpy arrays as input so you can use
it with any data under this format.
Now, if you want to integ
Dear all,
using sklearn 0.13 (fresh Ubuntu 12.04 installation), I'm getting the error
below, which I belief is a memory error. What strikes me is that I'm using
a machine with 512GB of RAM, so that shouldn't be happening.
Is there maybe a system setting that limits the amount of RAM on a user
bas
uhhh +1. any chance of using theano with it?
On Sun, May 12, 2013 at 7:35 AM, Alexandre ABRAHAM <
abraham.alexan...@gmail.com> wrote:
> Hey scikit people,
>
> I know that the first purpose of scikit is not to handle big data but
> would you be interested by a PR of my silhouette block implementa
Hey scikit people,
I know that the first purpose of scikit is not to handle big data but would
you be interested by a PR of my silhouette block implementation ? My
benches have shown that it is a bit slower than the scikit one when data is
small but it divides memory usage by n_cluster ^ 2. Plus i