Re: [Scikit-learn-general] Joblib and IPython

2012-02-02 Thread Andreas Müller
On 02/01/2012 04:03 PM, Gael Varoquaux wrote: > On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > >> I started working with IPython.parallel for training the trees using joblib. >> It works in principal, but it is SLOW. >> The time between starting and the jobs arriving at the engines

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Olivier Grisel
Andreas: you should do some timing tests for data transfer using the plain numpy + IPython.parallel API (without scikit-learn nor joblib) to check that you are able to broadcast your data efficiently without memory copy. Once you have optimal time check that you can build an application in reverse

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 05:48:44PM +0100, Olivier Grisel wrote: > > IPython uses pickling, which is really slow. > This is not the case for plain numpy arrays > http://ipython.org/ipython-doc/stable/parallel/parallel_details.html#non-copying-sends-and-numpy-arrays Yes, but as soon as you use obj

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Olivier Grisel
2012/2/1 Gael Varoquaux : > On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: >> I started working with IPython.parallel for training the trees using joblib. >> It works in principal, but it is SLOW. >> The time between starting and the jobs arriving at the engines is really >> long. >> I'm

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > I started working with IPython.parallel for training the trees using joblib. > It works in principal, but it is SLOW. > The time between starting and the jobs arriving at the engines is really > long. > I'm sending around 20.000x2000 float

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Andreas
On 02/01/2012 03:05 PM, Andreas wrote: > I started working with IPython.parallel for training the trees using joblib. > It works in principal, but it is SLOW. > The time between starting and the jobs arriving at the engines is really > long. > I'm sending around 20.000x2000 float64 matrices, but th

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Andreas
I started working with IPython.parallel for training the trees using joblib. It works in principal, but it is SLOW. The time between starting and the jobs arriving at the engines is really long. I'm sending around 20.000x2000 float64 matrices, but this is gigabit ethernet and I wouldn't expect it

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Mon, Jan 30, 2012 at 05:22:35PM +0100, Andreas wrote: > I implemented a somewhat trivial solution here: > https://github.com/amueller/joblib/tree/ipython_refactoring > It can be used like this: > https://gist.github.com/1705235 > Not sure if this is a good way to do things but it > was a very

Re: [Scikit-learn-general] Joblib and IPython

2012-01-30 Thread Andreas
Hey folks. I implemented a somewhat trivial solution here: https://github.com/amueller/joblib/tree/ipython_refactoring It can be used like this: https://gist.github.com/1705235 Not sure if this is a good way to do things but it was a very easy way and it works for me ;) You need a working ipytho

Re: [Scikit-learn-general] Joblib and IPython

2012-01-28 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 12:32:31PM -0800, Fernando Perez wrote: > And just to state what is probably obvious, we're more than happy to > adjust the apis in ipython as necessary to reduce the impedance > mismatches between tools. I think that we share the same view: as much parallel code as possib

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Fernando Perez
Howdy, On Fri, Jan 27, 2012 at 6:44 AM, Andreas wrote: > At the moment all parallelism is handled by joblib. On the other hand it > seems > IPython can talk to the SGE scheduling. > So I would love to have a way for joblib to talk to IPython. just to say that I'm sorry not to jump in the discuss

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 11:42:15AM -0500, Satrajit Ghosh wrote: > i understand, but you have to have a glue somewhere unless each of these > distribution libraries expose the same api. Agreed > the problem is that you have to agree on the data model for job dispatch > and that might take a while

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Satrajit Ghosh
hi gael, > I am not too entousiastic about that: joblib is meant to be a light > library. Such a solution would pretty much force the joblib release > manager to have an SGE cluster in order to run the tests and debug. > > Just like I think that IPython-specific stuff should live in IPython, I >

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 11:15:43AM -0500, Satrajit Ghosh wrote: >one option we could consider is to take the job distribution capability in >nipype and make it general purpose and add it to joblib (will require some >effort - it won't be quick). it would be nice as i have stated before

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Andreas
>> What do you think would be the hard part / why >> do you think this is much work? >> > I think that their is a bit of learning to be done. In particular, the > code would have to be well tested, and I have no idea of what the right > way to test it would be. The second problem would be to

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 05:10:11PM +0100, Olivier Grisel wrote: > The problem is that for multiprocessing, a n_jobs argument is enough > (to tell the number of cores). But for cluster computing you will have > to pass some kind of active cluster session (e.g. a > IPython.parallel.Client instance th

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 05:01:12PM +0100, Andreas wrote: > I was not sure whether you want other backends in joblib. The parallel part of joblib is meant to be a convenience wrapper, not a parallel execution model of its own. As such, enriching it with other backends is definitely in the scope as

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Satrajit Ghosh
hi gael, I would really like to avoid having any direct import to IPython in > scikit-learn: I would like our set of dependencies to stick to scipy, > numpy, and optionally matplotlib for the examples (note that this also > means that I would like to get rid of the pyamg dependency, that has > pro

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Olivier Grisel
2012/1/27 Gael Varoquaux : > On Fri, Jan 27, 2012 at 04:58:30PM +0100, Olivier Grisel wrote: >> I would advise you to start by experimenting with your own version of >> GridSearchCV (by deriving from the version of sklearn) and passing a >> LoadBalancedView instance as argument to the constructor a

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Andreas
Hi Olivier. Thanks for your suggestions. It certainly seems easier to directly use IPython but I agree with Gael about not wanting to add additional dependencies. I'll try doing it in joblib and if that is to hard, I'll try doing it directly in sklearn. Let's see how this goes! Cheers, Andy On

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Satrajit Ghosh
hi andreas, when you launch ipcluster on SGE for example, it queues up a set of python engines as jobs. these jobs will get distributed to the SGE execution pool depending on it's current job distribution. The key to note here is that no real job execution (in your case forests) have taken place y

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Olivier Grisel
2012/1/27 Andreas : > On 01/27/2012 04:55 PM, Gael Varoquaux wrote: >> On Fri, Jan 27, 2012 at 03:44:31PM +0100, Andreas wrote: >> >>> as it could be. So I was wondering whether there would be a >>> non-intrusive way to make sklearn parallelize over the cluster. >>> >> This is a very legitimate que

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 04:58:30PM +0100, Olivier Grisel wrote: > I would advise you to start by experimenting with your own version of > GridSearchCV (by deriving from the version of sklearn) and passing a > LoadBalancedView instance as argument to the constructor and use it in > the fit method in

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Andreas
On 01/27/2012 04:55 PM, Gael Varoquaux wrote: > On Fri, Jan 27, 2012 at 03:44:31PM +0100, Andreas wrote: > >> as it could be. So I was wondering whether there would be a >> non-intrusive way to make sklearn parallelize over the cluster. >> > This is a very legitimate question. Basically,

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Olivier Grisel
2012/1/27 Andreas : I would advise you to start by experimenting with your own version of GridSearchCV (by deriving from the version of sklearn) and passing a LoadBalancedView instance as argument to the constructor and use it in the fit method instead of calling joblib. The same could be followe

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Gael Varoquaux
On Fri, Jan 27, 2012 at 03:44:31PM +0100, Andreas wrote: > as it could be. So I was wondering whether there would be a > non-intrusive way to make sklearn parallelize over the cluster. This is a very legitimate question. Basically, it boils down to: how can we extend the parallelism model in sciki

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Andreas
Hi Satra. Thanks for your comments. Can you explain what the "grap an engine" strategy means? Is it that you distribute the jobs to the engines before starting any jobs and not having them in a queue? This should be ok if my jobs and my engines are pretty homogeneous, right? The main question f

Re: [Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Satrajit Ghosh
hi andreas, a few notes: - a sprint planned for pycon will be looking at parallel computing with scikit-learn and ipython (http://wiki.ipython.org/PyCon12Sprint) - ipython currently uses a grab an engine and not release strategy in the context of distributed systems like SGE/PBS/LSF. this implie

[Scikit-learn-general] Joblib and IPython

2012-01-27 Thread Andreas
Hi everybody. This question basically goes out to Gael, but might also be interesting for others. I am using sklearn on an SGE cluster at the moment and it is not as nice as it could be. So I was wondering whether there would be a non-intrusive way to make sklearn parallelize over the cluster. A