Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Arnaud Joly
If you set n_jobs to XXX, it will spawn XXX threads or processes. Thus, you will need to ask for XXX cores. Note that it’s often possible to retrieve XXX in your script using os.environ. If you use less than the XXX cores, then you won’t use all the available cpu. If you ask for more than XXX cor

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Sheila the angel
I still have following doubt: I understand that n_jobs "should be depending on the number of cpu cores available on your machine". But I am running code on Grid computing environment where I have to specify the number of CPU cores in advance. Does this mean if I (reserve 64 cores and) specify n_j

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Lars Buitinck
2014-08-21 13:44 GMT+02:00 Joel Nothman : > I think RandomForestClassifier, using multithreading in version 0.15, should > work nested in multiprocessing. It would work, but the p * n threads from p processes using n threads each would still compete for the cores, right? -

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Joel Nothman
On 21 August 2014 21:46, Gael Varoquaux wrote: > On Thu, Aug 21, 2014 at 09:44:37PM +1000, Joel Nothman wrote: > > I think RandomForestClassifier, using multithreading in version 0.15, > should > > work nested in multiprocessing. > > Good point, as it uses threading. Thus, for version 0.15, what

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Gael Varoquaux
On Thu, Aug 21, 2014 at 09:44:37PM +1000, Joel Nothman wrote: > I think RandomForestClassifier, using multithreading in version 0.15, should > work nested in multiprocessing. Good point, as it uses threading. Thus, for version 0.15, what I just said was irrelevant. G

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Joel Nothman
On 21 August 2014 21:39, Gael Varoquaux wrote: > On Thu, Aug 21, 2014 at 12:32:08PM +0200, Sheila the angel wrote: > > 2. If I use the classifier such as RandomForestClassifier where > > 'n_jobs' can be specified, will it make any difference if I specify > > "n_jobs" at the classifier level also-

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Gael Varoquaux
On Thu, Aug 21, 2014 at 12:32:08PM +0200, Sheila the angel wrote: > 2. If I use the classifier such as RandomForestClassifier where > 'n_jobs' can be specified, will it make any difference if I specify > "n_jobs" at the classifier level also- We don't support nested parallelism, unfortunately. G

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Sheila the angel
First Thanks for reply. @Hames : I understand that n_jobs "should be depending on the number of cpu cores available on your machine". But I am running code on Grid computing environment where I have to specify the number of CPUs in advance. Does this mean if I (reserve 64 cores and) specify n_job

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Lars Buitinck
2014-08-21 12:32 GMT+02:00 Sheila the angel : > 1. What should be the n_jobs value, 8 or (8*4=) 32 ? n_jobs is the number of CPUs you want to use, not the amount of work. (It's a misnomer because the number of jobs/work items is variable; the parameter determines the number of workers performing t

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Mr Samuel Hames
faster as all of the processes compete with each other. Hope this helps, Sam From: Sheila the angel Sent: Thursday, 21 August 2014 8:32 PM To: scikit-learn-general@lists.sourceforge.net Subject: [Scikit-learn-general] optimal n_jobs in GridSearchCV Hi, Using

[Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Sheila the angel
Hi, Using GridSearchCV, I am trying to optimize two parameters values. In total, I have 8 parameter combinations and doing 4 fold cross validation. I want to run it in parallel environment. My questions are: 1. What should be the n_jobs value, 8 or (8*4=) 32 ? (I know I can specify n_jobs=-1 but du