Re: [Scikit-learn-general] [pystatsmodels] Re: Sprinting at PyCon US 2012 in Santa Clara in March.

2012-01-25 Thread Fernando Perez
On Wed, Jan 25, 2012 at 7:07 PM, Wes McKinney wrote: > I'm happy to do it at PyCon since I assume there will be plenty of > space plus perhaps snacks and definitely camaraderie. Just wanted to > check that "the sprint is on!". Do you want to get something > officially on the schedule or shall I?

Re: [Scikit-learn-general] [pystatsmodels] Re: Sprinting at PyCon US 2012 in Santa Clara in March.

2012-01-25 Thread Wes McKinney
On Wed, Jan 25, 2012 at 1:49 PM, Olivier Grisel wrote: > 2012/1/25 Wes McKinney : >> >> hi Olivier, >> >> do we want to still do a data / statsmodels / scikit-learn sprint at >> PyCon? I will be there the first two sprint days, leaving town (after >> a very extended stay due to Strata 2 weeks befo

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Robert Layton
On 26 January 2012 04:30, Mathieu Blondel wrote: > On Thu, Jan 26, 2012 at 1:53 AM, Gael Varoquaux > wrote: > > > I agree, I wasn't voting against the feature. I was just puzzled, but you > > explained it. > > For sparse matrices, dot products and other low-level operations are > coded in C++ in

Re: [Scikit-learn-general] [pystatsmodels] Re: Sprinting at PyCon US 2012 in Santa Clara in March.

2012-01-25 Thread Olivier Grisel
2012/1/25 Wes McKinney : > > hi Olivier, > > do we want to still do a data / statsmodels / scikit-learn sprint at > PyCon? I will be there the first two sprint days, leaving town (after > a very extended stay due to Strata 2 weeks beforehand) on 3/15. I still think the PyCon venue would be nice to

Re: [Scikit-learn-general] [pystatsmodels] Re: Sprinting at PyCon US 2012 in Santa Clara in March.

2012-01-25 Thread Wes McKinney
On Fri, Dec 9, 2011 at 1:31 PM, Olivier Grisel wrote: > 2011/12/9 Fernando Perez : >> On Fri, Dec 9, 2011 at 1:37 AM, Olivier Grisel >> wrote: >>> I was thinking: it would be great if we could get a ssh access to a >>> small linux cluster (e.g. 10 nodes) with IPython / numpy / scipy >>> installe

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Mathieu Blondel
On Thu, Jan 26, 2012 at 1:53 AM, Gael Varoquaux wrote: > I agree, I wasn't voting against the feature. I was just puzzled, but you > explained it. For sparse matrices, dot products and other low-level operations are coded in C++ in Scipy, so parallel support also helps here. Mathieu --

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Paolo Losi
On Wed, Jan 25, 2012 at 6:00 PM, Olivier Grisel wrote: > > > Once you have clustered the unlabeled samples, > > you can add, as extra features on the labeled samples, > > the distance from each cluster center (e.g. computed > > via RBF kernel). > > Is that what you are suggesting? > > They are more

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Olivier Grisel
2012/1/25 Paolo Losi : > Hi Oliver, > > your reply is very informative (as always :-) ). > I've got a couple of question for you. See below... > > On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel > wrote: >> >> If you can cheaply collect unsupervised data that looks similar to >> your training set

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Paolo Losi
On Wed, Jan 25, 2012 at 5:32 PM, Mathieu Blondel wrote: > > do you see any use case for which distance calculation is > > performed in an "outer" loop? > > In my case, the pairwise matrix is an argument to the learning > algorithm and is computed once for all in the beginning. So, I really > want

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Gael Varoquaux
On Wed, Jan 25, 2012 at 05:43:08PM +0100, Olivier Grisel wrote: > Still I find it a good idea to have an explicit API to perform > kernel precomputation in parallel on multicore rather than hoping that > the underlying runtime will do it automatically (which is not the case > for 99% of the ubuntu

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Olivier Grisel
2012/1/25 Mathieu Blondel : > On Thu, Jan 26, 2012 at 12:40 AM, Paolo Losi wrote: > >> do you see any use case for which distance calculation is >> performed in an "outer" loop? > > In my case, the pairwise matrix is an argument to the learning > algorithm and is computed once for all in the begin

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Mathieu Blondel
On Thu, Jan 26, 2012 at 12:40 AM, Paolo Losi wrote: > do you see any use case for which distance calculation is > performed in an "outer" loop? In my case, the pairwise matrix is an argument to the learning algorithm and is computed once for all in the beginning. So, I really want it do be as fa

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Gael Varoquaux
On Wed, Jan 25, 2012 at 04:40:26PM +0100, Paolo Losi wrote: >As a general rule of thumb, IMHO I think it's better to parallelize >at higher levels (more external iteration loops). It's generally: >- more efficient  >- keeps the API cleaner (non need to push n_jobs parameter down) >

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Paolo Losi
Hi Mathieu! do you see any use case for which distance calculation is performed in an "outer" loop? As a general rule of thumb, IMHO I think it's better to parallelize at higher levels (more external iteration loops). It's generally: - more efficient - keeps the API cleaner (non need to push n_j

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Paolo Losi
Hi Oliver, your reply is very informative (as always :-) ). I've got a couple of question for you. See below... On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel wrote: > > If you can cheaply collect unsupervised data that looks similar to > your training set (albeit without the labels and in much

Re: [Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Mathieu Blondel
On Thu, Jan 26, 2012 at 12:16 AM, Mathieu Blondel wrote: > sparse, n_jobs=1: 30.92 > sparse, n_jobs=4: 10.17 > > dense, n_jobs=1: 7.64 > dense, n_jobs=4: 4.75 Oops, I forgot to mention that the above figures are computation times in seconds. Mathieu -

[Scikit-learn-general] n_jobs in pairwise_distances and pairwise_kernels

2012-01-25 Thread Mathieu Blondel
Hello folks, I've just added an n_jobs option to the pairwise_distances and pairwise_kernels functions. This works by breaking down the pairwise matrix into "n_jobs" even slices and doing the computations in parallel. On the USPS dataset (n_samples=7291, n_features=257), I got the following resul

Re: [Scikit-learn-general] Out of bag estimates for ensemble learners

2012-01-25 Thread Paolo Losi
Hi Andreas, IMHO the only reasonable thing to do is to ignore samples for which there is no oob estimation. building a forest with less than 5 trees makes no sense in the first place, so I would not worry if sklearn doesn't provide any warning for that specific problem (too "few" oob estimates).

Re: [Scikit-learn-general] Out of bag estimates for ensemble learners

2012-01-25 Thread Paolo Losi
Just for fun... the probability for a sample of being without oob estimates is: 5 trees: p = 0.0067 20 trees: p = 2e-9 I stand by my suggestion: let's ignore samples without oob estimates Paolo On Wed, Jan 25, 2012 at 2:30 PM, Paolo Losi wrote: > Hi Andreas, > > IMHO the only reasonable th

[Scikit-learn-general] Out of bag estimates for ensemble learners

2012-01-25 Thread Andreas
Hi everybody. My pull request for oob estimates got merge a couple of days ago. Now I noticed a behavior that I am not completely happy with. If the number of estimator in the ensemble is small (say 1) then the won't be a prediction for all of the samples. The way it is currently implemented, there

Re: [Scikit-learn-general] CoefSelectTransformerMixin

2012-01-25 Thread Andreas
On 01/25/2012 10:09 AM, Mathieu Blondel wrote: > On Wed, Jan 25, 2012 at 3:03 PM, Mathieu Blondel wrote: > > >> I will do it later today. >> > Done in > https://github.com/scikit-learn/scikit-learn/commit/77d83b61f9161899de286ca09601aa648e9c31ff. > Thanks! Will try it later :) Andy

Re: [Scikit-learn-general] CoefSelectTransformerMixin

2012-01-25 Thread Mathieu Blondel
On Wed, Jan 25, 2012 at 3:03 PM, Mathieu Blondel wrote: > I will do it later today. Done in https://github.com/scikit-learn/scikit-learn/commit/77d83b61f9161899de286ca09601aa648e9c31ff. Mathieu -- Keep Your Developer