On Wed, Jan 25, 2012 at 7:07 PM, Wes McKinney wrote:
> I'm happy to do it at PyCon since I assume there will be plenty of
> space plus perhaps snacks and definitely camaraderie. Just wanted to
> check that "the sprint is on!". Do you want to get something
> officially on the schedule or shall I?
On Wed, Jan 25, 2012 at 1:49 PM, Olivier Grisel
wrote:
> 2012/1/25 Wes McKinney :
>>
>> hi Olivier,
>>
>> do we want to still do a data / statsmodels / scikit-learn sprint at
>> PyCon? I will be there the first two sprint days, leaving town (after
>> a very extended stay due to Strata 2 weeks befo
On 26 January 2012 04:30, Mathieu Blondel wrote:
> On Thu, Jan 26, 2012 at 1:53 AM, Gael Varoquaux
> wrote:
>
> > I agree, I wasn't voting against the feature. I was just puzzled, but you
> > explained it.
>
> For sparse matrices, dot products and other low-level operations are
> coded in C++ in
2012/1/25 Wes McKinney :
>
> hi Olivier,
>
> do we want to still do a data / statsmodels / scikit-learn sprint at
> PyCon? I will be there the first two sprint days, leaving town (after
> a very extended stay due to Strata 2 weeks beforehand) on 3/15.
I still think the PyCon venue would be nice to
On Fri, Dec 9, 2011 at 1:31 PM, Olivier Grisel wrote:
> 2011/12/9 Fernando Perez :
>> On Fri, Dec 9, 2011 at 1:37 AM, Olivier Grisel
>> wrote:
>>> I was thinking: it would be great if we could get a ssh access to a
>>> small linux cluster (e.g. 10 nodes) with IPython / numpy / scipy
>>> installe
On Thu, Jan 26, 2012 at 1:53 AM, Gael Varoquaux
wrote:
> I agree, I wasn't voting against the feature. I was just puzzled, but you
> explained it.
For sparse matrices, dot products and other low-level operations are
coded in C++ in Scipy, so parallel support also helps here.
Mathieu
--
On Wed, Jan 25, 2012 at 6:00 PM, Olivier Grisel wrote:
>
> > Once you have clustered the unlabeled samples,
> > you can add, as extra features on the labeled samples,
> > the distance from each cluster center (e.g. computed
> > via RBF kernel).
> > Is that what you are suggesting?
>
> They are more
2012/1/25 Paolo Losi :
> Hi Oliver,
>
> your reply is very informative (as always :-) ).
> I've got a couple of question for you. See below...
>
> On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel
> wrote:
>>
>> If you can cheaply collect unsupervised data that looks similar to
>> your training set
On Wed, Jan 25, 2012 at 5:32 PM, Mathieu Blondel wrote:
> > do you see any use case for which distance calculation is
> > performed in an "outer" loop?
>
> In my case, the pairwise matrix is an argument to the learning
> algorithm and is computed once for all in the beginning. So, I really
> want
On Wed, Jan 25, 2012 at 05:43:08PM +0100, Olivier Grisel wrote:
> Still I find it a good idea to have an explicit API to perform
> kernel precomputation in parallel on multicore rather than hoping that
> the underlying runtime will do it automatically (which is not the case
> for 99% of the ubuntu
2012/1/25 Mathieu Blondel :
> On Thu, Jan 26, 2012 at 12:40 AM, Paolo Losi wrote:
>
>> do you see any use case for which distance calculation is
>> performed in an "outer" loop?
>
> In my case, the pairwise matrix is an argument to the learning
> algorithm and is computed once for all in the begin
On Thu, Jan 26, 2012 at 12:40 AM, Paolo Losi wrote:
> do you see any use case for which distance calculation is
> performed in an "outer" loop?
In my case, the pairwise matrix is an argument to the learning
algorithm and is computed once for all in the beginning. So, I really
want it do be as fa
On Wed, Jan 25, 2012 at 04:40:26PM +0100, Paolo Losi wrote:
>As a general rule of thumb, IMHO I think it's better to parallelize
>at higher levels (more external iteration loops). It's generally:
>- more efficientÂ
>- keeps the API cleaner (non need to push n_jobs parameter down)
>
Hi Mathieu!
do you see any use case for which distance calculation is
performed in an "outer" loop?
As a general rule of thumb, IMHO I think it's better to parallelize
at higher levels (more external iteration loops). It's generally:
- more efficient
- keeps the API cleaner (non need to push n_j
Hi Oliver,
your reply is very informative (as always :-) ).
I've got a couple of question for you. See below...
On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel wrote:
>
> If you can cheaply collect unsupervised data that looks similar to
> your training set (albeit without the labels and in much
On Thu, Jan 26, 2012 at 12:16 AM, Mathieu Blondel wrote:
> sparse, n_jobs=1: 30.92
> sparse, n_jobs=4: 10.17
>
> dense, n_jobs=1: 7.64
> dense, n_jobs=4: 4.75
Oops, I forgot to mention that the above figures are computation times
in seconds.
Mathieu
-
Hello folks,
I've just added an n_jobs option to the pairwise_distances and
pairwise_kernels functions. This works by breaking down the pairwise
matrix into "n_jobs" even slices and doing the computations in
parallel.
On the USPS dataset (n_samples=7291, n_features=257), I got the
following resul
Hi Andreas,
IMHO the only reasonable thing to do is to ignore samples for which
there is no oob estimation.
building a forest with less than 5 trees makes no sense in the first place,
so I would not worry if sklearn doesn't provide any warning for that
specific
problem (too "few" oob estimates).
Just for fun...
the probability for a sample of being without oob estimates is:
5 trees: p = 0.0067
20 trees: p = 2e-9
I stand by my suggestion: let's ignore samples without oob estimates
Paolo
On Wed, Jan 25, 2012 at 2:30 PM, Paolo Losi wrote:
> Hi Andreas,
>
> IMHO the only reasonable th
Hi everybody.
My pull request for oob estimates got merge a couple of days ago.
Now I noticed a behavior that I am not completely happy with.
If the number of estimator in the ensemble is small (say 1)
then the won't be a prediction for all of the samples.
The way it is currently implemented, there
On 01/25/2012 10:09 AM, Mathieu Blondel wrote:
> On Wed, Jan 25, 2012 at 3:03 PM, Mathieu Blondel wrote:
>
>
>> I will do it later today.
>>
> Done in
> https://github.com/scikit-learn/scikit-learn/commit/77d83b61f9161899de286ca09601aa648e9c31ff.
>
Thanks!
Will try it later :)
Andy
On Wed, Jan 25, 2012 at 3:03 PM, Mathieu Blondel wrote:
> I will do it later today.
Done in
https://github.com/scikit-learn/scikit-learn/commit/77d83b61f9161899de286ca09601aa648e9c31ff.
Mathieu
--
Keep Your Developer
22 matches
Mail list logo