Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Mathieu Blondel : > On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel > wrote: > >> LSH is just using a binary thresholded random projections in 32 (or 64 >> or 128...) dim space. That leads to 32bit (or 64bit...) vectors >> castable as integers and doing Hamming radius queries instead of

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Mathieu Blondel
On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel wrote: > LSH is just using a binary thresholded random projections in 32 (or 64 > or 128...) dim space. That leads to 32bit (or 64bit...) vectors > castable as integers and doing Hamming radius queries instead of > Euclidean queries in that boolean

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Jacob VanderPlas
Olivier Grisel wrote: > +1 for the dense case > > But ball tree does not work for high dim sparse data. > I'm working on that - I hope to have a pull request within the next few weeks. > We would also need some truncated kernels (e.g. cosine similarity for > positive data or RBF in the gener

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 04:15:36PM +0100, Olivier Grisel wrote: > 2012/1/23 Gael Varoquaux : > > On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: > >> Once we have random projections (or even just efficient hashing API), > >> LSH is quite simple to implement on top. > > I don't unde

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: >> Once we have random projections (or even just efficient hashing API), >> LSH is quite simple to implement on top. > > I don't understand: they are quite orthogonal, aren't they? You can implement LSH wi

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: > Once we have random projections (or even just efficient hashing API), > LSH is quite simple to implement on top. I don't understand: they are quite orthogonal, aren't they? Gael

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
Thanks Adrian and Andreas. My kindle is packed :) -- Olivier -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, Share

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Adrien
Le 23/01/2012 11:34, Olivier Grisel a écrit : > 2012/1/23 Andreas: >> On 01/23/2012 11:28 AM, Adrien wrote: >>> Hello everyone, >>> >>> A quick question: why not use Nystrom instead? >>> >> That was on my GSoC wish list ;) >> The application I had in mind at the moment was >> the label propagation

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 11:34 AM, Olivier Grisel wrote: 2012/1/23 Andreas: On 01/23/2012 11:28 AM, Adrien wrote: Hello everyone, A quick question: why not use Nystrom instead? That was on my GSoC wish list ;) The application I had in mind at the moment was the label propagation and m

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Andreas : > On 01/23/2012 11:28 AM, Adrien wrote: >> Hello everyone, >> >> A quick question: why not use Nystrom instead? >> > That was on my GSoC wish list ;) > The application I had in mind at the moment was > the label propagation and maybe the spectral clustering. > In general, I thin

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 11:28 AM, Adrien wrote: > Hello everyone, > > A quick question: why not use Nystrom instead? > That was on my GSoC wish list ;) The application I had in mind at the moment was the label propagation and maybe the spectral clustering. In general, I think the thresholding would work

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Adrien
Hello everyone, A quick question: why not use Nystrom instead? The effects of thresholding the kernel matrix is not very well understood and makes you lose the positive-definiteness (i.e. it's not a kernel matrix anymore). It's ok for spectral clustering as the Laplacian is always positive sem

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Dimitrios Pritsos
Hello, I am using Sklearn in combination with Pytables for Automated Genre Identification of Web Pages. The reason I am using Pytables is for executing Very hight Scale Evaluation of SVM using 50,000 samples for training. I know that that might probably this will not have so much impact in my

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Mathieu Blondel
On Mon, Jan 23, 2012 at 6:06 PM, Andreas wrote: > It might be as easy as that. > I guess I should try to see if this speeds up things. If you use algorithm="brute", there should be no speed-up (it computes all the distances and find those within the given radius...). If you use ball-tree, it sho

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 09:46:41AM +0100, Olivier Grisel wrote: >> But ball tree does not work for high dim sparse data. > > In this case, I think that the LSH option is a good one. There is an LSH > in pybrain that can be adapted. Once we have random projections (or e

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 08:52 AM, Alexandre Gramfort wrote: > I am not sure it is what you want but you could use: > > K = radius_neighbors_graph(X, radius, mode='distance') > K.data **= 2 > K.data *= -gamma > np.exp(K.data, out=K.data) > > no? > > Alex > It might be as easy as that. I guess I should try

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 09:46:41AM +0100, Olivier Grisel wrote: > But ball tree does not work for high dim sparse data. In this case, I think that the LSH option is a good one. There is an LSH in pybrain that can be adapted. Gael --

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Alexandre Gramfort : > I am not sure it is what you want but you could use: > > K = radius_neighbors_graph(X, radius, mode='distance') > K.data **= 2 > K.data *= -gamma > np.exp(K.data, out=K.data) > > no? +1 for the dense case But ball tree does not work for high dim sparse data. We w

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread Alexandre Gramfort
I am not sure it is what you want but you could use: K = radius_neighbors_graph(X, radius, mode='distance') K.data **= 2 K.data *= -gamma np.exp(K.data, out=K.data) no? Alex On Sun, Jan 22, 2012 at 9:34 PM, Andreas wrote: > Hi everybody. > While reviewing the label propagation PR, I thought ab

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread Jacob VanderPlas
I don't think this would work out-of-the-box. The classic ball tree implementation depends on the metric satisfying the triangle inequality. You may be able to cleverly modify the algorithm to work in other cases, but I'm not aware of any examples of that. I think that approximate nearest ne

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread Mathieu Blondel
On Mon, Jan 23, 2012 at 5:34 AM, Andreas wrote: > Hi everybody. > While reviewing the label propagation PR, I thought about the pairwise > rbf functions. > Would it be possible to compute an sparse, approximate RBF kernel matrix > using ball trees? > The idea would be that if the distance between

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread josef . pktd
On Sun, Jan 22, 2012 at 3:34 PM, Andreas wrote: > Hi everybody. > While reviewing the label propagation PR, I thought about the pairwise > rbf functions. > Would it be possible to compute an sparse, approximate RBF kernel matrix > using ball trees? > The idea would be that if the distance between

[Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread Andreas
Hi everybody. While reviewing the label propagation PR, I thought about the pairwise rbf functions. Would it be possible to compute an sparse, approximate RBF kernel matrix using ball trees? The idea would be that if the distance between two points is some "large" multiple of gamma, the kernel c