Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Pablo Ruggia
I'm also having trouble with Nearest neighbor, although not sure if its related. I'm doing a regression, and everytime I see that particular warning, one or more of my predicted values ends up with a value of 'nan'. I checked my inputs and none of them are nan.If I can create a small sample that sh

Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Jacob VanderPlas
I played around with this a bit: it appears to be related to a memory error. https://gist.github.com/1666570 This fails after a few iterations. If the print statement is uncommented, then it no longer fails. The ball tree code uses a lot of raw memory views for speed... I'll have a look through

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Olivier Grisel
2012/1/24 Blake Visin : > Seems to have worked.  Should I be concerned that it skipped 6 tests? Nope, they are expected to be skipped. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- Keep Your Devel

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Blake Visin
Seems to have worked. Should I be concerned that it skipped 6 tests? blake@blake-M4600:~/workspace/scikit-learn$ nosetests sklearn /home/blake/workspace/scikit-learn/sklearn/cross_val.py:2: UserWarning: sklearn.cross_val namespace is deprecated in version 0.9 and will be removed in version 0.11.

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Olivier Grisel
2012/1/24 Blake Visin : > I installed it using: sudo pip install -U scikit-learn > pip freeze returns: > > scikit-learn==0.10 > numpy==1.6.1 > scipy==0.9.0 Alright, can you try to build the master and check whether you can reproduce the test failure? git clone https://github.com/scikit-learn/scik

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Blake Visin
I installed it using: sudo pip install -U scikit-learn pip freeze returns: scikit-learn==0.10 numpy==1.6.1 scipy==0.9.0 Thanks, Blake On Mon, Jan 23, 2012 at 3:58 PM, Olivier Grisel wrote: > This looks like a bug in scikit-learn. Which version are you using? A > released archive or an up to dat

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Olivier Grisel
This looks like a bug in scikit-learn. Which version are you using? A released archive or an up to date clone from the master branch of the github repo? Also which version of numpy and scipy are installed on your box? -- Olivier --

Re: [Scikit-learn-general] logistic regression weights

2012-01-23 Thread Olivier Grisel
2012/1/24 Jieyun Fu : > Hi all, > > Is there a way to give observation weights to LogisticRegression module? I > am referring to the weights for different observations. i.e., if we are > feeding N samples into the regression, we should give N weights. >From the > APIs, looks like we can only give w

[Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Blake Visin
I am trying to get started with scikit-learn and I am following the tutorial here: I am running on Ubuntu 11.10 Linux 3.0.0-14-generic x86_64 I have installed all the necessary packages listed in the tutorial and here is the output when running nosetests sklearn: blake@blake-M4600:~/workspace/sci

[Scikit-learn-general] logistic regression weights

2012-01-23 Thread Jieyun Fu
Hi all, Is there a way to give observation weights to LogisticRegression module? I am referring to the weights for different observations. i.e., if we are feeding N samples into the regression, we should give N weights. From the APIs, looks like we can only give weights based on the classes. Than

Re: [Scikit-learn-general] AssertionError running nosetests sklearn

2012-01-23 Thread Blake Visin
Sorry forgot the link. I am following the tutorial here: http://scikit-learn.github.com/scikit-learn-tutorial/setup.html#install-scikit-learn-build-dependencies On Mon, Jan 23, 2012 at 3:21 PM, Blake Visin wrote: > I am trying to get started with scikit-learn and I am following the > tutorial h

Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Andreas
On 01/23/2012 10:38 PM, Andreas wrote: > Hi everybody. > I created a gist to illustrate the behavior: > https://gist.github.com/1665623 > This reproduces a warning that is quite weird. > > After trying it out for some time, I found > the following fun fact: > It behaves only like it does when the m

Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Andreas
Hi everybody. I created a gist to illustrate the behavior: https://gist.github.com/1665623 This reproduces a warning that is quite weird. After trying it out for some time, I found the following fun fact: It behaves only like it does when the manifold module is imported. So if you remove the manif

Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Andreas
Hi. I don't know much about the modules that are involved here but this looks like a bug to me. I can reproduce the behavior you observe and am looking into it. I think Jake will be able to tell you more about this. Cheers, Andy On 01/23/2012 08:15 PM, Alejandro Weinstein wrote: > Hi: > > When

[Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Alejandro Weinstein
Hi: When I run manifold.LocallyLinearEmbedding (using sklearn 0.10), as in the following code, ### from sklearn import manifold, datasets n_points = 1000 n_neighbors = 10 out_dim = 2 X, _ = datasets.samples_generator.ma

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Dimitrios Pritsos : > > However, when I do the same test using partial_fit() for the same > sub-set of my Data Set (see above) I am getting ~20%. > > Any Suggestions? Do a grid search to find the best alpha on SGDClassifier (and on C for the LinearSVC classifier). For instance: >>> from

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 07:01 PM, Dimitrios Pritsos wrote: > On 01/23/2012 06:59 PM, Andreas wrote: >> On 01/23/2012 05:54 PM, Dimitrios Pritsos wrote: >>> Relate to LinearSVC() and SGDClassifier() >>> >>> I ran both with a subset of my 33k-samples by 30k-features and I am >>> getting a huge difference in re

Re: [Scikit-learn-general] Unit test fail when building the latest version of scikit-learn.

2012-01-23 Thread Olivier Grisel
2012/1/23 Mathieu Blondel : > On Tue, Jan 24, 2012 at 2:47 AM, Olivier Grisel > wrote: > >> Tests are fine on numpy 1.5.1 and scipy 0.10.0: >> >> https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/ >> >> Maybe a 1.6.1 specific issue? If this is a rounding issue t

Re: [Scikit-learn-general] Unit test fail when building the latest version of scikit-learn.

2012-01-23 Thread Mathieu Blondel
On Tue, Jan 24, 2012 at 2:47 AM, Olivier Grisel wrote: > Tests are fine on numpy 1.5.1 and scipy 0.10.0: > > https://jenkins.shiningpanda.com/scikit-learn/job/python-2.7-numpy-1.5.1-scipy-0.10.0/ > > Maybe a 1.6.1 specific issue? If this is a rounding issue triggering a > classification switch on

Re: [Scikit-learn-general] Unit test fail when building the latest version of scikit-learn.

2012-01-23 Thread Olivier Grisel
2012/1/23 Alejandro Weinstein : > Hi: > > I am trying to install the latest version of scikit-learn (59db66...). > I cloned the repository, and typed 'make'. One of the unit tests is > failing: > > == > FAIL: sklearn.tests.test_mul

[Scikit-learn-general] Unit test fail when building the latest version of scikit-learn.

2012-01-23 Thread Alejandro Weinstein
Hi: I am trying to install the latest version of scikit-learn (59db66...). I cloned the repository, and typed 'make'. One of the unit tests is failing: == FAIL: sklearn.tests.test_multiclass.test_ovr_fit_predict -

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 06:59 PM, Andreas wrote: > On 01/23/2012 05:54 PM, Dimitrios Pritsos wrote: >> Relate to LinearSVC() and SGDClassifier() >> >> I ran both with a subset of my 33k-samples by 30k-features and I am >> getting a huge difference in results. Is this expected behavour! >> >> After 10-fold-cr

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Andreas
On 01/23/2012 05:54 PM, Dimitrios Pritsos wrote: > Relate to LinearSVC() and SGDClassifier() > > I ran both with a subset of my 33k-samples by 30k-features and I am > getting a huge difference in results. Is this expected behavour! > > After 10-fold-cross-validation (using the Defaults as arguments

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
Relate to LinearSVC() and SGDClassifier() I ran both with a subset of my 33k-samples by 30k-features and I am getting a huge difference in results. Is this expected behavour! After 10-fold-cross-validation (using the Defaults as arguments in both cases) I am getting: Accuracy = 44% for SGD Ac

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 05:48:21PM +0100, Olivier Grisel wrote: > I am not talking of adding a dependency on a redis client library in > scikit-learn but just to make it possible to pass a "vocabulary" > argument to the vectorizer that has the same behavior as python > defaultdict but would use a r

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 05:27:10PM +0100, Olivier Grisel wrote: >> Alternatively we could make a vocabulary dict implementation >> based on a redis server. > > That's two mails in a row suggesting to bing the scikit with an advanced > persistence engine: first Dimitrios

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Mathieu Blondel
On Tue, Jan 24, 2012 at 1:38 AM, Mathieu Blondel wrote: > Indeed, combined with your hashing text vectorizer, this will allow to > cache the extracted features and thus make several epochs over the > dataset (each epoch being broken down into several calls to > partial_fit). Actually, one call t

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Mathieu Blondel
On Tue, Jan 24, 2012 at 1:27 AM, Olivier Grisel wrote: > I agree although this would be really useful once I am done with the > hashing text vectorizer. Otherwise the vocabulary dict will explode in > memory. Indeed, combined with your hashing text vectorizer, this will allow to cache the extrac

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 05:27:10PM +0100, Olivier Grisel wrote: > Alternatively we could make a vocabulary dict implementation > based on a redis server. That's two mails in a row suggesting to bing the scikit with an advanced persistence engine: first Dimitrios suggesting to persist to pytables,

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Mathieu Blondel : > We need a dump utility to incrementally append data to a mem-mapped > array or csr matrix. This way, people would be able to do their > feature extraction in an iterator and create the array / matrix > incrementally. I agree although this would be really useful once I

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 06:14 PM, Mathieu Blondel wrote: > We need a dump utility to incrementally append data to a mem-mapped > array or csr matrix. This way, people would be able to do their > feature extraction in an iterator and create the array / matrix > incrementally. > > Mathieu > I will implement a

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Mathieu Blondel : > On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel > wrote: > >> LSH is just using a binary thresholded random projections in 32 (or 64 >> or 128...) dim space. That leads to 32bit (or 64bit...) vectors >> castable as integers and doing Hamming radius queries instead of

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Mathieu Blondel
We need a dump utility to incrementally append data to a mem-mapped array or csr matrix. This way, people would be able to do their feature extraction in an iterator and create the array / matrix incrementally. Mathieu --

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Mathieu Blondel
On Tue, Jan 24, 2012 at 12:15 AM, Olivier Grisel wrote: > LSH is just using a binary thresholded random projections in 32 (or 64 > or 128...) dim space. That leads to 32bit (or 64bit...) vectors > castable as integers and doing Hamming radius queries instead of > Euclidean queries in that boolean

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Jacob VanderPlas
Olivier Grisel wrote: > +1 for the dense case > > But ball tree does not work for high dim sparse data. > I'm working on that - I hope to have a pull request within the next few weeks. > We would also need some truncated kernels (e.g. cosine similarity for > positive data or RBF in the gener

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 04:15:36PM +0100, Olivier Grisel wrote: > 2012/1/23 Gael Varoquaux : > > On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: > >> Once we have random projections (or even just efficient hashing API), > >> LSH is quite simple to implement on top. > > I don't unde

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: >> Once we have random projections (or even just efficient hashing API), >> LSH is quite simple to implement on top. > > I don't understand: they are quite orthogonal, aren't they? You can implement LSH wi

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 02:17:21PM +0100, Olivier Grisel wrote: >> Hehe, that would be nice but I am affraid Gael won't let me do this as >> part of the main scikit repository: large scale examples mean >> largescale datasets ;) > > Why can't we just generate data. The

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 04:16 PM, Gael Varoquaux wrote: > On Mon, Jan 23, 2012 at 04:07:16PM +0200, Dimitrios Pritsos wrote: >> '0.11-git'<- is this the latest? > We don't know: it depends on the revision number of the git checkout. > > Do you have a full git checkout? If so, just do a 'git pull'. > > G

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 04:07:16PM +0200, Dimitrios Pritsos wrote: > '0.11-git' <- is this the latest? We don't know: it depends on the revision number of the git checkout. Do you have a full git checkout? If so, just do a 'git pull'. G --

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 03:58 PM, Lars Buitinck wrote: > 2012/1/23 Dimitrios Pritsos: >> I guess I misunderstood something here. There is no partial_fit(). Plus >> I haven't manage to figure out how to do the partial fit. >> >> I have the latest SKLEART I retrieved by git. Am I missing something? > Are you

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Lars Buitinck
2012/1/23 Dimitrios Pritsos : > I guess I misunderstood something here. There is no partial_fit(). Plus > I haven't manage to figure out how to do the partial fit. > > I have the latest SKLEART I retrieved by git. Am I missing something? Are you sure? SGDClassifier.partial_fit was implemented some

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 03:46 PM, Dimitrios Pritsos wrote: > On 01/23/2012 03:20 PM, Olivier Grisel wrote: >> 2012/1/23 Dimitrios Pritsos: >>> On 01/23/2012 02:20 PM, Lars Buitinck wrote: 2012/1/23 Dimitrios Pritsos: > On 01/23/2012 12:24 PM, Olivier Grisel wrote: >> BTW: what is the structure o

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 03:20 PM, Olivier Grisel wrote: > 2012/1/23 Dimitrios Pritsos: >> On 01/23/2012 02:20 PM, Lars Buitinck wrote: >>> 2012/1/23 Dimitrios Pritsos: On 01/23/2012 12:24 PM, Olivier Grisel wrote: > BTW: what is the structure of you data in PyTables? Is is mapped to a > scipy.sp

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 02:17:21PM +0100, Olivier Grisel wrote: > Hehe, that would be nice but I am affraid Gael won't let me do this as > part of the main scikit repository: large scale examples mean > largescale datasets ;) Why can't we just generate data. The goal is to get the idea through, no

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 03:07 PM, Olivier Grisel wrote: > 2012/1/23 Lars Buitinck: >> 2012/1/23 Dimitrios Pritsos: >>> I will give it a try however in some of my tests had a memory management >>> problem. As I can recall it was mostly because of numpy function that >>> might ask from pyTable to load every th

[Scikit-learn-general] Out of core/online

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 11:37:16AM +0200, Dimitrios Pritsos wrote: > So, is there a any tip for me to fit() the model in stages i.e not to > bring the whole data set in Memory during the learning process. As I can > see in my code when I am giving an EArray as an argument to Fit() it > seem to l

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 02:46 PM, Lars Buitinck wrote: > 2012/1/23 Dimitrios Pritsos: >> I will give it a try however in some of my tests had a memory management >> problem. As I can recall it was mostly because of numpy function that >> might ask from pyTable to load every thing in main men. I guess some >>

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 10:08:45AM +0100, Olivier Grisel wrote: > Once we have random projections (or even just efficient hashing API), > LSH is quite simple to implement on top. I don't understand: they are quite orthogonal, aren't they? Gael

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Dimitrios Pritsos : > On 01/23/2012 02:20 PM, Lars Buitinck wrote: >> 2012/1/23 Dimitrios Pritsos: >>> On 01/23/2012 12:24 PM, Olivier Grisel wrote: BTW: what is the structure of you data in PyTables? Is is mapped to a scipy.sparse Compressed Sparse Row datastructure? How many f

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Mathieu Blondel : > On Mon, Jan 23, 2012 at 7:24 PM, Olivier Grisel > wrote: >> Have a look at `sklearn.linear_model.SGDClassifier` that supports a >> partial_fit method in master that you can call several times with >> slices of data. >> >> BTW: what is the structure of you data in PyTa

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Mathieu Blondel
On Mon, Jan 23, 2012 at 10:07 PM, Olivier Grisel wrote: > Indeed SVC will not scale to 50k samples, only LinearSVC will. In any > case I found SGDClassifier (with the fit method) to be much faster > than LinearSVC or LogisticRegression (i.e. any liblinear based > models). And discrete naive Bayes

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
2012/1/23 Lars Buitinck : > 2012/1/23 Dimitrios Pritsos : >> I will give it a try however in some of my tests had a memory management >> problem. As I can recall it was mostly because of numpy function that >> might ask from pyTable to load every thing in main men. I guess some >> loops and some sl

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Lars Buitinck
2012/1/23 Dimitrios Pritsos : > I will give it a try however in some of my tests had a memory management > problem. As I can recall it was mostly because of numpy function that > might ask from pyTable to load every thing in main men. I guess some > loops and some slicing might solve the problem.

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 02:20 PM, Lars Buitinck wrote: > 2012/1/23 Dimitrios Pritsos: >> On 01/23/2012 12:24 PM, Olivier Grisel wrote: >>> BTW: what is the structure of you data in PyTables? Is is mapped to a >>> scipy.sparse Compressed Sparse Row datastructure? How many features do >>> you have in your data

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Mathieu Blondel
On Mon, Jan 23, 2012 at 7:24 PM, Olivier Grisel wrote: > Have a look at `sklearn.linear_model.SGDClassifier` that supports a > partial_fit method in master that you can call several times with > slices of data. > > BTW: what is the structure of you data in PyTables? Is is mapped to a > scipy.spars

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Lars Buitinck
2012/1/23 Dimitrios Pritsos : > On 01/23/2012 12:24 PM, Olivier Grisel wrote: >> BTW: what is the structure of you data in PyTables? Is is mapped to a >> scipy.sparse Compressed Sparse Row datastructure? How many features do >> you have in your dataset? > > The training data are in a EArray (Compre

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
On 01/23/2012 12:24 PM, Olivier Grisel wrote: > Have a look at `sklearn.linear_model.SGDClassifier` that supports a > partial_fit method in master that you can call several times with > slices of data. Thanx for the Ref I will have a look right now > BTW: what is the structure of you data in PyTa

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
Thanks Adrian and Andreas. My kindle is packed :) -- Olivier -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, Share

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Adrien
Le 23/01/2012 11:34, Olivier Grisel a écrit : > 2012/1/23 Andreas: >> On 01/23/2012 11:28 AM, Adrien wrote: >>> Hello everyone, >>> >>> A quick question: why not use Nystrom instead? >>> >> That was on my GSoC wish list ;) >> The application I had in mind at the moment was >> the label propagation

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 11:34 AM, Olivier Grisel wrote: 2012/1/23 Andreas: On 01/23/2012 11:28 AM, Adrien wrote: Hello everyone, A quick question: why not use Nystrom instead? That was on my GSoC wish list ;) The application I had in mind at the moment was the label propagation and m

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Andreas : > On 01/23/2012 11:28 AM, Adrien wrote: >> Hello everyone, >> >> A quick question: why not use Nystrom instead? >> > That was on my GSoC wish list ;) > The application I had in mind at the moment was > the label propagation and maybe the spectral clustering. > In general, I thin

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 11:28 AM, Adrien wrote: > Hello everyone, > > A quick question: why not use Nystrom instead? > That was on my GSoC wish list ;) The application I had in mind at the moment was the label propagation and maybe the spectral clustering. In general, I think the thresholding would work

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Adrien
Hello everyone, A quick question: why not use Nystrom instead? The effects of thresholding the kernel matrix is not very well understood and makes you lose the positive-definiteness (i.e. it's not a kernel matrix anymore). It's ok for spectral clustering as the Laplacian is always positive sem

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Olivier Grisel
Have a look at `sklearn.linear_model.SGDClassifier` that supports a partial_fit method in master that you can call several times with slices of data. BTW: what is the structure of you data in PyTables? Is is mapped to a scipy.sparse Compressed Sparse Row datastructure? How many features do you hav

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-23 Thread Dimitrios Pritsos
I am sending it again with the correct Subject line, I am sorry about that Hello, I am using Sklearn in combination with Pytables for Automated Genre Identification of Web Pages. The reason I am using Pytables is for executing Very hight Scale Evaluation of SVM using 50,00

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Dimitrios Pritsos
Hello, I am using Sklearn in combination with Pytables for Automated Genre Identification of Web Pages. The reason I am using Pytables is for executing Very hight Scale Evaluation of SVM using 50,000 samples for training. I know that that might probably this will not have so much impact in my

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Mathieu Blondel
On Mon, Jan 23, 2012 at 6:06 PM, Andreas wrote: > It might be as easy as that. > I guess I should try to see if this speeds up things. If you use algorithm="brute", there should be no speed-up (it computes all the distances and find those within the given radius...). If you use ball-tree, it sho

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Gael Varoquaux : > On Mon, Jan 23, 2012 at 09:46:41AM +0100, Olivier Grisel wrote: >> But ball tree does not work for high dim sparse data. > > In this case, I think that the LSH option is a good one. There is an LSH > in pybrain that can be adapted. Once we have random projections (or e

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Andreas
On 01/23/2012 08:52 AM, Alexandre Gramfort wrote: > I am not sure it is what you want but you could use: > > K = radius_neighbors_graph(X, radius, mode='distance') > K.data **= 2 > K.data *= -gamma > np.exp(K.data, out=K.data) > > no? > > Alex > It might be as easy as that. I guess I should try

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Gael Varoquaux
On Mon, Jan 23, 2012 at 09:46:41AM +0100, Olivier Grisel wrote: > But ball tree does not work for high dim sparse data. In this case, I think that the LSH option is a good one. There is an LSH in pybrain that can be adapted. Gael --

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-23 Thread Olivier Grisel
2012/1/23 Alexandre Gramfort : > I am not sure it is what you want but you could use: > > K = radius_neighbors_graph(X, radius, mode='distance') > K.data **= 2 > K.data *= -gamma > np.exp(K.data, out=K.data) > > no? +1 for the dense case But ball tree does not work for high dim sparse data. We w