Re: [Scikit-learn-general] Problems trying to run k-means clustering

2012-05-24 Thread Gael Varoquaux
On Fri, May 25, 2012 at 12:14:33AM +0200, Andreas Mueller wrote: >It's good that you where able to work around the problem. >Could you still please open an issue on github and give a script >that reproduces the problem (non-deterministically)? That was useful: I could fix that bug in h

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Gael Varoquaux
On Fri, May 25, 2012 at 10:36:03AM +0900, Mathieu Blondel wrote: >+1 too for precomputing coef_ once for all in fit. +1. It seems to simplify everything for little drawback. We'll need to document it, of course. Gael ---

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Mathieu Blondel
+1 too for precomputing coef_ once for all in fit. If you do so, you may also drop support_vectors_ to make the pickled objects lighter (and keep support_indices_ only). Note that for predict, in the multiclass case, you will need to implement the voting scheme needed for one-vs-one classification

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread David Warde-Farley
On Thu, May 24, 2012 at 05:39:22PM -0400, Ian Goodfellow wrote: > On Thu, May 24, 2012 at 5:07 PM, David Warde-Farley > wrote: > > On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote: > > > >> I think I need to introduce a dirty bit that determines whether coef_ > >> needs to be recompu

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Lars Buitinck
2012/5/25 David Warde-Farley : > It might still be nice to keep the property for the purpose of raising > an informative error message when people try to access it for nonlinear > SVMs. -1; IMHO, an AttributeError is informative enough, and this complicates the SVM classes further. -- Lars Buiti

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread David Warde-Farley
On Thu, May 24, 2012 at 05:35:27PM -0400, Ian Goodfellow wrote: > On Thu, May 24, 2012 at 5:09 PM, Gael Varoquaux > wrote: > > On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote: > >> An alternative might be to just compute it in fit() if kernel == 'linear', > >> and make the prope

Re: [Scikit-learn-general] Problems trying to run k-means clustering

2012-05-24 Thread Andreas Mueller
Hi Phani. It's good that you where able to work around the problem. Could you still please open an issue on github and give a script that reproduces the problem (non-deterministically)? That would help us fix the problem so that other won't have the same issue. Thanks, Andy On 05/24/2012 11:53

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 5:49 PM, Andreas Mueller wrote: > Hey Ian. > Sorry for being a bit absent from this discussion but I didn't think so > much mail > would accumulate over one afternoon not in the office. > > The ``coef_`` code was written quite recently by me. Before it was just > buggy. > I

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Lars Buitinck
2012/5/24 Andreas Mueller : > I am +1 on using the primal formulation for "predict" in the linear case > and I am +1 for computing "coef_" in "fit". It doesn't need to be a > property any more, then, right? Or was there some other magic attached? I just tried removing the property and there's no m

Re: [Scikit-learn-general] Problems trying to run k-means clustering

2012-05-24 Thread Phani Vadrevu
Hi Andy, I ran it a number of times. Every once in a while, it does finish the clustering successfully. But many times it results in the error that I have forwarded. Anyway, for my purposes, I found that removing the init='random' argument from the kmeans object instantiation, solves the prob

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Andreas Mueller
Hey Ian. Sorry for being a bit absent from this discussion but I didn't think so much mail would accumulate over one afternoon not in the office. The ``coef_`` code was written quite recently by me. Before it was just buggy. It was a bit tricky because of the weird format that LibSVM puts the a

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Andreas Mueller
On 05/24/2012 04:57 PM, David Warde-Farley wrote: > On 2012-05-24, at 10:35 AM, Mathieu Blondel wrote: > >> Correct. I guess we just assumed that people would use LinearSVC when using >> a linear kernel... > Unfortunately there's a very good reason not to: no native dense support in > liblinear.

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 5:07 PM, David Warde-Farley wrote: > On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote: > >> I think I need to introduce a dirty bit that determines whether coef_ >> needs to be recomputed. It starts off as True, gets set to False >> whenever coef_ executes, an

Re: [Scikit-learn-general] Problems trying to run k-means clustering

2012-05-24 Thread Andreas Mueller
Hi Phani. Are you sure the behavior is non-deterministic? I am not sure what comes out of the vectorizer, but my guess would be that X is a sparse matrix, which KMeans doesn't handle. Could you check that, please? Cheers, Andy On 05/24/2012 06:19 PM, Phani Vadrevu wrote: Hi all, I am trying

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 5:09 PM, Gael Varoquaux wrote: > On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote: >> An alternative might be to just compute it in fit() if kernel == 'linear', >> and make the property function return the precomputed vector in that case. >> That probably

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 5:11 PM, Gael Varoquaux wrote: > On Thu, May 24, 2012 at 05:06:51PM -0400, Ian Goodfellow wrote: >> > It seems to me that a simple way to avoid the problem would be to do: > >> >    coef_ = self.coef_ > >> > outside any for loop. That way the cost of computing the coef_ is

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Gael Varoquaux
On Thu, May 24, 2012 at 05:06:51PM -0400, Ian Goodfellow wrote: > > It seems to me that a simple way to avoid the problem would be to do: > >    coef_ = self.coef_ > > outside any for loop. That way the cost of computing the coef_ is > > hopefully negligeable. > > Do you think that this could wo

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Gael Varoquaux
On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote: > An alternative might be to just compute it in fit() if kernel == 'linear', > and make the property function return the precomputed vector in that case. > That probably minimizes the number of bugs introduced by either neglecting

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 4:27 PM, Gael Varoquaux wrote: > Thanks for your investigations. These are useful comments. > > On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote: >> I think I need to introduce a dirty bit that determines whether coef_ >> needs to be recomputed. > > If possibl

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread David Warde-Farley
On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote: > I think I need to introduce a dirty bit that determines whether coef_ > needs to be recomputed. It starts off as True, gets set to False > whenever coef_ executes, and gets set to True whenever self.fit is > called. Am I overlooking

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Gael Varoquaux
Thanks for your investigations. These are useful comments. On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote: > I think I need to introduce a dirty bit that determines whether coef_ > needs to be recomputed. If possible I'd rather avoid that (can create bugs hard to find). It seems

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
OK. I'd like to do a pull request to implement the coef_-based predict function. Since coef_ is a property, it recomputes the coefficient vector every time it's accessed. This means if predict uses self.coef_, it won't be any faster. I think I need to introduce a dirty bit that determines whethe

Re: [Scikit-learn-general] question about sci kit

2012-05-24 Thread Olivier Grisel
The example labels in training set you give is a multiclass traininset (one and only one class per sample). Use a multi-label set of training labels to make the LabelBinarizer switch to the 1 hot encoding: In [69]: y_train = (['New York', 'London'], ['London']) In [70]: Y_indicator = LabelBinariz

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Olivier Grisel
2012/5/24 Ian Goodfellow : > A high-level question before I dive in: is libsvm meant to compute > coef_ for us or do we compute it ourselves? If this is a libsvm bug I > should be looking for a bug in libsvm. > > On Thu, May 24, 2012 at 11:47 AM, Alexandre Gramfort > wrote: >> here is where to loo

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Alexandre Gramfort
> A high-level question before I dive in: is libsvm meant to compute > coef_ for us or do we compute it ourselves? If this is a libsvm bug I > should be looking for a bug in libsvm. no we compute it ourselves in the property at the line I gave you: https://github.com/scikit-learn/scikit-learn/blo

[Scikit-learn-general] Problems trying to run k-means clustering

2012-05-24 Thread Phani Vadrevu
Hi all, I am trying to run some basic clustering code. vectorizer = CountVectorizer(preprocessor=preprocessor,token_pattern=u'/\w+/') # url_list is a list of strings X = vectorizer.fit_transform(url_list) print "feature extraction done in %f s"%(time() - t0) t0 = time() km = KMeans(init='rand

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
A high-level question before I dive in: is libsvm meant to compute coef_ for us or do we compute it ourselves? If this is a libsvm bug I should be looking for a bug in libsvm. On Thu, May 24, 2012 at 11:47 AM, Alexandre Gramfort wrote: > here is where to look for the bug: > > https://github.com/s

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Alexandre Gramfort
here is where to look for the bug: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py#L543 please send us a pull request if you find the problem. Alex On Thu, May 24, 2012 at 5:20 PM, Ian Goodfellow wrote: > OK, here's a final version of my script demonstrating that t

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread David Warde-Farley
On 2012-05-24, at 10:35 AM, Mathieu Blondel wrote: > Correct. I guess we just assumed that people would use LinearSVC when using a > linear kernel... Unfortunately there's a very good reason not to: no native dense support in liblinear. For large dense inputs, the memory overhead of LinearSVC

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
OK, here's a final version of my script demonstrating that there's a bug somewhere in the computation of coef_. If I compute coef_ myself from dual_coef_ and support_vectors_ I am able to match the predict function with the dot product method, and the new coef_ has a dot product of about -1.5 with

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 10:55 AM, Alexandre Gramfort wrote: > is this test buggy: Yes. The test passes for me, but if I replace X and Y from the binary section of that test with the X and y from my script, the test fails. > > https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/t

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Alexandre Gramfort
is this test buggy: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/tests/test_svm.py#L264 ? could it be a numerical error? Alex On Thu, May 24, 2012 at 4:50 PM, Ian Goodfellow wrote: > On Thu, May 24, 2012 at 10:47 AM, Olivier Grisel > wrote: >> 2012/5/24 Ian Goodfellow

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Alexandre Gramfort
> np.dot(X, clf.coef_) - clf.intercept > 0 (if I remember correctly the > sign of the intercept) from what I remember clf.intercept is set to -rho to it should be + … Alex -- Live Security Virtual Conference Exclusive li

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 10:47 AM, Olivier Grisel wrote: > 2012/5/24 Ian Goodfellow : >> Well that's the thing, coef_ and intercept_ seem wrong, given the >> results of my script below. My implementation of predict based on >> coef_ only agrees with predict 50% of the time. >> Does anyone know if c

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Olivier Grisel
2012/5/24 Ian Goodfellow : > Well that's the thing, coef_ and intercept_ seem wrong, given the > results of my script below. My implementation of predict based on > coef_ only agrees with predict 50% of the time. > Does anyone know if coef_ and intercept_ are just getting set wrong? > Or does predi

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Mathieu Blondel
On Thu, May 24, 2012 at 11:39 PM, Ian Goodfellow wrote: > Well that's the thing, coef_ and intercept_ seem wrong, given the > results of my script below. My implementation of predict based on > coef_ only agrees with predict 50% of the time. > Does anyone know if coef_ and intercept_ are just gett

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
On Thu, May 24, 2012 at 10:35 AM, Mathieu Blondel wrote: > > On Thu, May 24, 2012 at 11:27 PM, Ian Goodfellow > wrote: >> >> I think I've figured out what the problem is, but someone familiar >> with the code should confirm. >> I think SVC is always using a decision function based on support >> v

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
Well that's the thing, coef_ and intercept_ seem wrong, given the results of my script below. My implementation of predict based on coef_ only agrees with predict 50% of the time. Does anyone know if coef_ and intercept_ are just getting set wrong? Or does predict implement a different decision fun

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Mathieu Blondel
On Thu, May 24, 2012 at 11:27 PM, Ian Goodfellow wrote: > I think I've figured out what the problem is, but someone familiar > with the code should confirm. > I think SVC is always using a decision function based on support > vectors, even though in the case of a linear kernel it is > computationa

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread David Warde-Farley
On 2012-05-24, at 10:27 AM, Ian Goodfellow wrote: > I think I've figured out what the problem is, but someone familiar > with the code should confirm. > I think SVC is always using a decision function based on support > vectors, even though in the case of a linear kernel it is > computationally c

Re: [Scikit-learn-general] liblinear now supports linear SVR

2012-05-24 Thread Alexandre Gramfort
great news ! we just need a volunteer now :) Alex On Thu, May 24, 2012 at 4:20 PM, Mathieu Blondel wrote: > Hello, > > I just found out that liblinear now supports large-scale linear SVR. > > http://www.csie.ntu.edu.tw/~cjlin/liblinear/ > > We could upgrade liblinear and add a new class LinearSV

Re: [Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
I think I've figured out what the problem is, but someone familiar with the code should confirm. I think SVC is always using a decision function based on support vectors, even though in the case of a linear kernel it is computationally cheaper to just do one dot product in feature space. I determi

Re: [Scikit-learn-general] liblinear now supports linear SVR

2012-05-24 Thread Peter Prettenhofer
2012/5/24 Mathieu Blondel : > Hello, > > I just found out that liblinear now supports large-scale linear SVR. > > http://www.csie.ntu.edu.tw/~cjlin/liblinear/ > > We could upgrade liblinear and add a new class LinearSVR to scikit-learn. We definitely should! > > Mathieu > > --

[Scikit-learn-general] liblinear now supports linear SVR

2012-05-24 Thread Mathieu Blondel
Hello, I just found out that liblinear now supports large-scale linear SVR. http://www.csie.ntu.edu.tw/~cjlin/liblinear/ We could upgrade liblinear and add a new class LinearSVR to scikit-learn. Mathieu -- Live Security

[Scikit-learn-general] SVC.predict slow and implements wrong function

2012-05-24 Thread Ian Goodfellow
I've noticed that on large datasets, it takes several minutes for SVC to classify the dataset, when it should take under a second. To debug this, I made a small test script where I time each step. I found that the predict function for the linear kernel, is not doing at all what I thought it would,