On Fri, May 25, 2012 at 12:14:33AM +0200, Andreas Mueller wrote:
>It's good that you where able to work around the problem.
>Could you still please open an issue on github and give a script
>that reproduces the problem (non-deterministically)?
That was useful: I could fix that bug in
h
On Fri, May 25, 2012 at 10:36:03AM +0900, Mathieu Blondel wrote:
>+1 too for precomputing coef_ once for all in fit.
+1. It seems to simplify everything for little drawback. We'll need to
document it, of course.
Gael
---
+1 too for precomputing coef_ once for all in fit. If you do so, you may
also drop support_vectors_ to make the pickled objects lighter (and keep
support_indices_ only).
Note that for predict, in the multiclass case, you will need to implement
the voting scheme needed for one-vs-one classification
On Thu, May 24, 2012 at 05:39:22PM -0400, Ian Goodfellow wrote:
> On Thu, May 24, 2012 at 5:07 PM, David Warde-Farley
> wrote:
> > On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote:
> >
> >> I think I need to introduce a dirty bit that determines whether coef_
> >> needs to be recompu
2012/5/25 David Warde-Farley :
> It might still be nice to keep the property for the purpose of raising
> an informative error message when people try to access it for nonlinear
> SVMs.
-1; IMHO, an AttributeError is informative enough, and this
complicates the SVM classes further.
--
Lars Buiti
On Thu, May 24, 2012 at 05:35:27PM -0400, Ian Goodfellow wrote:
> On Thu, May 24, 2012 at 5:09 PM, Gael Varoquaux
> wrote:
> > On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote:
> >> An alternative might be to just compute it in fit() if kernel == 'linear',
> >> and make the prope
Hi Phani.
It's good that you where able to work around the problem.
Could you still please open an issue on github and give a script
that reproduces the problem (non-deterministically)?
That would help us fix the problem so that other won't have the same
issue.
Thanks,
Andy
On 05/24/2012 11:53
On Thu, May 24, 2012 at 5:49 PM, Andreas Mueller
wrote:
> Hey Ian.
> Sorry for being a bit absent from this discussion but I didn't think so
> much mail
> would accumulate over one afternoon not in the office.
>
> The ``coef_`` code was written quite recently by me. Before it was just
> buggy.
> I
2012/5/24 Andreas Mueller :
> I am +1 on using the primal formulation for "predict" in the linear case
> and I am +1 for computing "coef_" in "fit". It doesn't need to be a
> property any more, then, right? Or was there some other magic attached?
I just tried removing the property and there's no m
Hi Andy,
I ran it a number of times. Every once in a while, it does finish the
clustering successfully. But many times it results in the error that I have
forwarded. Anyway, for my purposes, I found that removing the init='random'
argument from the kmeans object instantiation, solves the prob
Hey Ian.
Sorry for being a bit absent from this discussion but I didn't think so
much mail
would accumulate over one afternoon not in the office.
The ``coef_`` code was written quite recently by me. Before it was just
buggy.
It was a bit tricky because of the weird format that LibSVM puts the
a
On 05/24/2012 04:57 PM, David Warde-Farley wrote:
> On 2012-05-24, at 10:35 AM, Mathieu Blondel wrote:
>
>> Correct. I guess we just assumed that people would use LinearSVC when using
>> a linear kernel...
> Unfortunately there's a very good reason not to: no native dense support in
> liblinear.
On Thu, May 24, 2012 at 5:07 PM, David Warde-Farley
wrote:
> On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote:
>
>> I think I need to introduce a dirty bit that determines whether coef_
>> needs to be recomputed. It starts off as True, gets set to False
>> whenever coef_ executes, an
Hi Phani.
Are you sure the behavior is non-deterministic?
I am not sure what comes out of the vectorizer,
but my guess would be that X is a sparse matrix, which
KMeans doesn't handle.
Could you check that, please?
Cheers,
Andy
On 05/24/2012 06:19 PM, Phani Vadrevu wrote:
Hi all,
I am trying
On Thu, May 24, 2012 at 5:09 PM, Gael Varoquaux
wrote:
> On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote:
>> An alternative might be to just compute it in fit() if kernel == 'linear',
>> and make the property function return the precomputed vector in that case.
>> That probably
On Thu, May 24, 2012 at 5:11 PM, Gael Varoquaux
wrote:
> On Thu, May 24, 2012 at 05:06:51PM -0400, Ian Goodfellow wrote:
>> > It seems to me that a simple way to avoid the problem would be to do:
>
>> > coef_ = self.coef_
>
>> > outside any for loop. That way the cost of computing the coef_ is
On Thu, May 24, 2012 at 05:06:51PM -0400, Ian Goodfellow wrote:
> > It seems to me that a simple way to avoid the problem would be to do:
> > coef_ = self.coef_
> > outside any for loop. That way the cost of computing the coef_ is
> > hopefully negligeable.
> > Do you think that this could wo
On Thu, May 24, 2012 at 05:07:59PM -0400, David Warde-Farley wrote:
> An alternative might be to just compute it in fit() if kernel == 'linear',
> and make the property function return the precomputed vector in that case.
> That probably minimizes the number of bugs introduced by either neglecting
On Thu, May 24, 2012 at 4:27 PM, Gael Varoquaux
wrote:
> Thanks for your investigations. These are useful comments.
>
> On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote:
>> I think I need to introduce a dirty bit that determines whether coef_
>> needs to be recomputed.
>
> If possibl
On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote:
> I think I need to introduce a dirty bit that determines whether coef_
> needs to be recomputed. It starts off as True, gets set to False
> whenever coef_ executes, and gets set to True whenever self.fit is
> called. Am I overlooking
Thanks for your investigations. These are useful comments.
On Thu, May 24, 2012 at 04:22:30PM -0400, Ian Goodfellow wrote:
> I think I need to introduce a dirty bit that determines whether coef_
> needs to be recomputed.
If possible I'd rather avoid that (can create bugs hard to find).
It seems
OK.
I'd like to do a pull request to implement the coef_-based predict function.
Since coef_ is a property, it recomputes the coefficient vector every
time it's accessed. This means if predict uses self.coef_, it won't be
any faster.
I think I need to introduce a dirty bit that determines whethe
The example labels in training set you give is a multiclass traininset
(one and only one class per sample). Use a multi-label set of training
labels to make the LabelBinarizer switch to the 1 hot encoding:
In [69]: y_train = (['New York', 'London'], ['London'])
In [70]: Y_indicator = LabelBinariz
2012/5/24 Ian Goodfellow :
> A high-level question before I dive in: is libsvm meant to compute
> coef_ for us or do we compute it ourselves? If this is a libsvm bug I
> should be looking for a bug in libsvm.
>
> On Thu, May 24, 2012 at 11:47 AM, Alexandre Gramfort
> wrote:
>> here is where to loo
> A high-level question before I dive in: is libsvm meant to compute
> coef_ for us or do we compute it ourselves? If this is a libsvm bug I
> should be looking for a bug in libsvm.
no we compute it ourselves in the property at the line I gave you:
https://github.com/scikit-learn/scikit-learn/blo
Hi all,
I am trying to run some basic clustering code.
vectorizer =
CountVectorizer(preprocessor=preprocessor,token_pattern=u'/\w+/')
# url_list is a list of strings
X = vectorizer.fit_transform(url_list)
print "feature extraction done in %f s"%(time() - t0)
t0 = time()
km = KMeans(init='rand
A high-level question before I dive in: is libsvm meant to compute
coef_ for us or do we compute it ourselves? If this is a libsvm bug I
should be looking for a bug in libsvm.
On Thu, May 24, 2012 at 11:47 AM, Alexandre Gramfort
wrote:
> here is where to look for the bug:
>
> https://github.com/s
here is where to look for the bug:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py#L543
please send us a pull request if you find the problem.
Alex
On Thu, May 24, 2012 at 5:20 PM, Ian Goodfellow
wrote:
> OK, here's a final version of my script demonstrating that t
On 2012-05-24, at 10:35 AM, Mathieu Blondel wrote:
> Correct. I guess we just assumed that people would use LinearSVC when using a
> linear kernel...
Unfortunately there's a very good reason not to: no native dense support in
liblinear. For large dense inputs, the memory overhead of LinearSVC
OK, here's a final version of my script demonstrating that there's a
bug somewhere in the computation of coef_. If I compute coef_ myself
from dual_coef_ and support_vectors_ I am able to match the predict
function with the dot product method, and the new coef_ has a dot
product of about -1.5 with
On Thu, May 24, 2012 at 10:55 AM, Alexandre Gramfort
wrote:
> is this test buggy:
Yes. The test passes for me, but if I replace X and Y from the binary
section of that test with the X and y from my script, the test fails.
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/t
is this test buggy:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/tests/test_svm.py#L264
?
could it be a numerical error?
Alex
On Thu, May 24, 2012 at 4:50 PM, Ian Goodfellow
wrote:
> On Thu, May 24, 2012 at 10:47 AM, Olivier Grisel
> wrote:
>> 2012/5/24 Ian Goodfellow
> np.dot(X, clf.coef_) - clf.intercept > 0 (if I remember correctly the
> sign of the intercept)
from what I remember clf.intercept is set to -rho to it should be + …
Alex
--
Live Security Virtual Conference
Exclusive li
On Thu, May 24, 2012 at 10:47 AM, Olivier Grisel
wrote:
> 2012/5/24 Ian Goodfellow :
>> Well that's the thing, coef_ and intercept_ seem wrong, given the
>> results of my script below. My implementation of predict based on
>> coef_ only agrees with predict 50% of the time.
>> Does anyone know if c
2012/5/24 Ian Goodfellow :
> Well that's the thing, coef_ and intercept_ seem wrong, given the
> results of my script below. My implementation of predict based on
> coef_ only agrees with predict 50% of the time.
> Does anyone know if coef_ and intercept_ are just getting set wrong?
> Or does predi
On Thu, May 24, 2012 at 11:39 PM, Ian Goodfellow
wrote:
> Well that's the thing, coef_ and intercept_ seem wrong, given the
> results of my script below. My implementation of predict based on
> coef_ only agrees with predict 50% of the time.
> Does anyone know if coef_ and intercept_ are just gett
On Thu, May 24, 2012 at 10:35 AM, Mathieu Blondel wrote:
>
> On Thu, May 24, 2012 at 11:27 PM, Ian Goodfellow
> wrote:
>>
>> I think I've figured out what the problem is, but someone familiar
>> with the code should confirm.
>> I think SVC is always using a decision function based on support
>> v
Well that's the thing, coef_ and intercept_ seem wrong, given the
results of my script below. My implementation of predict based on
coef_ only agrees with predict 50% of the time.
Does anyone know if coef_ and intercept_ are just getting set wrong?
Or does predict implement a different decision fun
On Thu, May 24, 2012 at 11:27 PM, Ian Goodfellow
wrote:
> I think I've figured out what the problem is, but someone familiar
> with the code should confirm.
> I think SVC is always using a decision function based on support
> vectors, even though in the case of a linear kernel it is
> computationa
On 2012-05-24, at 10:27 AM, Ian Goodfellow wrote:
> I think I've figured out what the problem is, but someone familiar
> with the code should confirm.
> I think SVC is always using a decision function based on support
> vectors, even though in the case of a linear kernel it is
> computationally c
great news ! we just need a volunteer now :)
Alex
On Thu, May 24, 2012 at 4:20 PM, Mathieu Blondel wrote:
> Hello,
>
> I just found out that liblinear now supports large-scale linear SVR.
>
> http://www.csie.ntu.edu.tw/~cjlin/liblinear/
>
> We could upgrade liblinear and add a new class LinearSV
I think I've figured out what the problem is, but someone familiar
with the code should confirm.
I think SVC is always using a decision function based on support
vectors, even though in the case of a linear kernel it is
computationally cheaper to just do one dot product in feature space.
I determi
2012/5/24 Mathieu Blondel :
> Hello,
>
> I just found out that liblinear now supports large-scale linear SVR.
>
> http://www.csie.ntu.edu.tw/~cjlin/liblinear/
>
> We could upgrade liblinear and add a new class LinearSVR to scikit-learn.
We definitely should!
>
> Mathieu
>
> --
Hello,
I just found out that liblinear now supports large-scale linear SVR.
http://www.csie.ntu.edu.tw/~cjlin/liblinear/
We could upgrade liblinear and add a new class LinearSVR to scikit-learn.
Mathieu
--
Live Security
I've noticed that on large datasets, it takes several minutes for SVC
to classify the dataset, when it should take under a second.
To debug this, I made a small test script where I time each step.
I found that the predict function for the linear kernel, is not doing
at all what I thought it would,
45 matches
Mail list logo