the missing 2 in tokenizing 2.50 is indeed a bit weird, though.
Tom Fawcett schrieb:
>First, thanks for all your great work on scikits.learn! It’s making my
>life easier.
>
>Second, I found surprising behavior in sklearn.feature_extraction.text.
>I’m using TfidfVectorizer and CountVectorizer
for the missing 'r' in the docs: it looks like a sphnix glitch to me and I have
not found a way to fix. for the tokenization: the sklearn regexp seems like a
sensible default to me. what would you change it to so as to still be robust?
Tom Fawcett schrieb:
>First, thanks for all your great w
On Sun, Feb 24, 2013 at 04:32:05PM -0500, Ronnie Ghose wrote:
> On Thu, Jan 24, 2013 at 11:50 AM, Flavio Vinicius wrote:
> I think you can only guarantee that R2 is always positive when
> performing linear regression with no constraints.
I believe that the linear regression should be unr
So I had a similar question . a month or two ago? I think, so I think
that's relevant ~~ here it is.
It's still sort of a surprise to me too .
On Thu, Jan 24, 2013 at 11:50 AM, Flavio Vinicius
wrote:
> I think you can only guarantee that R2 is always positive when
> performing linear regres
On Fri, Feb 22, 2013 at 01:39:07PM -0500, Steven Greening wrote:
> I tried to used r2_score to calculate the coefficient of determination
> for a multiple regression problem and find that it is producing
> negative values.
That shouldn't be a surprise: an r2_score of 0 is chance. What you are
find
Hello all,
I tried to used r2_score to calculate the coefficient of determination
for a multiple regression problem and find that it is producing
negative values. Specifically I'm using the r2_score function with the
permutation_test_score function, and a large majority of the r2 values
from the p
e, I think it should be kept as it is? imho, it's that way in case you
have something irregular such as
"the.cat.in.the.hat.23.45.6632" . i'm assuming the $ is treated as just
another punctuation sign. ex. no special treatment for
the pound / euro / yen / etc signs. I think thouse should be ke
First, thanks for all your great work on scikits.learn! It’s making my life
easier.
Second, I found surprising behavior in sklearn.feature_extraction.text. I’m
using TfidfVectorizer and CountVectorizer to process news stories. The default
tokenizer uses the regular expression '(?u)\b\w\w+\b’