Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-20 Thread Paolo Di Prodi
What about using a distance metric like this one? http://en.wikipedia.org/wiki/Normalized_Google_distance From: Joel Nothman [joel.noth...@gmail.com] Sent: 19 February 2014 22:50 To: scikit-learn-general Subject: Re: [Scikit-learn-general] Logistic

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-20 Thread Lars Buitinck
2014-02-19 20:57 GMT+01:00 Pavel Soriano : > I thought about using the values of the coefficients of the fitted logit > equation to get a glimpse of what words in the vocabulary, or what style > features, affect the most to the classification decision. Is it correct to > assume that if the coeffici

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Tobias Günther
Sounds like you're on the right path. Looking at the misclassified documents and the feature coefficients is a common way to debug a classifier, especially if you use boolean features. If you're using a sklearn vectorizer this might be of interest to you: http://stackoverflow.com/questions/669

Re: [Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Joel Nothman
It is correct to assume that a positive coefficient contributes positively to a decision. However, because the features are interdependent, the raw strength of a feature isn't always straightforward to interpret. For example, it might give a big positive coefficient to "Tel" and a similar negative

[Scikit-learn-general] Logistic regression coefficients analysis

2014-02-19 Thread Pavel Soriano
Hello scikit! I need some insights into what I am doing. Currently I am doing a text classifier (2 classes) using unigrams (word level) and some writing style features. I am using a Logistic Regression model, with L1 regularization. I have a decent performance (around .70 f-measure) for the given