I am using a simple text processing pipeline to perform sentiment
classification:
steps = [('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression())]
pipe = Pipeline(steps)
With v0.15 the cross validation scores peak around 0.67 and with v0.15 they
peak at 0.55. This seems like a significant difference to me. My
hyperparameter gridsearch is as follows:
params = {'vect__ngram_range': [(1,1), (1,2)],
'vect__stop_words':['english',None],
'tfidf__use_idf': [True, False],
'clf__C': np.logspace(-1,2,3*3+1)}
I have repeated the experiment with other classifiers (linear SVM,
naive_bayes) and seen a similar drop between v0.15 and v0.14. Is this a
bug or am I missing some hyperparameters that need to be tuned differently
in v0.15?
- Matt Coursen
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general