Hi,
I try to apply the MPLclassifier to a subset (100 data points, 2 classes)
of the 20newsgroup dataset. I created (ok, copied) the following pipeline
model_MLP = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('model_MLP', MLPClassifier(solver='lbfgs',
alpha=1e-5,
hidden_layer_sizes=(5, 2),
random_state=1)
)
])
model_MLP.fit(twenty_train.data, twenty_train.target)
predicted_MLP = model_MLP.predict(twenty_test.data)
print(metrics.classification_report(twenty_test.target, predicted_MLP,
target_names=twenty_test.target_names))
The numbers I get are hopeless,
precision recall f1-score support
alt.atheism 0.00 0.00 0.00 34
sci.electronics 0.66 1.00 0.80 66
The only reason I can think of is that the dictionaries of the training and
the test set are not the same (testset: 5204 words, training set: 5402
words). That should not be a problem (if I understand Bayes correctly), but
it certainly gives rubbish (see the numbers).
The same setup with the SVD routine works great, all values are around .95
thanks,
Andreas
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn