Hi scikit-learn guys,

First I'd like to thank you for providing such a great toolkit for ML in
python!

I'm trying to make a personalized recommendation system for reddit,
which is basically a large forum. My recommender will work on an
individual basis, so there is nothing "collaborative" about it.

Moving to the point: the features I'm going to use are a mixture of text
(link title, description, content I can scrape on the page the link
points to) and other things (numerical number of "upvotes" and
"downvotes", link author, domain the link points to, subreddit
(subforum) name, etc.)

If I mix all these features into a single classifier, I fear the very
numerous features extracted from the text will drown the other
(important!) features. Some people suggested running a separate
classifier for the text, and then using the output of this classifier as
a single feature.

So my question is twofold: first, are my fears justified? Is my approach
correct? And second, how would you go about combining classifiers in
scikit-learn, without losing all the nice tools associated to
classifiers? (cross-validation and so on...) Should I implement a
"hybrid_classifier", or is something similar already available in the
toolkit?

Thanks again,

Joël Schaerer

------------------------------------------------------------------------------
Cloud Computing - Latest Buzzword or a Glimpse of the Future?
This paper surveys cloud computing today: What are the benefits? 
Why are businesses embracing it? What are its payoffs and pitfalls?
http://www.accelacomm.com/jaw/sdnl/114/51425149/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to