Hi scikit-learn guys, First I'd like to thank you for providing such a great toolkit for ML in python!
I'm trying to make a personalized recommendation system for reddit, which is basically a large forum. My recommender will work on an individual basis, so there is nothing "collaborative" about it. Moving to the point: the features I'm going to use are a mixture of text (link title, description, content I can scrape on the page the link points to) and other things (numerical number of "upvotes" and "downvotes", link author, domain the link points to, subreddit (subforum) name, etc.) If I mix all these features into a single classifier, I fear the very numerous features extracted from the text will drown the other (important!) features. Some people suggested running a separate classifier for the text, and then using the output of this classifier as a single feature. So my question is twofold: first, are my fears justified? Is my approach correct? And second, how would you go about combining classifiers in scikit-learn, without losing all the nice tools associated to classifiers? (cross-validation and so on...) Should I implement a "hybrid_classifier", or is something similar already available in the toolkit? Thanks again, Joël Schaerer ------------------------------------------------------------------------------ Cloud Computing - Latest Buzzword or a Glimpse of the Future? This paper surveys cloud computing today: What are the benefits? Why are businesses embracing it? What are its payoffs and pitfalls? http://www.accelacomm.com/jaw/sdnl/114/51425149/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general