On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll <gsing...@apache.org> wrote: > One of the things I am interested in is the marriage of Solr and Mahout > (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools. [snip]
I love it, good to know you are thinking big here. Here's another big thought: http://www.eml-r.org/nlp/papers/ponzetto07b.pdf .. but assume we want to extract this type of structure from the full text of Wikipedia rather than the narrow categories DB. > Things that can help with all this: LukeReqHandler, TermVectorComponent, > TermsComponent, others > [snip] > Neal, what did you have in mind for a JIRA issue? I'd love to see a patch. More research needed, but the initial idea would be to enable the passing in of a weighted term vector as a query and allowing a more-like-this type search on it. Anyone attempt this yet? Interesting point about faceting here is that it would give outgoing feedback on what /new/ words (not in initial query) that if added to the query would result in additional discrimination between the matched categories. So Solr outputs a set of categories for a document, and also emits a set of related words to the initial query! Categorization and recommendation in one. - Neal