On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll <gsing...@apache.org> wrote:
> One of the things I am interested in is the marriage of Solr and Mahout
> (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools.
 [snip]

I love it, good to know you are thinking big here.  Here's another big thought:
http://www.eml-r.org/nlp/papers/ponzetto07b.pdf .. but assume we want
to extract this type of structure from the full text of Wikipedia
rather than the narrow categories DB.

> Things that can help with all this:  LukeReqHandler, TermVectorComponent,
> TermsComponent, others
>

[snip]

> Neal, what did you have in mind for a JIRA issue?  I'd love to see a patch.

More research needed, but the initial idea would be to enable the
passing in of a weighted term vector as a query and allowing a
more-like-this type search on it.  Anyone attempt this yet?

Interesting point about faceting here is that it would give outgoing
feedback on what  /new/ words (not in initial query) that if added to
the query would result in additional discrimination between the
matched categories.

So Solr outputs a set of categories for a document, and also emits a
set of related words to the initial query!  Categorization and
recommendation in one.

- Neal

Reply via email to