On Jan 8, 2007, at 4:58 AM, Alan Burlison wrote:
I'm in the process of evaluating what we are going to do with the
search functionality for http://opensolaris.org, and at the moment
Solr is my first choice to replace what we already have - *if* it
can be made to handle disparate data sources.
There really is no question of "if" Solr can be made to handle
it. :) POSTing an encoded binary document in XML will work, and it
certainly will work to have Solr unencode it and parse it.
The Lucene in Action codebase has a DocumentHandler interface that
could be used for this, which has implementations for Word, PDF,
HTML, RTF, and some others. It's simplistic, so it might not be of
value specifically.
Erik