On Jan 8, 2007, at 4:58 AM, Alan Burlison wrote:
I'm in the process of evaluating what we are going to do with the search functionality for http://opensolaris.org, and at the moment Solr is my first choice to replace what we already have - *if* it can be made to handle disparate data sources.
There really is no question of "if" Solr can be made to handle it. :) POSTing an encoded binary document in XML will work, and it certainly will work to have Solr unencode it and parse it.
The Lucene in Action codebase has a DocumentHandler interface that could be used for this, which has implementations for Word, PDF, HTML, RTF, and some others. It's simplistic, so it might not be of value specifically.
Erik