Along similar lines I have been thinking of how to splice a Lucene indexing app
I wrote into SOLR. It occurred to me that it would almost be simpler to use
the plugin-friendly QueryRequest mechanism rather than the UpdateRequest
mechanism; coupled with what you wrote below, Hoss, it makes me think that a
little refactoring of request handling might go a long way:
SolrRequestHandler now defines
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp)
Interface SolrQueryRequest and abstract implementation SolrQueryRequestBase are
mainly involved with parsing request parameters; the only method signatures
which are query-specific are getSearcher() and the @deprecated getQueryString()
and getQueryType().
SolrQueryResponse is mainly concerned with building a generic response message
including execution time, though it also supports a default set of returned
field names.
So SolrRequestHandler.handleRequest could be changed to
public void handleRequest(SolrRequest req, SolrResponse rsp)
with SolrRequest and SolrResponse interfaces having the generic functionality
described above.
Then SolrQueryRequest and SolrQueryResponse could be crafted as sub-interfaces
and/or abstract implementations segregating the few Query-specific
functionality. One would also create SolrUpdateRequest and SolrUpdateResponse
interfaces and/or base implementations much the same way.
Then in SolrCore, the RequestHandler registry and execute() method would
without modification handle both Query and Update requests; the code in
SolrCore.update and SolrCore.readDoc should be moved into an implementation of
SolrRequestHandler, e.g. DefaultUpdateRequestHandler, which would be registered
under the request name "update" and could then be subclassed by users. It could
then use SolrResponse to formulate the response, and would get the request
timing information put in by SolrCore.execute() for free, as well as the
pluggable response format mechanism.
Note the UpdateRequestHandler which formulates update requests would be
separate from the UpdateHandler, which controls the update logic (index
acrobatics).
Finally, the SolrUpdateServlet could be cast as a trivial subclass of
SolrServlet; perhaps all it needs to do is to set the default value for the
request type to "update" rather than "standard", for reverse compatibility, and
perhaps to let an a parameter other than 'qt' be used to specify the request
type for updates.
I am pretty sure something along these lines would accomplish all the benefits
you suggest below and more, with a minimal amount of coding and fairly good
reverse-compatibility. It of course still leaves the hard work of writing the
actual update handler plugins. But it's a lot simpler to subclass an
UpdateRequestHandler than SolrCore!
What do you folks think?
- J.J.
PS: If I weren't up to my ears in other deadline-driven deliverables, I'd just
jump in and try it.
At 4:21 PM -0800 1/7/07, Chris Hostetter wrote:
>It seems like [Handling disparate data sources in Solr] could be addressed by
>modifing the SolrUpdateServlet to to support to low level query params similar
>to the way the SolrServlet looks at "qt" and "wt". The first Param would be
>used to pick an UpdateSource plugin that would have an API like...
> public interface UpdateSource {
> SolrUpdateRequest makeRequest(HttpServletRequest req);
> }
>
>with the SolrUpdateRequest interface looking something like...
> public interface SolrUpdateRequest {
> SolrParams getParams();
> Iterable<java.io.Reader> getRawUpdates();
> }
>
>different out of the box versions of UpdateSource would support building
>SolrUpdateRequest objects from HttpServletRequests using...
> 1) URL query args and the raw POST body
> 2) query args from multipart form input and Readers from file uploads
> 3) query args and local filenames specificed in query args
> 4) query args and remote URLs specified in query args
>
>The SolrUpdateServlet would then use SolrUpdateRequest.getParams() to
>lookup it's second core param for picking an UpdateParser plugin, which
>would be responsible for parsing all of those Readers in sequence,
>converting them to UpdateCommands, and calling the appropriate methods on
>the UpdateHandler.
>
>Out of the box versions of UpdateParser could do the XML parsing currently
>done, or JSON parsing, or CSV parsing. Custom plugins written by users
>could do more exotic schema specific parsing: ie, reading raw PDFs and
>extracting specific field values.
>
>
>what do you guys think?
>
>
>-Hoss