Update Plugins (was Re: Handling disparate data sources in Solr)

J.J. Larrea Tue, 09 Jan 2007 17:04:05 -0800

Along similar lines I have been thinking of how to splice a Lucene indexing app 
I wrote into SOLR.  It occurred to me that it would almost be simpler to use 
the plugin-friendly QueryRequest mechanism rather than the UpdateRequest 
mechanism; coupled with what you wrote below, Hoss, it makes me think that a 
little refactoring of request handling might go a long way:

SolrRequestHandler now defines

  public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp)

Interface SolrQueryRequest and abstract implementation SolrQueryRequestBase are 
mainly involved with parsing request parameters; the only method signatures 
which are query-specific are getSearcher() and the @deprecated getQueryString() 
and getQueryType().

SolrQueryResponse is mainly concerned with building a generic response message 
including execution time, though it also supports a default set of returned 
field names.

So SolrRequestHandler.handleRequest could be changed to

  public void handleRequest(SolrRequest req, SolrResponse rsp)

with SolrRequest and SolrResponse interfaces having the generic functionality 
described above.

Then SolrQueryRequest and SolrQueryResponse could be crafted as sub-interfaces 
and/or abstract implementations segregating the few Query-specific 
functionality.  One would also create SolrUpdateRequest and SolrUpdateResponse 
interfaces and/or base implementations much the same way.

Then in SolrCore, the RequestHandler registry and execute() method would 
without modification handle both Query and Update requests; the code in 
SolrCore.update and SolrCore.readDoc should be moved into an implementation of 
SolrRequestHandler, e.g. DefaultUpdateRequestHandler, which would be registered 
under the request name "update" and could then be subclassed by users. It could 
then use SolrResponse to formulate the response, and would get the request 
timing information put in by SolrCore.execute() for free, as well as the 
pluggable response format mechanism.

Note the UpdateRequestHandler which formulates update requests would be 
separate from the UpdateHandler, which controls the update logic (index 
acrobatics).

Finally, the SolrUpdateServlet could be cast as a trivial subclass of 
SolrServlet; perhaps all it needs to do is to set the default value for the 
request type to "update" rather than "standard", for reverse compatibility, and 
perhaps to let an a parameter other than 'qt' be used to specify the request 
type for updates.

I am pretty sure something along these lines would accomplish all the benefits 
you suggest below and more, with a minimal amount of coding and fairly good 
reverse-compatibility.  It of course still leaves the hard work of writing the 
actual update handler plugins.  But it's a lot simpler to subclass an 
UpdateRequestHandler than SolrCore!

What do you folks think?

- J.J.

PS: If I weren't up to my ears in other deadline-driven deliverables, I'd just 
jump in and try it.

At 4:21 PM -0800 1/7/07, Chris Hostetter wrote:
>It seems like [Handling disparate data sources in Solr] could be addressed by 
>modifing the SolrUpdateServlet to to support to low level query params similar 
>to the way the SolrServlet looks at "qt" and "wt".  The first Param would be 
>used to pick an UpdateSource plugin that would have an API like...
>  public interface UpdateSource {
>     SolrUpdateRequest makeRequest(HttpServletRequest req);
>  }
>
>with the SolrUpdateRequest interface looking something like...
>  public interface SolrUpdateRequest {
>     SolrParams getParams();
>     Iterable<java.io.Reader> getRawUpdates();
>  }
>
>different out of the box versions of UpdateSource would support building
>SolrUpdateRequest objects from HttpServletRequests using...
>  1) URL query args and the raw POST body
>  2) query args from multipart form input and Readers from file uploads
>  3) query args and local filenames specificed in query args
>  4) query args and remote URLs specified in query args
>
>The SolrUpdateServlet would then use SolrUpdateRequest.getParams() to
>lookup it's second core param for picking an UpdateParser plugin, which
>would be responsible for parsing all of those Readers in sequence,
>converting them to UpdateCommands, and calling the appropriate methods on
>the UpdateHandler.
>
>Out of the box versions of UpdateParser could do the XML parsing currently
>done, or JSON parsing, or CSV parsing.  Custom plugins written by users
>could do more exotic schema specific parsing: ie, reading raw PDFs and
>extracting specific field values.
>
>
>what do you guys think?
>
>
>-Hoss

Update Plugins (was Re: Handling disparate data sources in Solr)

Reply via email to