: > There has been some discussion about adding plugin support for the : > "update" side of things as well -- at a very simple level this could allow : > for messages to be sent via JSON, or CSV instead of just XML -- but
: I'm interested in discussing this further. I've moved the discussion : onto solr-dev, as suggested. Currently, the "modularity" of updates is configurable only the upateHandler -- which decides how instances of "UpdateCommand" will be handled by the SOlrCore (directly, via a temp index, etc...) The relevent discussion so far seems to have focused on a two different aspects of issue related to how SolrCore gets those commands... 1) parsing different String representations (ie: XML vs JSON vs CSV) of the same basic command structure (ie: "add" containing "doc"s, containing "field"s) 2) differnet means of feeding those String commands to Solr (raw POST, CGI file upload, local file) with this thread, a third aspect has been brought up: 3) Sending Solr more "raw" data and letting a plugin extract the individual fields based on rules (IE: parsing a PDF and determing the "title" and "body" on the server side) It seems like these issues could be addressed by modifing the SolrUpdateServlet to to support to low level query params similar to the way the SolrServlet looks at "qt" and "wt". The first Param would be used to pick an UpdateSource plugin that would have an API like... public interface UpdateSource { SolrUpdateRequest makeRequest(HttpServletRequest req); } with the SolrUpdateRequest interface looking something like... public interface SolrUpdateRequest { SolrParams getParams(); Iterable<java.io.Reader> getRawUpdates(); } different out of the box versions of UpdateSource would support building SolrUpdateRequest objects from HttpServletRequests using... 1) URL query args and the raw POST body 2) query args from multipart form input and Readers from file uploads 3) query args and local filenames specificed in query args 4) query args and remote URLs specified in query args The SolrUpdateServlet would then use SolrUpdateRequest.getParams() to lookup it's second core param for picking an UpdateParser plugin, which would be responsible for parsing all of those Readers in sequence, converting them to UpdateCommands, and calling the appropriate methods on the UpdateHandler. Out of the box versions of UpdateParser could do the XML parsing currently done, or JSON parsing, or CSV parsing. Custom plugins written by users could do more exotic schema specific parsing: ie, reading raw PDFs and extracting specific field values. what do you guys think? -Hoss