I'm in frantic deadline mode so I'm just going to throw in some (hopefully) short comments...
At 11:02 PM -0800 1/15/07, Ryan McKinley wrote: >>the one thing that still seems missing is those "micro-plugins" i was >> [SNIP] >> >> interface SolrRequestParser { >> SolrRequest process( HttpServletRequest req ); >> } >> > > >I left out "micro-plugins" because i don't quite have a good answer >yet :) This may a place where a custom dispatcher servlet/filter >defined in web.xml is the most appropriate solution. If the issue is munging HTTPServletRequest information, then a proper separation of concerns suggests responsibility should lie with a Servlet Filter, as Ryan suggests. For example, while the Servlet 2.4 spec doesn't have specifications for how the servlet container can/should "burst" a multipart-MIME payload into separate files or streams, there are a number of 3rd party Filters which do this. The Iterator<ContentStream> is a great idea because if each stream is read to completion before the next is opened it doesn't impose any limitation on individual stream length and doesn't require disk buffering. (Of course some handlers may require access to more than one stream at a time; each time next() is called on the iterator before the current stream is closed, the remainder of that stream will have to be buffered in memory or on disk, depending on the part length. Nonetheless that detail can be entirely hidden from the handler, as it should be. I am not sure if any available ServletFilter implementations work this way, but it's certainly doable.) But that detail is irrelevant for now; as I suggest below, using this API lets one immediately implement it with only next() value of the entire POST stream; that would answer the needs of the existing update request handling code, but establish an API to handle multi-part. Whenever someone wants to write a multi-stream handler, they can write or find a better Iterator<ContentStream> implementation, which would best be cast as a ServletFilter. >I like the SolrRequestParser suggestion. Me too. It answers a hole in my vision for how this can all fit together. >Consider: >qt='RequestHandler' >wt='ResponseWriter' >rp='RequestParser ' (rb='SolrBuilder'?) > >To avoid possible POST read-ahead stream mungling: qt,wt, and rp >should be defined by the URL, not parameters. (We can add special >logic to allow /query?qt=xxx) > >For qt, I like J.J. Larrea's suggestion on SOLR-104 to let people >define arbitrary path mapping for qt. > >We could append 'wt', 'rb', and arbitrary arbitrary text to the >registered path, something like > /registered/path/wt:json/rb:standard/more/stuff/in/the/path?params... > >(any other syntax ideas?) No need for new syntax, I think. The pathInfo or qt or other source resolves to a requestHandler CONFIG name. The handler config is read to determine the handler class name. It also can be consulted (with URL or form-POST params overriding if allowed by the config) to decide which RequestParser to invoke BEFORE IT IS CALLED and which ResponseWriter to invoke AFTER. Once those objects are set up, the request body gets executed. Handler config inheritance (as I proposed in SOLR-104 point #2) would greatly simplify, for example, creating a dozen query handlers which used a particular invariant combination of qt, wt, and rp >The 'standard' RequestParser would: >GET: > fill up SolrParams directly with req.getParameterMap() >if there is a 'post' parameter (post=XXX) > return a stream with XXX as its content >else > empty iterator. >Perhaps add a standard way to reference a remote URI stream. > >POST: > if( multipart ) { > read all form fields into parameter map. This should use the same req.getParameterMap as for GET, which Servlet 2.4 says is suppose to be automatically by the servlet container if the payload is application/x-www-form-urlencoded; in that case the input stream should be null. > return an iterator over the collection of files Collection of streams, per Hoss. >} >else { > no parameters? parse parameters from the URL? /name:value/ > return the body stream As above, this introduces unneeded complexity and should be avoided. >} >DEL: > throw unsupported exception? > > >Maybe each RequestHandler could have a default RequestParser. If we >limited the 'arbitrary path' to one level, this could be used to >generate more RESTful URLs. Consider: > >/myadder/1111/2222/3333/ > >/myadder maps to MyCustomHandler and that gives you >MyCustomRequestBuilder that maps /1111/2222/3333 to SolrParams I think these are best left for an extra-SOLR layer, especially since SOLR URLs are meant for interprogram communication and not direct use by non-developer end users. For example, for my org's website I have hundreds of Apache mod_rewrite rules which do URL munging such as /journals/abc/7/3/192a.pdf into /journalroot/index.cfm?journal=abc&volume=7&issue=3 &page=192&seq=a&format=pdf Or someone could custom-code a subclass of SolrServlet which handles application-specific URL requirments. But the base implementation should be as simple as possible - or perhaps more accurately, complex only where complexity is really called for: query caching, faceting, and the like. >>one last thought: while the interfaces you outlined would make a lot >>of sense if we were starting from scratch, there are probably several >>cases where not having those exact names/APIs doesn't really hurt, and >>would allow backwards compatibility with more of the current code (and >>current SolrRequestHandler plugin people have written) ... just something >>we should keep in mind: we don't want to go hog wild renaming a lot of >>stuff and alienating our existing "plugin" user base. (nor do we want to >>make a bunch of unneccessary config file format changes) >> > >I totally understand and agree. > >Perhaps the best approach is to offer a SolrRequestProcessor framework >that can sit next to the existing SolrRequestHandler without affecting >it much (if at all). For what i have suggested, i *think* it could >all be done with simple additions to solrschema.xml that would still >work on an unedited 1.1.0 solrconfig.xml Yah > >If we use a servletFilter for the dispatcher, this can sit next to the >current /query?xxx servlet without problem. When the >SolrRequestProcessor framework is rock solid, we would @Deprecate >SolrRequestHandler and change the default solrconfig.xml to map /query >to the new framework. I think there are some small to moderate steps which should be done within the SOLR 1.x where x < 5 framework, where one wants to be non-API breaking as much as possible, and some bigger improvements which should be contemplated for an API-incompatible release. For the short term, I think the following make sense: - Refactor the request handlers (based in part on what Ryan has already done in SOLR-102/20) to unify query and update requests. - Move scattered functionality to UpdateCommand and xxxUpdateCommmand implementations per my comments #3 SOLR - It would make sense to establish the Iterator<ContentStream> API in but initially only provide a single stream, the raw POST when not form-urlencoded. - Revise the XML-based update code (broken out of SolrCore into a RequestHandler) to use all the above. - To keep existing configs unmodified, we would want to continue to use qt as the arg even for update. - Add handler config inheritance and structure the sample solrconfig to suggest structuring handlers as a hierarchy - Change the servlet (or add a filter) to use pathInfo to select the handler. For the moment, the servlet could simply take any pathINFO and store it in qt. So /<base>?qt=standard and /<base>/standard are identical, and /<base>/query/products/instock would be the same as /base?qt=query/products/instock and match a response handler name="query/products/instock" - Provide config examples which exercise the new mechanisms. This would have immediate benefits for plugin writers (both query and update), config file writers (shorter files, request namespace structuring), command issuers (having the request part of the pathInfo rather than qt seems to make sense to everyone), and so forth. With that done, as a second step, having Request Parsing plugins makes great sense. I'll save further commentary on that for another email. - J.J.