[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464653 ]
J.J. Larrea commented on SOLR-104: ---------------------------------- I think this is a fantastic effort, so please take my comments and suggestions for improvement below in the context of my appreciation you took the time to wade deeply into the SOLR request-handling code looking for places to improve it. If the request structure is cleaned up along these lines it will make it much simpler for people to develop and contribute alternate request handlers (both update and query varieties) and further SOLRs standing as an open-source community-driven project. 1. I really like your idea of using the URL suffix to specify the handler. But it looks like you have required this to be a fixed 2-level hierarchy, with URLs of the form http://<host>:<port>/<solr-root>/<action>/<handler> which are looked up in a handler table keyed by <action> and then <handler>. For example search/standard looks for a <requestHandler action="search" name="standard"...>, loads the indicated handler class, and associates it with the config. But this hierarchy seems a little overdetermined and the implementation overcomplex. It could be argued that one would want <handler>/<action>, for the <handler> alone to resolve to a handler class, and the handler class to be responsible for deciding how to act on the <action> part... but any hierarchical arrangement that makes perfect sense to one person can seem wrong to another. And in your implementation I see no actual action taken by the <action> argument other than selection of a per-<action> default <handler>, and that seems more complexity than it's worth; there are easier ways. I would suggest the simpler approach of simply taking the entire path after <solr-root> to be a handler configuration name, without conforming it to any fixed hierarchy, e.g. a developer could set up <requestHandler name="search/products/bysku" class="...">...</> <requestHandler name="search/products/byname" class="...">...</> <requestHandler name="search/companies" class="...">...</> <requestHandler name="index/csv" class="...">...</> In that way establishing a command/subcommand hierarchy is entirely up to the user's solrconfig.xml setup, and there is no imposed logic as to whether the different behavior between the 3 search examples is achieved through different config of the same handler class, different handler classes, or both. As for default actions, there is no need for special code, they can entirely be defined in solrconfig. For example, if a developer sets up a /search/xxx space as above, the response to a client request /search without further qualification is entirely up to what is defined in solrconfig.xml: - If there is no request handler defined under name="search" SOLR would return a standard "No handler found" message - If it has a query request handler under that name (e.g. with name="search" class="solr.StandardRequestHandler") it would get to handle less-qualified requests with developer-defined defaults. - It could be defined to explicitly invoke your UnavailableRequestHandler -- a great idea which should be extended so the error code and error message could be custom-configured with handler config params. Thus I think this free-form hierarchy would achieve greater simplicity and greater flexibility at the same time. 2. What would make this even more powerful would be the ability to "subclass" (meaning refine and/or extend) request handler configs: If the requestHandler element allowed an attribute extends="<another-requesthandler-name>" and chained the SolrParams, then one could do something like: <requestHandler name="search/products/all" class="solr.DisMaxRequestHandler" > <lst name="defaults"> <float name="tie">0.01</float> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> ... much more, per the "dismax" example in the sample solrconfig.xml ... </requestHandler> ... and replacing the "partitioned" example ... <requestHandler name="search/products/instock" extends="search/products/all" > <lst name="appends"> <str name="fq">inStock:true</str> </lst> </requestHandler> One could even allow the extending requestHandler to set a different handler class, in the case where the difference in behavior requires a different handler implementation but can share all or part of the params. 3. Structuring the code and action under /add is conceptually limiting because update-style request plugins such as SQL-based or CSV-based (and certainly XML-based) should still be able to add, replace, and delete, either based on internal logic or external command. Your code suggests further refactoring improvements along those lines. For example, in your SQLUpdateHandler example you call: AddUpdateCommand cmd = UpdateUtils.getAddUpdateCommandFromParams( params ); and then for each assembled Document cmd.doc = docmap.toDocument( schema ); UpdateUtils.addDoc( cmd ); [which does SolrCore.getSolrCore().getUpdateHandler().addDoc( cmd );] addedDocumentCount++; Lets say: A. We standardized on action=... as a way to define an action in a param B. A new method UpdateUtils.getUpdateCommandFromParams( params ) would use the action= param to decide which xxxUpdateCommand class to instantiate -- though this might be better placed as a static class method getCommandFromParams defined in UpdateCommand itself. (Perhaps once it decodes the action param it could call another UpdateCommand static method getCommandFromString( name ), which would centralize the instantiation logic, and provide a pathway for it it to evolve into a name->class soft mapping should extensibility to other commands ever be desired.) C. Responsibility for initializing each update command from params would be the responsibility of the xxxUpdateCommand itself, e.g. base class UpdateCommand would define an abstract initializeFromParams( params ) which each subclass would implement. D. Responsibility for executing each update command would also lay with the xxxUpdateCommand via an abstract execute() method, which for AddUpdateCommand would be: SolrCore.getSolrCore().getUpdateHandler().addDoc( this ); (It could be argued that some of the various UpdateHandler functionality should migrate to the xxxUpdateCommands themselves, but that's another discussion) E. The DeleteUpdateCommand could allow the target record to be defined by setting a Document, and using the schema keyfield definition to extract the ID. The logic could be encapsulated in DeleteUpdateCommand.execute() itself, extracting the keyfield value from the Document and setting it in its own id field before calling getUpdateHandler().delete( cmd ). Or if setters were defined for the internal fields, setDocument(doc) could call setId(id) and not need to keep a reference to the document. In either case the existing UpdateHandlers don't even need to know about this new modality. Then SQLUpdateHandler or CSVUpdateHandler or whatever (which should really be called xxxRequestHandler since UpdateHandler is quite another beast) would be agnostic as to whether they are doing adds or deletes of the selected rows; they could just ask for an UpdateCommand implementation, and in a loop set the constructed Document, and call updateCommand.execute(). While it would be most efficient if for deletes the query or data file only specified a single keyfield column, there is no harm in creating Documents with many more fields and simply ignoring all but the keyfield. Either way one could set up an explicit deletion-only CSV handler: <!-- Could be csv/delete, delete/csv, product/delete, or whatever --> <requestHandler name="csv/delete" class="solr.CSVUpdateHandler"> <lst name="invariants"> <str name="action">delete</str> </lst> </requestHandler> or more flexibly allow the parameter to be passed in a URL query param: /<solr-root>/update/product?action=delete&... 4. Similarly with this structure your CommitHandler and OptimizeHandler classes can be replaced with a simple CommandHandler which is defined in solrconfig; it could even handle deletion by ID or query: <requestHandler name="commit-and-flush" class="solr.CommandHandler"> <lst name="invariants"> <str name="action">commit</str> <str name="waitFlush">true</str> </lst> <lst name="defaults"> <str name="waitSearcher">false</str> </lst> </requestHandler> 5. Of course a RequestHandler could be non-command-agnostic as well, for example the XML update parsing code now in SolrCore.update() could be recast as a RequestHandler which reads the <add>, <delete>, <commit>, <optimize> tags and decides itself which type of UpdateCommand(s) to instantiate (using UpdateCommand.getCommandFromString as described above). But it could be made more agnostic and simpler still, for example by parsing the action and parameters from XML into a SolrParams and calling getCommandFromParams as described above. That would forge a path for other non-XML command-stream-based update handlers by leaving only XML parsing logic in the XML update RequestHandler -- all the rest of the logic would be in UpdateCommand, the UpdateCommand implementations, and UpdateHandler. Anyway, those are some ideas, for what they're worth. > Update Plugins > -------------- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update > Affects Versions: 1.2 > Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira