[jira] Commented: (SOLR-104) Update Plugins

J.J. Larrea (JIRA) Sun, 14 Jan 2007 20:49:48 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464653
 ]


J.J. Larrea commented on SOLR-104:
----------------------------------

I think this is a fantastic effort, so please take my comments and suggestions 
for improvement below in the context of my appreciation you took the time to 
wade deeply into the SOLR request-handling code looking for places to improve 
it.  If the request structure is cleaned up along these lines it will make it 
much simpler for people to develop and contribute alternate request handlers 
(both update and query varieties) and further SOLRs standing as an open-source 
community-driven project.

1. I really like your idea of using the URL suffix to specify the handler.  But 
it looks like you have required this to be a fixed 2-level hierarchy, with URLs 
of the form

    http://<host>:<port>/<solr-root>/<action>/<handler>

which are looked up in a handler table keyed by <action> and then <handler>. 
For example search/standard looks for a <requestHandler action="search" 
name="standard"...>, loads the indicated handler class, and associates it with 
the config.

But this hierarchy seems a little overdetermined and the implementation 
overcomplex.  It could be argued that one would want <handler>/<action>, for 
the <handler> alone to resolve to a handler class, and the handler class to be 
responsible for deciding how to act on the <action> part... but any 
hierarchical arrangement that makes perfect sense to one person can seem wrong 
to another. And in your implementation I see no actual action taken by the 
<action> argument other than selection of a per-<action> default <handler>, and 
that seems more complexity than it's worth; there are easier ways.

I would suggest the simpler approach of simply taking the entire path after 
<solr-root> to be a handler configuration name,  without conforming it to any 
fixed hierarchy, e.g. a developer could set up

        <requestHandler name="search/products/bysku" class="...">...</>
        <requestHandler name="search/products/byname" class="...">...</>
        <requestHandler name="search/companies" class="...">...</>
        <requestHandler name="index/csv" class="...">...</>

In that way establishing a command/subcommand hierarchy is entirely up to the 
user's solrconfig.xml setup, and there is no imposed logic as to whether the 
different behavior between the 3 search examples is achieved through different 
config of the same handler class, different handler classes, or both.

As for default actions, there is no need for special code, they can entirely be 
defined in solrconfig.  For example, if a developer sets up a /search/xxx space 
as above, the response to a client request /search without further 
qualification is entirely up to what is defined in solrconfig.xml:
  - If there is no request handler defined under name="search" SOLR would 
return a standard "No handler found" message
  - If it has a query request handler under that name (e.g. with name="search" 
class="solr.StandardRequestHandler") it would get to handle less-qualified 
requests with developer-defined defaults.
  - It could be defined to explicitly invoke your UnavailableRequestHandler -- 
a great idea which should be extended so the error code and error message could 
be custom-configured with handler config params.

Thus I think this free-form hierarchy would achieve greater simplicity and 
greater flexibility at the same time.

2. What would make this even more powerful would be the ability to "subclass" 
(meaning refine and/or extend) request handler configs: If the requestHandler 
element allowed an attribute extends="<another-requesthandler-name>" and 
chained the SolrParams, then one could do something like:

  <requestHandler name="search/products/all" class="solr.DisMaxRequestHandler" >
    <lst name="defaults">
     <float name="tie">0.01</float>
     <str name="qf">
        text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
     </str>
     ... much more, per the "dismax" example in the sample solrconfig.xml ...
  </requestHandler>

  ... and replacing the "partitioned" example ...
  <requestHandler name="search/products/instock" extends="search/products/all" >
    <lst name="appends">
      <str name="fq">inStock:true</str>
    </lst>
  </requestHandler>

One could even allow the extending requestHandler to set a different handler 
class, in the case where the difference in behavior requires a different 
handler implementation but can share all or part of the params.

3. Structuring the code and action under /add is conceptually limiting because 
update-style request plugins such as SQL-based or CSV-based (and certainly 
XML-based) should still be able to add, replace, and delete, either based on 
internal logic or external command.

Your code suggests further refactoring improvements along those lines.  For 
example, in your SQLUpdateHandler example you call:

    AddUpdateCommand cmd = UpdateUtils.getAddUpdateCommandFromParams( params );

and then for each assembled Document

    cmd.doc = docmap.toDocument( schema );
    UpdateUtils.addDoc( cmd );
        [which does SolrCore.getSolrCore().getUpdateHandler().addDoc( cmd );]
    addedDocumentCount++;

Lets say:

  A. We standardized on action=... as a way to define an action in a param

  B. A new method UpdateUtils.getUpdateCommandFromParams( params ) would use 
the action= param to decide which xxxUpdateCommand class to instantiate -- 
though this might be better placed as a static class method 
getCommandFromParams defined in UpdateCommand itself.

(Perhaps once it decodes the action param it could call another UpdateCommand 
static method getCommandFromString( name ), which would centralize the 
instantiation logic, and provide a pathway for it it to evolve into a 
name->class soft mapping should extensibility to other commands ever be 
desired.)

  C. Responsibility for initializing each update command from params would be 
the responsibility of the xxxUpdateCommand itself, e.g. base class 
UpdateCommand would define an abstract initializeFromParams( params ) which 
each subclass would implement.

  D. Responsibility for executing each update command would also lay with the 
xxxUpdateCommand via an abstract execute() method, which for AddUpdateCommand 
would be:

    SolrCore.getSolrCore().getUpdateHandler().addDoc( this );

(It could be argued that some of the various UpdateHandler functionality should 
migrate to the xxxUpdateCommands themselves, but that's another discussion)

  E. The DeleteUpdateCommand could allow the target record to be defined by 
setting a Document, and using the schema keyfield definition to extract the ID. 
 The logic could be encapsulated in DeleteUpdateCommand.execute() itself, 
extracting the keyfield value from the Document and setting it in its own id 
field before calling getUpdateHandler().delete( cmd ). Or if setters were 
defined for the internal fields, setDocument(doc) could call setId(id) and not 
need to keep a reference to the document.  In either case the existing 
UpdateHandlers don't even need to know about this new modality.

Then SQLUpdateHandler or CSVUpdateHandler or whatever (which should really be 
called xxxRequestHandler since UpdateHandler is quite another beast) would be 
agnostic as to whether they are doing adds or deletes of the selected rows; 
they could just ask for an UpdateCommand implementation, and in a loop set the 
constructed Document, and call updateCommand.execute().  While it would be most 
efficient if for deletes the query or data file only specified a single 
keyfield column, there is no harm in creating Documents with many more fields 
and simply ignoring all but the keyfield.

Either way one could set up an explicit deletion-only CSV handler:

    <!-- Could be csv/delete, delete/csv, product/delete, or whatever -->
    <requestHandler name="csv/delete" class="solr.CSVUpdateHandler">
        <lst name="invariants">
            <str name="action">delete</str>
        </lst>
    </requestHandler>

or more flexibly allow the parameter to be passed in a URL query param:

    /<solr-root>/update/product?action=delete&...

4. Similarly with this structure your CommitHandler and OptimizeHandler classes 
can be replaced with a simple CommandHandler which is defined in solrconfig; it 
could even handle deletion by ID or query:
    
    <requestHandler name="commit-and-flush" class="solr.CommandHandler">
        <lst name="invariants">
            <str name="action">commit</str>
            <str name="waitFlush">true</str> 
        </lst>
        <lst name="defaults">
            <str name="waitSearcher">false</str>
        </lst>
    </requestHandler>

5. Of course a RequestHandler could be non-command-agnostic as well, for 
example the XML update parsing code now in SolrCore.update() could be recast as 
a RequestHandler which reads the <add>, <delete>, <commit>, <optimize> tags and 
decides itself which type of UpdateCommand(s) to instantiate (using 
UpdateCommand.getCommandFromString as described above).

But it could be made more agnostic and simpler still, for example by parsing 
the action and parameters from XML into a SolrParams and calling 
getCommandFromParams as described above.  That would forge a path for other 
non-XML command-stream-based  update handlers by leaving only XML parsing logic 
in the XML update RequestHandler -- all the rest of the logic would be in 
UpdateCommand, the UpdateCommand implementations, and UpdateHandler.

Anyway, those are some ideas, for what they're worth.


> Update Plugins
> --------------
>
>                 Key: SOLR-104
>                 URL: https://issues.apache.org/jira/browse/SOLR-104
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Ryan McKinley
>             Fix For: 1.2
>
>         Attachments: HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (SOLR-104) Update Plugins

Reply via email to