[jira] Commented: (SOLR-303) Federated Search over HTTP

Stu Hood (JIRA) Thu, 18 Oct 2007 14:57:11 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536064
 ]


Stu Hood commented on SOLR-303:
-------------------------------

I'm still working on wrapping my head around the fedsearch phases, but I 
noticed the following stacktrace showing up in the logs every now and then:
{noformat}
SEVERE: java.lang.NullPointerException
        at 
org.apache.solr.handler.federated.component.GlobalCollectionStatComponent.prepare(GlobalCollectionStatComponent.java:81)
        at 
org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:116)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:807)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)
{noformat}

... that is probably caused by the following statements around line 81 in 
GlobalCollectionStatComponent.prepare. We only enter the if statement if terms 
is null, and then we dereference it...
{code}    String terms = req.getParams().get(ResponseBuilder.DOCFREQS);
    if (numDocs != null && terms == null) {
      // the build query has to be over-written to take into
      //account global numDocs and docFreqs

      //extract the numDocs and docFreqs from request params
      Map<Term, Integer> dfMap = new HashMap<Term, Integer>();
      String[] strTerms = terms.split(",");
{code}

> Federated Search over HTTP
> --------------------------
>
>                 Key: SOLR-303
>                 URL: https://issues.apache.org/jira/browse/SOLR-303
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Sharad Agarwal
>            Priority: Minor
>         Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in 
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> ------------------------
> - The client applications are totally agnostic to federated search. The 
> federated search and merging of results are totally behind the scene in Solr 
> in request handler . Response format remains the same after merging of 
> results.
> The response from individual shard is deserialized into SolrQueryResponse 
> object. The collection of SolrQueryResponse objects are merged to produce a 
> single SolrQueryResponse object. This enables to use the Response writers as 
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated 
> only for merged documents. The query is executed in 2 phases. First phase 
> gets the doc unique keys with sort criteria. Second phase brings all 
> requested fields and highlighting information. This saves lot of CPU in case 
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can 
> specify to execute query in just 1 phase itself. (For some queries when 
> highlighting info is not required and number of fields requested are small; 
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate 
> plugins and request parameters. As federated search is performed by the 
> RequestHandler itself, multiple request handlers can easily be pre-configured 
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies 
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be 
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different 
> implementation could be written based on JSON, xml SAX etc. Current one based 
> on XML DOM.
> HOW:
> -------
> A new RequestHandler called MultiSearchRequestHandler does the federated 
> search on multiple sub-searchers, (referred as "shards" going forward). It 
> extends the RequestHandlerBase. handleRequestBody method in 
> RequestHandlerBase has been divided into query building and execute methods. 
> This has been done to calculate global numDocs and docFreqs; and execute the 
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend 
> MultiSearchRequestHandler class in order to enable federated capability for 
> the handler. StandardRequestHandler and DisMaxRequestHandler have been 
> changed to extend this class.
>  
> The federated search kicks in if "shards" is present in the request 
> parameter. Otherwise search is performed as usual on the local index. eg. 
> shards=local,host1:port1,host2:port2 will search on the local index and 2 
> remote indexes. The search response from all 3 shards are merged and serviced 
> back to the client. 
> The search request processing on the set of shards is performed as follows:
> STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs 
> are calculated by requesting all the shards and adding up numDocs and 
> docFreqs from each shard.
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs 
> are passed as request parameters. All document fields are NOT requested, only 
> document uniqFields and sort fields are requested. MoreLikeThis and 
> Highlighting information are NOT requested.
> STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" 
> and "rows" params. Merged doc uniqField and sort fields are collected. Other 
> information like facet and debug is also merged.
> STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped 
> based on shards. All shards in the grouping are queried for the merged doc 
> uniqFields (from FirstQueryPhase), highlighting and moreLikeThis info.
> STEP 5: Responses from all shards from SecondQueryPhase are merged.
> STEP 6: Document fields , highlighting and moreLikeThis info from 
> SecondQueryPhase are merged into FirstQueryPhase response.
> TODO:
> -Support sort field other than default score
> -Support ResponseDocs in writers other than XMLWriter
> -Http connection timeouts
> OPEN ISSUES;
> -Merging of facets by "top n terms of field f" 
> Scope for Performance optimization:-
> -Search shards in parallel threads
> -Http connection Keep-Alive ?
> -Cache global numDocs and docFreqs
> -Cache Query objects in handlers ??
> Would appreciate feedback on my approach. I understand that there would be 
> lot things I might have over-looked. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-303) Federated Search over HTTP

Reply via email to