[
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536064
]
Stu Hood commented on SOLR-303:
-------------------------------
I'm still working on wrapping my head around the fedsearch phases, but I
noticed the following stacktrace showing up in the logs every now and then:
{noformat}
SEVERE: java.lang.NullPointerException
at
org.apache.solr.handler.federated.component.GlobalCollectionStatComponent.prepare(GlobalCollectionStatComponent.java:81)
at
org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:116)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:807)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
{noformat}
... that is probably caused by the following statements around line 81 in
GlobalCollectionStatComponent.prepare. We only enter the if statement if terms
is null, and then we dereference it...
{code} String terms = req.getParams().get(ResponseBuilder.DOCFREQS);
if (numDocs != null && terms == null) {
// the build query has to be over-written to take into
//account global numDocs and docFreqs
//extract the numDocs and docFreqs from request params
Map<Term, Integer> dfMap = new HashMap<Term, Integer>();
String[] strTerms = terms.split(",");
{code}
> Federated Search over HTTP
> --------------------------
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Sharad Agarwal
> Priority: Minor
> Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch,
> fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> ------------------------
> - The client applications are totally agnostic to federated search. The
> federated search and merging of results are totally behind the scene in Solr
> in request handler . Response format remains the same after merging of
> results.
> The response from individual shard is deserialized into SolrQueryResponse
> object. The collection of SolrQueryResponse objects are merged to produce a
> single SolrQueryResponse object. This enables to use the Response writers as
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated
> only for merged documents. The query is executed in 2 phases. First phase
> gets the doc unique keys with sort criteria. Second phase brings all
> requested fields and highlighting information. This saves lot of CPU in case
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can
> specify to execute query in just 1 phase itself. (For some queries when
> highlighting info is not required and number of fields requested are small;
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate
> plugins and request parameters. As federated search is performed by the
> RequestHandler itself, multiple request handlers can easily be pre-configured
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different
> implementation could be written based on JSON, xml SAX etc. Current one based
> on XML DOM.
> HOW:
> -------
> A new RequestHandler called MultiSearchRequestHandler does the federated
> search on multiple sub-searchers, (referred as "shards" going forward). It
> extends the RequestHandlerBase. handleRequestBody method in
> RequestHandlerBase has been divided into query building and execute methods.
> This has been done to calculate global numDocs and docFreqs; and execute the
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend
> MultiSearchRequestHandler class in order to enable federated capability for
> the handler. StandardRequestHandler and DisMaxRequestHandler have been
> changed to extend this class.
>
> The federated search kicks in if "shards" is present in the request
> parameter. Otherwise search is performed as usual on the local index. eg.
> shards=local,host1:port1,host2:port2 will search on the local index and 2
> remote indexes. The search response from all 3 shards are merged and serviced
> back to the client.
> The search request processing on the set of shards is performed as follows:
> STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs
> are calculated by requesting all the shards and adding up numDocs and
> docFreqs from each shard.
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs
> are passed as request parameters. All document fields are NOT requested, only
> document uniqFields and sort fields are requested. MoreLikeThis and
> Highlighting information are NOT requested.
> STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start"
> and "rows" params. Merged doc uniqField and sort fields are collected. Other
> information like facet and debug is also merged.
> STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped
> based on shards. All shards in the grouping are queried for the merged doc
> uniqFields (from FirstQueryPhase), highlighting and moreLikeThis info.
> STEP 5: Responses from all shards from SecondQueryPhase are merged.
> STEP 6: Document fields , highlighting and moreLikeThis info from
> SecondQueryPhase are merged into FirstQueryPhase response.
> TODO:
> -Support sort field other than default score
> -Support ResponseDocs in writers other than XMLWriter
> -Http connection timeouts
> OPEN ISSUES;
> -Merging of facets by "top n terms of field f"
> Scope for Performance optimization:-
> -Search shards in parallel threads
> -Http connection Keep-Alive ?
> -Cache global numDocs and docFreqs
> -Cache Query objects in handlers ??
> Would appreciate feedback on my approach. I understand that there would be
> lot things I might have over-looked.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.