That makes sense. I would prefer to just merge the custom analytics, but
sending that much info via the solr response seems very slow. However I
still can't figure out how to access the custom analytics in a doc
transformer. That would provide the fastest response but I would have to
merge the Ids myself. I think I have only two paths, one appears to be too
slow, the other just throws exceptions.

The slow approach:
- The delegating collector computes the analytics for each collected doc: {
docId, { ... }}
- From the finish() method it places that map (size could be million+
elements) on the solr response: (response builder).rsp.add("customStats",
obj)
- The merge strategy gets the analytics from each shard response, merges
them only for the docs returned to the caller, then adds them to the solr
query response (size is now thousands, not millions).

This would work, but it's really slow. Does that have to do with putting the
analytics on the solr response for the merge object to pick up?

The broken approach (only works for single shard):
- The delegating collector computes the analytics for each collected doc
(exactly the same as above)
- From the finish() method it places that map (size could be million+
elements) on the solr query request: (response
builder).req.getContext().put("customStats", obj)
- Doc transformer reads the analytics and adds a field to the doc containing
the stats for that one field (the analytics are injected into the returned
doc)
- The merge strategy combines the analytics of duplicate docs. 

When the doc transformer first tries to read the analytics for the second
shard it throws exceptions. So either this approach is not possible, or my
implementation is flawed. You may not be able to determine anything from a
small code snippet, but this is my transform method:

public void transform(SolrDocument doc, int id) throws IOException {
                        if (super.context != null) {
                                HashMap stats = (HashMap)
super.context.req.getContext().get("CustomAnalytics");
        
                                HashMap fieldStats = stats.get(id);
                                if (fieldStats != null) {
                                        doc.setField(field, fieldStats.print());
                                }
                        }
                }

Any idea why the latter approach is not working?

Joel Bernstein wrote
> The mergeIds() method should be true if you are handling the merge of the
> documents from the shards. If you are merging custom analytics from an
> AnalyticsQuery only then you would return false. In your case, since you
> are de-duping documents you would need to return true.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-MergeStrategy-filter-returned-docs-tp4290446p4290799.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to