That makes sense. I would prefer to just merge the custom analytics, but sending that much info via the solr response seems very slow. However I still can't figure out how to access the custom analytics in a doc transformer. That would provide the fastest response but I would have to merge the Ids myself. I think I have only two paths, one appears to be too slow, the other just throws exceptions.
The slow approach: - The delegating collector computes the analytics for each collected doc: { docId, { ... }} - From the finish() method it places that map (size could be million+ elements) on the solr response: (response builder).rsp.add("customStats", obj) - The merge strategy gets the analytics from each shard response, merges them only for the docs returned to the caller, then adds them to the solr query response (size is now thousands, not millions). This would work, but it's really slow. Does that have to do with putting the analytics on the solr response for the merge object to pick up? The broken approach (only works for single shard): - The delegating collector computes the analytics for each collected doc (exactly the same as above) - From the finish() method it places that map (size could be million+ elements) on the solr query request: (response builder).req.getContext().put("customStats", obj) - Doc transformer reads the analytics and adds a field to the doc containing the stats for that one field (the analytics are injected into the returned doc) - The merge strategy combines the analytics of duplicate docs. When the doc transformer first tries to read the analytics for the second shard it throws exceptions. So either this approach is not possible, or my implementation is flawed. You may not be able to determine anything from a small code snippet, but this is my transform method: public void transform(SolrDocument doc, int id) throws IOException { if (super.context != null) { HashMap stats = (HashMap) super.context.req.getContext().get("CustomAnalytics"); HashMap fieldStats = stats.get(id); if (fieldStats != null) { doc.setField(field, fieldStats.print()); } } } Any idea why the latter approach is not working? Joel Bernstein wrote > The mergeIds() method should be true if you are handling the merge of the > documents from the shards. If you are merging custom analytics from an > AnalyticsQuery only then you would return false. In your case, since you > are de-duping documents you would need to return true. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-MergeStrategy-filter-returned-docs-tp4290446p4290799.html Sent from the Solr - User mailing list archive at Nabble.com.