Hello Solr users :)

Right now it seems that if I want to rollup on two different fields
with streaming expressions, I would need to do two separate requests.
This is too slow for our use-case, when we need to do joins before
sorting and rolling up (because we'd have to re-do the joins).

Since in our case we are actually looking for some not-necessarily
accurate facets (top N), the best solution we could come up with was
to implement a new stream decorator that implements an algorithm like
Count-min sketch[1] which would run on the tuples provided by the
stream function it wraps. This would have two big wins for us:
1) it would do the facet without needing to sort on the facet field,
so we'll potentially save lots of memory
2) because sorting isn't needed, we could do multiple facets in one go

That said, I have two (broad) questions:
A) is there a better way of doing this? Let's reduce the problem to
streaming aggregations, where the assumption is that we have multiple
collections where data needs to be joined, and then facet on fields
from all collections. But maybe there's a better algorithm, something
out of the box or closer to what is offered out of the box?
B) whatever the best way is, could we do it in a way that can be
contributed back to Solr? Any hints on how to do that? Just another
decorator?

Thanks and best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

[1] https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch

Reply via email to