Micheal Dustin, what should reduce the query time a lot is if you set
`collect_mode` to `breadth_first` on the `top-fingerprints` agg. Like this:
GET /_search?search_type=count
{
aggs: {
top-fingerprints: {
terms: {
field: fingerprint,
size: 50,
collect_mode:
Micheal: I'd would expect that setting the `size` option on the terms agg
to a smaller value would have a positive impact on the total query time.
Feels like I'm missing something, can you run hot threads api (
Martijn,
Thanks for thinking about this. I tried changing the `size` on terms agg to
1, 5, 10, 25, 50 and timing didn't change much. Interestingly I also set
the size to 0 which in turn took down our cluster. I tried removing the
`_source` option and that didn't have any noticeable effect on
I'm curious what the underlying algorithm is for TopHits.
My mental model for ordinary aggregations is that there's basically a hash
table of (field_value - count) maintained (for each field being
aggregated), and that hash table count is incremented once per document,
and then the top K
Can you share the query and example results please?
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/
On Tue, Jan 6, 2015 at 10:11 PM, Michael Irani
Sure. I simplified the query to keep things focused.
This query takes about 3 seconds to run:
{
size: 0,
aggs: {
top-fingerprints: {
terms: {
field: fingerprint,
size: 50
},
aggs: {
Hi Michael,
In general the more buckets being returned by the parent aggregator the
top_hits is nested in, the more work the top_hits agg needs to do, but I
didn't come across performance issues with `size` on terms agg being set to
50 and the time it takes to execute increasing 30 times when