I have an index that uses 1 level of nested documents. When I run a query 
on it the result comes back in about 20-200 milliseconds. When I add a 
facet or an aggregation involving the nested documents the uncached 
response always takes 2-3 seconds, regardless of how many documents have 
been selected, even zero.

My map looks like this:

{
    "document": {
        "dynamic": "strict",
        "properties": {
            "account_id": {
                "type": "long"
            },
            "data": {
                "type": "nested",
                "properties": {
                    "key": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "string": {
                        "type": "string",
                        "index": "not_analyzed",
                        "fields": {
                            "token": {
                                "type": "string"
                            }
                        }
                    },
                    "integer": {
                        "type": "long"
                    },
                    "date": {
                        "type": "date",
                        "format": "dateOptionalTime"
                    }
                }
            }
        }
    }
}

There are 3.6 million documents in this index. My query looks like this:

{
    "query": {
        "bool":{
            "must":[
             {"term":{"account_id": 1}},
                {
                    "nested":{
                        "path":"data",
                        "query":{"term":{"key":"amount"}}
                    }
                }
            ]
        }
    }
}

The result to the above query is 0 documents because account_id 1 doesn't 
have any documents with a key of "amount". Uncached this returns in about 
10-150ms:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    }
}

When I add an aggregation to the query:

{
    ...
    "aggs" : {
        "report" : {
            "nested" : {
                "path" : "data"
            },
            "aggs" : {
                "amount" : {
                    "filter" : {
                        "query": {"term": {"key":"amount"}}
                    },
                    "aggs": {
                        "sum": {
                            "sum" : { "field" : "integer" }
                        }
                    }
                }
            }
        }
    }
}

Uncached the query returns in about 2-3 seconds:

{
    "took": 2770,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "report": {
            "doc_count": 0,
            "amount": {
                "doc_count": 0,
                "sum": {
                    "value": 0
                }
            }
        }
    }
}

If I run the same thing a second time (cached) it runs in 26 milliseconds. 
If I clear the cache and run it again it takes 2 seconds.

Why is this aggregation always taking 2-3 seconds, even though the query is 
returning 0 documents? The same thing happens with a statistical facet.

-
Luke

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a82323a6-9a81-436b-a2d2-cc26e918cb7c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to