Hi Jörg,
This query
{
query : {
bool: {
must: {
match : { body : big }
},
must_not: {
match : { body : data }
},
must: {
match : {id: 521}
}
}
}
}
and this query are
I ran into the same issue when using Integer.MAX_VALUE as the size
parameter (migrating from a DB-based search). Perhaps someone can come up
with a proper reference, I cannot, but according to a comment in this SO
Exactly. Filters do not use scores. They also use bit sets which makes them
reusable and fast.
I wasn't talking about a filter added to a query, I mean filtered queries.
This is a huge difference.
This query
{
query : {
bool: {
must: {
match : { body : big }
Hi Ivan,
Thanks for the input about aggregating on strings, I do that, but those
queries take time but they do not crash node.
The queries which caused problem were pretty straightforward queries (such
as a boolean query with two musts, one must is equal match and other a
range match on long)
Before firing queries, you should consider if the index design and query
choice is optimal.
Numeric range queries are not straightforward. They were a major issue on
inverted index engines like Lucene/Elasticsearch and it has taken some time
to introduce efficient implementations. See e.g.
Hi Jörg,
This query
{
query : {
bool: {
must: {
match : { body : big }
},
must_not: {
match : { body : data }
},
must: {
match : {id: 521}
}
}
}
}
and this query are
When I kept size as Integer.MAX_VALUE, it caused all the problems
Are you trying to return up to 2 billion documents at once? Even if that
number was only 1 million, you will face problems. Or did I perhaps
misunderstand you?
Are you sorting the documents based on the score (the default)?
I am not returning 2 billion documents :)
I am returning all documents that match. Actual number can be anywhere
between 0 to 50k. I am just fetching documents between a given time
interval such as one hour, one day so on and then do batch processing them.
I fixed this by making 2 queries,
I have a cluster of size 240 GB including replica and it has 5 nodes in it.
I allocated 5 GB RAM (total 5*5 GB) to each node and started the cluster.
When I start continuously firing queries on the cluster the GC starts
kicking in and eventually node goes down because of OutOfMemory exception.