I think it's related to
this: https://github.com/elasticsearch/elasticsearch/pull/8270 which I
believe was released with 1.4.
We see the same thing, with hot spots on some nodes. You can poke the
cluster to rebalance itself, which that #8270 fixes permanently, using
curl -XPOST
I have dedicated client nodes for some really intense queries and
aggregations. Clients typically have 2GB of heap. Our experience is that
2GB of Heap is sufficient, the client node doesn't do a whole lot. The bulk
of the work is done on the data nodes.
cheers
mike
On Monday, November 10,
A few things I can think of to look at. During this high CPU load, what's
the:
- search rate
- index rate
- GC status (old and young) both number and duration
- IOPS
Are these nodes VM's? If so is there something else running on the other
VM's?
That CPU load doesn't look too bad
I strongly recommend Marvel (and I don't work for elasticsearch), it's
quite detailed and you get can insight into exactly what elasticsearch is
doing. The only thing it doesn't have full visibility into is the detailed
GC stats, for those you'll have to enable GC logging and use a gcviewer to
We use Nagios for alerting. I originally was using the nsca output plugin
for logstash, but found that it took close to a second to execute the
command line nsca client, and if we got flooded with alert messages,
logstash would fall behind. I've since switched to use the http output and
send
Try setting indices.recovery.max_bytes_per_sec much higher for faster
recovery. The default is 20mb/s, and there's a bug in versions prior to 1.2
that rate limit to even lower than that. You didn't specify how big your
indices are, but I can fairly accurately predict how long it'll take for
Removing the -XX+UseCMSInitiatingOccupancyOnly flag extended the time it
took before the JVM started full GC's from about 2 hours to 7 hours in my
cluster, but now it's back to constant full GC's. I'm out of ideas.
Suggestions?
mike
On Monday, June 23, 2014 10:25:20 AM UTC-4, Michael Hart
I'm running into a lot of issues with large heaps of = 8GB and full GC's,
as are a lot of others on this forum. Everything from Oracle/Sun indicates
that the G1 garbage collector is supposed to deal with large heaps better,
or at least give more consistency in terms of GC pauses, than the CMS