Re: Marvel creating disk usage imbalance

2014-11-11 Thread Michael Hart
I think it's related to this: https://github.com/elasticsearch/elasticsearch/pull/8270 which I believe was released with 1.4. We see the same thing, with hot spots on some nodes. You can poke the cluster to rebalance itself, which that #8270 fixes permanently, using curl -XPOST

Re: hardware recommendation for dedicated client node

2014-11-11 Thread Michael Hart
I have dedicated client nodes for some really intense queries and aggregations. Clients typically have 2GB of heap. Our experience is that 2GB of Heap is sufficient, the client node doesn't do a whole lot. The bulk of the work is done on the data nodes. cheers mike On Monday, November 10,

Re: elasticsearch high cpu usage

2014-07-03 Thread Michael Hart
A few things I can think of to look at. During this high CPU load, what's the: - search rate - index rate - GC status (old and young) both number and duration - IOPS Are these nodes VM's? If so is there something else running on the other VM's? That CPU load doesn't look too bad

Re: Visibility

2014-07-03 Thread Michael Hart
I strongly recommend Marvel (and I don't work for elasticsearch), it's quite detailed and you get can insight into exactly what elasticsearch is doing. The only thing it doesn't have full visibility into is the detailed GC stats, for those you'll have to enable GC logging and use a gcviewer to

Re: Alerting in ELK stack?

2014-06-25 Thread Michael Hart
We use Nagios for alerting. I originally was using the nsca output plugin for logstash, but found that it took close to a second to execute the command line nsca client, and if we got flooded with alert messages, logstash would fall behind. I've since switched to use the http output and send

Re: Stress Free Guide To Expanding a Cluster

2014-06-25 Thread Michael Hart
Try setting indices.recovery.max_bytes_per_sec much higher for faster recovery. The default is 20mb/s, and there's a bug in versions prior to 1.2 that rate limit to even lower than that. You didn't specify how big your indices are, but I can fairly accurately predict how long it'll take for

Re: ES v1.1 continuous young gc pauses old gc, stops the world when old gc happens and splits cluster

2014-06-24 Thread Michael Hart
Removing the -XX+UseCMSInitiatingOccupancyOnly flag extended the time it took before the JVM started full GC's from about 2 hours to 7 hours in my cluster, but now it's back to constant full GC's. I'm out of ideas. Suggestions? mike On Monday, June 23, 2014 10:25:20 AM UTC-4, Michael Hart

G1 Garbage Collector with Elasticsearch = 1.1

2014-06-24 Thread Michael Hart
I'm running into a lot of issues with large heaps of = 8GB and full GC's, as are a lot of others on this forum. Everything from Oracle/Sun indicates that the G1 garbage collector is supposed to deal with large heaps better, or at least give more consistency in terms of GC pauses, than the CMS