Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-27 Thread Lars Karlsson
Markus > > -Original message- > > From:Mikhail Khludnev > > Sent: Wednesday 26th July 2017 10:50 > > To: solr-user > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > > > I've looked into stacktrace. > > I see that one t

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-26 Thread Markus Jelsma
How can a bad node become normal just by restarting another bad node? Puzzling.. Thanks, Markus -Original message- > From:Mikhail Khludnev > Sent: Wednesday 26th July 2017 10:50 > To: solr-user > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > I&#x

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-26 Thread Mikhail Khludnev
xed. Some queries are executed but not very much. Attaching the stack > anyway. > > > > > > -Original message- > > From:Mikhail Khludnev > > Sent: Wednesday 19th July 2017 14:41 > > To: solr-user > > Subject: Re: 6.6 cloud starting to eat CPU af

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-20 Thread Markus Jelsma
e: 6.6 cloud starting to eat CPU after 8+ hours > > On 7/19/2017 3:35 AM, Markus Jelsma wrote: > > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is > > going crazy after a good part of the day has passed. It starts eating CPU > > for no good reaso

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Erick Erickson
indexed (3 - 4k). For some reason, index time does >> > increase with latency / CPU usage. >> > >> > This situation runs fine for many hours, then it will slowly start to go >> > bad, until nodes are restarted (or index size decreased). >> > >

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
esh start? Thanks, Markus -Original message- > From:Mikhail Khludnev > Sent: Wednesday 19th July 2017 14:41 > To: solr-user > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > You can get stack from kill -3 jstack even from solradmin. Overall, this > be

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
until nodes are restarted (or index size decreased). > > Thanks, > Markus > > -Original message- > > From:Mikhail Khludnev > > Sent: Wednesday 19th July 2017 14:18 > > To: solr-user > > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > >

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
many hours, then it will slowly start to go bad, until nodes are restarted (or index size decreased). Thanks, Markus -Original message- > From:Mikhail Khludnev > Sent: Wednesday 19th July 2017 14:18 > To: solr-user > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hou

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Mikhail Khludnev
> > The real distinction between busy and calm nodes is that busy nodes all > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as > second to fillBuffer(), what are they doing? Can you expose the stack deeper? Can they start to sync shards due to some reason? On Wed, Jul 19

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
all SSD's. Thanks, Markus -Original message- > From:Rick Leir > Sent: Wednesday 19th July 2017 12:48 > To: solr-user@lucene.apache.org > Subject: Re: 6.6 cloud starting to eat CPU after 8+ hours > > Markus, > What does iostat(1) tell you? Cheers -- Rick >

Re: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Rick Leir
Markus, What does iostat(1) tell you? Cheers -- Rick On July 19, 2017 5:35:32 AM EDT, Markus Jelsma wrote: >Hello, > >Another peculiarity here, our six node (2 shards / 3 replica's) cluster >is going crazy after a good part of the day has passed. It starts >eating CPU for no good reason and its

6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after a good part of the day has passed. It starts eating CPU for no good reason and its latency goes up. Grafana graphs show the problem really well After restarting 2/6 nodes, there is also quite a