This looks more like a memoryleak rather than a thread issue.

On Wed, 13 Oct 2021, 04:33 Joel Bernstein, <[email protected]> wrote:

> There is a thread dump on the Solr admin. You can use that to determine
> what all those threads are doing and where they are getting stuck. You can
> post parts of the thread dump back to this email thread as well.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Oct 12, 2021 at 11:15 AM Dominic Humphries
> <[email protected]> wrote:
>
> > We run 8.3.1 in prod without any problems, but we're having issues with
> > trying to upgrade.
> >
> > I've created an 8.9.0 leader & follower, imported our live data into it,
> > and am testing it via replaying requests made to prod. We're seeing a big
> > problem where fairly moderate request rates are causing the instance to
> > become so slow it fails healthcheck. The logs showed a lot of errors
> around
> > creating threads:
> >
> > solr[4507]: [124136.511s][warning][os,thread] Failed to start thread -
> > pthread_create failed (EAGAIN) for attributes: stacksize: 256k,
> guardsize:
> > 0k, detached.
> >
> > WARN  (qtp178604517-3891) [   ] o.e.j.i.ManagedSelector  =>
> > java.lang.OutOfMemoryError: unable to create native thread: possibly out
> of
> > memory or process/resource limits reached
> >
> > So I monitored thread count for the process whilst running the test suite
> > and saw a persistent pattern: Threads increased until maxed out, the logs
> > flooded with errors as it tried to create still more threads, and the
> > instance slowed down until terminated as unhealthy.
> >
> > The DefaultTasksMax is set to 4915, I've tried raising and lowering it
> but
> > regardless of value the result is the same: it gets maxed and everything
> > slows down.
> >
> > Is there anything I can do to stop solr spinning up so many threads it
> > ceases to function? There have been a few test passes where it
> > spontaneously dropped threadcount from thousands to hundreds and stayed
> up
> > longer, but there seems no pattern to when this happens. Running the same
> > tests on 8.3.1 results in a much slower increase in threads and it never
> > quite maxes them so things continue to function.
> >
> > See below for the thread count and healthcheck times seen on a (fairly
> > harsh) test run of 100 requests/sec
> >
> > Thanks
> >
> > Dominic
> >
> >
> > Threadcount:
> >
> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; ps -eLF | grep
> 'start.jar'
> > | wc -l; sleep 10s; done
> > Tue Oct 12 14:27:33 UTC 2021
> > 52
> > Tue Oct 12 14:27:43 UTC 2021
> > 52
> > Tue Oct 12 14:27:54 UTC 2021
> > 52
> > Tue Oct 12 14:28:04 UTC 2021
> > 52
> > Tue Oct 12 14:28:14 UTC 2021
> > 569
> > Tue Oct 12 14:28:24 UTC 2021
> > 899
> > Tue Oct 12 14:28:34 UTC 2021
> > 1198
> > Tue Oct 12 14:28:44 UTC 2021
> > 1589
> > Tue Oct 12 14:28:54 UTC 2021
> > 2016
> > Tue Oct 12 14:29:05 UTC 2021
> > 2451
> > Tue Oct 12 14:29:15 UTC 2021
> > 2851
> > Tue Oct 12 14:29:26 UTC 2021
> > 2934
> > Tue Oct 12 14:29:36 UTC 2021
> > 3249
> > Tue Oct 12 14:29:46 UTC 2021
> > 3501
> > Tue Oct 12 14:29:57 UTC 2021
> > 3734
> > Tue Oct 12 14:30:07 UTC 2021
> > 4128
> > Tue Oct 12 14:30:18 UTC 2021
> > 4374
> > Tue Oct 12 14:30:29 UTC 2021
> > 4637
> > Tue Oct 12 14:30:39 UTC 2021
> > 4693
> > Tue Oct 12 14:30:50 UTC 2021
> > 4807
> > Tue Oct 12 14:31:01 UTC 2021
> > 4916
> > Tue Oct 12 14:31:11 UTC 2021
> > 4916
> > Tue Oct 12 14:31:22 UTC 2021
> > Connection to 10.40.22.166 closed by remote host.
> >
> >
> > Healthcheck:
> >
> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; curl -v
> > localhost:8983/solr/ 2>&1 | grep HTTP; date; echo '----'; sleep
> > 10s; done
> > Tue Oct 12 14:27:34 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:27:34 UTC 2021
> > ----
> > Tue Oct 12 14:27:44 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:27:44 UTC 2021
> > ----
> > Tue Oct 12 14:27:54 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:27:54 UTC 2021
> > ----
> > Tue Oct 12 14:28:04 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:28:04 UTC 2021
> > ----
> > Tue Oct 12 14:28:14 UTC 2021
> > > GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:28:16 UTC 2021
> > ----
> > Tue Oct 12 14:28:26 UTC 2021
> > > GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:28:39 UTC 2021
> > ----
> > Tue Oct 12 14:28:49 UTC 2021
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--
> >   0> GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:23 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:29:13 UTC 2021
> > ----
> > Tue Oct 12 14:29:23 UTC 2021
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--
> >   0> GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:29:25 UTC 2021
> > ----
> > Tue Oct 12 14:29:35 UTC 2021
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--
> >   0> GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:29:44 UTC 2021
> > ----
> > Tue Oct 12 14:29:54 UTC 2021
> > > GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:30:06 UTC 2021
> > ----
> > Tue Oct 12 14:30:16 UTC 2021
> > > GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:30:20 UTC 2021
> > ----
> > Tue Oct 12 14:30:30 UTC 2021
> > > GET /solr/ HTTP/1.1
> >   0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--
> >   0< HTTP/1.1 200 OK
> > Tue Oct 12 14:30:33 UTC 2021
> > ----
> > Tue Oct 12 14:30:43 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:30:43 UTC 2021
> > ----
> > Tue Oct 12 14:30:53 UTC 2021
> > > GET /solr/ HTTP/1.1
> > Tue Oct 12 14:30:55 UTC 2021
> > ----
> > Tue Oct 12 14:31:05 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:31:05 UTC 2021
> > ----
> > Tue Oct 12 14:31:15 UTC 2021
> > > GET /solr/ HTTP/1.1
> > < HTTP/1.1 200 OK
> > Tue Oct 12 14:31:15 UTC 2021
> > ----
> > Connection to 10.40.22.166 closed by remote host.
> >
>

Reply via email to