Were you having any OOM errors beforehand? If so, that could have caused
some GC of objects that other threads still expect to be reachable, leading
to these null monitors.

On Fri, Mar 5, 2021 at 12:55 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> I'm investigating a node on solr 8.3.1 running in cloud mode which appears
> to have deadlocked, and I'm trying to figure out if this is a known issue
> or not, and looking for some guidance in understanding both (a) whether
> this is a resolved issue in future releases or needs a bug, and (b) how to
> lower the risk of recurrence until it is fixed.
>
> Here is what I've observed:
>
>    - strace shows the main process waiting. A spot check on child processes
>    shows the same, though I did not deep dive all of the threads yet (there
>    are over 100).
>    - the server was not doing anything or busy, except for jvm sitting at
>    constant memory usage. No resource of memory, swap, cpu, etc... was
> limited
>    or showing active usage.
>    - jcmd Thread.Print shows some interesting info which suggests a
>    deadlock or another type of locking issue
>       - For example, I found this log suggests something unusual because it
>       looks like it's trying to lock a null object
>          - "Finalizer" #3 daemon prio=8 os_prio=0 cpu=11.11ms
>          elapsed=111111.11s tid=0x0000111111110100 nid=0x1111 in
> Object.wait()
>           [0x0000111111111000]
>             java.lang.Thread.State: WAITING (on object monitor)
>                  at java.lang.Object.wait(java.base@11.0.7/Native Method)
>                  - waiting on <no object reference available>
>                  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>          /ReferenceQueue.java:155)
>                  - waiting to re-lock in wait() <0x0000000200222220> (a
>          java.lang.ref.ReferenceQueue$Lock)
>                  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>          /ReferenceQueue.java:176)
>                  at
>          java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.7
>          /Finalizer.java:170)
>          - I also see a lot of this. Some addressess occur multiple times,
>       but one in particular occurs 31 times. Maybe related?
>          - "h2sc-1-thread-11" #110 prio=5 os_prio=0 cpu=54.29ms
>          elapsed=111111.11s tid=0x0000111110010100 nid=0x1111 waiting
> on condition
>           [0x0000111110011000]
>             java.lang.Thread.State: WAITING (parking)
>                  at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native
>          Method)
>                  - parking to wait for  <0x0000000300333333>
>
> Can anyone help answer whether this is known or what I could look at next?
>
> Thanks!
> Stephen
>

Reply via email to