Re: SolrCloud 4.x hangs under high update volume

Erick Erickson Thu, 12 Sep 2013 03:43:57 -0700

Fewer client threads updating makes sense, and going to 1 core also seems
like it might help. But it's all a crap-shoot unless the underlying cause
gets fixed up. Both would improve things, but you'll still hit the problem
sometime, probably when doing a demo for your boss ;).


Adrien has branched the code for SOLR 4.5 in preparation for a release
candidate tentatively scheduled for next week. You might just start working
with that branch if you can rather than apply individual patches...

I suspect there'll be a couple more changes to this code (looks like
Shikhar already raised an issue for instance) before 4.5 is finally cut...

FWIW,
Erick



On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt <t...@elementspace.com>wrote:

> Thanks Erick!
>
> Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> patch. I think that is a very, very useful patch by the way. SOLR-5232
> seems promising as well.
>
> I see your point on the more-shards idea, this is obviously a
> global/instance-level lock. If I really had to, I suppose I could run more
> Solr instances to reduce locking then? Currently I have 2 cores per
> instance and I could go 1-to-1 to simplify things.
>
> The good news is we seem to be more stable since changing to a bigger
> client->solr batch-size and fewer client threads updating.
>
> Cheers,
>
> Tim
>
> On 11/09/13 04:19 AM, Erick Erickson wrote:
>
>> If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
>> copy of the 4x branch. By "recent", I mean like today, it looks like Mark
>> applied this early this morning. But several reports indicate that this
>> will
>> solve your problem.
>>
>> I would expect that increasing the number of shards would make the problem
>> worse, not
>> better.
>>
>> There's also SOLR-5232...
>>
>> Best
>> Erick
>>
>>
>> On Tue, Sep 10, 2013 at 5:20 PM, Tim 
>> Vaillancourt<tim@elementspace.**com<t...@elementspace.com>
>> >wrote:
>>
>>  Hey guys,
>>>
>>> Based on my understanding of the problem we are encountering, I feel
>>> we've
>>> been able to reduce the likelihood of this issue by making the following
>>> changes to our app's usage of SolrCloud:
>>>
>>> 1) We increased our document batch size to 200 from 10 - our app batches
>>> updates to reduce HTTP requests/overhead. The theory is increasing the
>>> batch size reduces the likelihood of this issue happening.
>>> 2) We reduced to 1 application node sending updates to SolrCloud - we
>>> write
>>> Solr updates to Redis, and have previously had 4 application nodes
>>> pushing
>>> the updates to Solr (popping off the Redis queue). Reducing the number of
>>> nodes pushing to Solr reduces the concurrency on SolrCloud.
>>> 3) Less threads pushing to SolrCloud - due to the increase in batch size,
>>> we were able to go down to 5 update threads on the update-pushing-app
>>> (from
>>> 10 threads).
>>>
>>> To be clear the above only reduces the likelihood of the issue happening,
>>> and DOES NOT actually resolve the issue at hand.
>>>
>>> If we happen to encounter issues with the above 3 changes, the next steps
>>> (I could use some advice on) are:
>>>
>>> 1) Increase the number of shards (2x) - the theory here is this reduces
>>> the
>>> locking on shards because there are more shards. Am I onto something
>>> here,
>>> or will this not help at all?
>>> 2) Use CloudSolrServer - currently we have a plain-old least-connection
>>> HTTP VIP. If we go "direct" to what we need to update, this will reduce
>>> concurrency in SolrCloud a bit. Thoughts?
>>>
>>> Thanks all!
>>>
>>> Cheers,
>>>
>>> Tim
>>>
>>>
>>> On 6 September 2013 14:47, Tim 
>>> Vaillancourt<tim@elementspace.**com<t...@elementspace.com>>
>>>  wrote:
>>>
>>>  Enjoy your trip, Mark! Thanks again for the help!
>>>>
>>>> Tim
>>>>
>>>>
>>>> On 6 September 2013 14:18, Mark Miller<markrmil...@gmail.com>  wrote:
>>>>
>>>>  Okay, thanks, useful info. Getting on a plane, but ill look more at
>>>>> this
>>>>> soon. That 10k thread spike is good to know - that's no good and could
>>>>> easily be part of the problem. We want to keep that from happening.
>>>>>
>>>>> Mark
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Sep 6, 2013, at 2:05 PM, Tim 
>>>>> Vaillancourt<tim@elementspace.**com<t...@elementspace.com>
>>>>> >
>>>>> wrote:
>>>>>
>>>>>  Hey Mark,
>>>>>>
>>>>>> The farthest we've made it at the same batch size/volume was 12 hours
>>>>>> without this patch, but that isn't consistent. Sometimes we would only
>>>>>>
>>>>> get
>>>>>
>>>>>> to 6 hours or less.
>>>>>>
>>>>>> During the crash I can see an amazing spike in threads to 10k which is
>>>>>> essentially our ulimit for the JVM, but I strangely see no
>>>>>>
>>>>> "OutOfMemory:
>>>
>>>> cannot open native thread errors" that always follow this. Weird!
>>>>>>
>>>>>> We also notice a spike in CPU around the crash. The instability caused
>>>>>>
>>>>> some
>>>>>
>>>>>> shard recovery/replication though, so that CPU may be a symptom of the
>>>>>> replication, or is possibly the root cause. The CPU spikes from about
>>>>>> 20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
>>>>>>
>>>>> while
>>>>>
>>>>>> spiking isn't quite "pinned" (very beefy Dell R720s - 16 core Xeons,
>>>>>>
>>>>> whole
>>>>>
>>>>>> index is in 128GB RAM, 6xRAID10 15k).
>>>>>>
>>>>>> More on resources: our disk I/O seemed to spike about 2x during the
>>>>>>
>>>>> crash
>>>>>
>>>>>> (about 1300kbps written to 3500kbps), but this may have been the
>>>>>> replication, or ERROR logging (we generally log nothing due to
>>>>>> WARN-severity unless something breaks).
>>>>>>
>>>>>> Lastly, I found this stack trace occurring frequently, and have no
>>>>>>
>>>>> idea
>>>
>>>> what it is (may be useful or not):
>>>>>>
>>>>>> "java.lang.**IllegalStateException :
>>>>>>       at
>>>>>>
>>>>> org.eclipse.jetty.server.**Response.resetBuffer(Response.**java:964)
>>>
>>>>       at org.eclipse.jetty.server.**Response.sendError(Response.**
>>>>>> java:325)
>>>>>>       at
>>>>>>
>>>>>>  org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
>>> SolrDispatchFilter.java:692)
>>>
>>>>       at
>>>>>>
>>>>>>  org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:380)
>>>
>>>>       at
>>>>>>
>>>>>>  org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:155)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
>>> doFilter(ServletHandler.java:**1423)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
>>> ServletHandler.java:450)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
>>> ScopedHandler.java:138)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.security.**SecurityHandler.handle(**
>>> SecurityHandler.java:564)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**session.SessionHandler.**
>>> doHandle(SessionHandler.java:**213)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.ContextHandler.**
>>> doHandle(ContextHandler.java:**1083)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.servlet.**ServletHandler.doScope(**
>>> ServletHandler.java:379)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**session.SessionHandler.**
>>> doScope(SessionHandler.java:**175)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.ContextHandler.**
>>> doScope(ContextHandler.java:**1017)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
>>> ScopedHandler.java:136)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**
>>> handle(**ContextHandlerCollection.java:**258)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.HandlerCollection.**
>>> handle(HandlerCollection.java:**109)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
>>> HandlerWrapper.java:97)
>>>
>>>>       at org.eclipse.jetty.server.**Server.handle(Server.java:445)
>>>>>>       at
>>>>>>
>>>>> org.eclipse.jetty.server.**HttpChannel.handle(**HttpChannel.java:260)
>>>>>
>>>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.server.**HttpConnection.onFillable(**
>>> HttpConnection.java:225)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.io.**AbstractConnection$**ReadCallback.run(**
>>> AbstractConnection.java:358)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
>>> QueuedThreadPool.java:596)
>>>
>>>>       at
>>>>>>
>>>>>>  org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(**
>>> QueuedThreadPool.java:527)
>>>
>>>>       at java.lang.Thread.run(Thread.**java:724)"
>>>>>>
>>>>>> On your live_nodes question, I don't have historical data on this from
>>>>>>
>>>>> when
>>>>>
>>>>>> the crash occurred, which I guess is what you're looking for. I could
>>>>>>
>>>>> add
>>>>>
>>>>>> this to our monitoring for future tests, however. I'd be glad to
>>>>>>
>>>>> continue
>>>>>
>>>>>> further testing, but I think first more monitoring is needed to
>>>>>>
>>>>> understand
>>>>>
>>>>>> this further. Could we come up with a list of metrics that would be
>>>>>>
>>>>> useful
>>>>>
>>>>>> to see following another test and successful crash?
>>>>>>
>>>>>> Metrics needed:
>>>>>>
>>>>>> 1) # of live_nodes.
>>>>>> 2) Full stack traces.
>>>>>> 3) CPU used by Solr's JVM specifically (instead of system-wide).
>>>>>> 4) Solr's JVM thread count (already done)
>>>>>> 5) ?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Tim Vaillancourt
>>>>>>
>>>>>>
>>>>>> On 6 September 2013 13:11, Mark Miller<markrmil...@gmail.com>  wrote:
>>>>>>
>>>>>>  Did you ever get to index that long before without hitting the
>>>>>>>
>>>>>> deadlock?
>>>>>
>>>>>> There really isn't anything negative the patch could be introducing,
>>>>>>>
>>>>>> other
>>>>>
>>>>>> than allowing for some more threads to possibly run at once. If I had
>>>>>>>
>>>>>> to
>>>>>
>>>>>> guess, I would say its likely this patch fixes the deadlock issue and
>>>>>>>
>>>>>> your
>>>>>
>>>>>> seeing another issue - which looks like the system cannot keep up
>>>>>>>
>>>>>> with
>>>
>>>> the
>>>>>
>>>>>> requests or something for some reason - perhaps due to some OS
>>>>>>>
>>>>>> networking
>>>>>
>>>>>> settings or something (more guessing). Connection refused happens
>>>>>>>
>>>>>> generally
>>>>>
>>>>>> when there is nothing listening on the port.
>>>>>>>
>>>>>>> Do you see anything interesting change with the rest of the system?
>>>>>>>
>>>>>> CPU
>>>
>>>> usage spikes or something like that?
>>>>>>>
>>>>>>> Clamping down further on the overall number of threads night help
>>>>>>>
>>>>>> (which
>>>>>
>>>>>> would require making something configurable). How many nodes are
>>>>>>>
>>>>>> listed in
>>>>>
>>>>>> zk under live_nodes?
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Sep 6, 2013, at 12:02 PM, Tim 
>>>>>>> Vaillancourt<tim@elementspace.**com<t...@elementspace.com>
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hey guys,
>>>>>>>>
>>>>>>>> (copy of my post to SOLR-5216)
>>>>>>>>
>>>>>>>> We tested this patch and unfortunately encountered some serious
>>>>>>>>
>>>>>>> issues a
>>>>>
>>>>>> few hours of 500 update-batches/sec. Our update batch is 10 docs, so
>>>>>>>>
>>>>>>> we
>>>>>
>>>>>> are
>>>>>>>
>>>>>>>> writing about 5000 docs/sec total, using autoCommit to commit the
>>>>>>>>
>>>>>>> updates
>>>>>
>>>>>> (no explicit commits).
>>>>>>>>
>>>>>>>> Our environment:
>>>>>>>>
>>>>>>>>    Solr 4.3.1 w/SOLR-5216 patch.
>>>>>>>>    Jetty 9, Java 1.7.
>>>>>>>>    3 solr instances, 1 per physical server.
>>>>>>>>    1 collection.
>>>>>>>>    3 shards.
>>>>>>>>    2 replicas (each instance is a leader and a replica).
>>>>>>>>    Soft autoCommit is 1000ms.
>>>>>>>>    Hard autoCommit is 15000ms.
>>>>>>>>
>>>>>>>> After about 6 hours of stress-testing this patch, we see many of
>>>>>>>>
>>>>>>> these
>>>
>>>> stalled transactions (below), and the Solr instances start to see
>>>>>>>>
>>>>>>> each
>>>
>>>> other as down, flooding our Solr logs with "Connection Refused"
>>>>>>>>
>>>>>>> exceptions,
>>>>>>>
>>>>>>>> and otherwise no obviously-useful logs that I could see.
>>>>>>>>
>>>>>>>> I did notice some stalled transactions on both /select and /update,
>>>>>>>> however. This never occurred without this patch.
>>>>>>>>
>>>>>>>> Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC
>>>>>>>> Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9
>>>>>>>>
>>>>>>>> Lastly, I have a summary of the ERROR-severity logs from this
>>>>>>>>
>>>>>>> 24-hour
>>>
>>>> soak.
>>>>>>>
>>>>>>>> My script "normalizes" the ERROR-severity stack traces and returns
>>>>>>>>
>>>>>>> them
>>>>>
>>>>>> in
>>>>>>>
>>>>>>>> order of occurrence.
>>>>>>>>
>>>>>>>> Summary of my solr.log: http://pastebin.com/pBdMAWeb
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Tim Vaillancourt
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6 September 2013 07:27, Markus Jelsma<
>>>>>>>>
>>>>>>> markus.jel...@openindex.io>
>>>
>>>> wrote:
>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> -----Original message-----
>>>>>>>>>
>>>>>>>>>> From:Erick 
>>>>>>>>>> Erickson<erickerickson@gmail.**com<erickerick...@gmail.com>
>>>>>>>>>> >
>>>>>>>>>> Sent: Friday 6th September 2013 16:20
>>>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>>>> Subject: Re: SolrCloud 4.x hangs under high update volume
>>>>>>>>>>
>>>>>>>>>> Markus:
>>>>>>>>>>
>>>>>>>>>> See: 
>>>>>>>>>> https://issues.apache.org/**jira/browse/SOLR-5216<https://issues.apache.org/jira/browse/SOLR-5216>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma
>>>>>>>>>> <markus.jel...@openindex.io>**wrote:
>>>>>>>>>>
>>>>>>>>>>  Hi Mark,
>>>>>>>>>>>
>>>>>>>>>>> Got an issue to watch?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Markus
>>>>>>>>>>>
>>>>>>>>>>> -----Original message-----
>>>>>>>>>>>
>>>>>>>>>>>> From:Mark Miller<markrmil...@gmail.com>
>>>>>>>>>>>> Sent: Wednesday 4th September 2013 16:55
>>>>>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>>>>>> Subject: Re: SolrCloud 4.x hangs under high update volume
>>>>>>>>>>>>
>>>>>>>>>>>> I'm going to try and fix the root cause for 4.5 - I've suspected
>>>>>>>>>>>>
>>>>>>>>>>> what it
>>>>>>>>>
>>>>>>>>>> is since early this year, but it's never personally been an
>>>>>>>>>>>
>>>>>>>>>> issue,
>>>
>>>> so
>>>>>
>>>>>> it's
>>>>>>>>>
>>>>>>>>>> rolled along for a long time.
>>>>>>>>>>>
>>>>>>>>>>>> Mark
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt<
>>>>>>>>>>>>
>>>>>>>>>>> t...@elementspace.com>
>>>>>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am looking into an issue we've been having with SolrCloud
>>>>>>>>>>>>>
>>>>>>>>>>>> since
>>>
>>>> the
>>>>>>>>>
>>>>>>>>>> beginning of our testing, all the way from 4.1 to 4.3 (haven't
>>>>>>>>>>>>>
>>>>>>>>>>>> tested
>>>>>>>>>
>>>>>>>>>> 4.4.0
>>>>>>>>>>>
>>>>>>>>>>>> yet). I've noticed other users with this same issue, so I'd
>>>>>>>>>>>>>
>>>>>>>>>>>> really
>>>>>
>>>>>> like to
>>>>>>>>>>>
>>>>>>>>>>>> get to the bottom of it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Under a very, very high rate of updates (2000+/sec), after 1-12
>>>>>>>>>>>>>
>>>>>>>>>>>> hours
>>>>>>>>>
>>>>>>>>>> we
>>>>>>>>>>>
>>>>>>>>>>>> see stalled transactions that snowball to consume all Jetty
>>>>>>>>>>>>>
>>>>>>>>>>>> threads in
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>> JVM. This eventually causes the JVM to hang with most threads
>>>>>>>>>>>>>
>>>>>>>>>>>> waiting
>>>>>>>>>
>>>>>>>>>> on
>>>>>>>>>>>
>>>>>>>>>>>> the condition/stack provided at the bottom of this message. At
>>>>>>>>>>>>>
>>>>>>>>>>>> this
>>>>>
>>>>>> point
>>>>>>>>>>>
>>>>>>>>>>>> SolrCloud instances then start to see their neighbors (who also
>>>>>>>>>>>>>
>>>>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>> all
>>>>>>>>>>>
>>>>>>>>>>>> threads hung) as down w/"Connection Refused", and the shards
>>>>>>>>>>>>>
>>>>>>>>>>>> become
>>>>>
>>>>>> "down"
>>>>>>>>>>>
>>>>>>>>>>>> in state. Sometimes a node or two survives and just returns
>>>>>>>>>>>>>
>>>>>>>>>>>> 503s
>>>
>>>> "no
>>>>>>>>>
>>>>>>>>>> server
>>>>>>>>>>>
>>>>>>>>>>>> hosting shard" errors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As a workaround/experiment, we have tuned the number of threads
>>>>>>>>>>>>>
>>>>>>>>>>>> sending
>>>>>>>>>
>>>>>>>>>> updates to Solr, as well as the batch size (we batch updates
>>>>>>>>>>>>>
>>>>>>>>>>>> from
>>>
>>>> client ->
>>>>>>>>>>>
>>>>>>>>>>>> solr), and the Soft/Hard autoCommits, all to no avail. Turning
>>>>>>>>>>>>>
>>>>>>>>>>>> off
>>>>>
>>>>>> Client-to-Solr batching (1 update = 1 call to Solr), which also
>>>>>>>>>>>>>
>>>>>>>>>>>> did not
>>>>>>>>>
>>>>>>>>>> help. Certain combinations of update threads and batch sizes
>>>>>>>>>>>>>
>>>>>>>>>>>> seem
>>>
>>>> to
>>>>>>>>>
>>>>>>>>>> mask/help the problem, but not resolve it entirely.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Our current environment is the following:
>>>>>>>>>>>>> - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
>>>>>>>>>>>>> - 3 x Zookeeper instances, external Java 7 JVM.
>>>>>>>>>>>>> - 1 collection, 3 shards, 2 replicas (each node is a leader of
>>>>>>>>>>>>>
>>>>>>>>>>>> 1
>>>
>>>> shard
>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>>> a replica of 1 shard).
>>>>>>>>>>>>> - Log4j 1.2 for Solr logs, set to WARN. This log has no
>>>>>>>>>>>>>
>>>>>>>>>>>> movement
>>>
>>>> on a
>>>>>>>>>
>>>>>>>>>> good
>>>>>>>>>>>
>>>>>>>>>>>> day.
>>>>>>>>>>>>> - 5000 max jetty threads (well above what we use when we are
>>>>>>>>>>>>>
>>>>>>>>>>>> healthy),
>>>>>>>>>
>>>>>>>>>> Linux-user threads ulimit is 6000.
>>>>>>>>>>>>> - Occurs under Jetty 8 or 9 (many versions).
>>>>>>>>>>>>> - Occurs under Java 1.6 or 1.7 (several minor versions).
>>>>>>>>>>>>> - Occurs under several JVM tunings.
>>>>>>>>>>>>> - Everything seems to point to Solr itself, and not a Jetty or
>>>>>>>>>>>>>
>>>>>>>>>>>> Java
>>>>>
>>>>>> version
>>>>>>>>>>>
>>>>>>>>>>>> (I hope I'm wrong).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The stack trace that is holding up all my Jetty QTP threads is
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>
>>>>>> following, which seems to be waiting on a lock that I would
>>>>>>>>>>>>>
>>>>>>>>>>>> very
>>>
>>>> much
>>>>>>>>>
>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>>> to understand further:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "java.lang.Thread.State: WAITING (parking)
>>>>>>>>>>>>>   at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>>>   - parking to wait for<0x00000007216e68d8>  (a
>>>>>>>>>>>>> java.util.concurrent.**Semaphore$NonfairSync)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> java.util.concurrent.locks.**LockSupport.park(LockSupport.**
>>>>>>>>> java:186)
>>>>>>>>>
>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> java.util.concurrent.locks.**AbstractQueuedSynchronizer.**
>>> parkAndCheckInterrupt(**AbstractQueuedSynchronizer.**java:834)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> java.util.concurrent.locks.**AbstractQueuedSynchronizer.**
>>> doAcquireSharedInterruptibly(**AbstractQueuedSynchronizer.**java:994)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> java.util.concurrent.locks.**AbstractQueuedSynchronizer.**
>>> acquireSharedInterruptibly(**AbstractQueuedSynchronizer.**java:1303)
>>>
>>>>   at java.util.concurrent.**Semaphore.acquire(Semaphore.**java:317)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.util.**AdjustableSemaphore.acquire(**
>>> AdjustableSemaphore.java:61)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**SolrCmdDistributor.submit(**
>>> SolrCmdDistributor.java:418)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**SolrCmdDistributor.submit(**
>>> SolrCmdDistributor.java:368)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**SolrCmdDistributor.flushAdds(**
>>> SolrCmdDistributor.java:300)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**SolrCmdDistributor.finish(**
>>> SolrCmdDistributor.java:96)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**processor.**
>>> DistributedUpdateProcessor.**doFinish(**DistributedUpdateProcessor.**
>>> java:462)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.update.**processor.**
>>> DistributedUpdateProcessor.**finish(**DistributedUpdateProcessor.**
>>> java:1178)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.handler.**ContentStreamHandlerBase.**
>>> handleRequestBody(**ContentStreamHandlerBase.java:**83)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
>>> RequestHandlerBase.java:135)
>>>
>>>>   at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1820)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.servlet.**SolrDispatchFilter.execute(**
>>> SolrDispatchFilter.java:656)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:359)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>>> SolrDispatchFilter.java:155)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
>>> doFilter(ServletHandler.java:**1486)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
>>> ServletHandler.java:503)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
>>> ScopedHandler.java:138)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.security.**SecurityHandler.handle(**
>>> SecurityHandler.java:564)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**session.SessionHandler.**
>>> doHandle(SessionHandler.java:**213)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.ContextHandler.**
>>> doHandle(ContextHandler.java:**1096)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.servlet.**ServletHandler.doScope(**
>>> ServletHandler.java:432)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**session.SessionHandler.**
>>> doScope(SessionHandler.java:**175)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.ContextHandler.**
>>> doScope(ContextHandler.java:**1030)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
>>> ScopedHandler.java:136)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.**ContextHandlerCollection.*
>>> *handle(**ContextHandlerCollection.java:**201)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.HandlerCollection.**
>>> handle(HandlerCollection.java:**109)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
>>> HandlerWrapper.java:97)
>>>
>>>>   at org.eclipse.jetty.server.**Server.handle(Server.java:445)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**HttpChannel.handle(**
>>>>>>>>> HttpChannel.java:268)
>>>>>>>>>
>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.server.**HttpConnection.onFillable(**
>>> HttpConnection.java:229)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.io.**AbstractConnection$**ReadCallback.run(**
>>> AbstractConnection.java:358)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
>>> QueuedThreadPool.java:601)
>>>
>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>> org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(**
>>> QueuedThreadPool.java:532)
>>>
>>>>   at java.lang.Thread.run(Thread.**java:724)"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some questions I had were:
>>>>>>>>>>>>> 1) What exclusive locks does SolrCloud "make" when performing
>>>>>>>>>>>>>
>>>>>>>>>>>> an
>>>
>>>> update?
>>>>>>>>>>>
>>>>>>>>>>>> 2) Keeping in mind I do not read or write java (sorry :D),
>>>>>>>>>>>>>
>>>>>>>>>>>> could
>>>
>>>> someone
>>>>>>>>>>>
>>>>>>>>>>>> help me understand "what" solr is locking in this case at
>>>>>>>>>>>>>
>>>>>>>>>>>> "org.apache.solr.util.**AdjustableSemaphore.acquire(**
>>> AdjustableSemaphore.java:61)"
>>>
>>>> when performing an update? That will help me understand where
>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>
>>>> look
>>>>>>>>>
>>>>>>>>>> next.
>>>>>>>>>>>
>>>>>>>>>>>> 3) It seems all threads in this state are waiting for
>>>>>>>>>>>>>
>>>>>>>>>>>> "0x00000007216e68d8",
>>>>>>>>>>>
>>>>>>>>>>>> is there a way to tell what "0x00000007216e68d8" is?
>>>>>>>>>>>>> 4) Is there a limit to how many updates you can do in
>>>>>>>>>>>>>
>>>>>>>>>>>> SolrCloud?
>>>
>>>> 5) Wild-ass-theory: would more shards provide more locks
>>>>>>>>>>>>>
>>>>>>>>>>>> (whatever
>>>>>
>>>>>> they
>>>>>>>>>
>>>>>>>>>> are) on update, and thus more update throughput?
>>>>>>>>>>>>>
>>>>>>>>>>>>> To those interested, I've provided a stacktrace of 1 of 3 nodes
>>>>>>>>>>>>>
>>>>>>>>>>>> at
>>>>
>>>>

Re: SolrCloud 4.x hangs under high update volume

Reply via email to