Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Erick Erickson Wed, 26 Jun 2013 06:38:53 -0700

Right, unfortunately this is a gremlin lurking in the weeds, see:
http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock


There are a couple of ways to deal with this:
1> go ahead and up the limit and re-compile, if you look at
SolrCmdDistributor the semaphore is defined there.

2> https://issues.apache.org/jira/browse/SOLR-4816 should
address this as well as improve indexing throughput. I'm totally sure
Joel (the guy working on this) would be thrilled if you were able to
verify that these two points, I'd ask him (on the JIRA) whether he thinks
it's ready to test.

3> Reduce the number of threads you're indexing with

4> index docs in small packets, perhaps even one and just rack
together a zillion threads to get throughput.

FWIW,
Erick

On Tue, Jun 25, 2013 at 8:55 AM, Vinay Pothnis <poth...@gmail.com> wrote:
> Jason and Scott,
>
> Thanks for the replies and pointers!
> Yes, I will consider the 'maxDocs' value as well. How do i monitor the
> transaction logs during the interval between commits?
>
> Thanks
> Vinay
>
>
> On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
>> Scott,
>>
>> My comment was meant to be a bit tongue-in-cheek, but my intent in the
>> statement was to represent hard failure along the lines Vinay is seeing.
>>  We're talking about OutOfMemoryException conditions, total cluster
>> paralysis requiring restart, or other similar and disastrous conditions.
>>
>> Where that line is is impossible to generically define, but trivial to
>> accomplish.  What any of us running Solr has to achieve is a realistic
>> simulation of our desired production load (probably well above peak) and to
>> see what limits are reached.  Armed with that information we tweak.  In
>> this case, we look at finding the point where data ingestion reaches a
>> natural limit.  For some that may be JVM GC, for others memory buffer size
>> on the client load, and yet others it may be I/O limits on multithreaded
>> reads from a database or file system.
>>
>> In old Solr days we had a little less to worry about.  We might play with
>> a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
>> commits and rollback recoveries.  But with 4.x we now have more durable
>> write options and NRT to consider, and SolrCloud begs to use this.  So we
>> have to consider transaction logs, the file handles they leave open until
>> commit operations occur, and how we want to manage writing to all cores
>> simultaneously instead of a more narrow master/slave relationship.
>>
>> It's all manageable, all predictable (with some load testing) and all
>> filled with many possibilities to meet our specific needs.  Considering hat
>> each person's data model, ingestion pipeline, request processors, and field
>> analysis steps will be different, 5 threads of input at face value doesn't
>> really contemplate the whole problem.  We have to measure our actual data
>> against our expectations and find where the weak chain links are to
>> strengthen them.  The symptoms aren't necessarily predictable in advance of
>> this testing, but they're likely addressable and not difficult to decipher.
>>
>> For what it's worth, SolrCloud is new enough that we're still experiencing
>> some "uncharted territory with unknown ramifications" but with continued
>> dialog through channels like these there are fewer territories without good
>> cartography :)
>>
>> Hope that's of use!
>>
>> Jason
>>
>>
>>
>> On Jun 24, 2013, at 7:12 PM, Scott Lundgren <
>> scott.lundg...@carbonblack.com> wrote:
>>
>> > Jason,
>> >
>> > Regarding your statement "push you over the edge"- what does that mean?
>> > Does it mean "uncharted territory with unknown ramifications" or
>> something
>> > more like specific, known symptoms?
>> >
>> > I ask because our use is similar to Vinay's in some respects, and we want
>> > to be able to push the capabilities of write perf - but not over the
>> edge!
>> > In particular, I am interested in knowing the symptoms of failure, to
>> help
>> > us troubleshoot the underlying problems if and when they arise.
>> >
>> > Thanks,
>> >
>> > Scott
>> >
>> > On Monday, June 24, 2013, Jason Hellman wrote:
>> >
>> >> Vinay,
>> >>
>> >> You may wish to pay attention to how many transaction logs are being
>> >> created along the way to your hard autoCommit, which should truncate the
>> >> open handles for those files.  I might suggest setting a maxDocs value
>> in
>> >> parallel with your maxTime value (you can use both) to ensure the commit
>> >> occurs at either breakpoint.  30 seconds is plenty of time for 5
>> parallel
>> >> processes of 20 document submissions to push you over the edge.
>> >>
>> >> Jason
>> >>
>> >> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis <poth...@gmail.com> wrote:
>> >>
>> >>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.
>> >>>
>> >>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman <
>> >>> jhell...@innoventsolutions.com> wrote:
>> >>>
>> >>>> Vinay,
>> >>>>
>> >>>> What autoCommit settings do you have for your indexing process?
>> >>>>
>> >>>> Jason
>> >>>>
>> >>>> On Jun 24, 2013, at 1:28 PM, Vinay Pothnis <poth...@gmail.com> wrote:
>> >>>>
>> >>>>> Here is the ulimit -a output:
>> >>>>>
>> >>>>> core file size           (blocks, -c)  0  data seg size
>> >>>> (kbytes,
>> >>>>> -d)  unlimited  scheduling priority              (-e)  0  file size
>> >>>>>              (blocks, -f)  unlimited  pending signals
>> >>>>> (-i)  179963  max locked memory        (kbytes, -l)  64  max memory
>> >> size
>> >>>>>        (kbytes, -m)  unlimited  open files                       (-n)
>> >>>>> 32769  pipe size             (512 bytes, -p)  8  POSIX message queues
>> >>>>>  (bytes,
>> >>>>> -q)  819200  real-time priority               (-r)  0  stack size
>> >>>>> (kbytes, -s)  10240  cpu time                (seconds, -t)  unlimited
>> >>>> max
>> >>>>> user processes               (-u)  140000  virtual memory
>> >>>> (kbytes,
>> >>>>> -v)  unlimited  file locks                       (-x)  unlimited
>> >>>>>
>> >>>>> On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro <
>> yago.rive...@gmail.com
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I have the same issue too, and the deploy is quasi exact like than
>> >> mine,
>> >>>>>>
>> >>>>
>> >>
>> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
>> >>>>>>
>> >>>>>> With some concurrence and batches of 10 solr apparently have some
>> >>>> deadlock
>> >>>>>> distributing updates
>> >>>>>>
>> >>>>>> Can you dump the configuration of the ulimit on your servers?, some
>> >>>> people
>> >>>>>> had the same issues because they are reach the ulimit maximum
>> defined
>> >>>> for
>> >>>>>> descriptor and process.
>> >>>>>>
>> >>>>>> --
>> >>>>>> Yago Riveiro
>> >>>>>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> >>>>>>
>> >>>>>>
>> >>>>>> On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:
>> >>>>>>
>> >>>>>>> Hello All,
>> >>>>>>>
>> >>>>>>> I have the following set up of solr cloud.
>> >>>>>>>
>> >>>>>>> * solr version 4.3.1
>> >>>>>>> * 3 node solr cloud + replciation factor 2
>> >>>>>>> * 3 zoo keepers
>> >>>>>>> * load balancer in front of the 3 solr nodes
>> >>>>>>>
>> >>>>>>> I am seeing this strange behavior when I am indexing a large number
>> >> of
>> >>>>>>> documents (10 mil). When I have more than 3-5 threads sending
>> >> documents
>> >>>>>> (in
>> >>>>>>> batch of 20) to solr, sometimes solr goes into a hung state. After
>> >> this
>> >>>>>> all
>> >>>>>>> the update requests get timed out. What we see via AppDynamics (a
>> >>>>>>> performance monitoring tool) is that there are a number of threads
>> >> that
>> >>>>>> are
>> >>>>>>> stalled. The stack trace for one of the threads is shown below.
>> >>>>>>>
>> >>>>>>> The cluster has to be restarted to recover from this. When I reduce
>> >> the
>> >>>>>>>
>> >
>> >
>> >
>> > --
>> > Scott Lundgren
>> > Director of Engineering
>> > Carbon Black, Inc.
>> > (210) 204-0483 | scott.lundg...@carbonblack.com
>>
>>

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Reply via email to