Scott,

My comment was meant to be a bit tongue-in-cheek, but my intent in the 
statement was to represent hard failure along the lines Vinay is seeing.  We're 
talking about OutOfMemoryException conditions, total cluster paralysis 
requiring restart, or other similar and disastrous conditions.

Where that line is is impossible to generically define, but trivial to 
accomplish.  What any of us running Solr has to achieve is a realistic 
simulation of our desired production load (probably well above peak) and to see 
what limits are reached.  Armed with that information we tweak.  In this case, 
we look at finding the point where data ingestion reaches a natural limit.  For 
some that may be JVM GC, for others memory buffer size on the client load, and 
yet others it may be I/O limits on multithreaded reads from a database or file 
system.   

In old Solr days we had a little less to worry about.  We might play with a 
commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial commits 
and rollback recoveries.  But with 4.x we now have more durable write options 
and NRT to consider, and SolrCloud begs to use this.  So we have to consider 
transaction logs, the file handles they leave open until commit operations 
occur, and how we want to manage writing to all cores simultaneously instead of 
a more narrow master/slave relationship.

It's all manageable, all predictable (with some load testing) and all filled 
with many possibilities to meet our specific needs.  Considering hat each 
person's data model, ingestion pipeline, request processors, and field analysis 
steps will be different, 5 threads of input at face value doesn't really 
contemplate the whole problem.  We have to measure our actual data against our 
expectations and find where the weak chain links are to strengthen them.  The 
symptoms aren't necessarily predictable in advance of this testing, but they're 
likely addressable and not difficult to decipher.

For what it's worth, SolrCloud is new enough that we're still experiencing some 
"uncharted territory with unknown ramifications" but with continued dialog 
through channels like these there are fewer territories without good 
cartography :)

Hope that's of use!

Jason



On Jun 24, 2013, at 7:12 PM, Scott Lundgren <scott.lundg...@carbonblack.com> 
wrote:

> Jason,
> 
> Regarding your statement "push you over the edge"- what does that mean?
> Does it mean "uncharted territory with unknown ramifications" or something
> more like specific, known symptoms?
> 
> I ask because our use is similar to Vinay's in some respects, and we want
> to be able to push the capabilities of write perf - but not over the edge!
> In particular, I am interested in knowing the symptoms of failure, to help
> us troubleshoot the underlying problems if and when they arise.
> 
> Thanks,
> 
> Scott
> 
> On Monday, June 24, 2013, Jason Hellman wrote:
> 
>> Vinay,
>> 
>> You may wish to pay attention to how many transaction logs are being
>> created along the way to your hard autoCommit, which should truncate the
>> open handles for those files.  I might suggest setting a maxDocs value in
>> parallel with your maxTime value (you can use both) to ensure the commit
>> occurs at either breakpoint.  30 seconds is plenty of time for 5 parallel
>> processes of 20 document submissions to push you over the edge.
>> 
>> Jason
>> 
>> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis <poth...@gmail.com> wrote:
>> 
>>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.
>>> 
>>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman <
>>> jhell...@innoventsolutions.com> wrote:
>>> 
>>>> Vinay,
>>>> 
>>>> What autoCommit settings do you have for your indexing process?
>>>> 
>>>> Jason
>>>> 
>>>> On Jun 24, 2013, at 1:28 PM, Vinay Pothnis <poth...@gmail.com> wrote:
>>>> 
>>>>> Here is the ulimit -a output:
>>>>> 
>>>>> core file size           (blocks, -c)  0  data seg size
>>>> (kbytes,
>>>>> -d)  unlimited  scheduling priority              (-e)  0  file size
>>>>>              (blocks, -f)  unlimited  pending signals
>>>>> (-i)  179963  max locked memory        (kbytes, -l)  64  max memory
>> size
>>>>>        (kbytes, -m)  unlimited  open files                       (-n)
>>>>> 32769  pipe size             (512 bytes, -p)  8  POSIX message queues
>>>>>  (bytes,
>>>>> -q)  819200  real-time priority               (-r)  0  stack size
>>>>> (kbytes, -s)  10240  cpu time                (seconds, -t)  unlimited
>>>> max
>>>>> user processes               (-u)  140000  virtual memory
>>>> (kbytes,
>>>>> -v)  unlimited  file locks                       (-x)  unlimited
>>>>> 
>>>>> On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro <yago.rive...@gmail.com
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have the same issue too, and the deploy is quasi exact like than
>> mine,
>>>>>> 
>>>> 
>> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
>>>>>> 
>>>>>> With some concurrence and batches of 10 solr apparently have some
>>>> deadlock
>>>>>> distributing updates
>>>>>> 
>>>>>> Can you dump the configuration of the ulimit on your servers?, some
>>>> people
>>>>>> had the same issues because they are reach the ulimit maximum defined
>>>> for
>>>>>> descriptor and process.
>>>>>> 
>>>>>> --
>>>>>> Yago Riveiro
>>>>>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>>>>> 
>>>>>> 
>>>>>> On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:
>>>>>> 
>>>>>>> Hello All,
>>>>>>> 
>>>>>>> I have the following set up of solr cloud.
>>>>>>> 
>>>>>>> * solr version 4.3.1
>>>>>>> * 3 node solr cloud + replciation factor 2
>>>>>>> * 3 zoo keepers
>>>>>>> * load balancer in front of the 3 solr nodes
>>>>>>> 
>>>>>>> I am seeing this strange behavior when I am indexing a large number
>> of
>>>>>>> documents (10 mil). When I have more than 3-5 threads sending
>> documents
>>>>>> (in
>>>>>>> batch of 20) to solr, sometimes solr goes into a hung state. After
>> this
>>>>>> all
>>>>>>> the update requests get timed out. What we see via AppDynamics (a
>>>>>>> performance monitoring tool) is that there are a number of threads
>> that
>>>>>> are
>>>>>>> stalled. The stack trace for one of the threads is shown below.
>>>>>>> 
>>>>>>> The cluster has to be restarted to recover from this. When I reduce
>> the
>>>>>>> 
> 
> 
> 
> -- 
> Scott Lundgren
> Director of Engineering
> Carbon Black, Inc.
> (210) 204-0483 | scott.lundg...@carbonblack.com

Reply via email to