Re: riak cluster suddenly became unresponsive

2013-03-31 Thread Matthew Von-Maszewski
Ingo, I have two guesses that might explain the symptoms: - there is a bad drive in one the nodes, or - one or more nodes begins to use swap space during a compaction or 2i iteration. I might be able to describe / isolate the problem by examining the "LOG" files produced by leveldb. Wou

Re: riak cluster suddenly became unresponsive

2013-03-27 Thread Ingo Rockel
Hi Mark, we have updated to riak 1.3 and raised zdbbl to 32MB but still run into the described phenomen. 9 nodes of our 12 nodes suddenly drastically drop their cpu utilisation, 3 nodes still have a normal cpu load, but have some "busy_dist_port" messages in the console.log (a lot less since

mailing list headers (was Re: riak cluster suddenly became unresponsive)

2013-03-19 Thread Justin Sheehy
Hi, Ingo. On Mar 19, 2013, at 10:41 AM, Ingo Rockel wrote: > and the riak-users mailer-daemon should really set a "reply-to"… Most email client programs have two well-understood controls for replies, one for "reply (to sender)" and one for "reply to all." We are not going to make one of them b

Re: Re: riak cluster suddenly became unresponsive

2013-03-19 Thread Evan Vigil-McClanahan
reff: Re: riak cluster suddenly became unresponsive > Datum: Tue, 19 Mar 2013 15:40:12 +0100 > Von: Ingo Rockel > An: Mark Phillips > > Hi Mark, > > thanks! > > The 1.3 update is already planned. > > But we will add the zdbbl first as we ran into the same is

Fwd: Re: riak cluster suddenly became unresponsive

2013-03-19 Thread Ingo Rockel
and the riak-users mailer-daemon should really set a "reply-to"... Original-Nachricht Betreff: Re: riak cluster suddenly became unresponsive Datum: Tue, 19 Mar 2013 15:40:12 +0100 Von: Ingo Rockel An: Mark Phillips Hi Mark, thanks! The 1.3 update is already pla

Re: riak cluster suddenly became unresponsive

2013-03-19 Thread Mark Phillips
Hi Ingo, Sorry for the delay in getting back to you. This looks symptomatic of some of the scheduler issues we fixed of 1.3. A few of theeleveldb issues in the release notes [1] can provide precise details. Is upgrading a possibility? Tweaking your zdbbl in vm.args should alleviate some of t

riak cluster suddenly became unresponsive

2013-03-15 Thread Ingo Rockel
Hi, we have a 12 nodes cluster running riak 1.2.1 which went live a week ago. Yesterday, suddenly from one minute to another the put_fsm_time_95 and the get_fsm_time_95 raised from something below 100ms up to several seconds. This went on for about 25 min and than went away. Checking the ria