I added logging into the resolvers to see how frequently I am received
siblings, and how many I get when its called.
Almost every call has only two siblings, and, although I am definitely creating
them, about 10 or so per minute, it seems to be handling that ok. Its not a
perfect test though
The ring state looks OK; the ring does not look polluted with random state, the
strange thing is why the get_fsm process 0.83.0 has a +100M heap. Would be
interesting to figure out what's on that heap; which you can learn from the
crash dump.
Perhaps you can load the crash dump into the
I'd think the large #buckets could be the issue; especially if there is any
bucket properties being set, because that would cause the ring data structure
to be enormous.
Could you provide an ls -l output of the riak data/ring directory?
Sent from my iPhone
On 05/08/2013, at 21.52, Paul
Hey Kresten,
Thanks for the response!
I learned my lesson on setting bucket properties. So all buckets currently use
the defaults.
here is the output from one of our nodes:
total 40
drwxr-xr-x 2 root root 4096 Aug 5 21:10 ./
drwxr-xr-x 6 root root 4096 Aug 4 17:26 ../
-rw-r--r-- 1 root
I watched top on all the instances when things started to fall apart. This is
what I saw…
Everything was jamming along just fine. CPU usage was about 25%, ram usage was
about 25% (3 of the 7 were at about 15%).
Suddenly, CPU usage spikes to over 50% and ram usage spikes to 80-90% (and I'm
Given your leveldb settings, I think that compaction is an unlikely
culprit. But check this out:
2013-08-05 18:01:15.878 [info] 0.83.0@riak_core_sysmon_
handler:handle_event:92 monitor large_heap 0.14832.557
Interesting. I have sibling resolution code on the client side. Would sibling
explosion take out the entire cluster all at once? Within 5 minutes of my last
email, the rest of the cluster died.
Is there a way to quickly figure out whether the cluster is full of siblings?
Paul Ingalls
On the client you could extract the value_count of the objects you
read and just log them. Feel free to post code too, in particular, how
you are writing out updated values.
On Mon, Aug 5, 2013 at 9:20 PM, Paul Ingalls p...@fanzo.me wrote:
Interesting. I have sibling resolution code on the
I'm currently using the java client and its ConflictResolver and Mutator
interfaces. In some cases I am just doing a store, and letting the client do
an implicit fetch and the mutator to make the actual change. In other cases
I'm doing an explicit fetch, modify the result, and then a store