Re: tail repair issue (1.4.20)

2014-07-02 Thread Denis Samoylov
Dormando, sure, we will add option to preset hashtable. (as i see nn should be 26). One question: as i see in logs for the servers there is no change for hash_power_level before incident (it would be hard to say for crushed but .20 just had outofmemory and i have solid stats). Does not this

Re: tail repair issue (1.4.20)

2014-07-02 Thread Denis Samoylov
Zhiwei, thank you for the info. But i still not sure that this relates to hash table grow (see my answer to Dormando in this thread) and it happened for 3 hours time and disappear... Or I miss this part of code (do_item_alloc is small but with fancy idea :) )? -denis On Tuesday, July 1, 2014

Re: tail repair issue (1.4.20)

2014-07-02 Thread dormando
Cool. That is disappointing. Can you clarify a few things for me: 1) You're saying that you were getting OOM's on slab 13, but it recovered on its own? This is under version 1.4.20 and you did *not* enable tail repairs? 2) Can you share (with me at least) the full stats/stats items/stats slabs

Re: tail repair issue (1.4.20)

2014-07-02 Thread Denis Samoylov
1) OOM's on slab 13, but it recovered on its own? This is under version 1.4.20 and you did *not* enable tail repairs? correct 2) Can you share (with me at least) the full stats/stats items/stats slabs output from one of the affected servers running 1.4.20? sent you _current_ stats from the

Re: tail repair issue (1.4.20)

2014-07-02 Thread dormando
Thanks! This is a little exciting actually, it's a new bug! tailrepairs was only necessary when an item was legitimately leaked; if we don't reap it, it never gets better. However you stated that for three hours all sets fail (and at the same time some .15's crashed). Then it self-recovered.

Re: tail repair issue (1.4.20)

2014-07-02 Thread Zhiwei Chan
Hi, with the hash power 26, slab 13, that means (2**26)*1.5*1488=142G memory is needed. Could you please put the stats info to this thread or send a copy for me too? And, is that tons of 'allocation failure' the system log or the outofmemory statistic in memcached? At last, i think

slab re-balance seems not thread-safty

2014-07-02 Thread Zhiwei Chan
Hi all, I have thought carefully about the the thread-safe memcached recently, and found that if the re-balance is running, it may not thread-safety. The code do_item_get-do_item_unlink_nolock may corrupt the hash table. Whenever it trying to modify the hash table, it should get cache_lock,

Re: slab re-balance seems not thread-safty

2014-07-02 Thread dormando
the item lock is already held for that key when do_item_get is called, which is why the nolock code is called there. slab rebalance has that second short-circuiting of fetches to ensure very hot items don't permanently jam a page move. On Wed, 2 Jul 2014, Zhiwei Chan wrote: Hi all,   I have