Dormando, sure, we will add option to preset hashtable. (as i see nn should
be 26).
One question: as i see in logs for the servers there is no change for
hash_power_level
before incident (it would be hard to say for crushed but .20 just had
outofmemory and i have solid stats). Does not this
Zhiwei,
thank you for the info. But i still not sure that this relates to hash
table grow (see my answer to Dormando in this thread) and it happened for 3
hours time and disappear... Or I miss this part of code (do_item_alloc is
small but with fancy idea :) )?
-denis
On Tuesday, July 1, 2014
Cool. That is disappointing.
Can you clarify a few things for me:
1) You're saying that you were getting OOM's on slab 13, but it recovered
on its own? This is under version 1.4.20 and you did *not* enable tail
repairs?
2) Can you share (with me at least) the full stats/stats items/stats slabs
1) OOM's on slab 13, but it recovered on its own? This is under version
1.4.20 and you did *not* enable tail repairs?
correct
2) Can you share (with me at least) the full stats/stats items/stats
slabs output from one of the affected servers running 1.4.20?
sent you _current_ stats from the
Thanks!
This is a little exciting actually, it's a new bug!
tailrepairs was only necessary when an item was legitimately leaked; if we
don't reap it, it never gets better. However you stated that for three
hours all sets fail (and at the same time some .15's crashed). Then it
self-recovered.
Hi,
with the hash power 26, slab 13, that means (2**26)*1.5*1488=142G memory
is needed. Could you please put the stats info to this thread or send a
copy for me too? And, is that tons of 'allocation failure' the system
log or the outofmemory statistic in memcached? At last, i think
Hi all,
I have thought carefully about the the thread-safe memcached recently,
and found that if the re-balance is running, it may not thread-safety. The
code do_item_get-do_item_unlink_nolock may corrupt the hash table.
Whenever it trying to modify the hash table, it should get cache_lock,
the item lock is already held for that key when do_item_get is called,
which is why the nolock code is called there.
slab rebalance has that second short-circuiting of fetches to ensure very
hot items don't permanently jam a page move.
On Wed, 2 Jul 2014, Zhiwei Chan wrote:
Hi all, I have