Hi Dormando,

I am still able to repeatedly reproduce those "Out of memory during read" 
errors during large item sets with latest v1.6.22.
please see my bug report 
at https://github.com/memcached/memcached/issues/1096

Thanks,
Jianjian

On Monday, November 28, 2022 at 12:25:12 AM UTC-8 Danny Kopping wrote:

To add another datapoint here, we at Grafana Labs use memcached extensively 
in our cloud and this fix made a massive impact on our cache effectiveness:
https://user-images.githubusercontent.com/373762/204228886-7c5a759a-927c-46fb-ae55-3e0b4056ebae.png

Thank you very much to you both for the investigation and bugfix!

On Saturday, August 27, 2022 at 8:53:47 AM UTC+2 Dormando wrote:

Thanks for taking the time to evaluate! It helps my confidence level with 
the fix. 

You caught me at a good time :) Been really behind with fixes for quite a 
while and only catching up this week. I've looked at this a few times and 
didn't see the easy fix before... 

I think earlier versions of the item chunking code were more fragile and I 
didn't revisit it after the cleanup work. In this case each chunk 
remembers its original slab class, so having the final chunk be from an 
unintended class doesn't break anything. Otherwise freeing the chunks 
would be impossible if I had to recalculate their original slab class from 
the chunk size. 

So now it'll use too much memory in some cases, and lowering slab chunk 
max would ease that a bit... so maybe soon will finally be a good time to 
lower the default chunk max a little to at least 128k or 256k. 

-Dormando 

On Fri, 26 Aug 2022, Hayden wrote: 

> I didn't see the docker files in the repo that could build the docker 
image, and when I tried cloning the git repo and doing a docker build I 
encountered 
> errors that I think were related to the web proxy on my work network. I 
was able to grab the release tarball and the bitnami docker file, do a 
little 
> surgery to work around my proxy issue, and build a 1.6.17 docker image 
though. 
> I ran my application against the new version and it ran for ~2hr without 
any errors (it previously wouldn't run more than 30s or so before 
encountering 
> blocks of the OOM during read errors). I also made a little test loop 
that just hammered the instance with similar sized writes (1-2MB) as fast 
as it 
> could and let it run a few hours, and it didn't have a single blip. That 
encompassed a couple million evictions. I'm pretty comfortable saying the 
issue 
> is fixed, at least for the kind of use I had in mind. 
> 
> I added a comment to the issue on GitHub to the same effect. 
> 
> I'm impressed by the quick turnaround, BTW. ;-) 
> 
> H 
> 
> On Friday, August 26, 2022 at 5:54:26 PM UTC-7 Dormando wrote: 
> So I tested this a bit more and released it in 1.6.17; I think bitnami 
> should pick it up soonish. if not I'll try to figure out docker this 
> weekend if you still need it. 
> 
> I'm not 100% sure it'll fix your use case but it does fix some things I 
> can test and it didn't seem like a regression. would be nice to validate 
> still. 
> 
> On Fri, 26 Aug 2022, dormando wrote: 
> 
> > You can't build docker images or compile binaries? there's a 
> > docker-compose.yml in the repo already if that helps. 
> > 
> > If not I can try but I don't spend a lot of time with docker directly. 
> > 
> > On Fri, 26 Aug 2022, Hayden wrote: 
> > 
> > > I'd be happy to help validate the fix, but I can't do it until the 
weekend, and I don't have a ready way to build an updated image. Any 
> chance you could 
> > > create a docker image with the fix that I could grab from somewhere? 
> > > 
> > > On Friday, August 26, 2022 at 10:38:54 AM UTC-7 Dormando wrote: 
> > > I have an opportunity to put this fix into a release today if anyone 
wants 
> > > to help validate :) 
> > > 
> > > On Thu, 25 Aug 2022, dormando wrote: 
> > > 
> > > > Took another quick look... 
> > > > 
> > > > Think there's an easy patch that might work: 
> > > > https://github.com/memcached/memcached/pull/924 
> > > > 
> > > > If you wouldn't mind helping validate? An external validator would 
help me 
> > > > get it in time for the next release :) 
> > > > 
> > > > Thanks, 
> > > > -Dormando 
> > > > 
> > > > On Wed, 24 Aug 2022, dormando wrote: 
> > > > 
> > > > > Hey, 
> > > > > 
> > > > > Thanks for the info. Yes; this generally confirms the issue. I 
see some of 
> > > > > your higher slab classes with "free_chunks 0", so if you're 
setting data 
> > > > > that requires these chunks it could error out. The "stats items" 
confirms 
> > > > > this since there are no actual items in those lower slab classes. 
> > > > > 
> > > > > You're certainly right a workaround of making your items < 512k 
would also 
> > > > > work; but in general if I have features it'd be nice if they 
worked well 
> > > > > :) Please open an issue so we can improve things! 
> > > > > 
> > > > > I intended to lower the slab_chunk_max default from 512k to much 
lower, as 
> > > > > that actually raises the memory efficiency by a bit (less gap at 
the 
> > > > > higher classes). That may help here. The system should also try 
ejecting 
> > > > > items from the highest LRU... I need to double check that it 
wasn't 
> > > > > already intending to do that and failing. 
> > > > > 
> > > > > Might also be able to adjust the page mover but not sure. The 
page mover 
> > > > > can probably be adjusted to attempt to keep one page in reserve, 
but I 
> > > > > think the algorithm isn't expecting slabs with no items in it so 
I'd have 
> > > > > to audit that too. 
> > > > > 
> > > > > If you're up for experiments it'd be interesting to know if 
setting 
> > > > > "-o slab_chunk_max=32768" or 16k (probably not more than 64) 
makes things 
> > > > > better or worse. 
> > > > > 
> > > > > Also, crud.. it's documented as kilobytes but that's not working 
somehow? 
> > > > > aaahahah. I guess the big EXPERIMENTAL tag scared people off 
since that 
> > > > > never got reported. 
> > > > > 
> > > > > I'm guessing most people have a mix of small to large items, but 
you only 
> > > > > have large items and a relatively low memory limit, so this is 
why you're 
> > > > > seeing it so easily. I think most people setting large items have 
like 
> > > > > 30G+ of memory so you end up with more spread around. 
> > > > > 
> > > > > Thanks, 
> > > > > -Dormando 
> > > > > 
> > > > > On Wed, 24 Aug 2022, Hayden wrote: 
> > > > > 
> > > > > > What you're saying makes sense, and I'm pretty sure it won't be 
too hard to add some functionality to my writing code to break my 
> large 
> > > items up into 
> > > > > > smaller parts that can each fit into a single chunk. That has 
the added benefit that I won't have to bother increasing the max item 
> > > size. 
> > > > > > In the meantime, though, I reran my pipeline and captured the 
output of stats, stats slabs, and stats items both when evicting 
> normally 
> > > and when getting 
> > > > > > spammed with the error. 
> > > > > > 
> > > > > > First, the output when I'm in the error state: 
> > > > > > **** Output of stats 
> > > > > > STAT pid 1 
> > > > > > STAT uptime 11727 
> > > > > > STAT time 1661406229 
> > > > > > STAT version b'1.6.14' 
> > > > > > STAT libevent b'2.1.8-stable' 
> > > > > > STAT pointer_size 64 
> > > > > > STAT rusage_user 2.93837 
> > > > > > STAT rusage_system 6.339015 
> > > > > > STAT max_connections 1024 
> > > > > > STAT curr_connections 2 
> > > > > > STAT total_connections 8230 
> > > > > > STAT rejected_connections 0 
> > > > > > STAT connection_structures 6 
> > > > > > STAT response_obj_oom 0 
> > > > > > STAT response_obj_count 1 
> > > > > > STAT response_obj_bytes 65536 
> > > > > > STAT read_buf_count 8 
> > > > > > STAT read_buf_bytes 131072 
> > > > > > STAT read_buf_bytes_free 49152 
> > > > > > STAT read_buf_oom 0 
> > > > > > STAT reserved_fds 20 
> > > > > > STAT cmd_get 0 
> > > > > > STAT cmd_set 12640 
> > > > > > STAT cmd_flush 0 
> > > > > > STAT cmd_touch 0 
> > > > > > STAT cmd_meta 0 
> > > > > > STAT get_hits 0 
> > > > > > STAT get_misses 0 
> > > > > > STAT get_expired 0 
> > > > > > STAT get_flushed 0 
> > > > > > STAT delete_misses 0 
> > > > > > STAT delete_hits 0 
> > > > > > STAT incr_misses 0 
> > > > > > STAT incr_hits 0 
> > > > > > STAT decr_misses 0 
> > > > > > STAT decr_hits 0 
> > > > > > STAT cas_misses 0 
> > > > > > STAT cas_hits 0 
> > > > > > STAT cas_badval 0 
> > > > > > STAT touch_hits 0 
> > > > > > STAT touch_misses 0 
> > > > > > STAT store_too_large 0 
> > > > > > STAT store_no_memory 0 
> > > > > > STAT auth_cmds 0 
> > > > > > STAT auth_errors 0 
> > > > > > STAT bytes_read 21755739959 
> > > > > > STAT bytes_written 330909 
> > > > > > STAT limit_maxbytes 5368709120 
> > > > > > STAT accepting_conns 1 
> > > > > > STAT listen_disabled_num 0 
> > > > > > STAT time_in_listen_disabled_us 0 
> > > > > > STAT threads 4 
> > > > > > STAT conn_yields 0 
> > > > > > STAT hash_power_level 16 
> > > > > > STAT hash_bytes 524288 
> > > > > > STAT hash_is_expanding False 
> > > > > > STAT slab_reassign_rescues 0 
> > > > > > STAT slab_reassign_chunk_rescues 0 
> > > > > > STAT slab_reassign_evictions_nomem 0 
> > > > > > STAT slab_reassign_inline_reclaim 0 
> > > > > > STAT slab_reassign_busy_items 0 
> > > > > > STAT slab_reassign_busy_deletes 0 
> > > > > > STAT slab_reassign_running False 
> > > > > > STAT slabs_moved 0 
> > > > > > STAT lru_crawler_running 0 
> > > > > > STAT lru_crawler_starts 20 
> > > > > > STAT lru_maintainer_juggles 71777 
> > > > > > STAT malloc_fails 0 
> > > > > > STAT log_worker_dropped 0 
> > > > > > STAT log_worker_written 0 
> > > > > > STAT log_watcher_skipped 0 
> > > > > > STAT log_watcher_sent 0 
> > > > > > STAT log_watchers 0 
> > > > > > STAT unexpected_napi_ids 0 
> > > > > > STAT round_robin_fallback 0 
> > > > > > STAT bytes 5241499325 
> > > > > > STAT curr_items 4211 
> > > > > > STAT total_items 12640 
> > > > > > STAT slab_global_page_pool 0 
> > > > > > STAT expired_unfetched 0 
> > > > > > STAT evicted_unfetched 8429 
> > > > > > STAT evicted_active 0 
> > > > > > STAT evictions 8429 
> > > > > > STAT reclaimed 0 
> > > > > > STAT crawler_reclaimed 0 
> > > > > > STAT crawler_items_checked 4212 
> > > > > > STAT lrutail_reflocked 0 
> > > > > > STAT moves_to_cold 11872 
> > > > > > STAT moves_to_warm 0 
> > > > > > STAT moves_within_lru 0 
> > > > > > STAT direct_reclaims 55559 
> > > > > > STAT lru_bumps_dropped 0 
> > > > > > END 
> > > > > > **** Output of stats slabs 
> > > > > > STAT 2:chunk_size 120 
> > > > > > STAT 2:chunks_per_page 8738 
> > > > > > STAT 2:total_pages 1 
> > > > > > STAT 2:total_chunks 8738 
> > > > > > STAT 2:used_chunks 4211 
> > > > > > STAT 2:free_chunks 4527 
> > > > > > STAT 2:free_chunks_end 0 
> > > > > > STAT 2:get_hits 0 
> > > > > > STAT 2:cmd_set 0 
> > > > > > STAT 2:delete_hits 0 
> > > > > > STAT 2:incr_hits 0 
> > > > > > STAT 2:decr_hits 0 
> > > > > > STAT 2:cas_hits 0 
> > > > > > STAT 2:cas_badval 0 
> > > > > > STAT 2:touch_hits 0 
> > > > > > STAT 30:chunk_size 66232 
> > > > > > STAT 30:chunks_per_page 15 
> > > > > > STAT 30:total_pages 1 
> > > > > > STAT 30:total_chunks 15 
> > > > > > STAT 30:used_chunks 3 
> > > > > > STAT 30:free_chunks 12 
> > > > > > STAT 30:free_chunks_end 0 
> > > > > > STAT 30:get_hits 0 
> > > > > > STAT 30:cmd_set 0 
> > > > > > STAT 30:delete_hits 0 
> > > > > > STAT 30:incr_hits 0 
> > > > > > STAT 30:decr_hits 0 
> > > > > > STAT 30:cas_hits 0 
> > > > > > STAT 30:cas_badval 0 
> > > > > > STAT 30:touch_hits 0 
> > > > > > STAT 31:chunk_size 82792 
> > > > > > STAT 31:chunks_per_page 12 
> > > > > > STAT 31:total_pages 1 
> > > > > > STAT 31:total_chunks 12 
> > > > > > STAT 31:used_chunks 6 
> > > > > > STAT 31:free_chunks 6 
> > > > > > STAT 31:free_chunks_end 0 
> > > > > > STAT 31:get_hits 0 
> > > > > > STAT 31:cmd_set 0 
> > > > > > STAT 31:delete_hits 0 
> > > > > > STAT 31:incr_hits 0 
> > > > > > STAT 31:decr_hits 0 
> > > > > > STAT 31:cas_hits 0 
> > > > > > STAT 31:cas_badval 0 
> > > > > > STAT 31:touch_hits 0 
> > > > > > STAT 32:chunk_size 103496 
> > > > > > STAT 32:chunks_per_page 10 
> > > > > > STAT 32:total_pages 19 
> > > > > > STAT 32:total_chunks 190 
> > > > > > STAT 32:used_chunks 183 
> > > > > > STAT 32:free_chunks 7 
> > > > > > STAT 32:free_chunks_end 0 
> > > > > > STAT 32:get_hits 0 
> > > > > > STAT 32:cmd_set 0 
> > > > > > STAT 32:delete_hits 0 
> > > > > > STAT 32:incr_hits 0 
> > > > > > STAT 32:decr_hits 0 
> > > > > > STAT 32:cas_hits 0 
> > > > > > STAT 32:cas_badval 0 
> > > > > > STAT 32:touch_hits 0 
> > > > > > STAT 33:chunk_size 129376 
> > > > > > STAT 33:chunks_per_page 8 
> > > > > > STAT 33:total_pages 50 
> > > > > > STAT 33:total_chunks 400 
> > > > > > STAT 33:used_chunks 393 
> > > > > > STAT 33:free_chunks 7 
> > > > > > STAT 33:free_chunks_end 0 
> > > > > > STAT 33:get_hits 0 
> > > > > > STAT 33:cmd_set 0 
> > > > > > STAT 33:delete_hits 0 
> > > > > > STAT 33:incr_hits 0 
> > > > > > STAT 33:decr_hits 0 
> > > > > > STAT 33:cas_hits 0 
> > > > > > STAT 33:cas_badval 0 
> > > > > > STAT 33:touch_hits 0 
> > > > > > STAT 34:chunk_size 161720 
> > > > > > STAT 34:chunks_per_page 6 
> > > > > > STAT 34:total_pages 41 
> > > > > > STAT 34:total_chunks 246 
> > > > > > STAT 34:used_chunks 245 
> > > > > > STAT 34:free_chunks 1 
> > > > > > STAT 34:free_chunks_end 0 
> > > > > > STAT 34:get_hits 0 
> > > > > > STAT 34:cmd_set 0 
> > > > > > STAT 34:delete_hits 0 
> > > > > > STAT 34:incr_hits 0 
> > > > > > STAT 34:decr_hits 0 
> > > > > > STAT 34:cas_hits 0 
> > > > > > STAT 34:cas_badval 0 
> > > > > > STAT 34:touch_hits 0 
> > > > > > STAT 35:chunk_size 202152 
> > > > > > STAT 35:chunks_per_page 5 
> > > > > > STAT 35:total_pages 231 
> > > > > > STAT 35:total_chunks 1155 
> > > > > > STAT 35:used_chunks 1155 
> > > > > > STAT 35:free_chunks 0 
> > > > > > STAT 35:free_chunks_end 0 
> > > > > > STAT 35:get_hits 0 
> > > > > > STAT 35:cmd_set 0 
> > > > > > STAT 35:delete_hits 0 
> > > > > > STAT 35:incr_hits 0 
> > > > > > STAT 35:decr_hits 0 
> > > > > > STAT 35:cas_hits 0 
> > > > > > STAT 35:cas_badval 0 
> > > > > > STAT 35:touch_hits 0 
> > > > > > STAT 36:chunk_size 252696 
> > > > > > STAT 36:chunks_per_page 4 
> > > > > > STAT 36:total_pages 536 
> > > > > > STAT 36:total_chunks 2144 
> > > > > > STAT 36:used_chunks 2144 
> > > > > > STAT 36:free_chunks 0 
> > > > > > STAT 36:free_chunks_end 0 
> > > > > > STAT 36:get_hits 0 
> > > > > > STAT 36:cmd_set 0 
> > > > > > STAT 36:delete_hits 0 
> > > > > > STAT 36:incr_hits 0 
> > > > > > STAT 36:decr_hits 0 
> > > > > > STAT 36:cas_hits 0 
> > > > > > STAT 36:cas_badval 0 
> > > > > > STAT 36:touch_hits 0 
> > > > > > STAT 37:chunk_size 315872 
> > > > > > STAT 37:chunks_per_page 3 
> > > > > > STAT 37:total_pages 28 
> > > > > > STAT 37:total_chunks 84 
> > > > > > STAT 37:used_chunks 82 
> > > > > > STAT 37:free_chunks 2 
> > > > > > STAT 37:free_chunks_end 0 
> > > > > > STAT 37:get_hits 0 
> > > > > > STAT 37:cmd_set 0 
> > > > > > STAT 37:delete_hits 0 
> > > > > > STAT 37:incr_hits 0 
> > > > > > STAT 37:decr_hits 0 
> > > > > > STAT 37:cas_hits 0 
> > > > > > STAT 37:cas_badval 0 
> > > > > > STAT 37:touch_hits 0 
> > > > > > STAT 39:chunk_size 524288 
> > > > > > STAT 39:chunks_per_page 2 
> > > > > > STAT 39:total_pages 4212 
> > > > > > STAT 39:total_chunks 8424 
> > > > > > STAT 39:used_chunks 8422 
> > > > > > STAT 39:free_chunks 2 
> > > > > > STAT 39:free_chunks_end 0 
> > > > > > STAT 39:get_hits 0 
> > > > > > STAT 39:cmd_set 12640 
> > > > > > STAT 39:delete_hits 0 
> > > > > > STAT 39:incr_hits 0 
> > > > > > STAT 39:decr_hits 0 
> > > > > > STAT 39:cas_hits 0 
> > > > > > STAT 39:cas_badval 0 
> > > > > > STAT 39:touch_hits 0 
> > > > > > STAT active_slabs 10 
> > > > > > STAT total_malloced 5368709120 
> > > > > > END 
> > > > > > **** Output of stats items 
> > > > > > STAT items:39:number 4211 
> > > > > > STAT items:39:number_hot 768 
> > > > > > STAT items:39:number_warm 0 
> > > > > > STAT items:39:number_cold 3443 
> > > > > > STAT items:39:age_hot 28 
> > > > > > STAT items:39:age_warm 0 
> > > > > > STAT items:39:age 143 
> > > > > > STAT items:39:mem_requested 5241499325 
> > > > > > STAT items:39:evicted 8429 
> > > > > > STAT items:39:evicted_nonzero 0 
> > > > > > STAT items:39:evicted_time 140 
> > > > > > STAT items:39:outofmemory 0 
> > > > > > STAT items:39:tailrepairs 0 
> > > > > > STAT items:39:reclaimed 0 
> > > > > > STAT items:39:expired_unfetched 0 
> > > > > > STAT items:39:evicted_unfetched 8429 
> > > > > > STAT items:39:evicted_active 0 
> > > > > > STAT items:39:crawler_reclaimed 0 
> > > > > > STAT items:39:crawler_items_checked 4212 
> > > > > > STAT items:39:lrutail_reflocked 0 
> > > > > > STAT items:39:moves_to_cold 11872 
> > > > > > STAT items:39:moves_to_warm 0 
> > > > > > STAT items:39:moves_within_lru 0 
> > > > > > STAT items:39:direct_reclaims 8429 
> > > > > > STAT items:39:hits_to_hot 0 
> > > > > > STAT items:39:hits_to_warm 0 
> > > > > > STAT items:39:hits_to_cold 0 
> > > > > > STAT items:39:hits_to_temp 0 
> > > > > > END 
> > > > > > 
> > > > > > Then, the output when it's humming along happily again: 
> > > > > > **** Output of stats 
> > > > > > STAT pid 1 
> > > > > > STAT uptime 11754 
> > > > > > STAT time 1661406256 
> > > > > > STAT version b'1.6.14' 
> > > > > > STAT libevent b'2.1.8-stable' 
> > > > > > STAT pointer_size 64 
> > > > > > STAT rusage_user 3.056135 
> > > > > > STAT rusage_system 7.074541 
> > > > > > STAT max_connections 1024 
> > > > > > STAT curr_connections 3 
> > > > > > STAT total_connections 10150 
> > > > > > STAT rejected_connections 0 
> > > > > > STAT connection_structures 6 
> > > > > > STAT response_obj_oom 0 
> > > > > > STAT response_obj_count 1 
> > > > > > STAT response_obj_bytes 65536 
> > > > > > STAT read_buf_count 8 
> > > > > > STAT read_buf_bytes 131072 
> > > > > > STAT read_buf_bytes_free 49152 
> > > > > > STAT read_buf_oom 0 
> > > > > > STAT reserved_fds 20 
> > > > > > STAT cmd_get 0 
> > > > > > STAT cmd_set 12794 
> > > > > > STAT cmd_flush 0 
> > > > > > STAT cmd_touch 0 
> > > > > > STAT cmd_meta 0 
> > > > > > STAT get_hits 0 
> > > > > > STAT get_misses 0 
> > > > > > STAT get_expired 0 
> > > > > > STAT get_flushed 0 
> > > > > > STAT delete_misses 0 
> > > > > > STAT delete_hits 0 
> > > > > > STAT incr_misses 0 
> > > > > > STAT incr_hits 0 
> > > > > > STAT decr_misses 0 
> > > > > > STAT decr_hits 0 
> > > > > > STAT cas_misses 0 
> > > > > > STAT cas_hits 0 
> > > > > > STAT cas_badval 0 
> > > > > > STAT touch_hits 0 
> > > > > > STAT touch_misses 0 
> > > > > > STAT store_too_large 0 
> > > > > > STAT store_no_memory 0 
> > > > > > STAT auth_cmds 0 
> > > > > > STAT auth_errors 0 
> > > > > > STAT bytes_read 24375641173 
> > > > > > STAT bytes_written 415262 
> > > > > > STAT limit_maxbytes 5368709120 
> > > > > > STAT accepting_conns 1 
> > > > > > STAT listen_disabled_num 0 
> > > > > > STAT time_in_listen_disabled_us 0 
> > > > > > STAT threads 4 
> > > > > > STAT conn_yields 0 
> > > > > > STAT hash_power_level 16 
> > > > > > STAT hash_bytes 524288 
> > > > > > STAT hash_is_expanding False 
> > > > > > STAT slab_reassign_rescues 0 
> > > > > > STAT slab_reassign_chunk_rescues 0 
> > > > > > STAT slab_reassign_evictions_nomem 0 
> > > > > > STAT slab_reassign_inline_reclaim 0 
> > > > > > STAT slab_reassign_busy_items 0 
> > > > > > STAT slab_reassign_busy_deletes 0 
> > > > > > STAT slab_reassign_running False 
> > > > > > STAT slabs_moved 0 
> > > > > > STAT lru_crawler_running 0 
> > > > > > STAT lru_crawler_starts 20 
> > > > > > STAT lru_maintainer_juggles 71952 
> > > > > > STAT malloc_fails 0 
> > > > > > STAT log_worker_dropped 0 
> > > > > > STAT log_worker_written 0 
> > > > > > STAT log_watcher_skipped 0 
> > > > > > STAT log_watcher_sent 0 
> > > > > > STAT log_watchers 0 
> > > > > > STAT unexpected_napi_ids 0 
> > > > > > STAT round_robin_fallback 0 
> > > > > > STAT bytes 5242957328 
> > > > > > STAT curr_items 4212 
> > > > > > STAT total_items 12794 
> > > > > > STAT slab_global_page_pool 0 
> > > > > > STAT expired_unfetched 0 
> > > > > > STAT evicted_unfetched 8582 
> > > > > > STAT evicted_active 0 
> > > > > > STAT evictions 8582 
> > > > > > STAT reclaimed 0 
> > > > > > STAT crawler_reclaimed 0 
> > > > > > STAT crawler_items_checked 4212 
> > > > > > STAT lrutail_reflocked 0 
> > > > > > STAT moves_to_cold 12533 
> > > > > > STAT moves_to_warm 0 
> > > > > > STAT moves_within_lru 0 
> > > > > > STAT direct_reclaims 74822 
> > > > > > STAT lru_bumps_dropped 0 
> > > > > > END 
> > > > > > **** Output of stats slabs 
> > > > > > STAT 2:chunk_size 120 
> > > > > > STAT 2:chunks_per_page 8738 
> > > > > > STAT 2:total_pages 1 
> > > > > > STAT 2:total_chunks 8738 
> > > > > > STAT 2:used_chunks 4212 
> > > > > > STAT 2:free_chunks 4526 
> > > > > > STAT 2:free_chunks_end 0 
> > > > > > STAT 2:get_hits 0 
> > > > > > STAT 2:cmd_set 0 
> > > > > > STAT 2:delete_hits 0 
> > > > > > STAT 2:incr_hits 0 
> > > > > > STAT 2:decr_hits 0 
> > > > > > STAT 2:cas_hits 0 
> > > > > > STAT 2:cas_badval 0 
> > > > > > STAT 2:touch_hits 0 
> > > > > > STAT 30:chunk_size 66232 
> > > > > > STAT 30:chunks_per_page 15 
> > > > > > STAT 30:total_pages 1 
> > > > > > STAT 30:total_chunks 15 
> > > > > > STAT 30:used_chunks 3 
> > > > > > STAT 30:free_chunks 12 
> > > > > > STAT 30:free_chunks_end 0 
> > > > > > STAT 30:get_hits 0 
> > > > > > STAT 30:cmd_set 0 
> > > > > > STAT 30:delete_hits 0 
> > > > > > STAT 30:incr_hits 0 
> > > > > > STAT 30:decr_hits 0 
> > > > > > STAT 30:cas_hits 0 
> > > > > > STAT 30:cas_badval 0 
> > > > > > STAT 30:touch_hits 0 
> > > > > > STAT 31:chunk_size 82792 
> > > > > > STAT 31:chunks_per_page 12 
> > > > > > STAT 31:total_pages 1 
> > > > > > STAT 31:total_chunks 12 
> > > > > > STAT 31:used_chunks 6 
> > > > > > STAT 31:free_chunks 6 
> > > > > > STAT 31:free_chunks_end 0 
> > > > > > STAT 31:get_hits 0 
> > > > > > STAT 31:cmd_set 0 
> > > > > > STAT 31:delete_hits 0 
> > > > > > STAT 31:incr_hits 0 
> > > > > > STAT 31:decr_hits 0 
> > > > > > STAT 31:cas_hits 0 
> > > > > > STAT 31:cas_badval 0 
> > > > > > STAT 31:touch_hits 0 
> > > > > > STAT 32:chunk_size 103496 
> > > > > > STAT 32:chunks_per_page 10 
> > > > > > STAT 32:total_pages 19 
> > > > > > STAT 32:total_chunks 190 
> > > > > > STAT 32:used_chunks 183 
> > > > > > STAT 32:free_chunks 7 
> > > > > > STAT 32:free_chunks_end 0 
> > > > > > STAT 32:get_hits 0 
> > > > > > STAT 32:cmd_set 0 
> > > > > > STAT 32:delete_hits 0 
> > > > > > STAT 32:incr_hits 0 
> > > > > > STAT 32:decr_hits 0 
> > > > > > STAT 32:cas_hits 0 
> > > > > > STAT 32:cas_badval 0 
> > > > > > STAT 32:touch_hits 0 
> > > > > > STAT 33:chunk_size 129376 
> > > > > > STAT 33:chunks_per_page 8 
> > > > > > STAT 33:total_pages 50 
> > > > > > STAT 33:total_chunks 400 
> > > > > > STAT 33:used_chunks 391 
> > > > > > STAT 33:free_chunks 9 
> > > > > > STAT 33:free_chunks_end 0 
> > > > > > STAT 33:get_hits 0 
> > > > > > STAT 33:cmd_set 0 
> > > > > > STAT 33:delete_hits 0 
> > > > > > STAT 33:incr_hits 0 
> > > > > > STAT 33:decr_hits 0 
> > > > > > STAT 33:cas_hits 0 
> > > > > > STAT 33:cas_badval 0 
> > > > > > STAT 33:touch_hits 0 
> > > > > > STAT 34:chunk_size 161720 
> > > > > > STAT 34:chunks_per_page 6 
> > > > > > STAT 34:total_pages 41 
> > > > > > STAT 34:total_chunks 246 
> > > > > > STAT 34:used_chunks 246 
> > > > > > STAT 34:free_chunks 0 
> > > > > > STAT 34:free_chunks_end 0 
> > > > > > STAT 34:get_hits 0 
> > > > > > STAT 34:cmd_set 0 
> > > > > > STAT 34:delete_hits 0 
> > > > > > STAT 34:incr_hits 0 
> > > > > > STAT 34:decr_hits 0 
> > > > > > STAT 34:cas_hits 0 
> > > > > > STAT 34:cas_badval 0 
> > > > > > STAT 34:touch_hits 0 
> > > > > > STAT 35:chunk_size 202152 
> > > > > > STAT 35:chunks_per_page 5 
> > > > > > STAT 35:total_pages 231 
> > > > > > STAT 35:total_chunks 1155 
> > > > > > STAT 35:used_chunks 1155 
> > > > > > STAT 35:free_chunks 0 
> > > > > > STAT 35:free_chunks_end 0 
> > > > > > STAT 35:get_hits 0 
> > > > > > STAT 35:cmd_set 0 
> > > > > > STAT 35:delete_hits 0 
> > > > > > STAT 35:incr_hits 0 
> > > > > > STAT 35:decr_hits 0 
> > > > > > STAT 35:cas_hits 0 
> > > > > > STAT 35:cas_badval 0 
> > > > > > STAT 35:touch_hits 0 
> > > > > > STAT 36:chunk_size 252696 
> > > > > > STAT 36:chunks_per_page 4 
> > > > > > STAT 36:total_pages 536 
> > > > > > STAT 36:total_chunks 2144 
> > > > > > STAT 36:used_chunks 2144 
> > > > > > STAT 36:free_chunks 0 
> > > > > > STAT 36:free_chunks_end 0 
> > > > > > STAT 36:get_hits 0 
> > > > > > STAT 36:cmd_set 0 
> > > > > > STAT 36:delete_hits 0 
> > > > > > STAT 36:incr_hits 0 
> > > > > > STAT 36:decr_hits 0 
> > > > > > STAT 36:cas_hits 0 
> > > > > > STAT 36:cas_badval 0 
> > > > > > STAT 36:touch_hits 0 
> > > > > > STAT 37:chunk_size 315872 
> > > > > > STAT 37:chunks_per_page 3 
> > > > > > STAT 37:total_pages 28 
> > > > > > STAT 37:total_chunks 84 
> > > > > > STAT 37:used_chunks 84 
> > > > > > STAT 37:free_chunks 0 
> > > > > > STAT 37:free_chunks_end 0 
> > > > > > STAT 37:get_hits 0 
> > > > > > STAT 37:cmd_set 0 
> > > > > > STAT 37:delete_hits 0 
> > > > > > STAT 37:incr_hits 0 
> > > > > > STAT 37:decr_hits 0 
> > > > > > STAT 37:cas_hits 0 
> > > > > > STAT 37:cas_badval 0 
> > > > > > STAT 37:touch_hits 0 
> > > > > > STAT 39:chunk_size 524288 
> > > > > > STAT 39:chunks_per_page 2 
> > > > > > STAT 39:total_pages 4212 
> > > > > > STAT 39:total_chunks 8424 
> > > > > > STAT 39:used_chunks 8424 
> > > > > > STAT 39:free_chunks 0 
> > > > > > STAT 39:free_chunks_end 0 
> > > > > > STAT 39:get_hits 0 
> > > > > > STAT 39:cmd_set 12794 
> > > > > > STAT 39:delete_hits 0 
> > > > > > STAT 39:incr_hits 0 
> > > > > > STAT 39:decr_hits 0 
> > > > > > STAT 39:cas_hits 0 
> > > > > > STAT 39:cas_badval 0 
> > > > > > STAT 39:touch_hits 0 
> > > > > > STAT active_slabs 10 
> > > > > > STAT total_malloced 5368709120 
> > > > > > END 
> > > > > > **** Output of stats items 
> > > > > > STAT items:39:number 4212 
> > > > > > STAT items:39:number_hot 261 
> > > > > > STAT items:39:number_warm 0 
> > > > > > STAT items:39:number_cold 3951 
> > > > > > STAT items:39:age_hot 33 
> > > > > > STAT items:39:age_warm 0 
> > > > > > STAT items:39:age 165 
> > > > > > STAT items:39:mem_requested 5242957328 
> > > > > > STAT items:39:evicted 8582 
> > > > > > STAT items:39:evicted_nonzero 0 
> > > > > > STAT items:39:evicted_time 165 
> > > > > > STAT items:39:outofmemory 0 
> > > > > > STAT items:39:tailrepairs 0 
> > > > > > STAT items:39:reclaimed 0 
> > > > > > STAT items:39:expired_unfetched 0 
> > > > > > STAT items:39:evicted_unfetched 8582 
> > > > > > STAT items:39:evicted_active 0 
> > > > > > STAT items:39:crawler_reclaimed 0 
> > > > > > STAT items:39:crawler_items_checked 4212 
> > > > > > STAT items:39:lrutail_reflocked 0 
> > > > > > STAT items:39:moves_to_cold 12533 
> > > > > > STAT items:39:moves_to_warm 0 
> > > > > > STAT items:39:moves_within_lru 0 
> > > > > > STAT items:39:direct_reclaims 8582 
> > > > > > STAT items:39:hits_to_hot 0 
> > > > > > STAT items:39:hits_to_warm 0 
> > > > > > STAT items:39:hits_to_cold 0 
> > > > > > STAT items:39:hits_to_temp 0 
> > > > > > END 
> > > > > > 
> > > > > > I'm happy to open an issue on GitHub if the stats confirm there 
actually is something in the code that could be fixed. You can 
> decide 
> > > then how much 
> > > > > > effort it's worth to fix it. If my workaround idea works, 
though, I'll just put it in place and move on to the next thing. ;-) 
> > > > > > On Wednesday, August 24, 2022 at 7:01:33 PM UTC-7 Dormando 
wrote: 
> > > > > > To put a little more internal detail on this: 
> > > > > > 
> > > > > > - As a SET is being processed item chunks must be made 
available 
> > > > > > - If it is chunked memory, it will be fetching these data 
chunks from 
> > > > > > across different slab classes (ie: 512k + 512k + sized enough 
for 
> > > > > > whatever's left over) 
> > > > > > - That full chunked item gets put in the largest slab class 
> > > > > > - If another SET comes along and it needs 512k + 512k + an 8k, 
it has to 
> > > > > > look into the 8k slab class for an item to evict. 
> > > > > > - Except there's no memory in the 8k class: it's all actually 
in the 
> > > > > > largest class. 
> > > > > > - So there's nothing to evict to free up memory 
> > > > > > - So you get an error. 
> > > > > > - The slab page mover can make this worse by not leaving enough 
reserved 
> > > > > > memory in the lower slab classes. 
> > > > > > 
> > > > > > I wasn't sure how often this would happen in practice and fixed 
a few edge 
> > > > > > cases in the past. Though I always figured I would've revisited 
it years 
> > > > > > ago, so sorry about the trouble. 
> > > > > > 
> > > > > > There are a few tuning options: 
> > > > > > 1) more memory, lol. 
> > > > > > 2) you can override slab_chunk_max to be much lower (like 8k or 
16k), 
> > > > > > which will make a lot more chunks but you won't realistically 
notice a 
> > > > > > performance difference. This can reduce the number of total 
slab classes, 
> > > > > > making it easier for more "end cap" memory to be found. 
> > > > > > 3) delete items as you use them so it doesn't have to evict. 
not the best 
> > > > > > option. 
> > > > > > 
> > > > > > There're code fixes I can try but I need to see what the exact 
symptom is 
> > > > > > first, which is why I ask for the stats stuff. 
> > > > > > 
> > > > > > On Wed, 24 Aug 2022, dormando wrote: 
> > > > > > 
> > > > > > > Hey, 
> > > > > > > 
> > > > > > > You're probably hitting an edge case in the "large item 
support". 
> > > > > > > 
> > > > > > > Basically to store values > 512k memcached internally splits 
them up into 
> > > > > > > chunks. When storing items memcached first allocates the item 
storage, 
> > > > > > > then reads data from the client socket directly into the data 
storage. 
> > > > > > > 
> > > > > > > For chunked items it will be allocating chunks of memory as 
it reads from 
> > > > > > > the socket, which can lead to that (thankfully very specific) 
"during 
> > > > > > > read" error. I've long suspected some edge cases but haven't 
revisited 
> > > > > > > that code in ... a very long time. 
> > > > > > > 
> > > > > > > If you can grab snapshots of "stats items" and "stats slabs" 
when it's 
> > > > > > > both evicting normally and when it's giving you errors, I 
might be able to 
> > > > > > > figure out what's causing it to bottom out and see if there's 
some tuning 
> > > > > > > to do. Normal "stats" output is also helpful. 
> > > > > > > 
> > > > > > > It kind of smells like some slab classes are running low on 
memory 
> > > > > > > sometimes, and the items in them are being read for a long 
time... but we 
> > > > > > > have to see the data to be sure. 
> > > > > > > 
> > > > > > > If you're feeling brave you can try building the current 
"next" branch 
> > > > > > > from github and try it out, as some fixes to the page mover 
went in there. 
> > > > > > > Those fixes may have caused too much memory to be moved away 
from a slab 
> > > > > > > class sometimes. 
> > > > > > > 
> > > > > > > Feel free to open an issue on github to track this if you'd 
like. 
> > > > > > > 
> > > > > > > have fun, 
> > > > > > > -Dormando 
> > > > > > > 
> > > > > > > On Wed, 24 Aug 2022, Hayden wrote: 
> > > > > > > 
> > > > > > > > Hello, 
> > > > > > > > I'm trying to use memcached for a use case I don't think is 
outlandish, but it's not behaving the way I expect. I 
> > > > > > > > wanted to sanity check what I'm doing to see if it should 
be working but there's maybe something I've done wrong 
> > > > > > > > with my configuration, or if my idea of how it's supposed 
to work is wrong, or if there's a problem with 
> > > > > > > > memcached itself. 
> > > > > > > > 
> > > > > > > > I'm using memcached as a temporary shared image store in a 
distributed video processing application. At the front 
> > > > > > > > of the pipeline is a process (actually all these processes 
are pods in a kubernetes cluster, if it matters, and 
> > > > > > > > memcached is running in the cluster as well) that consumes 
a video stream over RTSP, saves each frame to 
> > > > > > > > memcached, and outputs events to a message bus (kafka) with 
metadata about each frame. At the end of the pipeline 
> > > > > > > > is another process that consumes these metadata events, and 
when it sees events it thinks are interesting it 
> > > > > > > > retrieves the corresponding frame from memcached and adds 
the frame to a web UI. The video is typically 30fps, so 
> > > > > > > > there are about 30 set() operations each second, and since 
each value is effectively an image the values are a 
> > > > > > > > bit big (around 1MB... I upped the maximum value size in 
memcached to 2MB to make sure they'd fit, and I haven't 
> > > > > > > > had any problems with my writes being rejected because of 
size). 
> > > > > > > > 
> > > > > > > > The video stream is processed in real-time, and effectively 
infinite, but the memory available to memcached 
> > > > > > > > obviously isn't (I've configured it to use 5GB, FWIW). 
That's OK, because the cache is only supposed to be 
> > > > > > > > temporary storage. My expectation is that once the 
available memory is filled up (which takes a few minutes), 
> > > > > > > > then roughly speaking for every new frame added to 
memcached another entry (ostensibly the oldest one) will be 
> > > > > > > > evicted. If the consuming process at the end of the 
pipeline doesn't get to a frame it wants before it gets 
> > > > > > > > evicted that's OK. 
> > > > > > > > 
> > > > > > > > That's not what I'm seeing, though, or at least that's not 
all that I'm seeing. There are lots of evictions 
> > > > > > > > happening, but the process that's writing to memcached also 
goes through periods where every set() operation is 
> > > > > > > > rejected with an "Out of memory during read" error. It 
seems to happen in bursts where for several seconds every 
> > > > > > > > write encounters the error, then for several seconds the 
set() calls work just fine (and presumably other keys 
> > > > > > > > are being evicted), then the cycle repeats. It goes on this 
way for as long as I let the process run. 
> > > > > > > > 
> > > > > > > > I'm using memcached v1.6.14, installed into my k8s cluster 
using the bitnami helm chart v6.0.5. My reading and 
> > > > > > > > writing applications are both using pymemcache v3.5.2 for 
their access. 
> > > > > > > > 
> > > > > > > > Can anyone tell me if it seems like what I'm doing should 
work the way I described, and where I should try 
> > > > > > > > investigating to see what's going wrong? Or alternatively, 
why what I'm trying to do shouldn't work the way I 
> > > > > > > > expected it to, so I can figure out how to make my 
applications behave differently? 
> > > > > > > > 
> > > > > > > > Thanks, 
> > > > > > > > Hayden 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > 
> > > > > > > > --- 
> > > > > > > > You received this message because you are subscribed to the 
Google Groups "memcached" group. 
> > > > > > > > To unsubscribe from this group and stop receiving emails 
from it, send an email to 
> > > > > > > > memcached+...@googlegroups.com. 
> > > > > > > > To view this discussion on the web visit 
> > > > > > > > 
https://groups.google.com/d/msgid/memcached/702cae66-3108-46de-bb48-38eb3e17a5b7n%40googlegroups.com.
 

> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > -- 
> > > > > > > 
> > > > > > > --- 
> > > > > > > You received this message because you are subscribed to the 
Google Groups "memcached" group. 
> > > > > > > To unsubscribe from this group and stop receiving emails from 
it, send an email to memcached+...@googlegroups.com. 
> > > > > > > To view this discussion on the web visit 
> 
https://groups.google.com/d/msgid/memcached/2e8a8cd4-cc13-3e78-f76-772b92374a9b%40rydia.net.
 

> > > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > 
> > > > > > --- 
> > > > > > You received this message because you are subscribed to the 
Google Groups "memcached" group. 
> > > > > > To unsubscribe from this group and stop receiving emails from 
it, send an email to memcached+...@googlegroups.com. 
> > > > > > To view this discussion on the web visit 
> > > 
https://groups.google.com/d/msgid/memcached/3c08514a-f43f-45aa-b25b-87b431cb74aen%40googlegroups.com.
 

> > > > > > 
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > 
> > > > > --- 
> > > > > You received this message because you are subscribed to the 
Google Groups "memcached" group. 
> > > > > To unsubscribe from this group and stop receiving emails from it, 
send an email to memcached+...@googlegroups.com. 
> > > > > To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/76e3d381-1ba9-e7f2-d4b2-30d87e7cb7e%40rydia.net.
 

> > > > > 
> > > > 
> > > > -- 
> > > > 
> > > > --- 
> > > > You received this message because you are subscribed to the Google 
Groups "memcached" group. 
> > > > To unsubscribe from this group and stop receiving emails from it, 
send an email to memcached+...@googlegroups.com. 
> > > > To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/18244f57-4cd-b086-38a3-97c6e7755030%40rydia.net.
 

> > > > 
> > > 
> > > -- 
> > > 
> > > --- 
> > > You received this message because you are subscribed to the Google 
Groups "memcached" group. 
> > > To unsubscribe from this group and stop receiving emails from it, 
send an email to memcached+...@googlegroups.com. 
> > > To view this discussion on the web visit 
> 
https://groups.google.com/d/msgid/memcached/f158b293-e9ef-47f3-8b7d-c9a42ec4f3e4n%40googlegroups.com.
 

> > > 
> > > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
an email to memcached+...@googlegroups.com. 
> > To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/fe99cc83-7c3c-9a4e-f36-eaa18de015%40rydia.net.
 

> > 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
"memcached" group. 
> To unsubscribe from this group and stop receiving emails from it, send an 
email to memcached+...@googlegroups.com. 
> To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/a3c5ecd8-0acc-4a05-8033-6cfb4f55de23n%40googlegroups.com.
 

> 
> 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/e9e72e40-046a-4e5f-acf4-11258d782b9an%40googlegroups.com.

Reply via email to