[ceph-users] snaptrim blocks IO on ceph nautilus

Manuel Lausch Mon, 27 Jul 2020 06:59:42 -0700

Hi,

since some days I try to debug a problem with snaptrimming under
nautilus.


I have a cluster with Nautilus (v14.2.10) , 44 Nodes á 24 OSDs á 14 TB
I create every day a snapshot for 7 days.

Every time the old snapshot is deleting I have bad IO performcance and blocked 
requests for several seconds until the snaptrim is done.
Settings like snaptrim_sleep and osd_pg_max_concurrent_snap_trims don't affect 
this behavior.

In the debug_osd 10/10 log I see the following:

2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 
prio 196 cost 0 latency 0.019545 osd_repop_reply(client.22731418.0:615257 3.636 
e22457/22372) v2 pg pg[3.636( v 22457'100855 (21737'97756,22457'100855] 
local-lis/les=22372/22374 n=27762 ec=2842/2839 lis/c 22372/22372 les/c/f 
22374/22374/0 22372/22372/22343) [411,36,956,763] r=0 lpr=22372 
luod=22457'100854 crt=22457'100855 lcod 22457'100853 mlcod 22457'100853 
active+clean+snaptrim_wait trimq=[1d~1]]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 
finish
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 
prio 127 cost 0 latency 0.043165 MOSDScrubReserve(2.2645 RELEASE e22457) v1 pg 
pg[2.2645( empty local-lis/les=22359/22364 n=0 ec=2403/2403 lis/c 22359/22359 
les/c/f 22364/22367/0 22359/22359/22359) [379,411,884,975] r=1 lpr=22359 
crt=0'0 active mbc={}]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 
finish
2020-07-27 11:45:50.039 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 
22457'99491 (21594'96426,22457'99491] local-lis/les=22359/22362 n=27669 
ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) 
[411,379,848,924] r=0 lpr=22359 crt=22457'99491 lcod 22457'99489 mlcod 
22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer posting
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 
22457'99493 (21594'96426,22457'99493] local-lis/les=22359/22362 n=27669 
ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) 
[411,379,848,924] r=0 lpr=22359 luod=22457'99491 crt=22457'99493 lcod 
22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer 
complete
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 
prio 127 cost 663 latency 7.761823 osd_repop(osd.217.0:3025 3.1ca5 
e22457/22378) v2 pg pg[3.1ca5( v 22457'100370 (21716'97357,22457'100370] 
local-lis/les=22378/22379 n=27532 ec=2855/2839 lis/c 22378/22378 les/c/f 
22379/22379/0 22378/22378/22378) [217,411,551,1055] r=1 lpr=22378 luod=0'0 
lua=22294'100006 crt=22457'100370 lcod 22457'100369 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 
finish
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 
prio 127 cost 0 latency 7.494296 MOSDScrubReserve(2.37e2 REQUEST e22457) v1 pg 
pg[2.37e2( empty local-lis/les=22355/22356 n=0 ec=2412/2412 lis/c 22355/22355 
les/c/f 22356/22356/0 22355/22355/22355) [245,411,834,768] r=1 lpr=22355 
crt=0'0 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 
finish

the dequeueing of ops works without pauses until the „snap_trimmer posting“ and 
„snap_trimmer complete“ loglines. This task takes in this example about 7 
Seconds. The following operations which are dequeued have now a latency of 
about this time.

I tried to drill down this in the code. (Developers are asked here)
It seems, that the PG will be locked for every operation.
The snap_trimmer posting and complete message comes from „osd/PrimaryLogPG.cc“ 
on line 4700. This indicates me, that the process of deleting a snapshot object 
will sometimes take some time.

After further poking around. I see in „osd/SnapMapper.cc“ the method 
„SnapMapper::get_next_objects_to_trim“ which takes several seconds to get 
finished. I followed this further to the „common/map_cacher.hpp“ to the line 
94: „int r = driver->get_next(key, &store);“
From there I lost the path.

The slowness is not on all OSDs at the same time. Somteime, this few OSDs are 
affected, sometimes some others. Restart of an OSD does not help.

With luminous and filestore, snapshot deletion was not an issue at all. 
With nautilus and bluestore this is not acceptable for my usecase.

I don‘t know so far, if this is a bluestore specific problem or some general 
issue.
I wonder a bit why there are no other who have this problem.


Regards
Manuel
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] snaptrim blocks IO on ceph nautilus

Reply via email to