[ceph-users] Re: mimic: much more raw used than reported

2020-08-03 Thread Frank Schilder
Hi all, quick update: looks like copying OSDs does indeed deflate the objects with partial overwrites in an EC pool again: osd df tree blue stats ID SIZEUSE alloc store 878.96.6 6.64.6 <-- old disk with inflated objects 294 111.9 1.92.0 <-- new di

[ceph-users] Re: mimic: much more raw used than reported

2020-08-01 Thread Igor Fedotov
Hi Frank, On 7/31/2020 10:31 AM, Frank Schilder wrote: Hi Igor, thanks. I guess the problem with finding the corresponding images is, that it happens on bluestore and not on object level. Even if I listed all rados objects and added their sizes I would not see the excess storage. Thinking ab

[ceph-users] Re: mimic: much more raw used than reported

2020-07-31 Thread Frank Schilder
Hi Igor, thanks. I guess the problem with finding the corresponding images is, that it happens on bluestore and not on object level. Even if I listed all rados objects and added their sizes I would not see the excess storage. Thinking about working around this issue, would re-writing the object

[ceph-users] Re: mimic: much more raw used than reported

2020-07-30 Thread Igor Fedotov
Hi Frank, On 7/30/2020 11:19 AM, Frank Schilder wrote: Hi Igor, thanks for looking at this. Here a few thoughts: The copy goes to NTFS. I would expect between 2-4 meta data operations per write, which would go to few existing objects. I guess the difference bluestore_write_small-bluestore_wr

[ceph-users] Re: mimic: much more raw used than reported

2020-07-30 Thread Frank Schilder
Hi Igor, thanks for looking at this. Here a few thoughts: The copy goes to NTFS. I would expect between 2-4 meta data operations per write, which would go to few existing objects. I guess the difference bluestore_write_small-bluestore_write_small_new are mostly such writes and are susceptible

[ceph-users] Re: mimic: much more raw used than reported

2020-07-29 Thread Frank Schilder
Hi Igor, thanks! Here a sample extract for one OSD, time stamp (+%F-%H%M%S) in file name. For the second collection I let it run for about 10 minutes after reset: perf_dump_2020-07-29-142739.osd181:"bluestore_write_big": 10216689, perf_dump_2020-07-29-142739.osd181:"bluestore_wri

[ceph-users] Re: mimic: much more raw used than reported

2020-07-29 Thread Frank Schilder
Dear Igor, please find below data from "ceph osd df tree" and per-OSD bluestore stats pasted together with the script for extraction for reference. We have now: df USED: 142 TB bluestore_stored: 190.9TB (142*8/6 = 189, so matches) bluestore_allocated: 275.2TB osd df tree USE: 276.1 (so matches w

[ceph-users] Re: mimic: much more raw used than reported

2020-07-29 Thread Igor Fedotov
Frank, so you have pretty high amount of small writes indeed. More than a half of the written volume (in bytes) is done via small writes. And 6x times more small requests. This looks pretty odd for sequential write pattern and is likely to be the root cause for that space overhead. I can

[ceph-users] Re: mimic: much more raw used than reported

2020-07-29 Thread Igor Fedotov
Hi Frank, you might want to proceed with perf counters' dump analysis in the following way: For 2-3 arbitrary osds - save current perf counter dump - reset perf counters - leave OSD under the regular load for a while. - dump perf counters again - share both saved and new dumps and/or chec

[ceph-users] Re: mimic: much more raw used than reported

2020-07-27 Thread Frank Schilder
Hi Igor, thanks for your answer. I was thinking about that, but as far as I understood, to hit this bug actually requires a partial rewrite to happen. However, these are disk images in storage servers with basically static files, many of which very large (15GB). Therefore, I believe, the vast m

[ceph-users] Re: mimic: much more raw used than reported

2020-07-27 Thread Igor Fedotov
Frank, suggest to start with perf counter analysis as per the second part of my previous email... Thanks, Igor On 7/27/2020 2:30 PM, Frank Schilder wrote: Hi Igor, thanks for your answer. I was thinking about that, but as far as I understood, to hit this bug actually requires a partial r

[ceph-users] Re: mimic: much more raw used than reported

2020-07-27 Thread Igor Fedotov
Hi Frank, you might be being hit by https://tracker.ceph.com/issues/44213 In short the root causes areĀ  significant space overhead due to high bluestore allocation unit (64K) and EC overwrite design. This is fixed for upcoming Pacific release by using 4K alloc unit but it is unlikely to be b