Hi all,
quick update: looks like copying OSDs does indeed deflate the objects with
partial overwrites in an EC pool again:
osd df tree blue stats
ID SIZEUSE alloc store
878.96.6 6.64.6 <-- old disk with inflated objects
294 111.9 1.92.0 <-- new di
Hi Frank,
On 7/31/2020 10:31 AM, Frank Schilder wrote:
Hi Igor,
thanks. I guess the problem with finding the corresponding images is, that it
happens on bluestore and not on object level. Even if I listed all rados
objects and added their sizes I would not see the excess storage.
Thinking ab
Hi Igor,
thanks. I guess the problem with finding the corresponding images is, that it
happens on bluestore and not on object level. Even if I listed all rados
objects and added their sizes I would not see the excess storage.
Thinking about working around this issue, would re-writing the object
Hi Frank,
On 7/30/2020 11:19 AM, Frank Schilder wrote:
Hi Igor,
thanks for looking at this. Here a few thoughts:
The copy goes to NTFS. I would expect between 2-4 meta data operations per
write, which would go to few existing objects. I guess the difference
bluestore_write_small-bluestore_wr
Hi Igor,
thanks for looking at this. Here a few thoughts:
The copy goes to NTFS. I would expect between 2-4 meta data operations per
write, which would go to few existing objects. I guess the difference
bluestore_write_small-bluestore_write_small_new are mostly such writes and are
susceptible
Hi Igor,
thanks! Here a sample extract for one OSD, time stamp (+%F-%H%M%S) in file
name. For the second collection I let it run for about 10 minutes after reset:
perf_dump_2020-07-29-142739.osd181:"bluestore_write_big": 10216689,
perf_dump_2020-07-29-142739.osd181:"bluestore_wri
Dear Igor,
please find below data from "ceph osd df tree" and per-OSD bluestore stats
pasted together with the script for extraction for reference. We have now:
df USED: 142 TB
bluestore_stored: 190.9TB (142*8/6 = 189, so matches)
bluestore_allocated: 275.2TB
osd df tree USE: 276.1 (so matches w
Frank,
so you have pretty high amount of small writes indeed. More than a half
of the written volume (in bytes) is done via small writes.
And 6x times more small requests.
This looks pretty odd for sequential write pattern and is likely to be
the root cause for that space overhead.
I can
Hi Frank,
you might want to proceed with perf counters' dump analysis in the
following way:
For 2-3 arbitrary osds
- save current perf counter dump
- reset perf counters
- leave OSD under the regular load for a while.
- dump perf counters again
- share both saved and new dumps and/or chec
Hi Igor,
thanks for your answer. I was thinking about that, but as far as I understood,
to hit this bug actually requires a partial rewrite to happen. However, these
are disk images in storage servers with basically static files, many of which
very large (15GB). Therefore, I believe, the vast m
Frank,
suggest to start with perf counter analysis as per the second part of my
previous email...
Thanks,
Igor
On 7/27/2020 2:30 PM, Frank Schilder wrote:
Hi Igor,
thanks for your answer. I was thinking about that, but as far as I understood,
to hit this bug actually requires a partial r
Hi Frank,
you might be being hit by https://tracker.ceph.com/issues/44213
In short the root causes areĀ significant space overhead due to high
bluestore allocation unit (64K) and EC overwrite design.
This is fixed for upcoming Pacific release by using 4K alloc unit but it
is unlikely to be b
12 matches
Mail list logo