[ceph-users] Re: another osd_pglog memory usage incident

2020-10-09 Thread Harald Staub
On 09.10.20 13:55, Dan van der Ster wrote: [...] I also noticed a possible relationship with scrubbing -- One week ago we increased to osd_max_scrubs=5 to clear out a scrubbing backlog; I wonder if the increased read/write ratio somehow led to an exploding buffer_anon. Do things stabilize on

[ceph-users] Re: another osd_pglog memory usage incident

2020-10-09 Thread Harald Staub
On 07.10.20 21:00, Wido den Hollander wrote: On 07/10/2020 16:00, Dan van der Ster wrote: On Wed, Oct 7, 2020 at 3:29 PM Wido den Hollander wrote: On 07/10/2020 14:08, Dan van der Ster wrote: Hi all, This morning some osds in our S3 cluster started going OOM, after restarting them I

[ceph-users] Re: crashing OSD: ceph_assert(is_valid_io(off, len))

2020-06-08 Thread Harald Staub
https://drive.switch.ch/index.php/s/Jwk0Kgy7Q1EIxuE On 08.06.20 17:30, Igor Fedotov wrote: I think it's better to put the log to some public cloud and paste the link here.. On 6/8/2020 6:27 PM, Harald Staub wrote: (really sorry for spamming, but it is still waiting for moderator, so trying

[ceph-users] Re: crashing OSD: ceph_assert(is_valid_io(off, len))

2020-06-08 Thread Harald Staub
(really sorry for spamming, but it is still waiting for moderator, so trying with xz ...) On 08.06.20 17:21, Harald Staub wrote: (and now with trimmed attachment because of size restriction: only the debug log) On 08.06.20 16:53, Harald Staub wrote: (and now with attachment

[ceph-users] Re: crashing OSD: ceph_assert(is_valid_io(off, len))

2020-06-08 Thread Harald Staub
Cheers Harry On 08.06.20 16:37, Igor Fedotov wrote: Hi Harald, was this exact OSD suffering from "ceph_assert(h->file->fnode.ino != 1)"? Could you please collect extended log with debug-bluefs set ot 20? Thanks, Igor On 6/8/2020 4:48 PM, Harald Staub wrote: This is agai

[ceph-users] Re: crashing OSD: ceph_assert(is_valid_io(off, len))

2020-06-08 Thread Harald Staub
gister_command bluestore allocator dump bluefs-slow hook 0x559555ef0c90 -1> 2020-06-08 16:05:39.397 7fc589500d80 5 asok(0x559555eae000) register_command bluestore allocator score bluefs-slow hook 0x559555ef0c90 [...] On 08.06.20 15:48, Harald Staub wrote: This is again about our bad clust

[ceph-users] crashing OSD: ceph_assert(is_valid_io(off, len))

2020-06-08 Thread Harald Staub
This is again about our bad cluster, with far too many objects. Now another OSD crashes immediately at startup: /build/ceph-14.2.8/src/os/bluestore/KernelDevice.cc: 944: FAILED ceph_assert(is_valid_io(off, len)) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152)

[ceph-users] crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-29 Thread Harald Staub
This is again about our bad cluster, with too much objects, and the hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 GB usable). Now several OSDs do not come up any more. Typical error message: /build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED

[ceph-users] Re: OSDs taking too much memory, for buffer_anon

2020-05-25 Thread Harald Staub
20/20 7:36 AM, Harald Staub wrote: Hi Mark Thank you for you explanations! Some numbers of this example osd below. Cheers  Harry From dump mempools:     "buffer_anon": {     "items": 29012,     "bytes": 4584503367    

[ceph-users] Re: OSDs taking too much memory, for buffer_anon

2020-05-20 Thread Harald Staub
_manager)  The osd memory autotuning works by shrinking the bluestore and rocksdb caches to some target value to try and keep the mapped memory of the process bellow the osd_memory_target.  In some cases it's possible that something other than the caches are using the memory (usually pglog) or ther

[ceph-users] OSDs taking too much memory, for buffer_anon

2020-05-20 Thread Harald Staub
As a follow-up to our recent memory problems with OSDs (with high pglog values: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJN5EFE632CWWPK3UMGG3VF/#XHIWAIFX4AXZK5VEFOEBPS5TGTH33JZO ), we also see high buffer_anon values. E.g. more than 4 GB, with "osd memory

[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-17 Thread Harald Staub
is situation. kind regards, Wout 42on On 13-05-2020 07:27, Harald Staub wrote: Hi Mark Thank you for your feedback! The maximum number of PGs per OSD is only 123. But we have PGs with a lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with 900M objects, maybe this is

[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-12 Thread Harald Staub
entries per PG.  Keep in mind that fewer PG log entries may impact recovery.  FWIW, 8.5GB of memory usage for pglog implies that you have a lot of PGs per OSD, so that's probably the first place to look. Good luck! Mark On 5/12/20 5:10 PM, Harald Staub wrote: Several OSDs of one of our

[ceph-users] OSDs taking too much memory, for pglog

2020-05-12 Thread Harald Staub
Several OSDs of one of our clusters are down currently because RAM usage has increased during the last days. Now it is more than we can handle on some systems. Frequently OSDs get killed by the OOM killer. Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows that nearly all (about 8.5

[ceph-users] Re: Balancing PGs across OSDs

2019-12-02 Thread Harald Staub
Hi all Something to try: ceph config set mgr mgr/balancer/upmap_max_iterations 20 (Default is 100.) Cheers Harry On 03.12.19 08:02, Lars Täuber wrote: BTW: The osdmaptool doesn't see anything to do either: $ ceph osd getmap -o om $ osdmaptool om --upmap /tmp/upmap.sh --upmap-pool