On 09.10.20 13:55, Dan van der Ster wrote:
[...]
I also noticed a possible relationship with scrubbing -- One week ago
we increased to osd_max_scrubs=5 to clear out a scrubbing backlog; I
wonder if the increased read/write ratio somehow led to an exploding
buffer_anon. Do things stabilize on
On 07.10.20 21:00, Wido den Hollander wrote:
On 07/10/2020 16:00, Dan van der Ster wrote:
On Wed, Oct 7, 2020 at 3:29 PM Wido den Hollander wrote:
On 07/10/2020 14:08, Dan van der Ster wrote:
Hi all,
This morning some osds in our S3 cluster started going OOM, after
restarting them I
https://drive.switch.ch/index.php/s/Jwk0Kgy7Q1EIxuE
On 08.06.20 17:30, Igor Fedotov wrote:
I think it's better to put the log to some public cloud and paste the
link here..
On 6/8/2020 6:27 PM, Harald Staub wrote:
(really sorry for spamming, but it is still waiting for moderator, so
trying
(really sorry for spamming, but it is still waiting for moderator, so
trying with xz ...)
On 08.06.20 17:21, Harald Staub wrote:
(and now with trimmed attachment because of size restriction: only the
debug log)
On 08.06.20 16:53, Harald Staub wrote:
(and now with attachment
Cheers
Harry
On 08.06.20 16:37, Igor Fedotov wrote:
Hi Harald,
was this exact OSD suffering from "ceph_assert(h->file->fnode.ino != 1)"?
Could you please collect extended log with debug-bluefs set ot 20?
Thanks,
Igor
On 6/8/2020 4:48 PM, Harald Staub wrote:
This is agai
gister_command bluestore allocator dump bluefs-slow hook 0x559555ef0c90
-1> 2020-06-08 16:05:39.397 7fc589500d80 5 asok(0x559555eae000)
register_command bluestore allocator score bluefs-slow hook 0x559555ef0c90
[...]
On 08.06.20 15:48, Harald Staub wrote:
This is again about our bad clust
This is again about our bad cluster, with far too many objects. Now
another OSD crashes immediately at startup:
/build/ceph-14.2.8/src/os/bluestore/KernelDevice.cc: 944: FAILED
ceph_assert(is_valid_io(off, len))
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152)
This is again about our bad cluster, with too much objects, and the hdd
OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 GB
usable). Now several OSDs do not come up any more.
Typical error message:
/build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED
20/20 7:36 AM, Harald Staub wrote:
Hi Mark
Thank you for you explanations! Some numbers of this example osd below.
Cheers
Harry
From dump mempools:
"buffer_anon": {
"items": 29012,
"bytes": 4584503367
_manager) The osd memory
autotuning works by shrinking the bluestore and rocksdb caches to some
target value to try and keep the mapped memory of the process bellow the
osd_memory_target. In some cases it's possible that something other
than the caches are using the memory (usually pglog) or ther
As a follow-up to our recent memory problems with OSDs (with high pglog
values:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJN5EFE632CWWPK3UMGG3VF/#XHIWAIFX4AXZK5VEFOEBPS5TGTH33JZO
), we also see high buffer_anon values. E.g. more than 4 GB, with "osd
memory
is situation.
kind regards,
Wout
42on
On 13-05-2020 07:27, Harald Staub wrote:
Hi Mark
Thank you for your feedback!
The maximum number of PGs per OSD is only 123. But we have PGs with a
lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with
900M objects, maybe this is
entries per
PG. Keep in mind that fewer PG log entries may impact recovery. FWIW,
8.5GB of memory usage for pglog implies that you have a lot of PGs per
OSD, so that's probably the first place to look.
Good luck!
Mark
On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our
Several OSDs of one of our clusters are down currently because RAM usage
has increased during the last days. Now it is more than we can handle on
some systems. Frequently OSDs get killed by the OOM killer. Looking at
"ceph daemon osd.$OSD_ID dump_mempools", it shows that nearly all (about
8.5
Hi all
Something to try:
ceph config set mgr mgr/balancer/upmap_max_iterations 20
(Default is 100.)
Cheers
Harry
On 03.12.19 08:02, Lars Täuber wrote:
BTW: The osdmaptool doesn't see anything to do either:
$ ceph osd getmap -o om
$ osdmaptool om --upmap /tmp/upmap.sh --upmap-pool
15 matches
Mail list logo