Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Konstantin Shalygin
On 9/13/19 4:51 AM, Reed Dier wrote: I would love to deprecate the multi-root, and may try to do just that in my next OSD add, just worried about data shuffling unnecessarily. Would this in theory help my distribution across disparate OSD topologies? May be. Actually I don't know where is bala

Re: [ceph-users] Ceph RBD Mirroring

2019-09-12 Thread Oliver Freyermuth
Dear Jason, thanks for taking care and developing a patch so quickly! I have another strange observation to share. In our test setup, only a single RBD mirroring daemon is running for 51 images. It works fine with a constant stream of 1-2 MB/s, but at some point after roughly 20 hours, _all_

[ceph-users] Local Device Health PG inconsistent

2019-09-12 Thread Reed Dier
Trying to narrow down a strange issue where the single PG for the device_health_metrics that was created when I enabled the 'diskprediction_local' module in the ceph-mgr. But I never see any inconsistent objects in the PG. > $ ceph health detail > OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED Po

Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Reed Dier
> 1. Multi-root. You should deprecate your 'ssd' root and move your osds of > this root to 'default' root. > I would love to deprecate the multi-root, and may try to do just that in my next OSD add, just worried about data shuffling unnecessarily. Would this in theory help my distribution across

Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

2019-09-12 Thread Oliver Freyermuth
Dear Cephalopodians, I can confirm the same problem described by Joe Ryner in 14.2.2. I'm also getting (in a small test setup): - # ceph health detail HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitt

Re: [ceph-users] Bluestore OSDs keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)

2019-09-12 Thread Igor Fedotov
Hi Stefan, thanks for the update. Relevant PR from Paul mentions kernels (4.9+): https://github.com/ceph/ceph/pull/23273 Not sure how correct this is. That's all I have.. Try asking Sage/Paul... Also could you please update the ticket with more details, e..g what are the original and new k

Re: [ceph-users] Bluestore OSDs keep crashing in BlueStore.cc: 8808: FAILED assert(r == 0)

2019-09-12 Thread Stefan Priebe - Profihost AG
Hello Igor, i can now confirm that this is indeed a kernel bug. The issue does no longer happen on upgraded nodes. Do you know more about it? I really would like to know in which version it was fixed to prevent rebooting all ceph nodes. Greets, Stefan Am 27.08.19 um 16:20 schrieb Igor Fedotov:

Re: [ceph-users] ceph version 14.2.3-OSD fails

2019-09-12 Thread Igor Fedotov
Hi, this line:     -2> 2019-09-12 16:38:15.101 7fcd02fd1f80  1 bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in 0 extents tells me that OSD is unable to load free list manager properly, i.e. list of free/allocated blocks in unavailable. You might want to set 'debug bluestore

[ceph-users] ceph version 14.2.3-OSD fails

2019-09-12 Thread cephuser2345 user
Hi we have updated the ceph version from 14.2.2 to version 14.2.3. the osd getting : -2176.68713 host osd048 66 hdd 12.78119 osd.66 up 1.0 1.0 67 hdd 12.78119 osd.67 up 1.0 1.0 68 hdd 12.78119 osd.68 up 1.0 1

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

2019-09-12 Thread jesper
Thursday, 12 September 2019, 17.16 +0200 from Paul Emmerich : >Yeah, CephFS is much closer to POSIX semantics for a filesystem than >NFS. There's an experimental relaxed mode called LazyIO but I'm not >sure if it's applicable here. > >You can debug this by dumping slow requests from the MDS se

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

2019-09-12 Thread Paul Emmerich
Yeah, CephFS is much closer to POSIX semantics for a filesystem than NFS. There's an experimental relaxed mode called LazyIO but I'm not sure if it's applicable here. You can debug this by dumping slow requests from the MDS servers via the admin socket Paul -- Paul Emmerich Looking for help w

[ceph-users] Call for Submission for the IO500 List

2019-09-12 Thread John Bent
*Call for SubmissionDeadline: 10 November 2019 AoEThe IO500 is now accepting and encouraging submissions for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of sm

[ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

2019-09-12 Thread Stefan Kooman
Dear list, We recently switched the shared storage for our linux shared hosting platforms from "nfs" to "cephfs". Performance improvement are noticeable. It all works fine, however, there is one peculiar thing: when Apache reloads after a logrotate of the "error" logs all but one node will hang fo

Re: [ceph-users] units of metrics

2019-09-12 Thread Paul Emmerich
We use a custom script to collect these metrics in croit Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Sep 12, 2019 at 5:00 PM Stefan Kooman wrote: > > Hi Pa

Re: [ceph-users] units of metrics

2019-09-12 Thread Stefan Kooman
Hi Paul, Quoting Paul Emmerich (paul.emmer...@croit.io): > https://static.croit.io/ceph-training-examples/ceph-training-example-admin-socket.pdf Thanks for the link. So, what tool do you use to gather the metrics? We are using telegraf module of the Ceph manager. However, this module only provide

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Jason Dillaman
On Thu, Sep 12, 2019 at 3:31 AM Marc Schöchlin wrote: > > Hello Jason, > > yesterday i started rbd-nbd in forground mode to see if there are any > additional informations. > > root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id > nfs > 2019-09-11 13:07:41.444534 77fe

Re: [ceph-users] increase pg_num error

2019-09-12 Thread Kyriazis, George
Hi Burkhard, I tried using the autoscaler, however it did not give a suggestion to resize pg_num. Since my pg_num is not a power of 2, I wanted to fix that first, manually, to only realize that it didn’t work. Because changing pg_num manually did not work, I am not convinced that the autoscal

Re: [ceph-users] increase pg_num error

2019-09-12 Thread Burkhard Linke
Hi, On 9/12/19 5:16 AM, Kyriazis, George wrote: Ok, after all is settled, I tried changing pg_num again on my pool and it still didn’t work: # ceph osd pool get rbd1 pg_num pg_num: 100 # ceph osd pool set rbd1 pg_num 128 # ceph osd pool get rbd1 pg_num pg_num: 100 # ceph osd require-osd-releas

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Marc Schöchlin
Hello Jason, yesterday i started rbd-nbd in forground mode to see if there are any additional informations. root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id nfs 2019-09-11 13:07:41.444534 77fe1040  0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)