[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
I will send a unicast email with the link and details. -Paul On Aug 25, 2021, at 10:37 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: Hi Paul, Please send me the detail versions of the tcmu-runner and ceph-iscsi packages you are using. Thanks On 8/26/21 10:21 AM, Paul Giralt (pgiralt)

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
Thank you. I did find some coredump files. Is there a way I can send these to you to analyze? [root@cxcto-c240-j27-02 coredump]# ls -asl total 71292 0 drwxr-xr-x. 2 root root 176 Aug 25 18:31 . 0 drwxr-xr-x. 5 root root 70 Aug 10 11:31 .. 34496 -rw-r-. 1 root root 35316215

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
Thanks Xiubo. I will try this. How do I set the log level to 4? -Paul On Aug 25, 2021, at 9:30 PM, Xiubo Li mailto:xiu...@redhat.com>> wrote: It's buggy, we need one way to export the tcmu-runner log to the host. Could you see any crash coredump from the host ? Without that could you keep

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
If the tcmu-runner daemon is died, the above logs are expected. So we need to know what has caused the tcmu-runner service's crash. Xiubo Thanks for the response Xiubo. How can I go about figuring out why the tsmu-runner daemon has died? Are there any logs I can pull that will give insight

[ceph-users] Re: pgcalc tool removed (or moved?) from ceph.com ?

2021-08-25 Thread Mike Perez
Hi all, I attempted to migrate this to the new website, but it's going to require some CSS work. I'm not the best with CSS, and the old website has a very large CSS file to dissect from. I'll see what I can come up with tomorrow. https://github.com/ceph/ceph.io/issues/265 On Thu, Jul 8, 2021 at

[ceph-users] How to slow down PG recovery when a failed OSD node come back?

2021-08-25 Thread huxia...@horebdata.cn
Dear Cepher, I had an all flash 3 node Ceph cluster, each node of 8 SSDs as OSDs, running Ceph release 12.2.13. I have the following setting osd_op_queue = wpq osd_op_queue_cut_off = high and osd_recovery_sleep= 0.5 osd_min_pg_log_entries = 3000 osd_max_pg_log_entries = 1

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
Ilya / Xiubo, The problem just re-occurred on one server and I ran the systemctl status command. You can see there are no tcmu-runner processes listed: [root@cxcto-c240-j27-04 ~]# systemctl status ● cxcto-c240-j27-04.cisco.com State: running Jobs: 0 queued Failed: 0 units

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
Hi Dan, > [...] Do you have some custom mds config in this area? none that I'm aware of. What MDS config parameters should I look for? I recently seem to have had problems with very slow dirfrag operations that made an MDS unresponsive long enough for a MON to kick it out. I had to increase

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
Hi Dan, thanks for looking at this. Here are the lines from health detail and ceph.log: [root@gnosis ~]# ceph health detail HEALTH_WARN 4 large omap objects LARGE_OMAP_OBJECTS 4 large omap objects 4 large objects found in pool 'con-fs2-meta1' Search the cluster log for 'Large omap object

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Paul Giralt (pgiralt)
> > Does the node hang while shutting down or does it lock up so that you > can't even issue the reboot command? > It hangs when shutting down. I can SSH in and issue commands just fine and it takes the shutdown command and kicks me out, but it appears to never shut down as I can still ping

[ceph-users] Re: tcmu-runner crashing on 16.2.5

2021-08-25 Thread Ilya Dryomov
On Wed, Aug 25, 2021 at 7:02 AM Paul Giralt (pgiralt) wrote: > > I upgraded to Pacific 16.2.5 about a month ago and everything was working > fine. Suddenly for the past few days I’ve started having the tcmu-runner > container on my iSCSI gateways just disappear. I’m assuming this is because >

[ceph-users] Re: All monitors failed, recovering from encrypted osds: everything lost??

2021-08-25 Thread Ignacio García
No, the servers were rebooted and, missing the monitor, the osds are not able to run... El 25/8/21 a las 15:07, Janne Johansson escribió: Den ons 25 aug. 2021 kl 14:27 skrev Ignacio García : Only 1 monitor that was running on a failed disk -> unrecoverable store.db to create a new monitor.

[ceph-users] LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Frank Schilder
Hi all, I have the notorious "LARGE_OMAP_OBJECTS: 4 large omap objects" warning and am again wondering if there is any proper action one can take except "wait it out and deep-scrub (numerous ceph-users threads)" or "ignore

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Dan van der Ster
Hi, On Wed, Aug 25, 2021 at 2:37 PM Frank Schilder wrote: > > Hi Dan, > > > [...] Do you have some custom mds config in this area? > > none that I'm aware of. What MDS config parameters should I look for? This covers the topic and relevant config:

[ceph-users] Re: All monitors failed, recovering from encrypted osds: everything lost??

2021-08-25 Thread Janne Johansson
Den ons 25 aug. 2021 kl 14:27 skrev Ignacio García : > > Only 1 monitor that was running on a failed disk -> unrecoverable > store.db to create a new monitor. > > Then trying to recover from osds following: >

[ceph-users] All monitors failed, recovering from encrypted osds: everything lost??

2021-08-25 Thread Ignacio García
Only 1 monitor that was running on a failed disk -> unrecoverable store.db to create a new monitor. Then trying to recover from osds following: https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds But lvm devices that service bluestore osds are

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Dan van der Ster
Those are probably large directories; each omap key is a file/subdir in the directory. Normally the mds fragments dirs across several objects, so you shouldn't have a huge number of omap entries in any one single object. Do you have some custom mds config in this area? -- dan On Wed, Aug 25,

[ceph-users] Re: LARGE_OMAP_OBJECTS: any proper action possible?

2021-08-25 Thread Dan van der Ster
Hi Frank, Which objects are large? (You should see this in ceph.log when the large obj was detected). -- dan On Wed, Aug 25, 2021 at 12:27 PM Frank Schilder wrote: > > Hi all, > > I have the notorious "LARGE_OMAP_OBJECTS: 4 large omap objects" warning and > am again wondering if there is any

[ceph-users] Re: Disable autostart of old services

2021-08-25 Thread Marc
Probably ceph-disk osd's not? check this: /etc/systemd/system/ceph-osd.target.wants/ systemctl is-enabled ceph-osd@XX systemctl disable ceph-osd@XX > -Original Message- > From: Stolte, Felix > Sent: Wednesday, 25 August 2021 10:38 > To: ceph-users@ceph.io > Subject: [ceph-users]

[ceph-users] Disable autostart of old services

2021-08-25 Thread Stolte, Felix
Hey guys, we have an osd server with issues on its network interfaces. I marked out all osds on that server and disabled the ceph-osd@# services as well as ceph.target and ceph-osd.target. But after a reboot the osd services are starting again causing trouble. Which systemd unit do i need to

[ceph-users] Re: Ceph on windows: unable to map RBDimage

2021-08-25 Thread Lucian Petrut
Hi, On Windows, the RBD device map commands are dispatched to a centralized service so that the daemons are not tied to the current Windows session. The service gets configured automatically by the MSI installer [1]. However, if you’d like to configure it manually, please check this document