[ceph-users] rgw lifecycle process is not fast enough

2020-02-27 Thread quexian da
ceph version: 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable) I set up a ceph cluster and I'm uploading objects through rgw with a speed of 60 objects/s. I added some lifecycle rules to buckets so that my disks will not be used up. However, after I set "debug_rgw" to 5 and

[ceph-users] Re: continued warnings: Large omap object found

2020-02-27 Thread Brad Hubbard
og:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 : > cluster [WRN] Large omap object found. Object: > 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: > 1048559 Size (bytes): 46407183 > /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd.40 > (o

[ceph-users] continued warnings: Large omap object found

2020-02-27 Thread Seth Galitzer
ph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 : cluster [WRN] Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: 1048559 Size (bytes): 46407183 /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd.40 (osd.40) 1450 : clus

[ceph-users] Re: SSD considerations for block.db and WAL

2020-02-27 Thread DHilsbos
Christian; What is your failure domain? If your failure domain is set to OSD / drive, and 2 OSDs share a DB / WAL device, and that DB / WAL device dies, then portions of the data could drop to read-only (or be lost...). Ceph is really set up to own the storage hardware directly. It doesn't

[ceph-users] SSD considerations for block.db and WAL

2020-02-27 Thread Christian Wahl
Hi everyone, we currently have 6 OSDs with 8TB HDDs split across 3 hosts. The main usage is KVM-Images. To improve speed we planned on putting the block.db and WAL onto NVMe-SSDs. The plan was to put 2x1TB in each host. One option I thought of was to RAID 1 them for better redundancy, I don't

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Lincoln Bryant
It seems that one of the down PGs was able to recover just fine, but the other OSD went into "incomplete" state after export-and-removing the affected PG from the down OSD. I've still got the exported data from the pg, although re-importing it to the OSD again causes the crashes. What's the

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-27 Thread Dan van der Ster
FTR, the root cause is now understood: https://tracker.ceph.com/issues/39525#note-21 -- dan On Thu, Feb 20, 2020 at 9:24 PM Dan van der Ster wrote: > > On Thu, Feb 20, 2020 at 9:20 PM Wido den Hollander wrote: > > > > > Op 20 feb. 2020 om 19:54 heeft Dan van der Ster het > > > volgende

[ceph-users] Re: official ceph.com buster builds? [https://eu.ceph.com/debian-luminous buster]

2020-02-27 Thread Jelle de Jong
Hi all, Could someone make luminous available for buster (not container version, or nautilus)? What are the reasons for not having the version available from eu.ceph.com? What would be the motivation needed to add the packages? As I can see curl/libcurl4 version is the only thing needed to

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Lincoln Bryant
Thanks Sage, I can try that. Admittedly I'm not sure how to tell if these two PG can recover without this particular OSD. Note, it seems like there is still an underlying related issue, with hit set archives popping up as unfound objects on my cluster as in Paul's ticket. In total I had about

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Sage Weil
If the pg in question can recover without that OSD, I would use use ceph-objectstore-tool to export and remove it, and then move on. I hit a similar issue on my system (due to a bunch in an early octopus build) and it was super tedious to fix up manually (needed patched code and manual

[ceph-users] Re: [External Email] Re: 回复:Re: ceph prometheus module no export content

2020-02-27 Thread Dave Hall
Alternatively, it might be handy to have the passive mgrs issue an HTTP redirect to the active mgr.  Then a single DNS name pointing to all mgrs would always work, even when the active mgr fails over. Going a step further with some HA strategies, the cluster could have a separate, floating

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Paul Emmerich
Also: make a backup using the PG export feature of objectstore-tool before doing anything else. Sometimes it's enough to export and delete the PG from the broken OSD and import it into a different OSD using objectstore-tool. Paul -- Paul Emmerich Looking for help with your Ceph cluster?

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Paul Emmerich
Crash happens in PG::activate, so it's unrelated to IO etc. My first approach here would be to read the code and try to understand why it crashes/what the exact condition is that is violated here. It looks like something that can probably be fixed by fiddling around with ceph-objectstore-tool

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Lincoln Bryant
Thanks Paul. I was able to mark many of the unfound ones as lost, but I'm still stuck with one unfound and OSD assert at this point. I've tried setting many of the OSD options to pause all cluster I/O, backfilling, rebalancing, tiering agent, etc to try to avoid hitting the assert but alas

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Paul Emmerich
I've also encountered this issue, but luckily without the crashing OSDs, so marking as lost resolved it for us. See https://tracker.ceph.com/issues/44286 Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München

[ceph-users] Re: Is a scrub error (read_error) on a primary osd safe to repair?

2020-02-27 Thread Caspar Smit
Hi Mehmet, In our case the ceph pg repair fixed the issues (read_error). I think the read_error was just temporary due to low available RAM. You might want to check your actual issue with ceph pg query Kind regards, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t:

[ceph-users] Re: 回复:Re: ceph prometheus module no export content

2020-02-27 Thread Michael Bisig
Hi all, A similar question would be if it is possible to let passive mgr do the data collection!? We run 14.2.6 on a medium 2.5PB cluster with over 900M objects (rbd and mainly S3) . At the moment, we face an issue with the prometheus exporter while it has high load. (e.g. while we insert a