[ceph-users] Re: 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin
This problem of inaccessible file systems post upgrade by other than client.admin date back from v14 carries on through v17.  It also applies to any case of specifying other than the default pool names for new file systems.  Solved because Curt remembered link on this list.  (Thanks Curt!) Here

[ceph-users] Re: architecture help (iscsi, rbd, backups?)

2023-05-02 Thread Bailey Allison
Hey Angelo, Ya, we are using the RBD driver for quite a few customers in production, and it is working quite good! Hahahaha, I am familiar with the bug you are talking about I think, I believe that may be resolved by now. I believe the driver is either out of beta now/soon to be out of beta?

[ceph-users] [multisite] "bucket sync status" takes a while

2023-05-02 Thread Yixin Jin
Hi folks, With a multi-site environment, when I create a bucket-level sync policy with a symmetric flow between the master zone and another zone, "bucket sync status" immediately shows that the sync is now enabled in the master zone. But it takes a while for it to show that in the other zone. I

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Laura Flores
I saw two untracked failures in the upgrade/octopus-x suite. Both failures seem to indicate a problem with containers, unrelated to the Ceph code. However, if anyone else can please take a look and confirm, I would appreciate it. upgrade/octopus-x (pacific) https://pulpito.ceph.com/yuriw-2023-04-25

[ceph-users] 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin
In 17.2.6 is there a security requirement that pool names supporting a ceph fs filesystem match the filesystem name.data for the data and name.meta for the associated metadata pool? (multiple file systems are enabled) I have filesystems from older versions with the data pool name matching the

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello Igor, On Tue, May 02, 2023 at 05:41:04PM +0300, Igor Fedotov wrote: > Hi Nikola, > > I'd suggest to start monitoring perf counters for your osds. > op_w_lat/subop_w_lat ones specifically. I presume they raise eventually, > don't they? OK, starting collecting those for all OSDs.. currently

[ceph-users] Flushing stops as copy-from message being throttled

2023-05-02 Thread lingu2008
Hi all, On one server with a cache tier on Samsung PM983 SSDs for an EC base tier on HDDs, I find the cache tier stops flushing or evicting when the cache tier is near full. With quite some gdb-debugging, I find the problem may be with the throttling mechanism. When the write traffic is high,

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-02 Thread Adam King
The number of mgr daemons thing is expected. The way it works is it first upgrades all the standby mgrs (which will be all but one) and then fails over so the previously active mgr can be upgraded as well. After that failover is when it's first actually running the newer cephadm code, which is when

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Nizamudeen A
dashboard approved! Regards, Nizam On Tue, May 2, 2023, 20:48 Yuri Weinstein wrote: > Please review the Release Notes - https://github.com/ceph/ceph/pull/51301 > > Still seeking approvals for: > > rados - Neha, Radek, Laura > rook - Sébastien Han > dashboard - Ernesto > > fs - Venky, Patric

[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff
Hi Patrick, Please be careful resetting the journal. It was not necessary. You can try to recover the missing inode using cephfs-data-scan [2]. Yes. I did that very reluctantly after trying everything else as a last resort. But since it only gave me another error, I restored the previous sta

[ceph-users] Balancing Reads in Ceph

2023-05-02 Thread Alan Nair
Hi. I am currently using Ceph for replicated storage to store many objects across 5 nodes with 3x replication. When I generate ~1000 read requests to a single object, they all get serviced by the same primary OSD. I would like to balance the reads across the replicas. So I use the following:

[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Patrick Donnelly
On Tue, May 2, 2023 at 10:31 AM Janek Bevendorff wrote: > > Hi, > > After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS > fails start start. After replaying the journal, it just crashes with > > [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry > #0x1/storag

[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-05-02 Thread mhnx
Thank you for the explanation Frank. I also agree with you, Ceph is not designed for this kind of use case but I tried to continue what I know. My idea was exactly what you described, I was trying to automate cleaning or recreating on any failure. As you can see below, rep1 pool is very fast: - C

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Yuri Weinstein
Please review the Release Notes - https://github.com/ceph/ceph/pull/51301 Still seeking approvals for: rados - Neha, Radek, Laura rook - Sébastien Han dashboard - Ernesto fs - Venky, Patrick (upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)) ceph-volume - Guillaume On Tue,

[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff
Thanks! I tried downgrading to 16.2.10 and was able to get it running again, but after a reboot, got a warning that two of the OSDs on that host had broken Bluestore compression. Restarting the two OSDs again got rid of it, but that's still a bit concerning. On 02/05/2023 16:48, Dan van der

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Gregory Farnum
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov wrote: > > > On 5/2/2023 11:32 AM, Nikola Ciprich wrote: > > I've updated cluster to 17.2.6 some time ago, but the problem persists. > > This is > > especially annoying in connection with https://tracker.ceph.com/issues/56896 > > as restarting OSDs is q

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Casey Bodley
On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/59542#note-1 > Release Notes - TBD > > Seeking approvals for: > > smoke - Radek, Laura > rados - Radek, Laura > rook - Sébastien Han > cephadm - Adam K >

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-02 Thread Reza Bakhshayeshi
Hi Adam, I'm still struggling with this issue. I also checked it one more time with newer versions, upgrading the cluster from 16.2.11 to 16.2.12 was successful but from 16.2.12 to 17.2.6 failed again with the same ssh errors (I checked https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Yuri Weinstein
Venky, I did plan to cherry-pick this PR if you approve this (this PR was used for a rerun) On Tue, May 2, 2023 at 7:51 AM Venky Shankar wrote: > > Hi Yuri, > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein wrote: > > > > Details of this release are summarized here: > > > > https://tracker.ceph

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Venky Shankar
Hi Yuri, On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/59542#note-1 > Release Notes - TBD > > Seeking approvals for: > > smoke - Radek, Laura > rados - Radek, Laura > rook - Sébastien Han > cephadm -

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov
On 5/2/2023 11:32 AM, Nikola Ciprich wrote: I've updated cluster to 17.2.6 some time ago, but the problem persists. This is especially annoying in connection with https://tracker.ceph.com/issues/56896 as restarting OSDs is quite painfull when half of them crash.. with best regards Feel free to

[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Dan van der Ster
Hi Janek, That assert is part of a new corruption check added in 16.2.12 -- see https://github.com/ceph/ceph/commit/1771aae8e79b577acde749a292d9965264f20202 The abort is controlled by a new option: +Option("mds_abort_on_newly_corrupt_dentry", Option::TYPE_BOOL, Option::LEVEL_ADVANCED) +.

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov
Hi Nikola, I'd suggest to start monitoring perf counters for your osds. op_w_lat/subop_w_lat ones specifically. I presume they raise eventually, don't they? Does subop_w_lat grow for every OSD or just a subset of them? How large is the delta between the best and the worst OSDs after a one we

[ceph-users] MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff
Hi, After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS fails start start. After replaying the journal, it just crashes with [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry #0x1/storage [2,head] auth (dversion lock) Immediately after the upgrade, I

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-02 Thread Eugen Block
Hi, disclaimer: I haven't used LRC in a real setup yet, so there might be some misunderstandings on my side. But I tried to play around with one of my test clusters (Nautilus). Because I'm limited in the number of hosts (6 across 3 virtual DCs) I tried two different profiles with lower nu

[ceph-users] Re: Memory leak in MGR after upgrading to pacific.

2023-05-02 Thread Gary Molenkamp
To follow up on this issue,  I saw the additional comments on https://tracker.ceph.com/issues/59580 regarding mgr caps. By setting the mgr user caps back to the default, I was able to reduce the memory leak from several 100MB/h to just a few MB/hr. As the other commenter had posted, in order fo

[ceph-users] Re: RBD mirroring, asking for clarification

2023-05-02 Thread Eugen Block
Hi, while your assumptions are correct (you can use the rest of the pool for other non-mirrored images), at least I'm not aware of any limitations, can I ask for the motivation behind this question? Mixing different use-cases doesn't seem like a good idea to me. There's always a chance th

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-02 Thread Frank Schilder
Hi Arnaud, thanks, that's a good one. The inode in question should be in cache at this time. It actually accepts the hex-code given in the log message and is really fast. I hope I remember that for next time. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-02 Thread MARTEL Arnaud
Hi, Or you can query the MDS(s) with: ceph tell mds.* dump inode 2>/dev/null | grep path for example: user@server:~$ ceph tell mds.* dump inode 1099836155033 2>/dev/null | grep path "path": "/ec42/default/joliot/gipsi/gpu_burn.sif", "stray_prior_path": "", Arnaud Le 01/05/2023 15:07

[ceph-users] quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello dear CEPH users and developers, we're dealing with strange problems.. we're having 12 node alma linux 9 cluster, initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch of KVM virtual machines accessing volumes using RBD. everything is working well, but there is strang

[ceph-users] Re: PVE CEPH OSD heartbeat show

2023-05-02 Thread Fabian Grünbichler
On May 1, 2023 9:30 pm, Peter wrote: > Hi Fabian, > > Thank you for your prompt response. It's crucial to understand how things > work, and I appreciate your assistance. > > After replacing the switch for our Ceph environment, we experienced three > days of normalcy before the issue recurred th