[ceph-users] (belated) CLT notes

2024-08-08 Thread Gregory Farnum
Hi folks, the CLT met on Monday August 5. We discussed a few topics: * The mailing lists are a problem to moderate right now with a huge increase in spam. We have two problems: 1) the moderation system's web front-end apparently isn't operational. That's getting fixed. 2) The moderation is a big l

[ceph-users] Re: Ceph tracker broken?

2024-07-01 Thread Gregory Farnum
You currently have "Email notifications" set to "For any event on all my projects". I believe that's the firehose setting, so I've gone ahead and changed it to "Only for things I watch or I'm involved in". I'm unaware of any reason that would have been changed on the back end, though there were som

[ceph-users] Re: How to setup NVMeoF?

2024-05-30 Thread Gregory Farnum
There's a major NVMe effort underway but it's not even merged to master yet, so I'm not sure how docs would have ended up in the Reef doc tree. :/ Zac, any idea? Can we pull this out? -Greg On Thu, May 30, 2024 at 7:03 AM Robert Sander wrote: > > Hi, > > On 5/30/24 14:18, Frédéric Nass wrote: >

[ceph-users] Re: cephfs-data-scan orphan objects while mds active?

2024-05-16 Thread Gregory Farnum
; --- > Olli Rajala - Lead TD > Anima Vitae Ltd. > www.anima.fi > --- > > > On Tue, May 14, 2024 at 9:41 AM Gregory Farnum wrote: > > > The cephfs-data-scan tools are built with the expectation that they'll > > be run off

[ceph-users] Re: cephfs-data-scan orphan objects while mds active?

2024-05-13 Thread Gregory Farnum
The cephfs-data-scan tools are built with the expectation that they'll be run offline. Some portion of them could be run without damaging the live filesystem (NOT all, and I'd have to dig in to check which is which), but they will detect inconsistencies that don't really exist (due to updates that

[ceph-users] Re: question about rbd_read_from_replica_policy

2024-04-04 Thread Gregory Farnum
On Thu, Apr 4, 2024 at 8:23 AM Anthony D'Atri wrote: > > Network RTT? No, it's sadly not that clever. There's a crush_location configurable that you can set on clients (to a host, or a datacenter, or any other CRUSH bucket), and as long as part of it matches the CRUSH map then it will feed IOs to

[ceph-users] Re: Are we logging IRC channels?

2024-03-22 Thread Gregory Farnum
I put it on the list for the next CLT. :) (though I imagine it will move to the infrastructure meeting from there.) On Fri, Mar 22, 2024 at 4:42 PM Mark Nelson wrote: > Sure! I think Wido just did it all unofficially, but afaik we've lost > all of those records now. I don't know if Wido still

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-15 Thread Gregory Farnum
On Fri, Mar 15, 2024 at 6:15 AM Ivan Clayson wrote: > Hello everyone, > > We've been experiencing on our quincy CephFS clusters (one 17.2.6 and > another 17.2.7) repeated slow ops with our client kernel mounts > (Ceph 17.2.7 and version 4 Linux kernels on all clients) that seem to > originate fro

[ceph-users] Re: Telemetry endpoint down?

2024-03-11 Thread Gregory Farnum
We had a lab outage Thursday and it looks like this service wasn’t restarted after that occurred. Fixed now and we’ll look at how to prevent that in future. -Greg On Mon, Mar 11, 2024 at 6:46 AM Konstantin Shalygin wrote: > Hi, seems telemetry endpoint is down for a some days? We have connection

[ceph-users] Re: Ceph-storage slack access

2024-03-08 Thread Gregory Farnum
Much of our infrastructure (including website) was down for ~6 hours yesterday. Some information on the sepia list, and more in the slack/irc channel. -Greg On Fri, Mar 8, 2024 at 9:48 AM Zac Dover wrote: > > I ping www.ceph.io and ceph.io with no difficulty: > > > zdover@NUC8i7BEH:~$ ping www.ce

[ceph-users] Re: Minimum amount of nodes needed for stretch mode?

2024-03-07 Thread Gregory Farnum
On Thu, Mar 7, 2024 at 9:09 AM Stefan Kooman wrote: > > Hi, > > TL;DR > > Failure domain considered is data center. Cluster in stretch mode [1]. > > - What is the minimum amount of monitor nodes (apart from tie breaker) > needed per failure domain? You need at least two monitors per site. This is

[ceph-users] Re: Ceph-storage slack access

2024-03-07 Thread Gregory Farnum
The slack workspace is bridged to our also-published irc channels. I don't think we've done anything to enable xmpp (and two protocols is enough work to keep alive!). -Greg On Wed, Mar 6, 2024 at 9:07 AM Marc wrote: > > Is it possible to access this also with xmpp? > > > > > At the very bottom of

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Gregory Farnum
On Wed, Mar 6, 2024 at 8:56 AM Matthew Vernon wrote: > > Hi, > > On 06/03/2024 16:49, Gregory Farnum wrote: > > Has the link on the website broken? https://ceph.com/en/community/connect/ > > We've had trouble keeping it alive in the past (getting a non-expiring > &

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Gregory Farnum
Has the link on the website broken? https://ceph.com/en/community/connect/ We've had trouble keeping it alive in the past (getting a non-expiring invite), but I thought that was finally sorted out. -Greg On Wed, Mar 6, 2024 at 8:46 AM Matthew Vernon wrote: > > Hi, > > How does one get an invite t

[ceph-users] Re: cephfs inode backtrace information

2024-01-31 Thread Gregory Farnum
The docs recommend a fast SSD pool for the CephFS *metadata*, but the default data pool can be more flexible. The backtraces are relatively small — it's an encoded version of the path an inode is located at, plus the RADOS hobject, which is probably more of the space usage. So it should fit fine in

[ceph-users] Re: Debian 12 support

2023-11-15 Thread Gregory Farnum
There are versioning and dependency issues (both of packages, and compiler toolchain pieces) which mean that the existing reef releases do not build on Debian. Our upstream support for Debian has always been inconsistent because we don’t have anybody dedicated or involved enough in both Debian and

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-01 Thread Gregory Farnum
We have seen issues like this a few times and they have all been kernel client bugs with CephFS’ internal “capability” file locking protocol. I’m not aware of any extant bugs like this in our code base, but kernel patches can take a long and winding path before they end up on deployed systems. Mos

[ceph-users] Re: Not able to find a standardized restoration procedure for subvolume snapshots.

2023-09-27 Thread Gregory Farnum
Unfortunately, there’s not any such ability. We are starting long-term work on making this smoother, but CephFS snapshots are read-only and there’s no good way to do a constant-time or low-time “clone” operation, so you just have to copy the data somewhere and start work on it from that position :/

[ceph-users] Re: CVE-2023-43040 - Improperly verified POST keys in Ceph RGW?

2023-09-27 Thread Gregory Farnum
We discussed this in the CLT today and Casey can talk more about the impact and technical state of affairs. This was disclosed on the security list and it’s rated as a bug that did not require hotfix releases due to the limited escalation scope. -Greg On Wed, Sep 27, 2023 at 1:37 AM Christian Roh

[ceph-users] Ceph leadership team notes 9/27

2023-09-27 Thread Gregory Farnum
Hi everybody, The CLT met today as usual. We only had a few topics under discussion: * the User + Dev relaunch went off well! We’d like reliable recordings and have found Jitsi to be somewhat glitchy; Laura will communicate about workarounds for that while we work on a longer-term solution (self-ho

[ceph-users] Re: RHEL / CephFS / Pacific / SELinux unavoidable "relabel inode" error?

2023-08-02 Thread Gregory Farnum
I don't think we've seen this reported before. SELinux gets a hefty workout from Red Hat with their downstream ODF for OpenShift (Kubernetes), so it certainly works at a basic level. SELinux is a fussy beast though, so if you're eg mounting CephFS across RHEL nodes and invoking SELinux against it,

[ceph-users] Re: CephFS snapshots: impact of moving data

2023-07-06 Thread Gregory Farnum
Moving files around within the namespace never changes the way the file data is represented within RADOS. It’s just twiddling metadata bits. :) -Greg On Thu, Jul 6, 2023 at 3:26 PM Dan van der Ster wrote: > Hi Mathias, > > Provided that both subdirs are within the same snap context (subdirs belo

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Gregory Farnum
I haven’t checked the logs, but the most obvious way this happens is if the mtime set on the directory is in the future compared to the time on the client or server making changes — CephFS does not move times backwards. (This causes some problems but prevents many, many others when times are not sy

[ceph-users] Re: [EXTERN] Re: cephfs max_file_size

2023-05-24 Thread Gregory Farnum
On Tue, May 23, 2023 at 11:52 PM Dietmar Rieder wrote: > > On 5/23/23 15:58, Gregory Farnum wrote: > > On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder > > wrote: > >> > >> Hi, > >> > >> can the cephfs "max_file_size" setting be chan

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right direction. Thank

[ceph-users] Re: cephfs max_file_size

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder wrote: > > Hi, > > can the cephfs "max_file_size" setting be changed at any point in the > lifetime of a cephfs? > Or is it critical for existing data if it is changed after some time? Is > there anything to consider when changing, let's say, from 1TB

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Gregory Farnum
On Fri, May 12, 2023 at 5:28 AM Frank Schilder wrote: > > Dear Xiubo and others. > > >> I have never heard about that option until now. How do I check that and > >> how to I disable it if necessary? > >> I'm in meetings pretty much all day and will try to send some more info > >> later. > > > >

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
to head home > now ... > > Thanks and best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Gregory Farnum > Sent: Wednesday, May 10, 2023 4:26 PM > To: Frank Schilder >

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
This is a very strange assert to be hitting. From a code skim my best guess is the inode somehow has an xattr with no value, but that's just a guess and I've no idea how it would happen. Somebody recently pointed you at the (more complicated) way of identifying an inode path by looking at its RADOS

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Gregory Farnum
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov wrote: > > > On 5/2/2023 11:32 AM, Nikola Ciprich wrote: > > I've updated cluster to 17.2.6 some time ago, but the problem persists. > > This is > > especially annoying in connection with https://tracker.ceph.com/issues/56896 > > as restarting OSDs is q

[ceph-users] Re: Ceph stretch mode / POOL_BACKFILLFULL

2023-04-27 Thread Gregory Farnum
On Fri, Apr 21, 2023 at 7:26 AM Kilian Ries wrote: > > Still didn't find out what will happen when the pool is full - but tried a > little bit in our testing environment and i were not able to get the pool > full before an OSD got full. So in first place one OSD reached the full ratio > (pool n

[ceph-users] Re: Bug, pg_upmap_primaries.empty()

2023-04-26 Thread Gregory Farnum
Looks like you've somehow managed to enable the upmap balancer while allowing a client that's too told to understand it to mount. Radek, this is a commit from yesterday; is it a known issue? On Wed, Apr 26, 2023 at 7:49 AM Nguetchouang Ngongang Kevin wrote: > > Good morning, i found a bug on cep

[ceph-users] Re: CephFS thrashing through the page cache

2023-04-05 Thread Gregory Farnum
On Fri, Mar 17, 2023 at 1:56 AM Ashu Pachauri wrote: > > Hi Xiubo, > > As you have correctly pointed out, I was talking about the stipe_unit > setting in the file layout configuration. Here is the documentation for > that for anyone else's reference: > https://docs.ceph.com/en/quincy/cephfs/file-l

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
> Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Gregory Farnum > Sent: Wednesday, March 22, 2023 4:14 PM > To: Frank Schilder > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] Re: ln: failed to c

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
Do you have logs of what the nfs server is doing? Managed to reproduce it in terms of direct CephFS ops? On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder wrote: > I have to correct myself. It also fails on an export with "sync" mode. > Here is an strace on the client (strace ln envs/satwindspy/in

[ceph-users] Re: mds damage cannot repair

2023-02-13 Thread Gregory Farnum
A "backtrace" is an xattr on the RADOS object storing data for a given file, and it contains the file's (versioned) path from the root. So a bad backtrace means there's something wrong with that — possibly just that there's a bug in the version of the code that's checking it, because they're genera

[ceph-users] Re: Health warning - POOL_TARGET_SIZE_BYTES_OVERCOMMITED

2023-02-13 Thread Gregory Farnum
On Mon, Feb 13, 2023 at 4:16 AM Sake Paulusma wrote: > > Hello, > > I configured a stretched cluster on two datacenters. It's working fine, > except this weekend the Raw Capicity exceeded 50% and the error > POOL_TARGET_SIZE_BYTES_OVERCOMMITED showed up. > > The command "ceph df" is showing the

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Gregory Farnum
Also, that the current leader (ceph-01) is one of the monitors proposing an election each time suggests the problem is with getting commit acks back from one of its followers. On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster wrote: > > Hi Frank, > > Check the mon logs with some increased debug lev

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Gregory Farnum
On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote: > Hi Dhairya, > > On Mon, 12 Dec 2022, Dhairya Parmar wrote: > > > You might want to look at [1] for this, also I found a relevant thread > [2] > > that could be helpful. > > > > Thanks a lot. I already found [1,2], too. But I did not considere

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-08 Thread Gregory Farnum
Manuel Holtgrewe > Sent: Thursday, December 8, 2022 12:38 PM > To: Charles Hedrick > Cc: Gregory Farnum ; Dhairya Parmar ; > ceph-users@ceph.io > Subject: Re: [ceph-users] Re: what happens if a server crashes with cephfs? > > Hi Charles, > > are you concerned with a singl

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-08 Thread Gregory Farnum
t; >> thanks. I'm evaluating cephfs for a computer science dept. We have users >> that run week-long AI training jobs. They use standard packages, which they >> probably don't want to modify. At the moment we use NFS. It uses synchronous >> I/O, so if somethings goes wr

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-07 Thread Gregory Farnum
More generally, as Manuel noted you can (and should!) make use of fsync et al for data safety. Ceph’s async operations are not any different at the application layer from how data you send to the hard drive can sit around in volatile caches until a consistency point like fsync is invoked. -Greg On

[ceph-users] Re: Implications of pglog_hardlimit

2022-11-29 Thread Gregory Farnum
On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer wrote: > I've got a cluster in a precarious state because several nodes have run > out of memory due to extremely large pg logs on the osds. I came across > the pglog_hardlimit flag which sounds like the solution to the issue, > but I'm concerned that

[ceph-users] Re: CephFS performance

2022-11-22 Thread Gregory Farnum
In addition to not having resiliency by default, my recollection is that BeeGFS also doesn't guarantee metadata durability in the event of a crash or hardware failure like CephFS does. There's not really a way for us to catch up to their "in-memory metadata IOPS" with our "on-disk metadata IOPS". :

[ceph-users] Re: 16.2.11 branch

2022-10-31 Thread Gregory Farnum
On Fri, Oct 28, 2022 at 8:51 AM Laura Flores wrote: > > Hi Christian, > > There also is https://tracker.ceph.com/versions/656 which seems to be > > tracking > > the open issues tagged for this particular point release. > > > > Yes, thank you for providing the link. > > If you don't mind me asking

[ceph-users] Re: Slow monitor responses for rbd ls etc.

2022-10-18 Thread Gregory Farnum
On Fri, Oct 7, 2022 at 7:53 AM Sven Barczyk wrote: > > Hello, > > > > we are encountering a strange behavior on our Ceph. (All Ubuntu 20 / All > mons Quincy 17.2.4 / Oldest OSD Quincy 17.2.0 ) > Administrative commands like rbd ls or create are so slow, that libvirtd is > running into timeouts and

[ceph-users] Re: disable stretch_mode possible?

2022-10-17 Thread Gregory Farnum
On Mon, Oct 17, 2022 at 4:40 AM Enrico Bocchi wrote: > > Hi, > > I have played with stretch clusters a bit but never managed to > un-stretch them fully. > > From my experience (using Pacific 16.2.9), once the stretch mode is > enabled, the replicated pools switch to the stretch_rule with size 4,

[ceph-users] Re: CLT meeting summary 2022-09-28

2022-09-28 Thread Gregory Farnum
On Wed, Sep 28, 2022 at 9:15 AM Adam King wrote: > Budget Discussion > >- Going to investigate current resources being used, see if any costs >can be cut >- What can be moved from virtual environments to internal ones? >- Need to take inventory of what resources we currently have

[ceph-users] Re: Power outage recovery

2022-09-15 Thread Gregory Farnum
Recovery from OSDs loses the mds and rgw keys they use to authenticate with cephx. You need to get those set up again by using the auth commands. I don’t have them handy but it is discussed in the mailing list archives. -Greg On Thu, Sep 15, 2022 at 3:28 PM Jorge Garcia wrote: > Yes, I tried res

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
gt; > Is there a way to find out how many osdmaps are currently being kept? > ____ > From: Gregory Farnum > Sent: Wednesday, September 7, 2022 10:58 AM > To: Wyll Ingersoll > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] data usage growing desp

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
tore-tool to migrate PGs to their proper destinations (letting the cluster clean up the excess copies if you can afford to — deleting things is always scary). But I haven't had to help recover a death-looping cluster in around a decade, so that's about all the options I can offer up. -Gre

[ceph-users] Re: cephfs blocklist recovery and recover_session mount option

2022-09-07 Thread Gregory Farnum
On Tue, Aug 16, 2022 at 3:14 PM Vladimir Brik wrote: > > Hello > > I'd like to understand what is the proper/safe way to > recover when a cephfs client becomes blocklisted by the MDS. > > The man page of mount.ceph talks about recover_session=clean > option, but it has the following text I am not

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll wrote: > > > Our cluster has not had any data written to it externally in several weeks, > but yet the overall data usage has been growing. > Is this due to heavy recovery activity? If so, what can be done (if > anything) to reduce the data generate

[ceph-users] Re: [Help] Does MSGR2 protocol use openssl for encryption

2022-09-02 Thread Gregory Farnum
We partly rolled our own with AES-GCM. See https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format -Greg On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote: > > Hi, > > I have a question about the MSGR protocol Ceph used

[ceph-users] Re: CephFS MDS sizing

2022-09-02 Thread Gregory Farnum
On Sun, Aug 28, 2022 at 12:19 PM Vladimir Brik wrote: > > Hello > > Is there a way to query or get an approximate value of an > MDS's cache hit ratio without using "dump loads" command > (which seems to be a relatively expensive operation) for > monitoring and such? Unfortunately, I'm not seeing o

[ceph-users] Re: Changing the cluster network range

2022-09-02 Thread Gregory Farnum
On Mon, Aug 29, 2022 at 12:49 AM Burkhard Linke wrote: > > Hi, > > > some years ago we changed our setup from a IPoIB cluster network to a > single network setup, which is a similar operation. > > > The OSD use the cluster network for heartbeats and backfilling > operation; both use standard tcp c

[ceph-users] Re: Potential bug in cephfs-data-scan?

2022-08-25 Thread Gregory Farnum
On Fri, Aug 19, 2022 at 7:17 AM Patrick Donnelly wrote: > > On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen > wrote: > > > > Hi, > > > > I have recently been scanning the files in a PG with "cephfs-data-scan > > pg_files ...". > > Why? > > > Although, after a long time the scan was sti

[ceph-users] Re: CephFS perforamnce degradation in root directory

2022-08-15 Thread Gregory Farnum
I was wondering if it had something to do with quota enforcement. The other possibility that occurs to me is if other clients are monitoring the system, or an admin pane (eg the dashboard) is displaying per-volume or per-client stats, they may be poking at the mountpoint and interrupting exclusive

[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Gregory Farnum
The immediate driver is both a switch to newer versions of python, and to newer compilers supporting more C++20 features. More generally, supporting multiple versions of a distribution is a lot of work and when Reef comes out next year, CentOS9 will be over a year old. We generally move new stable

[ceph-users] Re: Upgrade from Octopus to Pacific cannot get monitor to join

2022-07-28 Thread Gregory Farnum
ode' they are running because it's > all in docker containers. But maybe I'm missing something obvious > > Thanks > > > > > July 27, 2022 4:34 PM, "Gregory Farnum" wrote: > > On Wed, Jul 27, 2022 at 10:24 AM wrote: > > Currently running Oc

[ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

2022-07-28 Thread Gregory Farnum
https://tracker.ceph.com/issues/56650 There's a PR in progress to resolve this issue now. (Thanks, Prashant!) -Greg On Thu, Jul 28, 2022 at 7:52 AM Nicolas FONTAINE wrote: > > Hello, > > We have exactly the same problem. Did you find an answer or should we > open a bug report? > > Sincerely, > >

[ceph-users] Re: cannot set quota on ceph fs root

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder wrote: > > Hi all, > > I'm trying to set a quota on the ceph fs file system root, but it fails with > "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any > sub-directory. Is this intentional? The documentation > (https://docs.cep

[ceph-users] Re: Cluster running without monitors

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl wrote: > > Hi Ceph Users, > > > I am currently evaluating different cluster layouts and as a test I stopped > two of my three monitors while client traffic was running on the nodes.? > > > Only when I restartet an OSD all PGs which were related to th

[ceph-users] Re: Upgrade from Octopus to Pacific cannot get monitor to join

2022-07-27 Thread Gregory Farnum
On Wed, Jul 27, 2022 at 10:24 AM wrote: > Currently running Octopus 15.2.16, trying to upgrade to Pacific using > cephadm. > > 3 mon nodes running 15.2.16 > 2 mgr nodes running 16.2.9 > 15 OSD's running 15.2.16 > > The mon/mgr nodes are running in lxc containers on Ubuntu running docker > from th

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-26 Thread Gregory Farnum
t us know. -Greg > > On Tue, Jul 26, 2022 at 3:16 PM Gregory Farnum wrote: > > > > We can’t do the final release until the recent mgr/volumes security > fixes get merged in, though. > > https://github.com/ceph/ceph/pull/47236 > > > > On Tue, Jul 26, 202

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-26 Thread Gregory Farnum
We can’t do the final release until the recent mgr/volumes security fixes get merged in, though. https://github.com/ceph/ceph/pull/47236 On Tue, Jul 26, 2022 at 3:12 PM Ramana Krisna Venkatesh Raja < rr...@redhat.com> wrote: > On Thu, Jul 21, 2022 at 10:28 AM Yuri Weinstein > wrote: > > > > Deta

[ceph-users] Re: LibCephFS Python Mount Failure

2022-07-26 Thread Gregory Farnum
It looks like you’re setting environment variables that force your new keyring, it you aren’t telling the library to use your new CephX user. So it opens your new keyring and looks for the default (client.admin) user and doesn’t get anything. -Greg On Tue, Jul 26, 2022 at 7:54 AM Adam Carrgilson

[ceph-users] Re: ceph-fs crashes on getfattr

2022-07-12 Thread Gregory Farnum
roken more or less all the way around. > > > > Best regards, > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > > From: Gregory Farnum > > Sent: 11 July

[ceph-users] Re: ceph-fs crashes on getfattr

2022-07-11 Thread Gregory Farnum
On Mon, Jul 11, 2022 at 8:26 AM Frank Schilder wrote: > > Hi all, > > we made a very weird observation on our ceph test cluster today. A simple > getfattr with a misspelled attribute name sends the MDS cluster into a > crash+restart loop. Something as simple as > > getfattr -n ceph.dir.layout.

[ceph-users] Re: cephfs client permission restrictions?

2022-06-23 Thread Gregory Farnum
On Thu, Jun 23, 2022 at 8:18 AM Wyll Ingersoll wrote: > > Is it possible to craft a cephfs client authorization key that will allow the > client read/write access to a path within the FS, but NOT allow the client to > modify the permissions of that path? > For example, allow RW access to /cephfs

[ceph-users] Re: Possible to recover deleted files from CephFS?

2022-06-14 Thread Gregory Farnum
On Tue, Jun 14, 2022 at 8:50 AM Michael Sherman wrote: > > Hi, > > We discovered that a number of files were deleted from our cephfs filesystem, > and haven’t been able to find current backups or snapshots. > > Is it possible to “undelete” a file by modifying metadata? Using > `cephfs-journal-to

[ceph-users] Re: Feedback/questions regarding cephfs-mirror

2022-06-10 Thread Gregory Farnum
On Wed, Jun 8, 2022 at 12:36 AM Andreas Teuchert wrote: > > > Hello, > > we're currently evaluating cephfs-mirror. > > We have two data centers with one Ceph cluster in each DC. For now, the > Ceph clusters are only used for CephFS. On each cluster we have one FS > that contains a directory for cu

[ceph-users] Re: Ceph on RHEL 9

2022-06-10 Thread Gregory Farnum
We aren't building for Centos 9 yet, so I guess the python dependency declarations don't work with the versions in that release. I've put updating to 9 on the agenda for the next CLT. (Do note that we don't test upstream packages against RHEL, so if Centos Stream does something which doesn't match

[ceph-users] Re: Stretch cluster questions

2022-05-16 Thread Gregory Farnum
I'm not quite clear where the confusion is coming from here, but there are some misunderstandings. Let me go over it a bit: On Tue, May 10, 2022 at 1:29 AM Frank Schilder wrote: > > > What you are missing from stretch mode is that your CRUSH rule wouldn't > > guarantee at least one copy in surviv

[ceph-users] Re: repairing damaged cephfs_metadata pool

2022-05-16 Thread Gregory Farnum
On Tue, May 10, 2022 at 2:47 PM Horvath, Dustin Marshall wrote: > > Hi there, newcomer here. > > I've been trying to figure out if it's possible to repair or recover cephfs > after some unfortunate issues a couple of months ago; these couple of nodes > have been offline most of the time since th

[ceph-users] Re: Incomplete file write/read from Ceph FS

2022-05-06 Thread Gregory Farnum
Do you have any locking which guarantees that nodes don't copy files which are still in the process of being written? CephFS will guarantee any readers see the results of writes which are already reported complete while reading, but I don't see any guarantees about atomicity in https://docs.microso

[ceph-users] Re: [progress WARNING root] complete: ev ... does not exist, oh my!

2022-05-06 Thread Gregory Farnum
On Fri, May 6, 2022 at 5:58 AM Harry G. Coin wrote: > > I tried searching for the meaning of a ceph Quincy all caps WARNING > message, and failed. So I need help. Ceph tells me my cluster is > 'healthy', yet emits a bunch of 'progress WARNING root] comlete ev' ... > messages. Which I score rig

[ceph-users] Re: Stretch cluster questions

2022-05-06 Thread Gregory Farnum
On Fri, May 6, 2022 at 3:21 AM Eneko Lacunza wrote: > Hi, > > Just made some basic tests, feature works nicely as far as I have tested :) > > I created 2 aditional pools each with a matching stretch rule: > - size=2/min=1 (not advised I know) > - size=6/min=3 (some kind of paranoid config) > > Wh

[ceph-users] Re: Stretch cluster questions

2022-05-04 Thread Gregory Farnum
On Wed, May 4, 2022 at 1:25 AM Eneko Lacunza wrote: > Hi Gregory, > > El 3/5/22 a las 22:30, Gregory Farnum escribió: > > On Mon, Apr 25, 2022 at 12:57 AM Eneko Lacunza > wrote: > > We're looking to deploy a stretch cluster for a 2-CPD deployment. I have >

[ceph-users] Re: [CephFS, MDS] internal MDS internal heartbeat is not healthy!

2022-05-03 Thread Gregory Farnum
Okay, so you started out with 2 active MDSes and then they failed on a restart? And in an effort to fix it you changed max_mds to 3? (That was a bad idea, but *probably* didn't actually hurt anything this time — adding new work to scale out a system which already can't turn on just overloads it mor

[ceph-users] Re: Stretch cluster questions

2022-05-03 Thread Gregory Farnum
On Mon, Apr 25, 2022 at 12:57 AM Eneko Lacunza wrote: > > Hi all, > > We're looking to deploy a stretch cluster for a 2-CPD deployment. I have > read the following docs: > https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#stretch-clusters > > I have some questions: > > - Can we have m

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Gregory Farnum
On Wed, Apr 13, 2022 at 10:01 AM Dan van der Ster wrote: > > I would set the pg_num, not pgp_num. In older versions of ceph you could > manipulate these things separately, but in pacific I'm not confident about > what setting pgp_num directly will do in this exact scenario. > > To understand, the

[ceph-users] Re: Cephfs default data pool (inode backtrace) no longer a thing?

2022-03-21 Thread Gregory Farnum
The backtraces are written out asynchronously by the MDS to those objects, so there can be a delay between file creation and when they appear. In fact I think backtraces only get written when the inode in question is falling out of the MDS journal, so if you have a relatively small number of flies

[ceph-users] Re: Not able to start MDS after upgrade to 16.2.7

2022-02-11 Thread Gregory Farnum
On Fri, Feb 11, 2022 at 10:53 AM Izzy Kulbe wrote: > Hi, > > If the MDS host has enough spare memory, setting > > `mds_cache_memory_limit`[*] to 9GB (or more if it permits) would get > > rid of this warning. Could you check if that improves the situation? > > Normally, the MDS starts trimming its

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
“Up” is the set of OSDs which are alive from the calculated crush mapping. “Acting” includes those extras which have been added in to bring the PG up to proper size. So the PG does have 3 live OSDs serving it. But perhaps the safety check *is* looking at up instead of acting? That seems like a pla

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
I don’t know how to get better errors out of cephadm, but the only way I can think of for this to happen is if your crush rule is somehow placing multiple replicas of a pg on a single host that cephadm wants to upgrade. So check your rules, your pool sizes, and osd tree? -Greg On Thu, Feb 10, 2022

[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-02-08 Thread Gregory Farnum
On Tue, Feb 8, 2022 at 7:30 AM Dan van der Ster wrote: > > On Tue, Feb 8, 2022 at 1:04 PM Frank Schilder wrote: > > The reason for this seemingly strange behaviour was an old static snapshot > > taken in an entirely different directory. Apparently, ceph fs snapshots are > > not local to an FS d

[ceph-users] Re: Ceph Performance very bad even in Memory?!

2022-01-31 Thread Gregory Farnum
There's a lot going on here. Some things I noticed you should be aware of in relation to the tests you performed: * Ceph may not have the performance ceiling you're looking for. A write IO takes about half a millisecond of CPU time, which used to be very fast and is now pretty slow compared to an

[ceph-users] Re: Ideas for Powersaving on archive Cluster ?

2022-01-21 Thread Gregory Farnum
I would not recommend this on Ceph. There was a project where somebody tried to make RADOS amenable to spinning down drives, but I don't think it ever amounted to anything. The issue is just that the OSDs need to do disk writes whenever they get new OSDMaps, there's a lot of random stuff that upda

[ceph-users] Re: v16.2.7 Pacific released

2022-01-11 Thread Gregory Farnum
On Tue, Jan 11, 2022 at 5:29 AM Dan van der Ster wrote: > > Hi, > > Yes it's confusing -- the release notes are normally only published in > master, which is shown as "latest", and are rarely backported to a > release branch. > The notes you're looking for are here: > https://docs.ceph.com/en/late

[ceph-users] Re: OSD write op out of order

2021-12-27 Thread Gregory Farnum
On Mon, Dec 27, 2021 at 9:12 AM gyfelectric wrote: > > Hi all, > > Recently, the problem of OSD disorder has often appeared in my > environment(14.2.5) and my Fuse Client borken > due to "FAILED assert(ob->last_commit_tid < tid)”. My application can’t > work normally now. > > The time series that

[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-15 Thread Gregory Farnum
Hmm that ticket came from the slightly unusual scenario where you were deploying a *new* Pacific monitor against an Octopus cluster. Michael, is your cluster deployed with cephadm? And is this a new or previously-existing monitor? On Wed, Dec 15, 2021 at 12:09 AM Michael Uleysky wrote: > > Thank

[ceph-users] Re: Ceph container image repos

2021-12-14 Thread Gregory Farnum
I generated a quick doc PR so this doesn't trip over other users: https://github.com/ceph/ceph/pull/44310. Thanks all! -Greg On Mon, Dec 13, 2021 at 10:59 AM John Petrini wrote: > > "As of August 2021, new container images are pushed to quay.io > registry only. Docker hub won't receive new conten

[ceph-users] Re: Support for alternative RHEL derivatives

2021-12-13 Thread Gregory Farnum
On Mon, Dec 13, 2021 at 7:02 AM Benoit Knecht wrote: > > Hi, > > As we're getting closer to CentOS 8 EOL, I'm sure plenty of Ceph users are > looking to migrate from CentOS 8 to CentOS Stream 8 or one of the new RHEL > derivatives, e.g. Rocky and Alma. > > The question of upstream support has alre

[ceph-users] Re: MDS stuck in stopping state

2021-12-13 Thread Gregory Farnum
and best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Gregory Farnum > Sent: 13 December 2021 17:39:55 > To: Frank Schilder > Cc: ceph-users > Subject: Re: [ceph-users] M

[ceph-users] Re: MDS stuck in stopping state

2021-12-13 Thread Gregory Farnum
This looks awkward — just from the ops, it seems mds.1 is trying to move some stray items (presumably snapshots of since-deleted files, from what you said?) into mds0's stray directory, and then mds.0 tries to get auth pins from mds.1 but that fails for some reason which isn't apparent from the dum

[ceph-users] Re: CephFS Metadata Pool bandwidth usage

2021-12-13 Thread Gregory Farnum
; "statfs_ops": [], > "command_ops": [] > } > > Any suggestions would be much appreciated. > > Kind regards, > > András > > > On Thu, Dec 9, 2021 at 7:48 PM Andras Sali wrote: >> >> Hi Greg, >> >> Much appreciated for

[ceph-users] Re: CephFS Metadata Pool bandwidth usage

2021-12-09 Thread Gregory Farnum
Andras, Unfortunately your attachment didn't come through the list. (It might work if you embed it inline? Not sure.) I don't know if anybody's looked too hard at this before, and without the image I don't know exactly what metric you're using to say something's 320KB in size. Can you explain more

[ceph-users] Re: Recursive delete hangs on cephfs

2021-11-22 Thread Gregory Farnum
with kernel >= 5.8. I also have kernel 5.3 on one of the client clusters and > nowsync there is not supported, however all rm operations happen reasonably > fast. So the second question is, does 5.3's libceph behave differently on > recursing rm compared to 5.4 or 5.8? >

[ceph-users] Re: Recursive delete hangs on cephfs

2021-11-17 Thread Gregory Farnum
On Sat, Nov 13, 2021 at 5:25 PM Sasha Litvak wrote: > > I continued looking into the issue and have no idea what hinders the > performance yet. However: > > 1. A client operating with kernel 5.3.0-42 (ubuntu 18.04) has no such > problems. I delete a directory with hashed subdirs (00 - ff) and tot

  1   2   3   4   5   6   7   8   9   10   >