Hi folks, the CLT met on Monday August 5. We discussed a few topics:
* The mailing lists are a problem to moderate right now with a huge
increase in spam. We have two problems: 1) the moderation system's web
front-end apparently isn't operational. That's getting fixed. 2) The
moderation is a big l
You currently have "Email notifications" set to "For any event on all
my projects". I believe that's the firehose setting, so I've gone
ahead and changed it to "Only for things I watch or I'm involved in".
I'm unaware of any reason that would have been changed on the back
end, though there were som
There's a major NVMe effort underway but it's not even merged to
master yet, so I'm not sure how docs would have ended up in the Reef
doc tree. :/ Zac, any idea? Can we pull this out?
-Greg
On Thu, May 30, 2024 at 7:03 AM Robert Sander
wrote:
>
> Hi,
>
> On 5/30/24 14:18, Frédéric Nass wrote:
>
; ---
> Olli Rajala - Lead TD
> Anima Vitae Ltd.
> www.anima.fi
> ---
>
>
> On Tue, May 14, 2024 at 9:41 AM Gregory Farnum wrote:
>
> > The cephfs-data-scan tools are built with the expectation that they'll
> > be run off
The cephfs-data-scan tools are built with the expectation that they'll
be run offline. Some portion of them could be run without damaging the
live filesystem (NOT all, and I'd have to dig in to check which is
which), but they will detect inconsistencies that don't really exist
(due to updates that
On Thu, Apr 4, 2024 at 8:23 AM Anthony D'Atri wrote:
>
> Network RTT?
No, it's sadly not that clever. There's a crush_location configurable
that you can set on clients (to a host, or a datacenter, or any other
CRUSH bucket), and as long as part of it matches the CRUSH map then it
will feed IOs to
I put it on the list for the next CLT. :) (though I imagine it will move to
the infrastructure meeting from there.)
On Fri, Mar 22, 2024 at 4:42 PM Mark Nelson wrote:
> Sure! I think Wido just did it all unofficially, but afaik we've lost
> all of those records now. I don't know if Wido still
On Fri, Mar 15, 2024 at 6:15 AM Ivan Clayson wrote:
> Hello everyone,
>
> We've been experiencing on our quincy CephFS clusters (one 17.2.6 and
> another 17.2.7) repeated slow ops with our client kernel mounts
> (Ceph 17.2.7 and version 4 Linux kernels on all clients) that seem to
> originate fro
We had a lab outage Thursday and it looks like this service wasn’t
restarted after that occurred. Fixed now and we’ll look at how to prevent
that in future.
-Greg
On Mon, Mar 11, 2024 at 6:46 AM Konstantin Shalygin wrote:
> Hi, seems telemetry endpoint is down for a some days? We have connection
Much of our infrastructure (including website) was down for ~6 hours
yesterday. Some information on the sepia list, and more in the
slack/irc channel.
-Greg
On Fri, Mar 8, 2024 at 9:48 AM Zac Dover wrote:
>
> I ping www.ceph.io and ceph.io with no difficulty:
>
>
> zdover@NUC8i7BEH:~$ ping www.ce
On Thu, Mar 7, 2024 at 9:09 AM Stefan Kooman wrote:
>
> Hi,
>
> TL;DR
>
> Failure domain considered is data center. Cluster in stretch mode [1].
>
> - What is the minimum amount of monitor nodes (apart from tie breaker)
> needed per failure domain?
You need at least two monitors per site. This is
The slack workspace is bridged to our also-published irc channels. I
don't think we've done anything to enable xmpp (and two protocols is
enough work to keep alive!).
-Greg
On Wed, Mar 6, 2024 at 9:07 AM Marc wrote:
>
> Is it possible to access this also with xmpp?
>
> >
> > At the very bottom of
On Wed, Mar 6, 2024 at 8:56 AM Matthew Vernon wrote:
>
> Hi,
>
> On 06/03/2024 16:49, Gregory Farnum wrote:
> > Has the link on the website broken? https://ceph.com/en/community/connect/
> > We've had trouble keeping it alive in the past (getting a non-expiring
> &
Has the link on the website broken? https://ceph.com/en/community/connect/
We've had trouble keeping it alive in the past (getting a non-expiring
invite), but I thought that was finally sorted out.
-Greg
On Wed, Mar 6, 2024 at 8:46 AM Matthew Vernon wrote:
>
> Hi,
>
> How does one get an invite t
The docs recommend a fast SSD pool for the CephFS *metadata*, but the
default data pool can be more flexible. The backtraces are relatively
small — it's an encoded version of the path an inode is located at,
plus the RADOS hobject, which is probably more of the space usage. So
it should fit fine in
There are versioning and dependency issues (both of packages, and compiler
toolchain pieces) which mean that the existing reef releases do not build
on Debian. Our upstream support for Debian has always been inconsistent
because we don’t have anybody dedicated or involved enough in both Debian
and
We have seen issues like this a few times and they have all been kernel
client bugs with CephFS’ internal “capability” file locking protocol. I’m
not aware of any extant bugs like this in our code base, but kernel patches
can take a long and winding path before they end up on deployed systems.
Mos
Unfortunately, there’s not any such ability. We are starting long-term work
on making this smoother, but CephFS snapshots are read-only and there’s no
good way to do a constant-time or low-time “clone” operation, so you just
have to copy the data somewhere and start work on it from that position :/
We discussed this in the CLT today and Casey can talk more about the impact
and technical state of affairs.
This was disclosed on the security list and it’s rated as a bug that did
not require hotfix releases due to the limited escalation scope.
-Greg
On Wed, Sep 27, 2023 at 1:37 AM Christian Roh
Hi everybody,
The CLT met today as usual. We only had a few topics under discussion:
* the User + Dev relaunch went off well! We’d like reliable recordings and
have found Jitsi to be somewhat glitchy; Laura will communicate about
workarounds for that while we work on a longer-term solution (self-ho
I don't think we've seen this reported before. SELinux gets a hefty
workout from Red Hat with their downstream ODF for OpenShift
(Kubernetes), so it certainly works at a basic level.
SELinux is a fussy beast though, so if you're eg mounting CephFS
across RHEL nodes and invoking SELinux against it,
Moving files around within the namespace never changes the way the file
data is represented within RADOS. It’s just twiddling metadata bits. :)
-Greg
On Thu, Jul 6, 2023 at 3:26 PM Dan van der Ster
wrote:
> Hi Mathias,
>
> Provided that both subdirs are within the same snap context (subdirs belo
I haven’t checked the logs, but the most obvious way this happens is if the
mtime set on the directory is in the future compared to the time on the
client or server making changes — CephFS does not move times backwards.
(This causes some problems but prevents many, many others when times are
not sy
On Tue, May 23, 2023 at 11:52 PM Dietmar Rieder
wrote:
>
> On 5/23/23 15:58, Gregory Farnum wrote:
> > On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder
> > wrote:
> >>
> >> Hi,
> >>
> >> can the cephfs "max_file_size" setting be chan
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right direction. Thank
On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder
wrote:
>
> Hi,
>
> can the cephfs "max_file_size" setting be changed at any point in the
> lifetime of a cephfs?
> Or is it critical for existing data if it is changed after some time? Is
> there anything to consider when changing, let's say, from 1TB
On Fri, May 12, 2023 at 5:28 AM Frank Schilder wrote:
>
> Dear Xiubo and others.
>
> >> I have never heard about that option until now. How do I check that and
> >> how to I disable it if necessary?
> >> I'm in meetings pretty much all day and will try to send some more info
> >> later.
> >
> >
to head home
> now ...
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Gregory Farnum
> Sent: Wednesday, May 10, 2023 4:26 PM
> To: Frank Schilder
>
This is a very strange assert to be hitting. From a code skim my best
guess is the inode somehow has an xattr with no value, but that's just
a guess and I've no idea how it would happen.
Somebody recently pointed you at the (more complicated) way of
identifying an inode path by looking at its RADOS
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov wrote:
>
>
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > I've updated cluster to 17.2.6 some time ago, but the problem persists.
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is q
On Fri, Apr 21, 2023 at 7:26 AM Kilian Ries wrote:
>
> Still didn't find out what will happen when the pool is full - but tried a
> little bit in our testing environment and i were not able to get the pool
> full before an OSD got full. So in first place one OSD reached the full ratio
> (pool n
Looks like you've somehow managed to enable the upmap balancer while
allowing a client that's too told to understand it to mount.
Radek, this is a commit from yesterday; is it a known issue?
On Wed, Apr 26, 2023 at 7:49 AM Nguetchouang Ngongang Kevin
wrote:
>
> Good morning, i found a bug on cep
On Fri, Mar 17, 2023 at 1:56 AM Ashu Pachauri wrote:
>
> Hi Xiubo,
>
> As you have correctly pointed out, I was talking about the stipe_unit
> setting in the file layout configuration. Here is the documentation for
> that for anyone else's reference:
> https://docs.ceph.com/en/quincy/cephfs/file-l
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Gregory Farnum
> Sent: Wednesday, March 22, 2023 4:14 PM
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: ln: failed to c
Do you have logs of what the nfs server is doing?
Managed to reproduce it in terms of direct CephFS ops?
On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder wrote:
> I have to correct myself. It also fails on an export with "sync" mode.
> Here is an strace on the client (strace ln envs/satwindspy/in
A "backtrace" is an xattr on the RADOS object storing data for a given
file, and it contains the file's (versioned) path from the root. So a
bad backtrace means there's something wrong with that — possibly just
that there's a bug in the version of the code that's checking it,
because they're genera
On Mon, Feb 13, 2023 at 4:16 AM Sake Paulusma wrote:
>
> Hello,
>
> I configured a stretched cluster on two datacenters. It's working fine,
> except this weekend the Raw Capicity exceeded 50% and the error
> POOL_TARGET_SIZE_BYTES_OVERCOMMITED showed up.
>
> The command "ceph df" is showing the
Also, that the current leader (ceph-01) is one of the monitors
proposing an election each time suggests the problem is with getting
commit acks back from one of its followers.
On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster wrote:
>
> Hi Frank,
>
> Check the mon logs with some increased debug lev
On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote:
> Hi Dhairya,
>
> On Mon, 12 Dec 2022, Dhairya Parmar wrote:
>
> > You might want to look at [1] for this, also I found a relevant thread
> [2]
> > that could be helpful.
> >
>
> Thanks a lot. I already found [1,2], too. But I did not considere
Manuel Holtgrewe
> Sent: Thursday, December 8, 2022 12:38 PM
> To: Charles Hedrick
> Cc: Gregory Farnum ; Dhairya Parmar ;
> ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: what happens if a server crashes with cephfs?
>
> Hi Charles,
>
> are you concerned with a singl
t;
>> thanks. I'm evaluating cephfs for a computer science dept. We have users
>> that run week-long AI training jobs. They use standard packages, which they
>> probably don't want to modify. At the moment we use NFS. It uses synchronous
>> I/O, so if somethings goes wr
More generally, as Manuel noted you can (and should!) make use of fsync et
al for data safety. Ceph’s async operations are not any different at the
application layer from how data you send to the hard drive can sit around
in volatile caches until a consistency point like fsync is invoked.
-Greg
On
On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer
wrote:
> I've got a cluster in a precarious state because several nodes have run
> out of memory due to extremely large pg logs on the osds. I came across
> the pglog_hardlimit flag which sounds like the solution to the issue,
> but I'm concerned that
In addition to not having resiliency by default, my recollection is
that BeeGFS also doesn't guarantee metadata durability in the event of
a crash or hardware failure like CephFS does. There's not really a way
for us to catch up to their "in-memory metadata IOPS" with our
"on-disk metadata IOPS". :
On Fri, Oct 28, 2022 at 8:51 AM Laura Flores wrote:
>
> Hi Christian,
>
> There also is https://tracker.ceph.com/versions/656 which seems to be
> > tracking
> > the open issues tagged for this particular point release.
> >
>
> Yes, thank you for providing the link.
>
> If you don't mind me asking
On Fri, Oct 7, 2022 at 7:53 AM Sven Barczyk wrote:
>
> Hello,
>
>
>
> we are encountering a strange behavior on our Ceph. (All Ubuntu 20 / All
> mons Quincy 17.2.4 / Oldest OSD Quincy 17.2.0 )
> Administrative commands like rbd ls or create are so slow, that libvirtd is
> running into timeouts and
On Mon, Oct 17, 2022 at 4:40 AM Enrico Bocchi wrote:
>
> Hi,
>
> I have played with stretch clusters a bit but never managed to
> un-stretch them fully.
>
> From my experience (using Pacific 16.2.9), once the stretch mode is
> enabled, the replicated pools switch to the stretch_rule with size 4,
On Wed, Sep 28, 2022 at 9:15 AM Adam King wrote:
> Budget Discussion
>
>- Going to investigate current resources being used, see if any costs
>can be cut
>- What can be moved from virtual environments to internal ones?
>- Need to take inventory of what resources we currently have
Recovery from OSDs loses the mds and rgw keys they use to authenticate with
cephx. You need to get those set up again by using the auth commands. I
don’t have them handy but it is discussed in the mailing list archives.
-Greg
On Thu, Sep 15, 2022 at 3:28 PM Jorge Garcia wrote:
> Yes, I tried res
gt;
> Is there a way to find out how many osdmaps are currently being kept?
> ____
> From: Gregory Farnum
> Sent: Wednesday, September 7, 2022 10:58 AM
> To: Wyll Ingersoll
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] data usage growing desp
tore-tool to migrate PGs to their proper destinations
(letting the cluster clean up the excess copies if you can afford to —
deleting things is always scary).
But I haven't had to help recover a death-looping cluster in around a
decade, so that's about all the options I can offer up.
-Gre
On Tue, Aug 16, 2022 at 3:14 PM Vladimir Brik
wrote:
>
> Hello
>
> I'd like to understand what is the proper/safe way to
> recover when a cephfs client becomes blocklisted by the MDS.
>
> The man page of mount.ceph talks about recover_session=clean
> option, but it has the following text I am not
On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll
wrote:
>
>
> Our cluster has not had any data written to it externally in several weeks,
> but yet the overall data usage has been growing.
> Is this due to heavy recovery activity? If so, what can be done (if
> anything) to reduce the data generate
We partly rolled our own with AES-GCM. See
https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes
and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format
-Greg
On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote:
>
> Hi,
>
> I have a question about the MSGR protocol Ceph used
On Sun, Aug 28, 2022 at 12:19 PM Vladimir Brik
wrote:
>
> Hello
>
> Is there a way to query or get an approximate value of an
> MDS's cache hit ratio without using "dump loads" command
> (which seems to be a relatively expensive operation) for
> monitoring and such?
Unfortunately, I'm not seeing o
On Mon, Aug 29, 2022 at 12:49 AM Burkhard Linke
wrote:
>
> Hi,
>
>
> some years ago we changed our setup from a IPoIB cluster network to a
> single network setup, which is a similar operation.
>
>
> The OSD use the cluster network for heartbeats and backfilling
> operation; both use standard tcp c
On Fri, Aug 19, 2022 at 7:17 AM Patrick Donnelly wrote:
>
> On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen
> wrote:
> >
> > Hi,
> >
> > I have recently been scanning the files in a PG with "cephfs-data-scan
> > pg_files ...".
>
> Why?
>
> > Although, after a long time the scan was sti
I was wondering if it had something to do with quota enforcement. The other
possibility that occurs to me is if other clients are monitoring the
system, or an admin pane (eg the dashboard) is displaying per-volume or
per-client stats, they may be poking at the mountpoint and interrupting
exclusive
The immediate driver is both a switch to newer versions of python, and to
newer compilers supporting more C++20 features.
More generally, supporting multiple versions of a distribution is a lot of
work and when Reef comes out next year, CentOS9 will be over a year old. We
generally move new stable
ode' they are running because it's
> all in docker containers. But maybe I'm missing something obvious
>
> Thanks
>
>
>
>
> July 27, 2022 4:34 PM, "Gregory Farnum" wrote:
>
> On Wed, Jul 27, 2022 at 10:24 AM wrote:
>
> Currently running Oc
https://tracker.ceph.com/issues/56650
There's a PR in progress to resolve this issue now. (Thanks, Prashant!)
-Greg
On Thu, Jul 28, 2022 at 7:52 AM Nicolas FONTAINE wrote:
>
> Hello,
>
> We have exactly the same problem. Did you find an answer or should we
> open a bug report?
>
> Sincerely,
>
>
On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder wrote:
>
> Hi all,
>
> I'm trying to set a quota on the ceph fs file system root, but it fails with
> "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any
> sub-directory. Is this intentional? The documentation
> (https://docs.cep
On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl wrote:
>
> Hi Ceph Users,
>
>
> I am currently evaluating different cluster layouts and as a test I stopped
> two of my three monitors while client traffic was running on the nodes.?
>
>
> Only when I restartet an OSD all PGs which were related to th
On Wed, Jul 27, 2022 at 10:24 AM wrote:
> Currently running Octopus 15.2.16, trying to upgrade to Pacific using
> cephadm.
>
> 3 mon nodes running 15.2.16
> 2 mgr nodes running 16.2.9
> 15 OSD's running 15.2.16
>
> The mon/mgr nodes are running in lxc containers on Ubuntu running docker
> from th
t us know.
-Greg
>
> On Tue, Jul 26, 2022 at 3:16 PM Gregory Farnum wrote:
> >
> > We can’t do the final release until the recent mgr/volumes security
> fixes get merged in, though.
> > https://github.com/ceph/ceph/pull/47236
> >
> > On Tue, Jul 26, 202
We can’t do the final release until the recent mgr/volumes security fixes
get merged in, though.
https://github.com/ceph/ceph/pull/47236
On Tue, Jul 26, 2022 at 3:12 PM Ramana Krisna Venkatesh Raja <
rr...@redhat.com> wrote:
> On Thu, Jul 21, 2022 at 10:28 AM Yuri Weinstein
> wrote:
> >
> > Deta
It looks like you’re setting environment variables that force your new
keyring, it you aren’t telling the library to use your new CephX user. So
it opens your new keyring and looks for the default (client.admin) user and
doesn’t get anything.
-Greg
On Tue, Jul 26, 2022 at 7:54 AM Adam Carrgilson
roken more or less all the way around.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> >
> > From: Gregory Farnum
> > Sent: 11 July
On Mon, Jul 11, 2022 at 8:26 AM Frank Schilder wrote:
>
> Hi all,
>
> we made a very weird observation on our ceph test cluster today. A simple
> getfattr with a misspelled attribute name sends the MDS cluster into a
> crash+restart loop. Something as simple as
>
> getfattr -n ceph.dir.layout.
On Thu, Jun 23, 2022 at 8:18 AM Wyll Ingersoll
wrote:
>
> Is it possible to craft a cephfs client authorization key that will allow the
> client read/write access to a path within the FS, but NOT allow the client to
> modify the permissions of that path?
> For example, allow RW access to /cephfs
On Tue, Jun 14, 2022 at 8:50 AM Michael Sherman wrote:
>
> Hi,
>
> We discovered that a number of files were deleted from our cephfs filesystem,
> and haven’t been able to find current backups or snapshots.
>
> Is it possible to “undelete” a file by modifying metadata? Using
> `cephfs-journal-to
On Wed, Jun 8, 2022 at 12:36 AM Andreas Teuchert
wrote:
>
>
> Hello,
>
> we're currently evaluating cephfs-mirror.
>
> We have two data centers with one Ceph cluster in each DC. For now, the
> Ceph clusters are only used for CephFS. On each cluster we have one FS
> that contains a directory for cu
We aren't building for Centos 9 yet, so I guess the python dependency
declarations don't work with the versions in that release.
I've put updating to 9 on the agenda for the next CLT.
(Do note that we don't test upstream packages against RHEL, so if
Centos Stream does something which doesn't match
I'm not quite clear where the confusion is coming from here, but there
are some misunderstandings. Let me go over it a bit:
On Tue, May 10, 2022 at 1:29 AM Frank Schilder wrote:
>
> > What you are missing from stretch mode is that your CRUSH rule wouldn't
> > guarantee at least one copy in surviv
On Tue, May 10, 2022 at 2:47 PM Horvath, Dustin Marshall
wrote:
>
> Hi there, newcomer here.
>
> I've been trying to figure out if it's possible to repair or recover cephfs
> after some unfortunate issues a couple of months ago; these couple of nodes
> have been offline most of the time since th
Do you have any locking which guarantees that nodes don't copy files
which are still in the process of being written?
CephFS will guarantee any readers see the results of writes which are
already reported complete while reading, but I don't see any
guarantees about atomicity in
https://docs.microso
On Fri, May 6, 2022 at 5:58 AM Harry G. Coin wrote:
>
> I tried searching for the meaning of a ceph Quincy all caps WARNING
> message, and failed. So I need help. Ceph tells me my cluster is
> 'healthy', yet emits a bunch of 'progress WARNING root] comlete ev' ...
> messages. Which I score rig
On Fri, May 6, 2022 at 3:21 AM Eneko Lacunza wrote:
> Hi,
>
> Just made some basic tests, feature works nicely as far as I have tested :)
>
> I created 2 aditional pools each with a matching stretch rule:
> - size=2/min=1 (not advised I know)
> - size=6/min=3 (some kind of paranoid config)
>
> Wh
On Wed, May 4, 2022 at 1:25 AM Eneko Lacunza wrote:
> Hi Gregory,
>
> El 3/5/22 a las 22:30, Gregory Farnum escribió:
>
> On Mon, Apr 25, 2022 at 12:57 AM Eneko Lacunza
> wrote:
>
> We're looking to deploy a stretch cluster for a 2-CPD deployment. I have
>
Okay, so you started out with 2 active MDSes and then they failed on a restart?
And in an effort to fix it you changed max_mds to 3? (That was a bad
idea, but *probably* didn't actually hurt anything this time — adding
new work to scale out a system which already can't turn on just
overloads it mor
On Mon, Apr 25, 2022 at 12:57 AM Eneko Lacunza wrote:
>
> Hi all,
>
> We're looking to deploy a stretch cluster for a 2-CPD deployment. I have
> read the following docs:
> https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#stretch-clusters
>
> I have some questions:
>
> - Can we have m
On Wed, Apr 13, 2022 at 10:01 AM Dan van der Ster wrote:
>
> I would set the pg_num, not pgp_num. In older versions of ceph you could
> manipulate these things separately, but in pacific I'm not confident about
> what setting pgp_num directly will do in this exact scenario.
>
> To understand, the
The backtraces are written out asynchronously by the MDS to those
objects, so there can be a delay between file creation and when they
appear. In fact I think backtraces only get written when the inode in
question is falling out of the MDS journal, so if you have a
relatively small number of flies
On Fri, Feb 11, 2022 at 10:53 AM Izzy Kulbe wrote:
> Hi,
>
> If the MDS host has enough spare memory, setting
> > `mds_cache_memory_limit`[*] to 9GB (or more if it permits) would get
> > rid of this warning. Could you check if that improves the situation?
> > Normally, the MDS starts trimming its
“Up” is the set of OSDs which are alive from the calculated crush mapping.
“Acting” includes those extras which have been added in to bring the PG up
to proper size. So the PG does have 3 live OSDs serving it.
But perhaps the safety check *is* looking at up instead of acting? That
seems like a pla
I don’t know how to get better errors out of cephadm, but the only way I
can think of for this to happen is if your crush rule is somehow placing
multiple replicas of a pg on a single host that cephadm wants to upgrade.
So check your rules, your pool sizes, and osd tree?
-Greg
On Thu, Feb 10, 2022
On Tue, Feb 8, 2022 at 7:30 AM Dan van der Ster wrote:
>
> On Tue, Feb 8, 2022 at 1:04 PM Frank Schilder wrote:
> > The reason for this seemingly strange behaviour was an old static snapshot
> > taken in an entirely different directory. Apparently, ceph fs snapshots are
> > not local to an FS d
There's a lot going on here. Some things I noticed you should be aware
of in relation to the tests you performed:
* Ceph may not have the performance ceiling you're looking for. A
write IO takes about half a millisecond of CPU time, which used to be
very fast and is now pretty slow compared to an
I would not recommend this on Ceph. There was a project where somebody
tried to make RADOS amenable to spinning down drives, but I don't
think it ever amounted to anything.
The issue is just that the OSDs need to do disk writes whenever they
get new OSDMaps, there's a lot of random stuff that upda
On Tue, Jan 11, 2022 at 5:29 AM Dan van der Ster wrote:
>
> Hi,
>
> Yes it's confusing -- the release notes are normally only published in
> master, which is shown as "latest", and are rarely backported to a
> release branch.
> The notes you're looking for are here:
> https://docs.ceph.com/en/late
On Mon, Dec 27, 2021 at 9:12 AM gyfelectric wrote:
>
> Hi all,
>
> Recently, the problem of OSD disorder has often appeared in my
> environment(14.2.5) and my Fuse Client borken
> due to "FAILED assert(ob->last_commit_tid < tid)”. My application can’t
> work normally now.
>
> The time series that
Hmm that ticket came from the slightly unusual scenario where you were
deploying a *new* Pacific monitor against an Octopus cluster.
Michael, is your cluster deployed with cephadm? And is this a new or
previously-existing monitor?
On Wed, Dec 15, 2021 at 12:09 AM Michael Uleysky wrote:
>
> Thank
I generated a quick doc PR so this doesn't trip over other users:
https://github.com/ceph/ceph/pull/44310. Thanks all!
-Greg
On Mon, Dec 13, 2021 at 10:59 AM John Petrini wrote:
>
> "As of August 2021, new container images are pushed to quay.io
> registry only. Docker hub won't receive new conten
On Mon, Dec 13, 2021 at 7:02 AM Benoit Knecht wrote:
>
> Hi,
>
> As we're getting closer to CentOS 8 EOL, I'm sure plenty of Ceph users are
> looking to migrate from CentOS 8 to CentOS Stream 8 or one of the new RHEL
> derivatives, e.g. Rocky and Alma.
>
> The question of upstream support has alre
and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Gregory Farnum
> Sent: 13 December 2021 17:39:55
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] M
This looks awkward — just from the ops, it seems mds.1 is trying to
move some stray items (presumably snapshots of since-deleted files,
from what you said?) into mds0's stray directory, and then mds.0 tries
to get auth pins from mds.1 but that fails for some reason which isn't
apparent from the dum
; "statfs_ops": [],
> "command_ops": []
> }
>
> Any suggestions would be much appreciated.
>
> Kind regards,
>
> András
>
>
> On Thu, Dec 9, 2021 at 7:48 PM Andras Sali wrote:
>>
>> Hi Greg,
>>
>> Much appreciated for
Andras,
Unfortunately your attachment didn't come through the list. (It might
work if you embed it inline? Not sure.) I don't know if anybody's
looked too hard at this before, and without the image I don't know
exactly what metric you're using to say something's 320KB in size. Can
you explain more
with kernel >= 5.8. I also have kernel 5.3 on one of the client clusters and
> nowsync there is not supported, however all rm operations happen reasonably
> fast. So the second question is, does 5.3's libceph behave differently on
> recursing rm compared to 5.4 or 5.8?
>
On Sat, Nov 13, 2021 at 5:25 PM Sasha Litvak
wrote:
>
> I continued looking into the issue and have no idea what hinders the
> performance yet. However:
>
> 1. A client operating with kernel 5.3.0-42 (ubuntu 18.04) has no such
> problems. I delete a directory with hashed subdirs (00 - ff) and tot
1 - 100 of 3924 matches
Mail list logo