Yeah, now that you mention it, I recall figuring that out also at some point. I
think I did it originally when I was debugging the problem without the
container.
From: Eugen Block
Sent: Friday, May 3, 2024 8:37 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Thank you!
From: Eugen Block
Sent: Friday, May 3, 2024 6:46 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] cephadm custom crush location hooks
I found your (open) tracker issue:
https://tracker.ceph.com/issues/53562
Your workaround
I've found the crush location hook script code to be problematic in the
containerized/cephadm world.
Our workaround is to place the script in a common place on each OSD node, such
as /etc/crush/crushhook.sh, and then make a link from /rootfs -> /, and set the
configuration value so that the
We have a storage node that is failing, but the disks themselves are not. What
is the recommended procedure for replacing the host itself without destroying
the OSDs or losing data?
This cluster is running ceph 16.2.11 using ceph orchestrator with docker
containers on Ubuntu 20.04 (focal).
Ingersoll
Subject: Re: [ceph-users] ceph-mgr ssh connections left open
On Tuesday, July 18, 2023 10:56:12 AM EDT Wyll Ingersoll wrote:
> Every night at midnight, our ceph-mgr daemons open up ssh connections to the
> other nodes and then leaves them open. Eventually they become zombies. I
> cann
Every night at midnight, our ceph-mgr daemons open up ssh connections to the
other nodes and then leaves them open. Eventually they become zombies.
I cannot figure out what module is causing this or how to turn it off. If left
unchecked over days/weeks, the zombie ssh connections just keep
I have a similar issue with how the dashboard tries to access an SSL protected
RGW service. It doesn't use the correct name and doesn't allow for any way to
override the RGW name that the dashboard uses.
https://tracker.ceph.com/issues/59111
Bug #59111: dashboard should use rgw_dns_name when
d the answer, though.
From: Eugen Block
Sent: Thursday, March 16, 2023 10:30 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Ceph NFS data - cannot read files, getattr
returns NFS4ERR_PERM
You found the right keywords yourself (
, 2023 10:04 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Ceph NFS data - cannot read files, getattr
returns NFS4ERR_PERM
It sounds a bit like this [1], doesn't it? Setting the application
metadata is just:
ceph osd pool application set cephfs
cephfs
[1] https
e pools? I don't see a subcommand in
the rados utility, I'm hoping I dont have to write something to do it myself.
THough I do see there are APIs for updating in in librados, so I could write a
short C utility to make the change if necessary.
thanks!
____
From: Euge
d test again?
Regards,
Eugen
[1] https://documentation.suse.com/ses/7.1/html/ses-all/cha-ceph-cephfs.html
Zitat von Wyll Ingersoll :
> ceph pacific 16.2.11 (cephadm managed)
>
> I have configured some NFS mounts from the ceph GUI from cephfs. We
> can mount the filesystems and vie
ceph pacific 16.2.11 (cephadm managed)
I have configured some NFS mounts from the ceph GUI from cephfs. We can mount
the filesystems and view file/directory listings, but cannot read any file data.
The permissions on the shares are RW. We mount from the client using
"vers=4.1".
Looking at
I have an ochestrated (cephadm) ceph cluster (16.2.11) with 2 radosgw services
on 2 separate hosts without HA (i.e. no ingress/haproxy in front). Both of the
rgw servers use SSL and have a properly signed certificate. We can access them
with standard S3 tools like s3cmd, cyberduck, etc.
The
evices
size). Is it possible to resize the DB devices without destroying and
recreating the OSD itself?
What are the implications of having bluestore DB devices that are far smaller
than they should be?
thanks,
Wyllys Ingersoll
________
From: Wyll Ingersoll
Sent: F
Ceph Pacific 16.2.9
We have a storage server with multiple 1.7TB SSDs dedicated to the bluestore DB
usage. The osd spec originally was misconfigured slightly and had set the
"limit" parameter on the db_devices to 5 (there are 8 SSDs available) and did
not specify a block_db_size. ceph
.
Hi,
can you share the output of
storage01:~ # ceph orch ls osd
Thanks,
Eugen
Zitat von Wyll Ingersoll :
> When adding a new OSD to a ceph orchestrated system (16.2.9) on a
> storage node that has a specification profile that dictates which
> devices to use as the db_devices (SSDs),
When adding a new OSD to a ceph orchestrated system (16.2.9) on a storage node
that has a specification profile that dictates which devices to use as the
db_devices (SSDs), the newly added OSDs seem to be ignoring the db_devices
(there are several available) and putting the data and db/wal on
Running ceph-pacific 16.2.9 using ceph orchestrator.
We made a mistake adding a disk to the cluster and immediately issued a command
to remove it using "ceph orch osd rm ### --replace --force".
This OSD had no data on it at the time and was removed after just a few
minutes. "ceph orch osd rm
Running ceph-pacific 16.2.9 using ceph orchestrator.
We made a mistake adding a disk to the cluster and immediately issued a command
to remove it using "ceph orch osd rm ### --replace --force".
This OSD no data on it at the time and was removed after just a few minutes.
"ceph orch osd rm
But why is OMAP data usage growing at a rate 10x the amount of the actual data
being written to RGW?
From: Robert Sander
Sent: Monday, December 5, 2022 3:06 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: OMAP data growth
Am 02.12.22 um 21:09 schrieb Wyll
We have a large cluster (10PB) which is about 30% full at this point. We
recently fixed a configuration issue that then triggered the pg autoscaler to
start moving around massive amounts of data (85% misplaced objects - about 7.5B
objects). The misplaced % is dropping slowly (about 10% each
Sent: Friday, October 28, 2022 2:25 PM
To: Lee Carney
Cc: Wyll Ingersoll ; ceph-users@ceph.io
Subject: [ceph-users] Re: cephadm node-exporter extra_container_args for
textfile_collector
We had actually considered adding an `extra_daemon_args` to be the
equivalent to `extra_containe
I ran into the same issue - wanted to add the textfile.directory to the
node_exporter using "extra_container_args" - and it failed just as you
describe. It appears that those args get applied to the container command
(podman or docker) and not to the actual service in the container. Not sure
No - the recommendation is just to mount /cephfs using the kernel module and
then share it via standard VFS module from Samba. Pretty simple.
From: Christophe BAILLON
Sent: Thursday, October 27, 2022 4:08 PM
To: Wyll Ingersoll
Cc: Eugen Block ; ceph-users
I don't think there is anything particularly special about exposing /cephfs (or
subdirs thereof) over SMB with SAMBA. We've done it for years over various
releases of both Ceph and Samba.
Basically, you create a NAS server host that mounts /cephfs and run Samba on
that host. You share
Looking at the device health info for the OSDs in our cluster sometimes shows
"No SMART data available". This appears to only occur for SCSI type disks in
our cluster. ATA disks have their full health SMART data displayed, but the
non-ATA do not.
The actual SMART data (JSON formatted) is
What network does radosgw use when it reads/writes the objects to the cluster?
We have a high-speed cluster_network and want the radosgw to write data over
that instead of the slower public_network if possible, is it configurable?
thanks!
Wyllys Ingersoll
This looks very useful. Has anyone created a grafana dashboard that will
display the collected data ?
From: Konstantin Shalygin
Sent: Friday, October 14, 2022 12:12 PM
To: John Petrini
Cc: Marc ; Paul Mezzanini ; ceph-users
Subject: [ceph-users] Re:
Yes, we restarted the primary mon and mgr services. Still no luck.
From: Dhairya Parmar
Sent: Monday, September 26, 2022 3:44 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] osds not bootstrapping: monclient: wait_auth_rotating
timed
Ceph Pacific (16.2.9) on a large cluster. Approximately 60 (out of 700) osds
fail to start and show an error:
monclient: wait_auth_rotating timed out after 300
We modified the "rotating_keys_bootstrap_timeout" from 30 to 300, but they
still fail. All nodes are time-synced with NTP and the
Understood, that was a typo on my part.
Definitely dont cancel-backfill after generating the moves from
placementoptimizer.
From: Josh Baergen
Sent: Friday, September 23, 2022 11:31 AM
To: Wyll Ingersoll
Cc: Eugen Block ; ceph-users@ceph.io
Subject: Re: [ceph
When doing manual remapping/rebalancing with tools like pgremapper and
placementoptimizer, what are the recommended settings for norebalance,
norecover, nobackfill?
Should the balancer module be disabled if we are manually issuing the pg remap
commands generated by those scripts so it doesn't
tions now to see if that helps.
From: Stefan Kooman
Sent: Wednesday, September 7, 2022 11:34 AM
To: Wyll Ingersoll ; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: data usage growing despite data being written
On 9/7/22 16:38, Wyll Ingersoll wrote:
>
ugh to use the "placementoptimizer" utillity, but
the epochs are changing too fast and it won't work right now.
From: Stefan Kooman
Sent: Wednesday, September 7, 2022 11:34 AM
To: Wyll Ingersoll ; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-u
being kept?
From: Gregory Farnum
Sent: Wednesday, September 7, 2022 10:58 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] data usage growing despite data being written
On Wed, Sep 7, 2022 at 7:38 AM Wyll Ingersoll
wrote:
>
> I'm s
again and stop filling up with OSDMaps and other
internal ceph data?
thanks!
From: Gregory Farnum
Sent: Wednesday, September 7, 2022 10:01 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] data usage growing despite data being written
Our cluster has not had any data written to it externally in several weeks, but
yet the overall data usage has been growing.
Is this due to heavy recovery activity? If so, what can be done (if anything)
to reduce the data generated during recovery.
We've been trying to move PGs away from
We are in the middle of a massive recovery event and our monitor DBs keep
exploding to the point that they fill their disk partition (800GB disk). We
cannot compact it because there is no room on the device for compaction to
happen. We cannot add another disk at this time either. We
can bring it back online? This is a bluestore OSD.
I don't understand how this overfilling issue is not already a bug that is
getting attention, it seems very broken that an OSD can blow way past its
full_ratio.
From: Wyll Ingersoll
Sent: Monday, August 29
Thanks, we may resort to that if we can't make progress in rebalancing things.
From: Dave Schulz
Sent: Tuesday, August 30, 2022 11:18 AM
To: Wyll Ingersoll ; Josh Baergen
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSDs growing beyond full ratio
Hi
OSDs are bluestore on HDD with SSD for DB/WAL. We already tuned the sleep_hdd
to 0 and cranked up the max_backfills and recovery parameters to much higher
values.
From: Josh Baergen
Sent: Tuesday, August 30, 2022 9:46 AM
To: Wyll Ingersoll
Cc: Dave Schulz
nds to relocate PGs in to attempt to
balance things better and get it moving again, but progress is glacially slow.
From: Dave Schulz
Sent: Monday, August 29, 2022 10:42 PM
To: Wyll Ingersoll ; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSDs growing bey
00% utilization. They are reweighted to almost 0, but yet
continue to grow.
Why is this happening? I thought the cluster would stop writing to the osd
when it was at above the full ratio."
thanks...
____
From: Wyll Ingersoll
Sent: Monday, August 29, 2022 9:24 AM
To: Jarett
Thank You!
I will see about trying these out, probably using your suggestion of several
iterations with #1 and then #3.
From: Stefan Kooman
Sent: Monday, August 29, 2022 1:38 AM
To: Wyll Ingersoll ; ceph-users@ceph.io
Subject: Re: [ceph-users] OSDs growing
28, 2022 8:19 PM
To: Wyll Ingersoll ; ceph-users@ceph.io
Subject: RE: [ceph-users] OSDs growing beyond full ratio
Isn’t rebalancing onto the empty OSDs default behavior?
From: Wyll Ingersoll<mailto:wyllys.ingers...@keepertech.com>
Sent: Sunday, August 28, 2022 10:31 AM
To: ceph-users@c
We have a pacific cluster that is overly filled and is having major trouble
recovering. We are desperate for help in improving recovery speed. We have
modified all of the various recovery throttling parameters.
The full_ratio is 0.95 but we have several osds that continue to grow and are
This was seen today in Pacific 16.2.9.
From: Stefan Kooman
Sent: Thursday, August 25, 2022 3:17 PM
To: Eugen Block ; Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: backfillfull osd - but it is only at 68% capacity
On 8/25/22 20:56
: Thursday, August 25, 2022 2:56 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] backfillfull osd - but it is only at 68% capacity
Hi,
I’ve seen this many times in older clusters, mostly Nautilus (can’t
say much about Octopus or later). Apparently the root cause hasn’t
been
My cluster (ceph pacific) is complaining about one of the OSD being
backfillfull:
[WRN] OSD_BACKFILLFULL: 1 backfillfull osd(s)
osd.31 is backfill full
backfillfull ratios:
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
ceph osd df shows:
31hdd 5.55899 1.0
We have a large Pacific cluster (680 osd, ~9.6PB ) - primarily it is used as
an RGW object store. The default.rgw.meta pool is reporting strange numbers:
default.rgw.meta 4 32 16EiB 64 11MiB 100 0
Why would the "Stored" value show 16EiB (which is the maximum possible for
ceph)? These
We did this but oddly enough it is showing the movement of PGS away from the
new, underutilized OSDs instead of TO them as we would expect.
From: Wesley Dillingham
Sent: Tuesday, August 23, 2022 2:13 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re
Thank you - we have increased backfill settings, but can you elaborate on
"injecting upmaps" ?
From: Wesley Dillingham
Sent: Tuesday, August 23, 2022 1:44 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Full cluster, new OSDS
ery slowly) growing so recovery is
happening but very very slowly.
From: Wesley Dillingham
Sent: Tuesday, August 23, 2022 1:18 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Full cluster, new OSDS not being used
Can you please send
We have a large cluster with a many osds that are at their nearfull or full
ratio limit and are thus having problems rebalancing.
We added 2 more storage nodes, each with 20 additional drives to give the
cluster room to rebalance. However, for the past few days, the new OSDs are
NOT being
[ceph pacific 16.2.9]
When creating a NFS export using "ceph nfs export apply ... -i export.json" for
a subdirectory of /cephfs, does the subdir that you wish to export need to be
pre-created or will ceph (or ganesha) create it for you?
I'm trying to create an "/shared" directory in a cephfs
[ceph pacific 16.2.9]
I have a crush_location_hook script which is a small python3 script that
figures out the correct root/chassis/host location for a particular OSD. Our
map has 2 roots, one for an all-SSD, and another for HDDs, thus the need for
the location hook. Without it, the SSD
Running Ceph Pacific 16.2.7
We have a very large cluster with 3 monitors. One of the monitor DBs is > 2x
the size of the other 2 and is growing constantly (store.db fills up) and
eventually fills up the /var partition on that server. The monitor in question
is not the leader. The cluster
Thanks for the explanation, that's what I suspected but needed the confirmation.
From: Gregory Farnum
Sent: Thursday, June 23, 2022 11:22 AM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] cephfs client permission restrictions?
On Thu, Jun
Is it possible to craft a cephfs client authorization key that will allow the
client read/write access to a path within the FS, but NOT allow the client to
modify the permissions of that path?
For example, allow RW access to /cephfs/foo (path=/foo) but prevent the client
from modifying
Running "object rewrite" on a couple of the objects in the bucket seems to have
triggered the sync and now things appear ok.
From: Szabo, Istvan (Agoda)
Sent: Thursday, June 9, 2022 3:24 PM
To: Wyll Ingersoll
Cc: ceph-users@ceph.io ; d...@ceph.io
S
y has some
objects in it, what command should be used to force a sync operation based on
the new policy? It seems that only objects added AFTER the policy is applied
get replicated, pre-existing ones are not replicated.
____
From: Wyll Ingersoll
Sent: Thursday, June
688Z",
"info": {
"source_zone": "a6ed5947-0ceb-407b-812f-347fab2ef62d",
"error_code": 125,
"message": "failed to sync bucket instance: (125) Operation
canceled"
Seeking help from a radosgw expert...
I have a 3-zone multisite configuration (all running pacific 16.2.9) with 1
bucket per zone and a couple of small objects in each bucket for testing
purposes.
One of the secondary zones cannot get seem to get into sync with the master,
sync status
the other zones. Is
this normal behavior?
From: Wyll Ingersoll
Sent: Wednesday, June 1, 2022 11:57 AM
To: d...@ceph.io
Subject: radosgw multisite sync /admin/log requests overloading system.
I have a simple multisite radosgw configuration setup for testing
I have a simple multisite radosgw configuration setup for testing. There is 1
realm, 1 zonegroup, and 2 separate clusters each with its own zone. There is 1
bucket with 1 object in it and no updates currently happening. There is no
group sync policy currently defined.
The problem I see is
Problem solved - 2 of the pools (zone-2.rgw.meta and zone-2.rgw.log) did not
have the "rgw" application enabled. Once that was fixed, it started working.
____
From: Wyll Ingersoll
Sent: Tuesday, May 31, 2022 3:51 PM
To: ceph-users@ceph.io
Subject: [
I'm having trouble adding a secondary zone RGW using cephadm, running with ceph
16.2.9.
The master realm, zonegroup, and zone are already configured and working on
another cluster.
This is a new cluster configured with cephadm, everything is up and running but
when I try to add an RGW and
67 matches
Mail list logo