[ceph-users] Upgrading RGW before cluster?

2024-08-13 Thread Thomas Byrne - STFC UKRI
Hi all, The Ceph documentation has always recommended upgrading RGWs last when doing a upgrade. Is there a reason for this? As they're mostly just RADOS clients you could imagine the order doesn't matter as long as the cluster and RGW major versions are compatible. Our basic testing has shown n

[ceph-users] Re: reef 18.2.3 QE validation status

2024-08-01 Thread thomas
se an already released tarball. Cheers, Thomas Goirand (zigo) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Debian package for 18.2.4 broken

2024-08-01 Thread Thomas Lamprecht
st" ceph reef repo: https://pve.proxmox.com/wiki/Package_Repositories#_ceph_reef_test_repository See here for the release key used to sign those packages: https://pve.proxmox.com/wiki/Package_Repositories#repos_secure_apt regards Thomas ___ ceph-us

[ceph-users] Re: OSD processes crashes on repair 'unexpected clone'

2024-05-28 Thread Thomas Björklund
s(unsigned int, ceph::heartbeat_handle_d*)+0xf13) [0x55c8141b93c3] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x55c814802e6a] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c814805410] 18: (()+0x7ea7) [0x7f4d509c0ea7] 19: (clone()+0x3f) [0x7f4d50549a6f] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. We are unsure on how to rectify the issue and get the cluster healthy again. Best regards, Thomas Björklund ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] OSD processes crashes on repair 'unexpected clone'

2024-05-28 Thread Thomas Björklund
rdedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x55c814802e6a] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55c814805410] 18: (()+0x7ea7) [0x7f4d509c0ea7] 19: (clone()+0x3f) [0x7f4d50549a6f] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.* We are unsure on how to rectify the issue and get the cluster healthy again. Best regards, Thomas Björklund ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW/Lua script does not show logs

2024-04-11 Thread Thomas Bennett
Hi Lee, RGWDebugLog logs at the debug level. Do you have the correct logging levels on your rados gateways? Should be 20. Cheers, Tom On Mon, 8 Apr 2024 at 23:31, wrote: > Hello, I wrote a Lua script in order to retrieve RGW logs such as bucket > name, bucket owner, etc. > However, when I appl

[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-05 Thread Thomas Schneider
Hey there, thanks. We had/have the same problem and I was debugging it for a while now as well. We will test if the upgrade solves the problem on our side as well. Thanks again for the thread. Kind regards Thomas On 4/4/24 7:52 PM, Lorenz Bausch wrote: Thank you again Casey for putting us

[ceph-users] MDS crashing repeatedly

2023-12-13 Thread Thomas Widhalm
and update the hardware. But still - I need my data back. Any ideas? Cheers, Thomas -- http://www.widhalm.or.at GnuPG : 6265BAE6 , A84CB603 Threema: H7AV7D33 Telegram, Signal: widha...@widhalm.or.at OpenPGP_signature.asc Description: OpenPGP digital signature _

[ceph-users] Deleting files from lost+found in 18.2.0

2023-12-11 Thread Thomas Widhalm
; but only when acting on files in lost+found. I searched the documentation but I can't find anything related. Is there a special trick or a flag I have to set? Cheers, Thomas -- http://www.widhalm.or.at GnuPG : 6265BAE6 , A84CB603 Threema: H7AV7D33 Telegram, Signal: widha...@widhalm.or.at O

[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2023-11-01 Thread Thomas Bennett
updates - https://docs.ceph.com/en/quincy/radosgw/bucketpolicy/ to indicate that usfolks in the example is the tenant name? On Wed, 1 Nov 2023 at 18:27, Thomas Bennett wrote: > Hi, > > I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti > tenants, for example: &g

[ceph-users] Setting S3 bucket policies with multi-tenants

2023-11-01 Thread Thomas Bennett
Hi, I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti tenants, for example: - Tenant1$manager - Tenant1$readwrite I would like to set a policy on a bucket (backups for example) owned by *Tenant1$manager* to allow *Tenant1$readwrite* access to that bucket. I can't find any

[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Thanks for all the responses, much appreciated. Upping the chunk size fixes my problem in the short term but I upgrade to 17.2.6 :) Kind regards, Tom On Tue, 3 Oct 2023 at 15:28, Matt Benjamin wrote: > Hi Thomas, > > If I'm not mistaken, the RGW will paginate ListBuckets es

[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi, > > You should increase these default settings: > > rgw_list_buckets_max_chunk // for buckets > rgw_max_listing_results // for objects > > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett wrote: > >> Hi, >> >> I'm running a Ceph 17.2.5 Rados Gateway

[ceph-users] S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi, I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than 1000 buckets. When the client tries to list all their buckets using s3cmd, rclone and python boto3, they all three only ever return the first 1000 bucket names. I can confirm the buckets are all there (and more than 1000

[ceph-users] Dashboard daemon logging not working

2023-09-27 Thread Thomas Bennett
Hey, Has anyone else had issues with exploring Loki after deploying ceph monitoring services ? I'm running 17.2.6. When clicking on the Ceph dashboard daemon logs (i.e Cluster -> Logs -> Daemon Logs), it took me through to an embedded

[ceph-users] Re: ref v18.2.0 QE Validation status

2023-08-03 Thread Thomas Lamprecht
te code didn't change since quincy FWICT). - Thomas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Delete or move files from lost+found in cephfs

2023-07-04 Thread Thomas Widhalm
Hi, Thank you very much! That's exactly what I was looking for. I'm in no hurry as long as it will be able to remove the data eventually. Cheers, Thomas On 04.07.23 12:23, Dhairya Parmar wrote: Hi, These symptoms look relevant to [0] and its PR is already merged in main; bac

[ceph-users] Delete or move files from lost+found in cephfs

2023-07-03 Thread Thomas Widhalm
re mostly useless copies. Cheers, Thomas -- http://www.widhalm.or.at GnuPG : 6265BAE6 , A84CB603 Threema: H7AV7D33 Telegram, Signal: widha...@widhalm.or.at OpenPGP_signature Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-use

[ceph-users] Re: Orchestration seems not to work

2023-06-07 Thread Thomas Widhalm
help you got me! Cheers, Thomas On 07.06.23 15:32, Thomas Widhalm wrote: I found something else, that might help with identifying the problem. When I look into which containers are used I see the following: global: quay.io/ceph/ceph@sha256

[ceph-users] Re: Orchestration seems not to work

2023-06-07 Thread Thomas Widhalm
tried changing deployment rules to default (host, no hosts listed, no count set) and shut down the cluster yet again. Still I have 2 months old data in "ceph orch ps". Same in the Dashboard. Any other ideas I could check for? On 25.05.23 15:04, Thomas Widhalm wrote: What caught my e

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
all others without problems. By name and IP address. (Even to the two new ones that couldn't connect themselves) On 25.05.23 20:33, Thomas Widhalm wrote: Hi, So sorry I didn't see your reply. Had some tough weeks (father in law died and that gave us some turmoil) I just came back to d

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
failed %s' % (address,)) OSError: [Errno 113] Connect call failed ('192.168.122.201', 22) By trying this for connecting to each host in the cluster, given how close it is to how cephadm is operating, it should help verify with relative certainty if this is connection related or not.

[ceph-users] Re: Orchestration seems not to work

2023-05-25 Thread Thomas Widhalm
rated by the orchestrator so curious what the last actions it took was (and how long ago). On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm <mailto:widha...@widhalm.or.at>> wrote: To completely rule out hung processes, I managed to get another short shutdown. Now I'm see

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-17 Thread Thomas Bennett
ation-hooks Cheers, Tom On Wed, 17 May 2023 at 14:40, Thomas Bennett wrote: > Hey, > > A question slightly related to this: > > > I would suggest that you add all new hosts and make the OSDs start >> > with a super-low initial weight (0.0001 or so), which means they

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-17 Thread Thomas Bennett
Hey, A question slightly related to this: > I would suggest that you add all new hosts and make the OSDs start > > with a super-low initial weight (0.0001 or so), which means they will > > be in and up, but not receive any PGs. Is it possible to have the correct weight set and use ceph osd set

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm
100% of other cases of this type of thing happening I've looked at before have had those processes sitting around. On Mon, May 15, 2023 at 3:10 PM Thomas Widhalm <mailto:widha...@widhalm.or.at>> wrote: This is why I even tried a full cluster shutdown. All Hosts were ou

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm
rocesses. I would still expect that's the most likely thing you're experiencing here. I haven't seen any other causes for cephadm to not refresh unless the module crashed, but that would be explicitly stated in the cluster health. On Mon, May 15, 2023 at 11:44 AM Thomas Widhalm wrot

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm
nly is manually. I added two more hosts, tagged them. But there isn't a single daemon started there. Could you help me again with how to debug orchestration not working? On 04.05.23 15:12, Thomas Widhalm wrote: Thanks. I set the log level to debug, try a few steps and then come back. On

[ceph-users] Re: Lua scripting in the rados gateway

2023-05-09 Thread Thomas Bennett
nning: cephadm shell radosgw-admin script put --infile=/rootfs/tmp/preRequest.lua --context=preRequest This injects the lua script into the pre request context. Cheers, Tom On Fri, 28 Apr 2023 at 15:19, Thomas Bennett wrote: > Hey Yuval, > > No problem. It was interesting to me to fig

[ceph-users] osd pause

2023-05-05 Thread Thomas Bennett
Hi, FYI - This might be pedantic, but there does not seem to be any difference between using these two sets of commands: - ceph osd pause / ceph osd unpause - ceph osd set pause / ceph osd unset pause I can see that they both set/unset the pauserd,pausewr flags, but since they don't report

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
On 04.05.23 16:55, Adam King wrote: what does specifically `ceph log last 200 debug cephadm` spit out? The log lines you've posted so far I don't think are generated by the orchestrator so curious what the last actions it took was (and how long ago). On Thu, May 4, 2023 at 10:35 AM

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
he mgr actually trying something? Zitat von Thomas Widhalm : Hi, I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the following problem existed when I was still everywhere on 17.2.5 . I had a major issue in my cluster which could be solved with a lot of your help and

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
d it be there's something missing on the new nodes that are now used as mgr/mon? Cheers, Thomas On 04.05.23 14:48, Eugen Block wrote: Hi, try setting debug logs for the mgr: ceph config set mgr mgr/cephadm/log_level debug This should provide more details what the mgr is trying and where

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
fully. Last week this helped to identify an issue between a lower pacific issue for me. Do you see anything in the cephadm.log pointing to the mgr actually trying something? Zitat von Thomas Widhalm : Hi, I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the followi

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
hosts that aren't having their info refreshed and check for hanging "cephadm" commands (I just check for "ps aux | grep cephadm"). On Thu, May 4, 2023 at 8:38 AM Thomas Widhalm <mailto:widha...@widhalm.or.at>> wrote: Hi, I'm in the process of

[ceph-users] Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm
nance window. Didn't change anything. Could you help me? To be honest I'm still rather new to Ceph and since I didn't find anything in the logs that caught my eye I would be thankful for hints how to debug. Cheers, Thomas -- http://www.widhalm.or.at GnuPG

[ceph-users] Re: Return code -116 when starting MDS scrub

2023-04-29 Thread Thomas Widhalm
ou don’t specify a path at all. Can you retry with „/„? Do you see anything with damage ls? [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56062.html Zitat von Thomas Widhalm : Hi, I followed the steps to repair journal and MDS I found here in the list. I hit a bug that stopped

[ceph-users] Return code -116 when starting MDS scrub

2023-04-29 Thread Thomas Widhalm
Hi, I followed the steps to repair journal and MDS I found here in the list. I hit a bug that stopped my MDS to start so I took the long way with reading the data. Everything went fine and I can even mount one of my CephFS now. That's a big relieve. But when I start scrub, I just get retur

[ceph-users] Re: Lua scripting in the rados gateway

2023-04-28 Thread Thomas Bennett
Hey Yuval, No problem. It was interesting to me to figure out how it all fits together and works. Thanks for opening an issue on the tracker. Cheers, Tom On Thu, 27 Apr 2023 at 15:03, Yuval Lifshitz wrote: > Hi Thomas, > Thanks for the detailed info! > RGW lua scripting was never te

[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-28 Thread Thomas Bennett
Tricks" section of the docs. > > -- dan > > __ > Clyso GmbH | https://www.clyso.com > > > > > On Wed, Apr 26, 2023 at 7:46 AM Thomas Bennett wrote: > > > > I would second Joachim's suggestion - this is exactly what we&

[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-26 Thread Thomas Bennett
I would second Joachim's suggestion - this is exactly what we're in the process of doing for a client, i.e migrating from Luminous to Quincy. However below would also work if you're moving to Nautilus. The only catch with this plan would be if you plan to reuse any hardware - i.e the hosts running

[ceph-users] Re: OSD_TOO_MANY_REPAIRS on random OSDs causing clients to hang

2023-04-26 Thread Thomas Hukkelberg
]: Machine check events logged [Tue Mar 28 20:00:28 2023] mce: [Hardware Error]: Machine check events logged [Wed Apr 19 01:50:41 2023] mce: [Hardware Error]: Machine check events logged mce: [Hardware Error] suggest memory or other type of hardware error as we understand it. --thomas > 26.

[ceph-users] OSD_TOO_MANY_REPAIRS on random OSDs causing clients to hang

2023-04-26 Thread Thomas Hukkelberg
degraded+repair, active+recovering+repair, active+clean+repair every few seconds? Any ideas on how to gracefully battle this problem? Thanks! --thomas Thomas Hukkelberg tho...@hovedkvarteret.no ___ ceph-users mailing list -- ceph-users@ceph.io To unsu

[ceph-users] Lua scripting in the rados gateway

2023-04-25 Thread Thomas Bennett
Hi ceph users, I've been trying out the lua scripting for the rados gateway (thanks Yuval). As in my previous email I mentioned that there is an error when trying to load the luasocket module. However, I thought it was a good time to report on my progress. My 'hello world' example below is calle

[ceph-users] Rados gateway lua script-package error lib64

2023-04-25 Thread Thomas Bennett
Hi, I've noticed that when my lua script runs I get the following error on my radosgw container. It looks like the lib64 directory is not included in the path when looking for shared libraries. Copying the content of lib64 into the lib directory solves the issue on the running container. Here ar

[ceph-users] Cephadm only scheduling, not orchestrating daemons

2023-04-13 Thread Thomas Widhalm
t;Upgrading all daemon types on all hosts", "services_complete": [ "crash", "mgr", "mon", "osd" ], "progress": "18/40 daemons upgraded", "message": "Upgrade paused&

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
tput, you could use that to check the journal logs which should tell the last restart time and why it's gone down. On Mon, Apr 10, 2023 at 4:25 PM Thomas Widhalm <mailto:widha...@widhalm.or.at>> wrote: I did what you told me. I also see in the log, that the command went throu

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
of the cephadm binary, un "cephadm ls" with it, grab the systemd unit name for the mds daemon form that output, you could use that to check the journal logs which should tell the last restart time and why it's gone down. On Mon, Apr 10, 2023 at 4:25 PM Thomas Widhalm <mai

[ceph-users] Re: Ceph Object Gateway and lua scripts

2023-04-11 Thread Thomas Bennett
Thanks Yuval. From your email I've confirmed that it's not the logging that is broken - it's the CopyFrom is causing an issue :) I've got some other example Lua scripts working now. Kind regards, Thomas On Sun, 9 Apr 2023 at 11:41, Yuval Lifshitz wrote: > Hi Thomas,I

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Thomas Widhalm
On 11.04.23 09:16, Xiubo Li wrote: On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this fix is not in the v17.2.6 yet in upstream

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Thomas Widhalm
error 32m ago 10w-- mds.mds01.ceph07.omdisd ceph07 error 32m ago 2M-- And other ideas? Or am I missing something. Cheers, Thomas On 10.04.23 21:53, Adam King wrote: Will also note that the normal upgrade

[ceph-users] Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-10 Thread Thomas Widhalm
ime and the update went on.) And in the log: 2023-04-10T19:23:48.750129+ mgr.ceph04.qaexpv [INF] Upgrade: Waiting for mds.mds01.ceph04.hcmvae to be up:active (currently up:replay) 2023-04-10T19:23:58.758141+ mgr.ceph04.qaexpv [WRN] Upgrade: No mds is up; continuing upgrade procedure to

[ceph-users] Ceph Object Gateway and lua scripts

2023-04-05 Thread Thomas Bennett
Hi, We're currently testing out lua scripting in the Ceph Object Gateway (Radosgw). Ceph version: 17.2.5 We've tried a simple experiment with the simple lua script which is based on the documentation (see fixed width text below). However, the issue we're having is that we can't find the log mes

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-04-05 Thread Thomas Widhalm
Sorry for interfereing, but: Wh!! Thank you so much for the great work. Can't wait for the release with a good chance to get access to my data again. On 05.04.23 16:15, Josh Durgin wrote: The LRC upgraded with no problems, so this release is good to go! Josh On Mon, Apr 3, 2023 at 3:

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-27 Thread Thomas Schneider
rieb Curt: Needs to be inside the " with your other commands. On Mon, Feb 27, 2023, 16:55 Thomas Schneider <74cmo...@gmail.com> wrote: Hi, I get an error running this ceph auth get-or-create syntax: # ceph auth get-or-create client.${rbdName} mon  "allow r&qu

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-27 Thread Thomas Schneider
Hi, I get an error running this ceph auth get-or-create syntax: # ceph auth get-or-create client.${rbdName} mon  "allow r" osd "allow rwx pool ${rbdPoolName} object_prefix rbd_data.${imageID}; allow rwx pool ${rbdPoolName} object_prefix rbd_header.${imageID}; allow rx pool ${rbdPoolName} obje

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-24 Thread Thomas Schneider
Please check the output here: # rbd info hdb_backup/VCT rbd image 'VCT':     size 800 GiB in 204800 objects     order 22 (4 MiB objects)     snapshot_count: 0     id: b768d4baac048b     block_name_prefix: rbd_data.b768d4baac048b     format: 2     features: layering

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-24 Thread Thomas Schneider
ready try the other caps? Do those work? Zitat von Thomas Schneider <74cmo...@gmail.com>: Confirmed. # ceph versions {     "mon": {     "ceph version 14.2.22 (877fa256043e4743620f4677e72dee5e738d1226) nautilus (stable)": 3     },     "m

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
;overall": {     "ceph version 14.2.22 (877fa256043e4743620f4677e72dee5e738d1226) nautilus (stable)": 450     } } Am 23.02.2023 um 17:33 schrieb Eugen Block: And the ceph cluster has the same version? ‚ceph versions‘ shows all daemons. If the cluster is also 14.2.X the caps sho

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
et_mirror_image: failed to retrieve mirroring state: (1) Operation not permitted rbd: info: (1) Operation not permitted And I don't have rbd-mirror enabled in this cluster, so that's kind of strange... I'll try to find out which other caps it requires. I already disabled all

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
x for the --id parameter, just the client name, in your case "VCT". Your second approach shows a different error message, so it can connect with "VCT" successfully, but the permissions seem not to be sufficient. Those caps look very restrictive, not sure which prevent the m

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
hdb_backup --id VCT  --keyring /etc/ceph/ceph.client.VCT.keyring' On Thu, Feb 23, 2023 at 6:54 PM Thomas Schneider <74cmo...@gmail.com> wrote: Hm... I'm not sure about the correct rbd command syntax, but I thought it's correct. Anyway, using a different ID

[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
not sure about upper-case client names, haven't tried that)? rbd map hdb_backup/VCT --id VCT --keyring /etc/ceph/ceph.client.VCT.keyring Zitat von Thomas Schneider <74cmo...@gmail.com>: Hello, I'm trying to mount RBD using rbd map, but I get this error message: # rbd map

[ceph-users] rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Thomas Schneider
Hello, I'm trying to mount RBD using rbd map, but I get this error message: # rbd map hdb_backup/VCT --id client --keyring /etc/ceph/ceph.client.VCT.keyring rbd: couldn't connect to the cluster! Checking on Ceph server the required permission for relevant keyring exists: # ceph-authtool -l /et

[ceph-users] Re: MDS stuck in "up:replay"

2023-02-23 Thread Thomas Widhalm
: /lib64/libpthread.so.0(+0x81ca) [0x7f6bef78e1ca] 12: clone() NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. If you need more, just let me know, please. On 23.02.23 01:34, Xiubo Li wrote: On 23/02/2023 05:56, Thomas Widhalm wrote: Ah, sorry. My bad. The MDS crash

[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Thomas Widhalm
I guess, you'd like to have the crash logs? Thank you in advance. Any help is really appreciated. My filesystems are still completely down. Cheers, Thomas On 22.02.23 18:36, Patrick Donnelly wrote: On Wed, Feb 22, 2023 at 12:10 PM Thomas Widhalm wrote: Hi, Thanks for the idea! I

[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Thomas Widhalm
Hi, Thanks for the idea! I tried it immediately but still, MDS are in up:replay mode. So far they haven't crashed but this usually takes a few minutes. So no effect so far. :-( Cheers, Thomas On 22.02.23 17:58, Patrick Donnelly wrote: On Wed, Jan 25, 2023 at 3:36 PM Thomas Widhalm

[ceph-users] Re: [EXTERNAL] Any ceph constants available?

2023-02-03 Thread Thomas Cannon
y have no idea what that means. Sadly, I am learning on the job here and the curve is pretty steep. Are the drives not balancing because of rules being misapplied? Thank you for all of your help here. Thomas > > It’s concerning that you have 4 pools warning nearful, but 7 pools in the

[ceph-users] Any ceph constants available?

2023-02-03 Thread Thomas Cannon
dea of the problems I am facing. I am hoping for some professional services hours with someone who is a true expert with this software, to get us to a stable and sane deployment that can be managed without it being a terrifying guessing game, trying to get it to work. If that is you, or if you know so

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-25 Thread Thomas Widhalm
puzzles me because at least one of them hasn't seen changes for weeks before the crash. Cheers, Thomas On 20.01.23 04:37, Venky Shankar wrote: Hi Thomas, On Thu, Jan 19, 2023 at 7:15 PM Thomas Widhalm wrote: Hi, Unfortunately the workaround didn't work out: [ceph: root@ceph05

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-19 Thread Thomas Widhalm
,i=[7ff]}] Standby daemons: [mds.mds01.ceph05.pqxmvt{-1:61834887} state up:standby seq 1 addr [v2:192.168.23.65:6800/957802673,v1:192.168.23.65:6801/957802673] compat {c=[1],r=[1],i=[7ff]}] dumped fsmap epoch 198622 On 19.01.23 14:01, Venky Shankar wrote: Hi Thomas, On Tue, Jan 17, 2023 at 5:34 P

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-19 Thread Thomas Widhalm
Hi Venky, Thanks. I just uploaded my logs to the tracker. I'll try what you suggested and will let you know how it went. Cheers, Thomas On 19.01.23 14:01, Venky Shankar wrote: Hi Thomas, On Tue, Jan 17, 2023 at 5:34 PM Thomas Widhalm wrote: Another new thing that just happened: O

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-18 Thread Thomas Widhalm
Thank you. I'm setting the debug level and await authorization for Tracker. I'll upload the logs as soon as I can collect them. Thank you so much for your help On 18.01.23 12:26, Kotresh Hiremath Ravishankar wrote: Hi Thomas, This looks like it requires more investigation than

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
t's referring to log replaying, could this be related to my issue? On 17.01.23 10:54, Thomas Widhalm wrote: Hi again, Another thing I found: Out of pure desperation, I started MDS on all nodes. I had them configured in the past so I was hoping, they could help with bringing in missing data e

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
hosts that usually don't run MDS just spiked. So high I had to kill the MDS again because otherwise they kept killing OSD containers. So I don't really have any new information, but maybe that could be a hint of some kind? Cheers, Thomas On 17.01.23 10:13, Thomas Widhalm wrote: Hi, Thanks

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-17 Thread Thomas Widhalm
p thread waiting interval 1.0s The only thing that gives me hope here is that the line mds.beacon.mds01.ceph05.pqxmvt Sending beacon up:replay seq 11109 is chaning its sequence number. Anything else I can provide? Cheers, Thomas On 17.01.23 06:27, Kotresh Hiremath Ravishankar wrote: Hi Thomas,

[ceph-users] Re: MDS stuck in "up:replay"

2023-01-16 Thread Thomas Widhalm
]: mds.mds01.ceph04.cvdhsx Updating MDS map to version 143933 from mon.1 Jan 16 10:05:24 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx Updating MDS map to version 143935 from mon.1 Jan 16 10:05:29 ceph04 ceph-mds[1311]: mds.mds01.ceph04.cvdhsx Updating MDS map to version 143936 from mon.1 Jan 16 10

[ceph-users] MDS stuck in "up:replay"

2023-01-14 Thread Thomas Widhalm
scrubbing aside). But I can't mount anything. When I try to start a VM that's on RDS I just get a timeout. And when I try to mount a CephFS, mount just hangs forever. Whatever command I give MDS or journal, it just hangs. The only thing I could do, was take all CephFS offline, kil

[ceph-users] Requesting recommendations for Ceph multi-cluster management

2022-11-23 Thread Thomas Eckert
following this survey and, at time of writing, it is the only article on ceph.io labeled "multi-cluster". Any recommendations or pointers where to look would be appreciated! Regards, Thomas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Invalid crush class

2022-10-08 Thread Michael Thomas
In 15.2.7, how can I remove an invalid crush class? I'm surprised that I was able to create it in the first place: [root@ceph1 bin]# ceph osd crush class ls [ "ssd", "JBOD.hdd", "nvme", "hdd" ] [root@ceph1 bin]# ceph osd crush class ls-osd JBOD.hdd Invalid command: invalid cha

[ceph-users] Map RBD to multiple nodes (line NFS)

2022-07-25 Thread Thomas Schneider
node has written the file to RBD, like a NFS? If yes, how must the RBD be configured here? If no, is there any possibility in Ceph to provide such a shared storage? Regards Thomas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: cephadm orch thinks hosts are offline

2022-06-29 Thread Thomas Roth
r, cf. above. > ssh-copy-id -f -i /etc/ceph/ceph.pub root@lxbk0374 > ceph orch host add lxbk0374 10.20.2.161 -> 'ceph orch host ls' shows that node no longer Offline. -> Repeat with all the other hosts, and everything looks fine also from the orch view. My question: Did I

[ceph-users] Re: cephadm orch thinks hosts are offline

2022-06-27 Thread Thomas Roth
ot found. Use 'ceph orch host ls' to see all managed hosts. In some email on this issue I can't find atm, someone describes a workaround that allows to restart the entire orchestrator business. But that sounded risky. Regards Thomsa On 23/06/2022 19.42, Adam King wrote: Hi Th

[ceph-users] cephadm orch thinks hosts are offline

2022-06-23 Thread Thomas Roth
eem to be unaffected. Cheers Thomas -- ---- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

[ceph-users] active+undersized+degraded due to OSD size differences?

2022-06-19 Thread Thomas Roth
ver? This is all Quincy, cephadm, so there is no ceph.conf anymore, and I did not find the command to inject my failure domain into the config database... Regards Thomas -- ---- Thomas Roth IT-HPC-Linux Location: SB3 2.291

[ceph-users] ceph.pub not presistent over reboots?

2022-06-15 Thread Thomas Roth
their nodes? Can't really believe that. Regards Thomas -- ---- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.

[ceph-users] set configuration options in the cephadm age

2022-06-14 Thread Thomas Roth
paragraph in the documentation mentioning this, along with the corresponding paragraph on setting options permanently... In fact, I would just to have the failure domain 'OSD' instead of 'host'. Any clever way of doing that? Regards, Thomas ___

[ceph-users] Re: Rebalance after draining - why?

2022-05-28 Thread Michael Thomas
Try this: ceph osd crush reweight osd.XX 0 --Mike On 5/28/22 15:02, Nico Schottelius wrote: Good evening dear fellow Ceph'ers, when removing OSDs from a cluster, we sometimes use ceph osd reweight osd.XX 0 and wait until the OSD's content has been redistributed. However, when then fin

[ceph-users] Re: v17.2.0 Quincy released

2022-05-25 Thread Thomas Roth
ceph.conf.new: Permission denied By now, I go to ceph.io every day to see if the motd has been changed to "If it compiles at all, release it as stable". Cheers, Thomas On 5/4/22 14:57, Jozef Rebjak wrote: Hello, If there is somebody who is using non-root user within Pacific and wo

[ceph-users] Re: managed block storage stopped working

2022-02-09 Thread Michael Thomas
On 1/7/22 16:49, Marc wrote: Where else can I look to find out why the managed block storage isn't accessible anymore? ceph -s ? I guess it is not showing any errors, and there is probably nothing with ceph, you can do an rbdmap and see if you can just map an image. Then try mapping an im

[ceph-users] Re: Multipath and cephadm

2022-01-30 Thread Thomas Roth
Thanks, Peter, this works. Before, I had the impression cephadm would only accept 'bare' disks as osd devices, but indeed it will swallow any kind of block device or LV that you prepare for it on the osd host. Regards, Thomas On 1/25/22 20:21, Peter Childs wrote: This came from

[ceph-users] Re: Multipath and cephadm

2022-01-25 Thread Thomas Roth
es not know how to handle multipath devices? Regrads, Thomas On 12/23/21 18:40, Michal Strnad wrote: Hi all. We have problem using disks accessible via multipath. We are using cephadm for deployment, Pacific version for containers, CentOS 8 Stream on servers and following LVM

[ceph-users] managed block storage stopped working

2022-01-07 Thread Michael Thomas
...sorta. I have a ovirt-4.4.2 system installed a couple of years ago and set up managed block storage using ceph Octopus[1]. This has been working well since it was originally set up. In late November we had some network issues on one of our ovirt hosts, as well a seperate network issue tha

[ceph-users] Re: [External Email] Re: ceph-objectstore-tool core dump

2021-10-04 Thread Michael Thomas
On 10/4/21 11:57 AM, Dave Hall wrote: > I also had a delay on the start of the repair scrub when I was dealing with > this issue. I ultimately increased the number of simultaneous scrubs, but > I think you could also temporarily disable scrubs and then re-issue the 'pg > repair'. (But I'm not one

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread Michael Thomas
On 10/3/21 12:08, 胡 玮文 wrote: 在 2021年10月4日,00:53,Michael Thomas 写道: I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph cluster. I was able to determine that they are all coming from the same OSD: osd.143. This host recently suffered from an unplanned power loss, so

[ceph-users] ceph-objectstore-tool core dump

2021-10-03 Thread Michael Thomas
I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph cluster. I was able to determine that they are all coming from the same OSD: osd.143. This host recently suffered from an unplanned power loss, so I'm not surprised that there may be some corruption. This PG is part of

[ceph-users] Multiple cephfs MDS crashes with same assert_condition: state == LOCK_XLOCK || state == LOCK_XLOCKDONE

2021-08-09 Thread Thomas Hukkelberg
Hi Today we suddenly experience multiple MDS crashes during the day with an error we have not seen earlier. We run octopus 15.2.13 with 4 ranks and 4 standby-reply MDSes and 1 passive standby. Any input on how to troubleshot or resolve this would be most welcome. --- root@hk-cephnode-54:~# ce

[ceph-users] Re: HDD <-> OSDs

2021-06-22 Thread Thomas Roth
Thank you all for the clarification! I just did not grasp the concept before, probably because I am used to those systems that form a layer on top of the local file system. If ceph does it all, down to the magnetic platter, all the better. Cheers Thomas On 6/22/21 12:15 PM, Marc wrote

[ceph-users] HDD <-> OSDs

2021-06-22 Thread Thomas Roth
try cephfs on ~10 servers with 70 HDD each. That would make each system having to deal with 70 OSDs, on 70 LVs? Really no aggregation of the disks? Regards, Thomas -- Thomas Roth Department: IT GSI Helmholtzzentrum für

[ceph-users] cephfs auditing

2021-05-27 Thread Michael Thomas
Is there a way to log or track which cephfs files are being accessed? This would help us in planning where to place certain datasets based on popularity, eg on a EC HDD pool or a replicated SSD pool. I know I can run inotify on the ceph clients, but I was hoping that the MDS would have a way t

  1   2   3   >