[ceph-users] Re: Squid 19.2.0 balancer causes restful requests to be lost

2024-11-01 Thread Chris Palmer
e can confirm? Can you please include: 1. Steps to reproduce (including any commands you are performing to invoke the restful api) 2. MGR logs with `ceph config set mgr.* debug_mgr 20` and `ceph config set mgr mgr/balancer/log_level debug` Thanks, Laura On Wed, Oct 30, 2024 at 7:24 AM Chris Palmer

[ceph-users] Squid 19.2.0 balancer causes restful requests to be lost

2024-10-30 Thread Chris Palmer
I've just upgraded a test cluster from 18.2.4 to 19.2.0.  Package install on centos 9 stream. Very smooth upgrade. Only one problem so far... The MGR restful api calls work fine. EXCEPT whenever the balancer kicks in to find any new plans. During the few seconds that the balancer takes to run,

[ceph-users] Re: RGW Lifecycle Problem (Reef)

2024-07-18 Thread Chris Palmer
Anyone got any ideas why this one lifecycle rule never runs automatically? On 15/07/2024 13:32, Chris Palmer wrote: Reef 18.2.2, package install on Centos 9. This is a very straightforward production cluster, 2 RGW hosts, no multisite. 4 buckets have lifecycle policies: $ radosgw-admin lc

[ceph-users] RGW Lifecycle Problem (Reef)

2024-07-15 Thread Chris Palmer
Reef 18.2.2, package install on Centos 9. This is a very straightforward production cluster, 2 RGW hosts, no multisite. 4 buckets have lifecycle policies: $ radosgw-admin lc list [     {     "bucket": ":a:1d84db34-ed2b-400a-842e-344cdfa3deed.261076466.1",     "shard": "lc.3",   

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Chris Palmer
a cephadm orchestrated docker/podman cluster be an acceptable workaround? We are running that config with reef containers on Debian 12 hosts, with a couple of debian 12 clients successfully mounting cephfs mounts, using the reef client packages directly on Debian. On Fri, Feb 2, 2024, 8:21 AM Ch

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Chris Palmer
steps.  Upgrade ceph first or upgrade debian first, then do the upgrade to the other one. Most of our infra is already upgraded to debian 12, except ceph. On 2024-01-29 07:27, Chris Palmer wrote: I have logged this as https://tracker.ceph.com/issues/64213 On 16/01/2024 14:18, DERUMIER,

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-29 Thread Chris Palmer
I have logged this as https://tracker.ceph.com/issues/64213 On 16/01/2024 14:18, DERUMIER, Alexandre wrote: Hi, ImportError: PyO3 modules may only be initialized once per interpreter process and ceph -s reports "Module 'dashboard' has failed dependency: PyO3 modules may only be initialized on

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-17 Thread Chris Palmer
On 17/01/2024 16:11, kefu chai wrote: On Tue, Jan 16, 2024 at 12:11 AM Chris Palmer wrote: Updates on both problems: Problem 1 -- The bookworm/reef cephadm package needs updating to accommodate the last change in /usr/share/doc/adduser/NEWS.Debian.gz

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-15 Thread Chris Palmer
-bin/bugreport.cgi?bug=1055212 https://github.com/pyca/bcrypt/issues/694 On 12/01/2024 14:29, Chris Palmer wrote: More info on problem 2: When starting the dashboard, the mgr seems to try to initialise cephadm, which in turn uses python crypto libraries that lead to the python error: $ ceph

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer
ot;,     "os_name": "Debian GNU/Linux 12 (bookworm)",     "os_version": "12 (bookworm)",     "os_version_id": "12",     "process_name": "ceph-mgr",     "stack_sig": "7815ad73ced094695056319d1241bf7847da19b4

[ceph-users] Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer
I was delighted to see the native Debian 12 (bookworm) packages turn up in Reef 18.2.1. We currently run a number of ceph clusters on Debian11 (bullseye) / Quincy 17.2.7. These are not cephadm-managed. I have attempted to upgrade a test cluster, and it is not going well. Quincy only supports

[ceph-users] Re: Debian 12 support

2023-11-13 Thread Chris Palmer
And another big +1 for debian12 reef from us. We're unable to upgrade to either debian12 or reef. I've been keeping an eye on the debian12 bug, and it looks as though it might be fixed if you start from the latest repo release. Thanks, Chris On 13/11/2023 07:43, Berger Wolfgang wrote: +1 for

[ceph-users] Re: Ceph Dashboard - Community News Sticker [Feedback]

2023-11-08 Thread Chris Palmer
My vote would be "no": * This is an operational high-criticality system. Not the right place to have distracting other stuff or to bloat the dashboard. * Our ceph systems deliberately don't have direct internet connectivity. * There is plenty of useful operational information that could fil

[ceph-users] Re: Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-13 Thread Chris Palmer
I have just checked 2 quincy 17.2.6 clusters, and I see exactly the same. The pgmap version is bumping every two seconds (which ties in with the frequency you observed). Both clusters are healthy with nothing apart from client IO happening. On 13/10/2023 12:09, Zakhar Kirpichenko wrote: Hi,

[ceph-users] Re: Please help collecting stats of Ceph monitor disk writes

2023-10-13 Thread Chris Palmer
Here is some data from a small, very lightly loaded cluster. It is manually deployed on debian11, with the mon store on an SSD: 1) iotop results: TID PRIO USER DISK READ DISK WRITE SWAPIN IOCOMMAND 1923 be/4 ceph 0.00 B104.00 K ?unavailable? ceph-mon -f -

[ceph-users] MGR Memory Leak in Restful

2023-09-08 Thread Chris Palmer
I first posted this on 17 April but did not get any response (although IIRC a number of other posts referred to it). Seeing as MGR OOM is being discussed at the moment I am re-posting. These clusters are not containerized. Is this being tracked/fixed or not? Thanks, Chris -

[ceph-users] Re: Debian/bullseye build for reef

2023-08-21 Thread Chris Palmer
ote: There was difficulty building on bullseye due to the older version of GCC available: https://tracker.ceph.com/issues/61845 On Mon, Aug 21, 2023 at 3:01 AM Chris Palmer wrote: I'd like to try reef, but we are on debian 11 (bullseye). In the ceph repos, there is debian-

[ceph-users] Debian/bullseye build for reef

2023-08-21 Thread Chris Palmer
I'd like to try reef, but we are on debian 11 (bullseye). In the ceph repos, there is debian-quincy/bullseye and debian-quincy/focal, but under reef there is only focal & jammy. Is there a reason why there is no reef/bullseye build? I had thought that the blocker only affected debian-bookworm

[ceph-users] Re: v18.2.0 Reef released

2023-08-15 Thread Chris Palmer
I'd like to try reef, but we are on debian 11 (bullseye). In the ceph repos, there is debian-quincy/bullseye and debian-quincy/focal, but under reef there is only focal & jammy. Is there a reason why there is no reef/bullseye build? I had thought that the blocker only affected debian-bookworm

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Chris Palmer
) Thanks for your prompt reply. Regards Sandip Divekar -Original Message- From: Chris Palmer Sent: Thursday, May 25, 2023 7:25 PM To:ceph-users@ceph.io Subject: [ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly * EXTERNAL EMAIL * Hi

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Chris Palmer
Hi Milind I just tried this using the ceph kernel client and ceph-common 17.2.6 package in the latest Fedora kernel, against Ceph 17.2.6 and it worked perfectly... There must be some other factor in play. Chris On 25/05/2023 13:04, Sandip Divekar wrote: Hello Milind, We are using Ceph Kernel

[ceph-users] Re: Slow recovery on Quincy

2023-05-17 Thread Chris Palmer
This is interesting, and it arrived minutes after I had replaced an HDD OSD (with NVME DB/WAL) in a small cluster. With the three profiles i was only seeing objects/second of around 6-8 (high_client_ops), 9-12 (balanced), 12-15 (high_recovery_ops). There was only a very light client load. Wit

[ceph-users] CephFS Scrub Questions

2023-05-04 Thread Chris Palmer
Hi Grateful if someone could clarify some things about CephFS Scrubs: 1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start / recursive" only triggers a forward scrub (not a backward scrub)? 2) I couldn't find any reference to forward scrubs being done automatically and wa

[ceph-users] Re: [ceph 17.2.6] unable to create rbd snapshots for images with erasure code data-pool

2023-04-17 Thread Chris Palmer
I've just tried this on 17.2.6 and it worked fine On 17/04/2023 12:57, Reto Gysi wrote: Dear Ceph Users, After upgrading from version 17.2.5 to 17.2.6 I no longer seem to be able to create snapshots of images that have an erasure coded datapool. root@zephir:~# rbd snap create ceph-dev@back

[ceph-users] MGR Memory Leak in Restful

2023-04-17 Thread Chris Palmer
We've hit a memory leak in the Manager Restful interface, in versions 17.2.5 & 17.2.6. On our main production cluster the active MGR grew to about 60G until the oom_reaper killed it, causing a successful failover and restart of the failed one. We can then see that the problem is recurring, actu

[ceph-users] Re: 17.2.6 Dashboard/RGW Signature Mismatch

2023-04-14 Thread Chris Palmer
mes. I guess you would hit that first, masking the other problem. Although cluster 2 should probably have been configured with FQDN hostnames I do still think this is a regression. The "rgw dns name" field should be honoured. Thanks, Chris On 13/04/2023 17:20, Chris Palmer wrote:

[ceph-users] Re: 17.2.6 Dashboard/RGW Signature Mismatch

2023-04-13 Thread Chris Palmer
On 13/04/2023 18:15, Gilles Mocellin wrote: I suspect the same origin of our problem in 16.2.11, see Tuesday's thread "[ceph-users] Pacific dashboard: unable to get RGW information". https://www.mail-archive.com/ceph-users%40ceph.io/msg19566.html Unfortunately I don't think it is the same pro

[ceph-users] 17.2.6 Dashboard/RGW Signature Mismatch

2023-04-13 Thread Chris Palmer
Hi I have 3 Ceph clusters, all configured similarly, which have been happy for some months on 17.2.5: 1. A test cluster 2. A small production cluster 3. A larger production cluster All are debian 11 built from packages - no cephadm. I upgraded (1) to 17.2.6 without any problems at all. In pa

[ceph-users] Re: Status of Quincy 17.2.5 ?

2022-10-20 Thread Chris Palmer
I do agree with Christian. I would like to see the Ceph repositories handled in a similar way to most others: * Testing or pre-release packages go into one (or more) testing repos * Production-ready packages go into the production repo I don't care about the minor mirror-synch delay. What I d

[ceph-users] Status of Quincy 17.2.5 ?

2022-10-19 Thread Chris Palmer
Hi I've noticed that packages for Quincy 17.2.5 appeared in the debian 11 repo a few days ago. However I haven't seen any mention of it anywhere, can't find any release notes, and the documentation still shows 17.2.4 as the latest version. Is 17.2.5 documented and ready for use yet? It's a b

[ceph-users] Re: Quincy recovery load

2022-07-12 Thread Chris Palmer
I've created tracker https://tracker.ceph.com/issues/56530 for this, including info on replicating it on another cluster. On 11/07/2022 17:41, Chris Palmer wrote: Correction - it is the Acting OSDs that are consuming CPU, not the UP ones On 11/07/2022 16:17, Chris Palmer wrote: I'

[ceph-users] Re: Quincy recovery load

2022-07-11 Thread Chris Palmer
Correction - it is the Acting OSDs that are consuming CPU, not the UP ones On 11/07/2022 16:17, Chris Palmer wrote: I'm seeing a similar problem on a small cluster just upgraded from Pacific 16.2.9 to Quincy 17.2.1 (non-cephadm). The cluster was only very lightly loaded during and afte

[ceph-users] Re: Quincy recovery load

2022-07-11 Thread Chris Palmer
I'm seeing a similar problem on a small cluster just upgraded from Pacific 16.2.9 to Quincy 17.2.1 (non-cephadm). The cluster was only very lightly loaded during and after the upgrade. The OSDs affected are all bluestore, HDD sharing NVMe DB/WAL, and all created on Pacific (I think). The upgra

[ceph-users] Quincy upgrade note - comments

2022-07-01 Thread Chris Palmer
I just upgraded a non-cephadm test cluster from Pacific 16.2.9 to Quincy 17.2.1. It all went very smoothly, but just a couple of comments about the upgrade note: * Steps 5.2, 5.3 & 5.5 - the required command is "ceph fs status", not "ceph status" * Step 5.1 correctly requires "allow_standb

[ceph-users] Re: MDS upgrade to Quincy

2022-04-21 Thread Chris Palmer
Hi Patrick Sorry, I misread it. Now it makes perfect sense. Sorry for the noise. Regards, Chris On 21/04/2022 14:28, Patrick Donnelly wrote: On Wed, Apr 20, 2022 at 8:29 AM Chris Palmer wrote: The Quincy release notes state that "MDS upgrades no longer require all standby MDS daemons

[ceph-users] Re: v17.2.0 Quincy released

2022-04-21 Thread Chris Palmer
Could you clarify this part of the Quincy release notes too please? https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XMOSB4IFKVHSQT5MFSWXTQHY7FC5WDSQ/ On 20/04/2022 17:31, Stefan Kooman wrote: On 4/20/22 18:26, Patrick Donnelly wrote: On Wed, Apr 20, 2022 at 7:22 AM Stefan Kooma

[ceph-users] MDS upgrade to Quincy

2022-04-20 Thread Chris Palmer
The Quincy release notes state that "MDS upgrades no longer require all standby MDS daemons to be stoped before upgrading a file systems's sole active MDS." but the "Upgrading non-cephadm clusters" instructions still include reducing ranks to 1, upgrading, then raising it again. Does the new f

[ceph-users] Re: How often should I scrub the filesystem ?

2022-03-17 Thread Chris Palmer
owed by a "scrub" without any issues, and if the "damage ls" still shows you an error, try running "damage rm" and re-run "scrub" to see if the system still reports a damage. Please update the upstream tracker with your findings if possible. -- Milind

[ceph-users] Re: How often should I scrub the filesystem ?

2022-03-12 Thread Chris Palmer
Ok, restarting mds.0 cleared it. I then restarted the others until this one was again active, and repeated the scrub ~mdsdir which was then clean. I don't know what caused it, or why restarting the MDS was necessary but it has done the trick. On 12/03/2022 19:14, Chris Palmer wrote

[ceph-users] Re: How often should I scrub the filesystem ?

2022-03-12 Thread Chris Palmer
Hi Miland (or anyone else who can help...) Reading this thread made me realise I had overlooked cephfs scrubbing, so i tried it on a small 16.2.7 cluster. The normal forward scrub showed nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find one backtrace error (putting the

[ceph-users] Mon crash - abort in RocksDB

2022-02-24 Thread Chris Palmer
We have a small Pacific 16.2.7 test cluster that has been ticking over for a couple of years with no problems whatever. The last "event" was 14 days ago when I was testing some OSD replacement procedures - nothing remarkable. At 0146 this morning though mon03 signalled an abort in the RocksDB

[ceph-users] Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2021-12-09 Thread Chris Palmer
via config diff. -- dan On Thu, Dec 9, 2021 at 7:32 PM Chris Palmer wrote: Hi Yes, using ceph config is working fine for the rest of the nodes. Do you know if it is necessary/advisable to restart the MONs after removing the mon_mds_skip_sanity setting when the upgrade is complete? Thanks, Chris O

[ceph-users] Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2021-12-09 Thread Chris Palmer
at 6:44 PM Chris Palmer wrote: Hi Dan & Patrick Setting that to true using "ceph config" didn't seem to work. I then deleted it from there and set it in ceph.conf on node1 and eventually after a reboot it started ok. I don't know for sure whether it failing using ceph c

[ceph-users] Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2021-12-09 Thread Chris Palmer
ed to know? Thanks for your very fast responses! Chris On 09/12/2021 17:10, Dan van der Ster wrote: On Thu, Dec 9, 2021 at 5:40 PM Patrick Donnelly wrote: Hi Chris, On Thu, Dec 9, 2021 at 10:40 AM Chris Palmer wrote: Hi I've just started an upgrade of a test cluster from 16.2.6 -

[ceph-users] Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2021-12-09 Thread Chris Palmer
ph fs dump` -- dan On Thu, Dec 9, 2021 at 4:40 PM Chris Palmer wrote: Hi I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and immediately hit a problem. The cluster started as octopus, and has upgraded through to 16.2.6 without any trouble. It is a conventional deplo

[ceph-users] Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2021-12-09 Thread Chris Palmer
Hi I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and immediately hit a problem. The cluster started as octopus, and has upgraded through to 16.2.6 without any trouble. It is a conventional deployment on Debian 10, NOT using cephadm. All was clean before the upgrade. It

[ceph-users] Re: Multiple DNS names for RGW?

2021-08-17 Thread Chris Palmer
me through the proxy. In that case we didn't need the proxy to inject a Host header. Regards, Chris On 17/08/2021 09:47, Christian Rohmann wrote: Hey Burkhard, Chris, all, On 16/08/2021 10:48, Chris Palmer wrote: It's straightforward to add multiple DNS names to an endpoint. We d

[ceph-users] Re: Multiple DNS names for RGW?

2021-08-16 Thread Chris Palmer
It's straightforward to add multiple DNS names to an endpoint. We do this for the sort of reasons you suggest. You then don't need separate rgw instances (not for this reason anyway). Assuming default: * radosgw-admin zonegroup get > zg-default * Edit zg-default, changing "hostnames" to e.g.

[ceph-users] Re: Pacific 16.2.5 Dashboard minor regression

2021-07-22 Thread Chris Palmer
3/ <https://github.com/ceph/ceph/pull/41483/> I may suggest you to use a SAN with the IP address/es in your certificate? Kind Regards, Ernesto On Thu, Jul 22, 2021 at 3:45 PM Chris Palmer <mailto:chris.pal...@idnet.com>> wrote: Since updating from Pacific 16.2.4 -> 16.

[ceph-users] Pacific 16.2.5 Dashboard minor regression

2021-07-22 Thread Chris Palmer
Since updating from Pacific 16.2.4 -> 16.2.5 I've noticed a behaviour change in the Dashboard. If I connect to the active MGR, it is fine. However if I connect to a standby MGR, it redirects to the active one by placing the active IP address in the URL, rather than the active hostname as it use

[ceph-users] Re: Pacific: RadosGW crashing on multipart uploads.

2021-06-30 Thread Chris Palmer
Hi Vincent It may be Bug 50556 . I am having this problem, although I don't think that characters in the bucket name is relevant. Backport 51001 has just been updated so looks as though it will be in 16.2.5. At a

[ceph-users] Likely date for Pacific backport for RGW fix?

2021-06-16 Thread Chris Palmer
Hi Our first upgrade (non-cephadm) from Octopus to Pacific 16.0.4 went very smoothly. Thanks for all the effort. The only thing that has bitten us is https://tracker.ceph.com/issues/50556 which prevents a multipart upload to an RGW bucket that has a b

[ceph-users] Importance of bluefs fix in Octopus 15.2.10 ?

2021-03-19 Thread Chris Palmer
When looking over the changelog for 15.2.10 I noticed some bluefs changes. One in particular caught my eye, and it was called out as a notable change:    os/bluestore: fix huge reads/writes at BlueFS (pr#39701 , Jianpeng Ma, Igor Fedotov) It wasn't ob

[ceph-users] Re: multiple-domain for S3 on rgws with same ceph backend on one zone

2021-02-22 Thread Chris Palmer
I'm not sure that the tenant solution is what the OP wants - my reading is that running under a different tenant allows you have different tenants use the same bucket and user names but still be distinct, which wasn't what I thought was meant. You can however get RGW to accept a list of host n

[ceph-users] Re: Debian repo for ceph-iscsi

2020-12-22 Thread Chris Palmer
Pulling the package python3-rtslib-fb_2.1.71-3_all.deb from bullseye and manually installing it on buster seems to have done the trick. On 22/12/2020 13:20, Chris Palmer wrote: Hi Joachim Thanks for that pointer. I've pulled ceph-iscsi from there and and trying to get things going n

[ceph-users] Re: Debian repo for ceph-iscsi

2020-12-22 Thread Chris Palmer
Hi Joachim Thanks for that pointer. I've pulled ceph-iscsi from there and and trying to get things going now on buster. The problem I have at the moment though is with python3-rtslib-fb. That hasn't been backported to buster, and the latest in the main buster repo is 2.1.66, but ceph-iscsi r

[ceph-users] Debian repo for ceph-iscsi

2020-12-11 Thread Chris Palmer
I just went to setup an iscsi gateway on a Debian Buster / Octopus cluster and hit a brick wall with packages. I had perhaps naively assumed they were in with the rest. Now I understand that it can exist separately, but then so can RGW. I found some ceph-iscsi rpm builds for Centos, but nothin

[ceph-users] Re: Can you block gmail.com or so!!!

2020-08-07 Thread Chris Palmer
While you are thinking about the mailing list configuration, can you consider that it is very DMARC-unfriendly, which is why I have to use an email address from an ISP domain that does not publish DMARC. If I post from my normal email accounts: * We publish SPF, DKIM & DMARC policies that req

[ceph-users] Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

2020-08-05 Thread Chris Palmer
mewhat different: https://clouddocs.web.cern.ch/object_store/s3cmd.html#object-expiration-with-s3cmd .. Dan On Wed, 5 Aug 2020, 14:52 Chris Palmer, <mailto:ch...@cpalmer.eclipse.co.uk>> wrote: This is starting to look like a regression error in Octopus 15.2.4. After cleaning thin

[ceph-users] Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

2020-08-05 Thread Chris Palmer
18 days after which all non-current versions get deleted (on 15.2.4). Anyone come across versioning problems on 15.2.4? Thanks, Chris On 17/07/2020 09:11, Chris Palmer wrote: This got worse this morning. An RGW daemon crashed at midnight with a segfault, and the backtrace hints that it

[ceph-users] Re: Not able to access radosgw S3 bucket creation with AWS java SDK. Caused by: java.net.UnknownHostException: issue.

2020-07-29 Thread Chris Palmer
This works for me (the code switches between AWS and RGW according to whether s3Endpoint is set). You need the pathStyleAccess unless you have wildcard DNS names etc.             String s3Endpoint = "http://my.host:80";;             AmazonS3ClientBuilder s3b = AmazonS3ClientBuilder.standard ()

[ceph-users] Re: RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

2020-07-17 Thread Chris Palmer
ith a similar policy and test that in parallel. We will see what happens tomorrow Thanks, Chris On 16/07/2020 08:22, Chris Palmer wrote: I have an RGW bucket (backups) that is versioned. A nightly job creates a new version of a few objects. There is a lifecycle policy (see below) that keeps 1

[ceph-users] RGW versioned objects lost after Octopus 15.2.3 -> 15.2.4 upgrade

2020-07-16 Thread Chris Palmer
I have an RGW bucket (backups) that is versioned. A nightly job creates a new version of a few objects. There is a lifecycle policy (see below) that keeps 18 days of versions. This has been working perfectly and has not been changed. Until I upgraded Octopus... The nightly job creates separate

[ceph-users] RGW multi-object delete failing with 403 denied

2020-07-11 Thread Chris Palmer
Hi An RGW access denied problem that I can't get anywhere with... * Bucket mybucket owned by user "c" * Bucket policy grants s3:listBucket on mybucket, and s3:putObject & s3:deleteObject on mybucket/* to user "j", and s3:getObject to * (I even granted s3:* on mybucket/* to "j" with no ef

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-28 Thread Chris Palmer
Immediate thought: Forget about crush maps, osds, etc. If you lose half the nodes (when one power rail fails) your MONs will lose quorum. I don't see how you can win with that configuration... On 28/05/2020 13:18, Phil Regnauld wrote: Hi, in our production cluster, we have the following setup

[ceph-users] Re: [External Email] Re: Ceph Nautius not working after setting MTU 9000

2020-05-27 Thread Chris Palmer
To elaborate on some aspects that have been mentioned already and add some others:: * Test using iperf3. * Don't try to use jumbos on networks where you don't have complete control over every host. This usually includes the main ceph network. It's just too much grief. You can consider us

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
ain, with bluefs_preextend_wal_files=false until its deemed safe to re-enable it. Many thanks Igor! Regards, Chris On 23/05/2020 11:06, Chris Palmer wrote: > Hi Ashley > > Setting bluefs_preextend_wal_files to false should stop any further > corruption of the WAL

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
with bluefs_preextend_wal_files=false until its deemed safe to re-enable it. Many thanks Igor! Regards, Chris On 23/05/2020 11:06, Chris Palmer wrote: Hi Ashley Setting bluefs_preextend_wal_files to false should stop any further corruption of the WAL (subject to the small risk of doing this while the OS

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
just rebuilding the OSD's that failed? Or are you going through and rebuilding every OSD even the working one's? Or does setting the bluefs_preextend_wal_files value to false and leaving the OSD running fix the WAL automatically? Thanks On Sat, 23 May 2020 15:53:42 +0800 *Chris

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Chris Palmer
same... > > > Thanks, > > Igor > > On 5/20/2020 5:24 PM, Igor Fedotov wrote: >> Chris, >> >> got them, thanks! >> >> Investigating >> >> >> Thanks, >>

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Chris Palmer
luefs to 20 and collect another one for failed OSD startup. Thanks, Igor On 5/20/2020 4:39 PM, Chris Palmer wrote: I'm getting similar errors after rebooting a node. Cluster was upgraded 15.2.1 -> 15.2.2 yesterday. No problems after rebooting during upgrade. On the node I just r

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Chris Palmer
I'm getting similar errors after rebooting a node. Cluster was upgraded 15.2.1 -> 15.2.2 yesterday. No problems after rebooting during upgrade. On the node I just rebooted, 2/4 OSDs won't restart. Similar logs from both. Logs from one below. Neither OSDs have compression enabled, although there