[ceph-users] Re: Erasure coded pool chunk count k

2021-10-04 Thread Anthony D'Atri
The larger the value of K relative to M, the more efficient the raw :: usable ratio ends up. There are tradeoffs and caveats. Here are some of my thoughts; if I’m off-base here, I welcome enlightenment. When possible, it’s ideal to have at least K+M failure domains — often racks,

[ceph-users] Re: Daemon Version Mismatch (But Not Really?) After Deleting/Recreating OSDs

2021-10-04 Thread Gregory Farnum
On Mon, Oct 4, 2021 at 12:05 PM Edward R Huyer wrote: > > Apparently the default value for container_image in the cluster configuration > is "docker.io/ceph/daemon-base:latest-pacific-devel". I don't know where > that came from. I didn't set it anywhere. I'm not allowed to edit it, > either

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread Stefan Kooman
On 10/4/21 14:19, von Hoesslin, Volker wrote:   -7598> 2021-10-04T11:27:17.438+0200 7f529998c700 -1 mds.0.openfiles _load_finish: corrupted header/values: void Anchor::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end  of struct encoding: Malformed input ^^ openfiles

[ceph-users] Re: [External Email] Re: ceph-objectstore-tool core dump

2021-10-04 Thread Dave Hall
I also had a delay on the start of the repair scrub when I was dealing with this issue. I ultimately increased the number of simultaneous scrubs, but I think you could also temporarily disable scrubs and then re-issue the 'pg repair'. (But I'm not one of the experts on this.) My perception is

[ceph-users] Adopting "unmanaged" OSDs into OSD service specification

2021-10-04 Thread David Orman
We have an older cluster which has been iterated on many times. It's always been cephadm deployed, but I am certain the OSD specification used has changed over time. I believe at some point, it may have been 'rm'd. So here's our current state: root@ceph02:/# ceph orch ls osd --export

[ceph-users] Re: Daemon Version Mismatch (But Not Really?) After Deleting/Recreating OSDs

2021-10-04 Thread Edward R Huyer
Apparently the default value for container_image in the cluster configuration is "docker.io/ceph/daemon-base:latest-pacific-devel". I don't know where that came from. I didn't set it anywhere. I'm not allowed to edit it, either (from the dashboard, anyway). The container_image_base for the

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Stefan Kooman
On 10/4/21 15:58, Konstantin Shalygin wrote: On 4 Oct 2021, at 16:38, Stefan Kooman > wrote: What procedure are you following to add the mon? # ceph mon dump epoch 10 fsid 677f4be1-cd98-496d-8b50-1f99df0df670 last_changed 2021-09-11 10:04:23.890922 created 2018-05-18

[ceph-users] Re: Daemon Version Mismatch (But Not Really?) After Deleting/Recreating OSDs

2021-10-04 Thread Gregory Farnum
On Mon, Oct 4, 2021 at 7:57 AM Edward R Huyer wrote: > > Over the summer, I upgraded my cluster from Nautilus to Pacific, and > converted to use cephadm after doing so. Over the past couple weeks, I've > been converting my OSDs to use NVMe drives for db+wal storage. Schedule a > node's worth

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Stefan Kooman
On 10/4/21 15:27, Konstantin Shalygin wrote: Hi, I was make a mkfs for new mon, but mon stuck on probing. On debug I see: fault on lossy channel, failing. This is a bad (lossy) network (crc mismatch)? What procedure are you following to add the mon? Is this physical hardware? Or a (cloned)

[ceph-users] Re: osd marked down

2021-10-04 Thread Abdelillah Asraoui
it is on rook ceph tool container: rook-ceph-tools-5b5bfc786-nptecv /]# ls -l /var/lib/ceph/osd/ceph-3/keyring -rwxrwxrwx. 1 ceph ceph 171 Oct 4 16:56 /var/lib/ceph/osd/ceph-3/keyring thanks! On Mon, Oct 4, 2021 at 12:16 PM Eugen Block wrote: > Did you put that file in the container? > > >

[ceph-users] Re: osd marked down

2021-10-04 Thread Eugen Block
Did you put that file in the container? Zitat von Abdelillah Asraoui : I have create the keyring file: andvar/lib/ceph/osd/ceph-3/keyring and chown to ceph but still getting these error on the osd pod log: k -n rook-ceph logs rook-ceph-osd-3-6497bdc65b-5cvx3 debug

[ceph-users] Re: [External Email] Re: ceph-objectstore-tool core dump

2021-10-04 Thread Michael Thomas
On 10/4/21 11:57 AM, Dave Hall wrote: > I also had a delay on the start of the repair scrub when I was dealing with > this issue. I ultimately increased the number of simultaneous scrubs, but > I think you could also temporarily disable scrubs and then re-issue the 'pg > repair'. (But I'm not

[ceph-users] Re: osd marked down

2021-10-04 Thread Abdelillah Asraoui
I have create the keyring file: andvar/lib/ceph/osd/ceph-3/keyring and chown to ceph but still getting these error on the osd pod log: k -n rook-ceph logs rook-ceph-osd-3-6497bdc65b-5cvx3 debug 2021-10-04T16:06:38.287+ 7f8633cc1f00 -1 auth: unable to find a keyring on

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread Stefan Kooman
On 10/4/21 12:10, von Hoesslin, Volker wrote: Here (see attachment) is a more verbose log output, perhaps someone can see more than I have already mentioned. -7633> 2021-10-04T11:27:17.434+0200 7f529f998700 4 mds.0.purge_queue operator(): data pool 4 not found in OSDMap ^^ What is this

[ceph-users] Re: Multisite reshard stale instances

2021-10-04 Thread Christian Rohmann
On 04/10/2021 12:22, Christian Rohmann wrote: So there is no reason those instances are still kept? How and when are those instances cleared up? Also just like for the other reporters of this issue, in my case most buckets are deleted buckets, but not all of them. I just hope somebody with a

[ceph-users] Re: Erasure coded pool chunk count k

2021-10-04 Thread Etienne Menguy
Hi, It depends of hardware, failure domain, use case, overhead. I don’t see an easy way to chose k and m values. - Etienne Menguy etienne.men...@croit.io > On 4 Oct 2021, at 16:57, Golasowski Martin wrote: > > Hello guys, > how does one estimate number of chunks for erasure coded pool ( k =

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread Stefan Kooman
On 10/4/21 12:10, von Hoesslin, Volker wrote: Here (see attachment) is a more verbose log output, perhaps someone can see more than I have already mentioned. Just checking. Have you done "ceph osd require-osd-release pacific" after upgrading to pacific? (make sure all your OSDs are upgraded

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Konstantin Shalygin
This cluster isn't use cephx. ceph.conf global settings disable it k Sent from my iPhone > On 4 Oct 2021, at 17:46, Stefan Kooman wrote: > > I'm missing the part where keyring is downloaded and used: > > ceph auth get mon. -o /tmp/keyring > ceph mon getmap -o /tmp/monmap > chown -R

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Konstantin Shalygin
After this I see only logs to stderr, what exactly I should looking for? Some grep keyword? k Sent from my iPhone > On 4 Oct 2021, at 17:37, Vladimir Bashkirtsev > wrote: >  > I guess: > > strace ceph-mon -d --id mon2 --setuser ceph --setgroup ceph > > should do. > > > > Try -f

[ceph-users] Erasure coded pool chunk count k

2021-10-04 Thread Golasowski Martin
Hello guys, how does one estimate number of chunks for erasure coded pool ( k = ? ) ? I see that number of m chunks determines the pool’s resiliency, however I did not find clear guideline how to determine k. Red Hat states that they support only the following combinations: k=8, m=3 k=8, m=4

[ceph-users] Daemon Version Mismatch (But Not Really?) After Deleting/Recreating OSDs

2021-10-04 Thread Edward R Huyer
Over the summer, I upgraded my cluster from Nautilus to Pacific, and converted to use cephadm after doing so. Over the past couple weeks, I've been converting my OSDs to use NVMe drives for db+wal storage. Schedule a node's worth of OSDs to be removed, wait for that to happen, delete the PVs

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Vladimir Bashkirtsev
I guess: strace ceph-mon -d --id mon2 --setuser ceph --setgroup ceph should do. Try -f instead of -d if you are overwhelmed with output to get mon debug output to log file. Regards, Vladimir On 5/10/21 01:27, Konstantin Shalygin wrote: On 4 Oct 2021, at 17:07, Vladimir Bashkirtsev

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Konstantin Shalygin
> On 4 Oct 2021, at 17:07, Vladimir Bashkirtsev > wrote: > > This line bothers me: > > [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e4 > 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 > tx=0).handle_read_frame_preamble_main read frame preamble

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Vladimir Bashkirtsev
This line bothers me: [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e4 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) May be it is good idea to run mon

[ceph-users] Re: Can't join new mon - lossy channel, failing

2021-10-04 Thread Konstantin Shalygin
> On 4 Oct 2021, at 16:38, Stefan Kooman wrote: > > What procedure are you following to add the mon? # ceph mon dump epoch 10 fsid 677f4be1-cd98-496d-8b50-1f99df0df670 last_changed 2021-09-11 10:04:23.890922 created 2018-05-18 20:43:43.260897 min_mon_release 14 (nautilus) 0:

[ceph-users] Re: nfs and showmount

2021-10-04 Thread Daniel Gryniewicz
showmount uses the MNT protocol, which is only part of NFSv3. NFSv4 mounts a pseudoroot, under which actual exports are exposed, so the NFSv4 equivalent is to mount /, and then list it. In general, NFSv4 should be used in preference to NFSv3 whenever possible. Daniel On 10/4/21 9:10 AM,

[ceph-users] Can't join new mon - lossy channel, failing

2021-10-04 Thread Konstantin Shalygin
Hi, I was make a mkfs for new mon, but mon stuck on probing. On debug I see: fault on lossy channel, failing. This is a bad (lossy) network (crc mismatch)? 2021-10-04 16:22:24.707 7f5952761700 10 mon.mon2@-1(probing) e10 probing other monitors 2021-10-04 16:22:24.707 7f5952761700 1 --

[ceph-users] Re: nfs and showmount

2021-10-04 Thread Fyodor Ustinov
Hi! Yes. You're right. Ganesha does. But ceph doesn't use all of ganesh's functionality. In the ceph dashboard there is no way to enable nfs3, only nfs4 - Original Message - > From: "Marc" > To: "Fyodor Ustinov" > Cc: "ceph-users" > Sent: Monday, 4 October, 2021 15:33:43 > Subject:

[ceph-users] Re: nfs and showmount

2021-10-04 Thread Marc
Afaik ceph uses nfs-ganesha, and ganesha supports nfs 3 and 4 and other protocols. > Hi! > > I think ceph only supports nsf4? > > > - Original Message - > > Sent: Monday, 4 October, 2021 12:44:38 > > Subject: RE: nfs and showmount > > > I can remember asking the same some time ago. I

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread von Hoesslin, Volker
Von: Stefan Kooman Gesendet: Montag, 4. Oktober 2021 13:24 An: von Hoesslin, Volker; ceph-users@ceph.io Betreff: [ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input Externe E-Mail! Öffnen Sie nur Links oder

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread von Hoesslin, Volker
Von: Stefan Kooman Gesendet: Montag, 4. Oktober 2021 13:57:45 An: von Hoesslin, Volker; ceph-users@ceph.io Betreff: [URL wurde verändert] [ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input Externe E-Mail! Öffnen

[ceph-users] Re: MDS not becoming active after migrating to cephadm

2021-10-04 Thread 胡 玮文
By saying upgrade, I mean upgrade from the non-dockerized 16.2.5 to cephadm version 16.2.6. So I think you need to disable standby-replay and reduce the number of ranks to 1, then stop all the non-dockerized mds, deploy new mds with cephadm. Only scaling back up after you finish the migration.

[ceph-users] Re: Tool to cancel pending backfills

2021-10-04 Thread Peter Lieven
Am 01.10.21 um 16:52 schrieb Josh Baergen: Hi Peter, When I check for circles I found that running the upmap balancer alone never seems to create any kind of circle in the graph By a circle, do you mean something like this? pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1) pg 1.b: 2->3

[ceph-users] Re: MDS not becoming active after migrating to cephadm

2021-10-04 Thread Petr Belyaev
Hi Weiwen, Yes, we did that during the upgrade. In fact, we did that multiple times even after the upgrade to see if it will resolve the issue (disabling hot standby, scaling everything down to a single MDS, swapping it with the new one, scaling back up). The upgrade itself went fine,

[ceph-users] 回复: MDS not becoming active after migrating to cephadm

2021-10-04 Thread 胡 玮文
Hi Petr, Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ for MDS upgrade procedure. In short, when upgrading to 16.2.6, you need to disable standby-replay and reduce the number of ranks to 1. Weiwen Hu 从 Windows 版邮件发送 发件人: Petr

[ceph-users] Re: Multisite reshard stale instances

2021-10-04 Thread Christian Rohmann
Hey there again, On 01/10/2021 17:35, Szabo, Istvan (Agoda) wrote: In my setup I've disabled the sharding and preshard each bucket which needs more then 1.1 millions of objects. I also use 11 shards as default, see my ML post

[ceph-users] Re: nfs and showmount

2021-10-04 Thread Fyodor Ustinov
Hi! I think ceph only supports nsf4? - Original Message - > From: "Marc" > To: "Fyodor Ustinov" , "ceph-users" > Sent: Monday, 4 October, 2021 12:44:38 > Subject: RE: nfs and showmount > I can remember asking the same some time ago. I think it has to do with the > version of nfs you

[ceph-users] MDS not becoming active after migrating to cephadm

2021-10-04 Thread Petr Belyaev
Hi, We’ve recently upgraded from Nautilus to Pacific, and tried moving our services to cephadm/ceph orch. For some reason, MDS nodes deployed through orch never become active (or at least standby-replay). Non-dockerized MDS nodes can still be deployed and work fine. Non-dockerized mds version

[ceph-users] Re: nfs and showmount

2021-10-04 Thread Marc
I can remember asking the same some time ago. I think it has to do with the version of nfs you are using. > -Original Message- > From: Fyodor Ustinov > Sent: Monday, 4 October 2021 11:32 > To: ceph-users > Subject: [ceph-users] nfs and showmount > > Hi! > > As I understand it - the

[ceph-users] nfs and showmount

2021-10-04 Thread Fyodor Ustinov
Hi! As I understand it - the built-in NFS server does not support the command "showmount -e"? WBR, Fyodor. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread Stefan Kooman
On 10/4/21 11:09, von Hoesslin, Volker wrote: hmmm... more and more PGs are broken: :/ at the risk of making a fool of myself, but how do i check what data is in a PG? https://docs.ceph.com/en/pacific/man/8/ceph-objectstore-tool/ i have already done a backup at the beginning using

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-04 Thread von Hoesslin, Volker
hmmm... more and more PGs are broken: # ceph health detail HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline; insufficient standby MDS daemons available; 46 scrub errors; Possible data damage: 32 pgs inconsistent; 2625 daemons have recently