[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
Dear Igor, Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be able to remove the db (included the wall) from the nvme even if it is spilledover? I can’t compact back to normal many disk to not show spillover warning. I think Christian has the truth of the issue, my

[ceph-users] Re: osd_memory_target=level0 ?

2021-10-01 Thread Christian Wuerdig
I don't have much experience in recovering struggling EC pools unfortunately. Looks like it can't find OSDs for 2 out of the 6 shards. Since you run EC 4+2 the data isn't lost but not 100% sure how to make it healthy. There was a thread a while back that had some similar issue albeit possibly

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov
Hi Istvan, yeah both db and wal to slow migration are supported. And spillover state isn't a show stopper for that. On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote: Dear Igor, Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be able to remove the db (included

[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Szabo, Istvan (Agoda)
In my setup I've disabled the sharding and preshard each bucket which needs more then 1.1 millions of objects. I don't think it's possible to cleanup, even if you run the command with the really-really mean it, it will not do anything, I've tried already. Istvan Szabo Senior Infrastructure

[ceph-users] Re: Trying to understand what overlapped roots means in pg_autoscale's scale-down mode

2021-10-01 Thread Gregory Farnum
On Fri, Oct 1, 2021 at 11:55 AM Andrew Gunnerson wrote: > > Thanks for the info. Do shadow roots affect that calculation at all? They're not supposed to, but I haven't worked in this code...but the different names you get (with the ~ssd and ~hdd postfix) would indicate not. > > In the regular

[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Szabo, Istvan (Agoda)
I just left it and I stopped using synchronous multisite replication. I'm only using directional for a while which is working properly. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: Trying to understand what overlapped roots means in pg_autoscale's scale-down mode

2021-10-01 Thread Andrew Gunnerson
Thanks for the info. Do shadow roots affect that calculation at all? In the regular "ceph osd crush tree", I don't see any cycles and it doesn't seem like there are the same buckets under two roots (I only have one root): root=default rack=rack1 host=ssd-1

[ceph-users] Re: Tool to cancel pending backfills

2021-10-01 Thread Josh Baergen
Hi Peter, > When I check for circles I found that running the upmap balancer alone never > seems to create > any kind of circle in the graph By a circle, do you mean something like this? pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1) pg 1.b: 2->3 pg 1.c: 3->1 If so, then it's not

[ceph-users] Re: Trying to understand what overlapped roots means in pg_autoscale's scale-down mode

2021-10-01 Thread Gregory Farnum
It generally means, in CS terms, that you have a graph rather than a tree. In other words, you have two roots, or other crush buckets, which contain some of the same buckets/items underneath themselves. On Fri, Oct 1, 2021 at 9:43 AM Harry G. Coin wrote: > > I asked as well, it seems nobody on

[ceph-users] Re: urgent question about rdb mirror

2021-10-01 Thread Ignazio Cassano
Hello, you are right, I must explain better and I have to be more patient. I have to install new clusters so I think I will use the last ceph version. As far as object storage replication (if I understood well) there aren't problems with 3 separated cluster. As far as rbd-mirroring (in my case

[ceph-users] Re: Trying to understand what overlapped roots means in pg_autoscale's scale-down mode

2021-10-01 Thread Harry G. Coin
I asked as well, it seems nobody on the list knows so far. On 9/30/21 10:34 AM, Andrew Gunnerson wrote: Hello, I'm trying to figure out what overlapping roots entails with the default scale-down autoscaling profile in Ceph Pacific. My test setup involves a CRUSH map that looks like:

[ceph-users] Re: urgent question about rdb mirror

2021-10-01 Thread DHilsbos
Ignazio; If your first attempt at asking a question results in no responses, you might consider why, before reposting. I don't use RBD mirroring, so I can only supply theoretical information. Googling RBD mirroring (for me) results in the below as the first result:

[ceph-users] Re: dealing with unfound pg in 4:2 ec pool

2021-10-01 Thread Szabo, Istvan (Agoda)
Now the rebalance started to continue but I always have 2 pg which is in degraded state, what is weird that the up and acting osds are totally different :/ PG_STAT STATE UP UP_PRIMARY ACTING

[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Christian Rohmann
On 01/10/2021 17:00, Szabo, Istvan (Agoda) wrote: I just left it and I stopped using synchronous multisite replication. I'm only using directional for a while which is working properly. So you did setup a sync policy to only sync in one direction? In my setup the secondary site does not

[ceph-users] Re: dealing with unfound pg in 4:2 ec pool

2021-10-01 Thread Szabo, Istvan (Agoda)
Marked as unfound, 2 of them shadow file, 1 is multipart trash, 1 was a normal file but have copy, to be honest when I touch the min_size it will degrade pg for a short period of time and not sure what it would cause in a not healthy state. Let's see how far I can go. Istvan Szabo Senior

[ceph-users] shards falling behind on multisite metadata sync

2021-10-01 Thread Boris Behrens
Hi, does someone got a quick fix for falling behin shards in the metadata sync? I can do a radosgw-admin metadata sync init and restart the rgw daemons to get a full sync, but after a day the first shards falls behind, and after two days I also get the message with "oldest incremental change not

[ceph-users] Re: Multisite reshard stale instances

2021-10-01 Thread Christian Rohmann
Hey Istvan, On 05/02/2021 03:00, Szabo, Istvan (Agoda) wrote: I found 6-700 stale instances with the reshard stale instances list command. Is there a way to clean it up (or actually should I clean it up)? The stale instance rm doesn't work in multisite. I observe a similar issue with some

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
3x SSD osd /nvme Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message- From: Igor Fedotov Sent: Friday, October 1, 2021

[ceph-users] Re: Tool to cancel pending backfills

2021-10-01 Thread Peter Lieven
Am 27.09.21 um 22:38 schrieb Josh Baergen: >> I have a question regarding the last step. It seems to me that the ceph >> balancer is not able to remove the upmaps >> created by pgremapper, but instead creates new upmaps to balance the pgs >> among osds. > The balancer will prefer to remove

[ceph-users] cephfs could not lock

2021-10-01 Thread nORKy
Hi, We need to use rrdtool on a cephfs mount. But we get this error "could not lock RRD". Could you help me ? Thanks 'Joffrey ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread von Hoesslin, Volker
here a copy of pastbin. so there are no similar issues out there in the deep internet and so i cant say the root cause: Sep 30 16:24:47 pve04 systemd[1]: Started Ceph metadata server daemon. Sep 30 16:24:47 pve04 ceph-mds[331479]: starting mds.pve04 at Sep 30 16:24:47 pve04 ceph-mds[331479]:

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread Stefan Kooman
On 10/1/21 07:07, von Hoesslin, Volker wrote: hi! my cephfs is broken and i can not recover the mds-daemons. yesterday i have update my ceph-cluster from v15 to v16 and i thought all working fine. Do you use cephadm? There was (is?) an issue with the way cephadm upgrades the MDS, see the

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread Eugen Block
I can't access the pastebin, did you verify if you hit the same issue as Stefan referenced (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/)? Before deleting or rebuilding anything I would first check what the root cause is. As Stefan said,

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Arthur Outhenin-Chalandre
Hi Ignazio, On 10/1/21 14:16, Ignazio Cassano wrote: I meant : Cluster B  pool name pool-clutser-B mirrored on Cluster A Cluster C pool name pool-clutser-C mirrored on Cluster A To me this should be feasible. As I said earlier, I never tested such a setup though. So on cluster A I should

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Ignazio Cassano
Hello Arthur, I meant : Cluster B pool name pool-clutser-B mirrored on Cluster A Cluster C pool name pool-clutser-C mirrored on Cluster A So on cluster A I should have two rbd-mirror daemons Ignazio Il giorno ven 1 ott 2021 alle ore 13:35 Arthur Outhenin-Chalandre <

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread von Hoesslin, Volker
is there any chance to fix this? there are some "advanced metadata repair tools" (https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/) but i'm not realy sure is it the right way to handle this issue? i have created an "backup" bevor any tries with this command:

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Arthur Outhenin-Chalandre
Hi, On 10/1/21 11:25, Eugen Block wrote: I don't know for sure but I believe you can have only one rbd mirror daemon per cluster. So you can either configure one-way or two-way mirroring between two clusters. With your example the third cluster would then require two mirror daemons which is not

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
I have my dashboards and I can see that the db nvmes are always running on 100% utilization (you can monitor with iostat -x 1) and it generates all the time iowaits which is between 1-3. I’m using nvme in front of the ssds. Istvan Szabo Senior Infrastructure Engineer

[ceph-users] Re: Failing to mount PVCs

2021-10-01 Thread Eugen Block
Hi, I'm not entirely sure if this really is the same issue here. One of our customers also works with k8s in openstack and I saw similar messages. We never investigated it, I don't know if the customer did, but one thing they encountered was that k8s didn't properly clean up

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread von Hoesslin, Volker
no, aside the update i had not input this command: ceph fs set allow_standby_replay false just only the lines from the proxmox update path -> https://pve.proxmox.com/wiki/Ceph_Octopus_to_Pacific#Upgrade_all_CephFS_MDS_daemons Von: Stefan Kooman Gesendet:

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread Stefan Kooman
On 10/1/21 11:48, von Hoesslin, Volker wrote: i can not see what i have done wrong, this are my update steps: Have you set allow_standby_replay to false? Have you checked there were no more standby-replay daemons (ceph fs dump)? Gr. Stefan ___

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-01 Thread von Hoesslin, Volker
i can not see what i have done wrong, this are my update steps: https://pve.proxmox.com/wiki/Ceph_Octopus_to_Pacific#Upgrade_all_CephFS_MDS_daemons Von: Stefan Kooman Gesendet: Freitag, 1. Oktober 2021 11:00:10 An: von Hoesslin, Volker; ceph-users@ceph.io

[ceph-users] Re: S3 Bucket Notification requirement

2021-10-01 Thread Sanjeev Jha
Thanks very much Yuval for your confirmation. I have set the signature to v2 on the v1 aws-cli? [ansibleuser@ceprgw01 z_ejbg]$ python3 topic_with_endpoint.py mytopic {'TopicArn': 'arn:aws:sns:poc:app1:mytopic', 'ResponseMetadata': {'RequestId':

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov
And how many OSDs are per single NVMe do you have? On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote: I have my dashboards and I can see that the db nvmes are always running on 100% utilization (you can monitor with iostat -x 1) and it generates all the time iowaits which is between 1-3.

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Ignazio Cassano
Thanks Ignazio Il giorno ven 1 ott 2021 alle ore 11:26 Eugen Block ha scritto: > Hi, > > I don't know for sure but I believe you can have only one rbd mirror > daemon per cluster. So you can either configure one-way or two-way > mirroring between two clusters. With your example the third

[ceph-users] Re: Rbd mirror

2021-10-01 Thread Eugen Block
Hi, I don't know for sure but I believe you can have only one rbd mirror daemon per cluster. So you can either configure one-way or two-way mirroring between two clusters. With your example the third cluster would then require two mirror daemons which is not possible AFAIK. I can't tell

[ceph-users] Re: S3 Bucket Notification requirement

2021-10-01 Thread Sanjeev Jha
Hi Yuval, Thanks, after using aws-cli v1, no signature related issue occurs. However, I can still see the issue related to sns topic creation. [ansibleuser@ceprgw02 z_ejbg]$ aws --endpoint-url http://objects.dev.xx.xx.xx:80 sns create-topic --name=mytopic --attributes='{"push-endpoint":

[ceph-users] Re: dealing with unfound pg in 4:2 ec pool

2021-10-01 Thread Eugen Block
Hi, I'm not sure if setting min_size to 4 would also fix the PGs, but the client IO would probably be restored. Marking it as lost is the last straw according to this list, luckily I haven't been in such a situation yet. So give it a try with min_size = 4 but don't forget to increase

[ceph-users] Re: osd marked down

2021-10-01 Thread Eugen Block
I'm not sure if anything else could break, but since the OSD isn't starting anyway... I guess you could delete osd.3 from ceph auth: ceph auth del osd.3 And then recreate it with: ceph auth get-or-create osd.3 mon 'allow profile osd' osd 'allow *' mgr 'allow profile osd' [osd.3]

[ceph-users] urgent question about rdb mirror

2021-10-01 Thread Ignazio Cassano
Hello All, Please I would like to know if it is possibile two clusters can mirror rbd to a third cluster. In other words I have 3 separated ceph cluster : A B C. I would like cluster A and cluster B can mirror some pools on cluster C. Is it possible ? Thanks