[ceph-users] Re: Remapping OSDs under a PG

2021-05-27 Thread 胡 玮文
在 2021年5月28日,08:18,Jeremy Hansen 写道:  I’m very new to Ceph so if this question makes no sense, I apologize. Continuing to study but I thought an answer to this question would help me understand Ceph a bit more. Using cephadm, I set up a cluster. Cephadm automatically creates a pool for

[ceph-users] Remapping OSDs under a PG

2021-05-27 Thread Jeremy Hansen
I’m very new to Ceph so if this question makes no sense, I apologize. Continuing to study but I thought an answer to this question would help me understand Ceph a bit more. Using cephadm, I set up a cluster. Cephadm automatically creates a pool for Ceph metrics. It looks like one of my ssd

[ceph-users] Messed up placement of MDS

2021-05-27 Thread mabi
Hello, I am trying to place the two MDS daemons for CephFS on dedicated nodes. For that purpose I tried out a few different "cephadm orch apply ..." commands with a label but at the end it looks like I messed up with the placement as I now have two mds service_types as you can see below: #

[ceph-users] cephfs auditing

2021-05-27 Thread Michael Thomas
Is there a way to log or track which cephfs files are being accessed? This would help us in planning where to place certain datasets based on popularity, eg on a EC HDD pool or a replicated SSD pool. I know I can run inotify on the ceph clients, but I was hoping that the MDS would have a way

[ceph-users] XFS on RBD on EC painfully slow

2021-05-27 Thread Reed Dier
Hoping someone may be able to help point out where my bottleneck(s) may be. I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of that. This was not an ideal scenario, rather it was a rescue mission to dump a large, aging raid array before it was too late, so I'm working

[ceph-users] Re: Ceph osd will not start.

2021-05-27 Thread Peter Childs
In the end it looks like I might be able to get the node up to about 30 odds before it stops creating any more. Or more it formats the disks but freezes up starting the daemons. I suspect I'm missing somthing I can tune to get it working better. If I could see any error messages that might

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
It works again but I had to do after a start/stop of the OSD on an admin node: # ceph orch daemon stop osd.2 # ceph orch daemon start tosd.2 What an adventure, thanks again so much for your help! ‐‐‐ Original Message ‐‐‐ On Thursday, May 27, 2021 3:37 PM, Eugen Block wrote: > That

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
Nicely spotted about the missing file, it looks like I have the same case as you can see below from the syslog: May 27 15:33:12 ceph1f systemd[1]: ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Scheduled restart job, restart counter is at 1. May 27 15:33:12 ceph1f systemd[1]: Stopped

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Rok Jaklič
16.2.4 for some when starting ods with systemctl on this "renewed" host, did not start osds after a while, but when doing it through console manually, it did. Thank anyway. On Thu, 27 May 2021, 16:31 Eugen Block, wrote: > Yes, if your pool requires 5 chunks and you only have 5 hosts (with >

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
I managed to remove that wrongly created cluster on the node running: sudo cephadm rm-cluster --fsid 91a86f20-8083-40b1-8bf1-fe35fac3d677 --force So I am getting closed but the osd.2 service on that node simply does not want to start as you can see below: # ceph orch daemon start osd.2

[ceph-users] Re: MDS stuck in up:stopping state

2021-05-27 Thread Martin Rasmus Lundquist Hansen
Hi Weiwen, Amazing, that actually worked. So simple, thanks! Fra: 胡 玮文 Sendt: 27. maj 2021 09:02 Til: Martin Rasmus Lundquist Hansen ; ceph-users@ceph.io Emne: 回复: MDS stuck in up:stopping state Hi Martin, You may hit

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
I am trying to run "cephadm shell" on that newly installed OSD node and it seems that I have now unfortunately configured a new cluster ID as it shows: ubuntu@ceph1f:~$ sudo cephadm shell ERROR: Cannot infer an fsid, one must be specified: ['8d47792c-987d-11eb-9bb6-a5302e00e1fa',

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-27 Thread Dan van der Ster
Hi Fulvio, I suggest removing only the upmaps which are clearly incorrect, and then see if the upmap balancer re-creates them. Perhaps they were created when they were not incorrect, when you had a different crush rule? Or perhaps you're running an old version of ceph which had buggy balancer

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-27 Thread Fulvio Galeazzi
Hallo Dan, Nathan, thanks for your replies and apologies for my silence. Sorry I had made a typo... the rule is really 6+4. And to reply to Nathan's message, the rule was built like this in anticipation of getting additional servers, at which point in time I will relax the "2 chunks per

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
You are right, I used the FSID of the OSD and not of the cluster in the deploy command. So now I tried again with the cluster ID as FSID but still it does not work as you can see below: ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid 8d47792c-987d-11eb-9bb6-a5302e00e1fa Deploy daemon

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Eugen Block
Yes, if your pool requires 5 chunks and you only have 5 hosts (with failure domain host) your PGs become undersized when a host fails and won't recover until the OSDs come back. Which ceph version is this? Zitat von Rok Jaklič : For this pool I have set EC 3+2 (so in total I have 5 nodes)

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread mabi
Hi Eugen, What a good coincidence ;-) So I ran "cephadm ceph-volume lvm list" on the OSD node which I re-instaled and it saw my osd.2 OSD. So far so good, but the following suggested command does not work as you can see below: ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad
On 27.05.2021 11:53, Eugen Block wrote: This test was on ceph version 15.2.8. On Pacific (ceph version 16.2.4) this also works for me for initial deployment of an entire host: +-+-+--+--+--+-+ |SERVICE |NAME |HOST |DATA |DB

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Rok Jaklič
For this pool I have set EC 3+2 (so in total I have 5 nodes) which one was temporarily removed, but maybe this was the problem? On Thu, May 27, 2021 at 3:51 PM Rok Jaklič wrote: > Hi, thanks for quick reply > > root@ctplmon1:~# ceph pg dump pgs_brief | grep undersized > dumped pgs_brief > 9.5

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Rok Jaklič
Hi, thanks for quick reply root@ctplmon1:~# ceph pg dump pgs_brief | grep undersized dumped pgs_brief 9.5 active+undersized+degraded [72,85,54,120,2147483647] 72 [72,85,54,120,2147483647] 72 9.6 active+undersized+degraded [101,47,113,74,2147483647] 101

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
That file is in the regular filesystem, you can copy it from a different osd directory, it just a minimal ceph.conf. The directory for the failing osd should now be present after the failed attempts. Zitat von mabi : Nicely spotted about the missing file, it looks like I have the same

[ceph-users] Re: rebalancing after node more

2021-05-27 Thread Eugen Block
Hi, this sounds like your crush rule(s) for one or more pools can't place the PGs because the host is missing. Please share ceph pg dump pgs_brief | grep undersized ceph osd tree ceph osd pool ls detail and the crush rule(s) for the affected pool(s). Zitat von Rok Jaklič : Hi, I have

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
Can you try with both cluster and osd fsid? Something like this: pacific2:~ # cephadm deploy --name osd.2 --fsid acbb46d6-bde3-11eb-9cf2-fa163ebb2a74 --osd-fsid bc241cd4-e284-4c5a-aad2-5744632fc7fc I tried to reproduce a similar scenario and found a missing config file in the osd

[ceph-users] Re: MDS stuck in up:stopping state

2021-05-27 Thread 胡 玮文
> 在 2021年5月27日,19:11,Mark Schouten 写道: > > On Thu, May 27, 2021 at 12:38:07PM +0200, Mark Schouten wrote: >>> On Thu, May 27, 2021 at 06:25:44AM +, Martin Rasmus Lundquist Hansen >>> wrote: >>> After scaling the number of MDS daemons down, we now have a daemon stuck in >>> the >>>

[ceph-users] Re: cephfs:: store files on different pools?

2021-05-27 Thread Dietmar Rieder
On 5/27/21 2:33 PM, Adrian Sevcenco wrote: Hi! is is (technically) possible to instruct cephfs to store files < 1Mib on a (replicate) pool and the others files on another (ec) pool? And even more, is it possible to take the same kind of decision on the path of the file? (let's say that

[ceph-users] cephfs:: store files on different pools?

2021-05-27 Thread Adrian Sevcenco
Hi! is is (technically) possible to instruct cephfs to store files < 1Mib on a (replicate) pool and the others files on another (ec) pool? And even more, is it possible to take the same kind of decision on the path of the file? (let's say that critical files with names like

[ceph-users] Python lib usage access permissions

2021-05-27 Thread Szabo, Istvan (Agoda)
Hi, Is there a way to be able to manage specific pools with the python lib without admin keyring? Not sure why it only works with admin keyring but not with the client keyring :/ Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co.,

[ceph-users] Re: MDS stuck in up:stopping state

2021-05-27 Thread Mark Schouten
On Thu, May 27, 2021 at 12:38:07PM +0200, Mark Schouten wrote: > On Thu, May 27, 2021 at 06:25:44AM +, Martin Rasmus Lundquist Hansen > wrote: > > After scaling the number of MDS daemons down, we now have a daemon stuck in > > the > > "up:stopping" state. The documentation says it can take

[ceph-users] Re: MDS stuck in up:stopping state

2021-05-27 Thread Mark Schouten
On Thu, May 27, 2021 at 06:25:44AM +, Martin Rasmus Lundquist Hansen wrote: > After scaling the number of MDS daemons down, we now have a daemon stuck in > the > "up:stopping" state. The documentation says it can take several minutes to > stop the > daemon, but it has been stuck in this

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid 91a86f20-8083-40b1-8bf1-fe35fac3d677 Deploy daemon osd.2 ... Which fsid is it, the cluster's or the OSD's? According to the 'cephadm deploy' help page it should be the cluster fsid. Zitat von mabi : Hi Eugen, What a good

[ceph-users] MDS stuck in up:stopping state

2021-05-27 Thread Martin Rasmus Lundquist Hansen
After scaling the number of MDS daemons down, we now have a daemon stuck in the "up:stopping" state. The documentation says it can take several minutes to stop the daemon, but it has been stuck in this state for almost a full day. According to the "ceph fs status" output attached below, it still

[ceph-users] Re: How to add back stray OSD daemon after node re-installation

2021-05-27 Thread Eugen Block
Hi, I posted a link to the docs [1], [2] just yesterday ;-) You should see the respective OSD in the output of 'cephadm ceph-volume lvm list' on that node. You should then be able to get it back to cephadm with cephadm deploy --name osd.x But I haven't tried this yet myself, so please

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
This test was on ceph version 15.2.8. On Pacific (ceph version 16.2.4) this also works for me for initial deployment of an entire host: +-+-+--+--+--+-+ |SERVICE |NAME |HOST |DATA |DB|WAL |

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
Thank you very much, very good explanation!! El 27/5/21 a las 9:42, Dan van der Ster escribió: etween 100-200 -- *** Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad
On 27.05.2021 11:17, Eugen Block wrote: That's not how it's supposed to work. I tried the same on an Octopus cluster and removed all filters except: data_devices: rotational: 1 db_devices: rotational: 0 My Octopus test osd nodes have two HDDs and one SSD, I removed all OSDs and redeployed

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
That's not how it's supposed to work. I tried the same on an Octopus cluster and removed all filters except: data_devices: rotational: 1 db_devices: rotational: 0 My Octopus test osd nodes have two HDDs and one SSD, I removed all OSDs and redeployed on one node. This spec file results

[ceph-users] Re: [Spam] �ظ�: MDS stuck in up:stopping state

2021-05-27 Thread Mark Schouten
On Thu, May 27, 2021 at 10:37:33AM +0200, Mark Schouten wrote: > On Thu, May 27, 2021 at 07:02:16AM +, 胡 玮文 wrote: > > You may hit https://tracker.ceph.com/issues/50112, which we failed to find > > the root cause yet. I resolved this by restart rank 0. (I have only 2 > > active MDSs) > > I

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Eugen Block
Hi, The VG has 357.74GB of free space of total 5.24TB so I did actually tried different values like "30G:", "30G", "300G:", "300G", "357G". I also tied some crazy high numbers and some ranges, but don't remember the values. But none of them worked. the size parameter is filtering the disk

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad
On 26.05.2021 22:14, David Orman wrote: We've found that after doing the osd rm, you can use: "ceph-volume lvm zap --osd-id 178 --destroy" on the server with that OSD as per: https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/#removing-devices and it will clean things up so they work as

[ceph-users] Re: [Spam] �ظ�: MDS stuck in up:stopping state

2021-05-27 Thread Mark Schouten
On Thu, May 27, 2021 at 07:02:16AM +, 胡 玮文 wrote: > You may hit https://tracker.ceph.com/issues/50112, which we failed to find > the root cause yet. I resolved this by restart rank 0. (I have only 2 active > MDSs) I have this exact issue while trying to upgrade from 12.2 (which is pending

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad
On 26.05.2021 18:12, Eugen Block wrote: Could you share the output of lsblk -o name,rota,size,type from the affected osd node? # lsblk -o name,rota,size,type NAME ROTA SIZE

[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?

2021-05-27 Thread Boris Behrens
Am Do., 27. Mai 2021 um 07:47 Uhr schrieb Janne Johansson : > > Den ons 26 maj 2021 kl 16:33 skrev Boris Behrens : > > > > Hi Janne, > > do you know if there can be data duplication which leads to orphan objects? > > > > I am currently huntin strange errors (there is a lot more data in the > >

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Dan van der Ster
I don't think # clients alone is a good measure by which to decide to deploy multiple MDSs -- idle clients create very little load, but just a few badly behaving clients can use all the MDS performance. (If you must hear a number, I can share that we have single MDSs with 2-3000 clients

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
Oh, very interesting!! I have reduced the number of MDS to one. Only one question more, out of curiosity, from what number can we consider that there are many clients? El 27/5/21 a las 9:24, Dan van der Ster escribió: On Thu, May 27, 2021 at 9:21 AM Andres Rojas Guerrero wrote: El

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Dan van der Ster
On Thu, May 27, 2021 at 9:21 AM Andres Rojas Guerrero wrote: > > > > El 26/5/21 a las 16:51, Dan van der Ster escribió: > > I see you have two active MDSs. Is your cluster more stable if you use > > only one single active MDS? > > Good question!! I read form Ceph Doc: > > "You should configure

[ceph-users] Re: MDS cache tunning

2021-05-27 Thread Andres Rojas Guerrero
El 26/5/21 a las 16:51, Dan van der Ster escribió: I see you have two active MDSs. Is your cluster more stable if you use only one single active MDS? Good question!! I read form Ceph Doc: "You should configure multiple active MDS daemons when your metadata performance is bottlenecked on

[ceph-users] 回复: MDS stuck in up:stopping state

2021-05-27 Thread 胡 玮文
Hi Martin, You may hit https://tracker.ceph.com/issues/50112, which we failed to find the root cause yet. I resolved this by restart rank 0. (I have only 2 active MDSs) Weiwen Hu 发送自 Windows 10 版邮件应用 发件人: Martin Rasmus Lundquist