[ceph-users] Re: Ceph standby-replay metadata server: MDS internal heartbeat is not healthy

2020-02-19 Thread Patrick Donnelly
Hi Martin, On Thu, Feb 13, 2020 at 4:10 AM Martin Palma wrote: > > Hi all, > > today we observe that out of the sudden our standby-replay metadata > server continuously writes the following logs: > > 2020-02-13 11:56:50.216102 7fd2ad229700 1 heartbeat_map is_healthy > 'MDSRank' had timed out

[ceph-users] Re: cephfs slow, howto investigate and tune mds configuration?

2020-02-19 Thread Patrick Donnelly
Hello Marc, On Tue, Feb 11, 2020 at 1:41 PM Marc Roos wrote: > > Thanks Samy I will give this a try. > > It would be helpful if there is some value that shows cache misses or > so, so you have a more precise idea with how much you need to increase > the cache. I have now added a couple of GB's

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-19 Thread Sean Matheny
Thanks for all of the helpful suggestions. We’re back up and running. We successfully re-created the monitor, and re-imported the keys. With cro.it’s help, we turned on the OSD Daemons, and things came right relatively smoothly (a few inactive/incomplete pgs, and a few expected small

[ceph-users] Re: MDS: obscene buffer_anon memory use when scanning lots of files

2020-02-19 Thread John Madden
Ah, no, I hadn't seen that. Patiently awaiting .8 then. Thanks! On Mon, Feb 17, 2020 at 8:52 AM Dan van der Ster wrote: > > On Mon, Feb 10, 2020 at 8:31 PM John Madden wrote: > > > > Upgraded to 14.2.7, doesn't appear to have affected the behavior. As > > requested: > > In case it wasn't clear

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Mark Nelson
Alternately the use case is HPC scratch space where you can regenerate the data by re-running jobs and speed/capacity are more important than long term storage, you might consider 2x replication with min_size 1.  That probably falls under the "not caring about your data" use case though. ;)

[ceph-users] Re: slow using ISCSI - Help-me

2020-02-19 Thread Mike Christie
On 02/16/2020 04:51 AM, Gesiel Galvão Bernardes wrote: > > > Em sex., 14 de fev. de 2020 às 13:25, Mike Christie > escreveu: > > On 02/13/2020 08:52 PM, Gesiel Galvão Bernardes wrote: > > Hi > > > > Em dom., 9 de fev. de 2020 às 18:27, Mike Christie

[ceph-users] Re: bluestore compression questions

2020-02-19 Thread Andras Pataki
Hi Igor, Thanks for the insightful details on how to interpret the compression data.  I'm still a bit confused about why compression doesn't work better in my case, so I've decided to try a test.  I created 16 GiB cephfs file which is just a repeat of 4 characters 'abcd' essentially 4

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Dan van der Ster
And btw EC with k=2, m=2, min_size=3, is also probably fine, and has only x2 space cost. -- dan On Wed, Feb 19, 2020 at 5:46 PM Paul Emmerich wrote: > > x2 replication is perfectly fine as long as you also keep min_size at 2 ;) > > (But that means you're offline as soon as something is offline)

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Paul Emmerich
x2 replication is perfectly fine as long as you also keep min_size at 2 ;) (But that means you're offline as soon as something is offline) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Wido den Hollander
On 2/19/20 3:17 PM, Frank R wrote: > Hi all, > > I have noticed that RedHat is willing to support 2x replication with > NVME drives. Additionally, I have seen CERN presentation where they > use a 2x replication with NVME for a hyperconverged/HPC/CephFS > solution. > Don't do this if you care

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Frank R
Thanks for clearing that up. On Wed, Feb 19, 2020 at 9:47 AM Dan van der Ster wrote: > > Hi, > > 2x replication was for a performance test. We use 3x in production. > > -- dan > > On Wed, Feb 19, 2020 at 3:18 PM Frank R wrote: > > > > Hi all, > > > > I have noticed that RedHat is willing to

[ceph-users] Re: ceph nvme 2x replication

2020-02-19 Thread Dan van der Ster
Hi, 2x replication was for a performance test. We use 3x in production. -- dan On Wed, Feb 19, 2020 at 3:18 PM Frank R wrote: > > Hi all, > > I have noticed that RedHat is willing to support 2x replication with > NVME drives. Additionally, I have seen CERN presentation where they > use a 2x

[ceph-users] ceph nvme 2x replication

2020-02-19 Thread Frank R
Hi all, I have noticed that RedHat is willing to support 2x replication with NVME drives. Additionally, I have seen CERN presentation where they use a 2x replication with NVME for a hyperconverged/HPC/CephFS solution. I would like to hear some opinions on whether this is really a good idea for

[ceph-users] Re: Migrating/Realocating ceph cluster

2020-02-19 Thread Marc Roos
I think it will be easier. You just have to check if the latency is going to be an issue. And if you have enough space maybe increase the replication, so you can move more nodes at once? -Original Message- From: Rafa Wdoowski [mailto:rwadolow...@cloudferro.com] Sent: 19 February 2020

[ceph-users] Re: Migrating/Realocating ceph cluster

2020-02-19 Thread Rafał Wądołowski
Yeah, I saw your thread, the problem is more complicated due to size of the cluster... I'm trying to figure out the best solution, which will minimize the downtime and migration time. Best Regards, Rafał Wądołowski On 19.02.2020 14:23, Marc Roos wrote: > I asked the same not so long ago, check

[ceph-users] Re: Migrating/Realocating ceph cluster

2020-02-19 Thread Marc Roos
I asked the same not so long ago, check archive, quite usefull replies. -Original Message- Sent: 19 February 2020 14:20 To: ceph-users@ceph.io Subject: [ceph-users] Migrating/Realocating ceph cluster Hi, I am looking for a good way of migrating/realocating ceph cluster. It has about

[ceph-users] Migrating/Realocating ceph cluster

2020-02-19 Thread Rafał Wądołowski
Hi, I am looking for a good way of migrating/realocating ceph cluster. It has about 2PB net, mainly RBD, but object storage is also used. The new location is far away about 1,500 kilometers. Of course I have to minimize the downtime of the cluster :) Right now I see following scenarios: 1.

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-19 Thread Wido den Hollander
On 2/19/20 10:11 AM, Paul Emmerich wrote: > On Wed, Feb 19, 2020 at 10:03 AM Wido den Hollander wrote: >> >> >> >> On 2/19/20 8:49 AM, Sean Matheny wrote: >>> Thanks, >>> If the OSDs have a newer epoch of the OSDMap than the MON it won't work. >>> >>> How can I verify this? (i.e the epoch

[ceph-users] Re: Pool on limited number of OSDs

2020-02-19 Thread Jacek Suchenia
Ok, I found an issue I changed a class when OSD was reweighted then weight for this OSD in this class was different that default (current) one And yes - significantly different size caused this problem with degraded and undersized Thanks Janne and Wido for help Jacek śr., 19 lut 2020 o 10:00

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-19 Thread Paul Emmerich
On Wed, Feb 19, 2020 at 10:03 AM Wido den Hollander wrote: > > > > On 2/19/20 8:49 AM, Sean Matheny wrote: > > Thanks, > > > >> If the OSDs have a newer epoch of the OSDMap than the MON it won't work. > > > > How can I verify this? (i.e the epoch of the monitor vs the epoch of the > > osd(s)) > >

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-19 Thread Wido den Hollander
On 2/19/20 9:34 AM, Paul Emmerich wrote: > On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander wrote: >> >> >> >> On 2/18/20 6:54 PM, Paul Emmerich wrote: >>> I've also seen this problem on Nautilus with no obvious reason for the >>> slowness once. >> >> Did this resolve itself? Or did you

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-19 Thread Wido den Hollander
On 2/19/20 9:21 AM, Dan van der Ster wrote: > On Wed, Feb 19, 2020 at 7:29 AM Wido den Hollander wrote: >> >> >> >> On 2/18/20 6:54 PM, Paul Emmerich wrote: >>> I've also seen this problem on Nautilus with no obvious reason for the >>> slowness once. >> >> Did this resolve itself? Or did you

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-19 Thread Wido den Hollander
On 2/19/20 8:49 AM, Sean Matheny wrote: > Thanks, > >> If the OSDs have a newer epoch of the OSDMap than the MON it won't work. > > How can I verify this? (i.e the epoch of the monitor vs the epoch of the > osd(s)) > Check the status of the OSDs: $ ceph daemon osd.X status This should tell

[ceph-users] Re: Pool on limited number of OSDs

2020-02-19 Thread Jacek Suchenia
Janne Thanks for good spot however all of them are 3.53830, that change was left after some tests to kick CRUSH algorithm Jacek śr., 19 lut 2020 o 09:47 Janne Johansson napisał(a): > Den ons 19 feb. 2020 kl 09:42 skrev Jacek Suchenia < > jacek.suche...@gmail.com>: > >> Hello Wido >> >> Sure,

[ceph-users] Re: Pool on limited number of OSDs

2020-02-19 Thread Janne Johansson
Den ons 19 feb. 2020 kl 09:42 skrev Jacek Suchenia : > Hello Wido > > Sure, here is a rule: > -15s3 3.53830 host kw01sv09.sr1.cr1.lab1~s3 > 11s3 3.53830 osd.11 > -17s3 3.53830 host kw01sv10.sr1.cr1.lab1~s3 > 10s3 3.53830

[ceph-users] Re: Pool on limited number of OSDs

2020-02-19 Thread Jacek Suchenia
Hello Wido Sure, here is a rule: ceph osd crush rule dump s3_rule { "rule_id": 1, "rule_name": "s3_rule", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -21, "item_name":

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-19 Thread Paul Emmerich
On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander wrote: > > > > On 2/18/20 6:54 PM, Paul Emmerich wrote: > > I've also seen this problem on Nautilus with no obvious reason for the > > slowness once. > > Did this resolve itself? Or did you remove the pool? I've seen this twice on the same

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-19 Thread Dan van der Ster
On Wed, Feb 19, 2020 at 7:29 AM Wido den Hollander wrote: > > > > On 2/18/20 6:54 PM, Paul Emmerich wrote: > > I've also seen this problem on Nautilus with no obvious reason for the > > slowness once. > > Did this resolve itself? Or did you remove the pool? > > > In my case it was a rather old