Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Sangwhan Moon
Original Message: > > > On 7/25/19 7:49 AM, Sangwhan Moon wrote: > > Hello, > > > > Original Message: > >> > >> > >> On 7/25/19 6:49 AM, Sangwhan Moon wrote: > >>> Hello, > >>> > >>> I've inherited a Ceph cluster from someone who has left zero > >>> documentation or any handover. A couple days

Re: [ceph-users] Anybody using 4x (size=4) replication?

2019-07-24 Thread Janne Johansson
Den ons 24 juli 2019 kl 21:48 skrev Wido den Hollander : > Right now I'm just trying to find a clever solution to this. It's a 2k > OSD cluster and the likelihood of an host or OSD crashing is reasonable > while you are performing maintenance on a different host. > > All kinds of things have cross

Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Wido den Hollander
On 7/25/19 7:49 AM, Sangwhan Moon wrote: > Hello, > > Original Message: >> >> >> On 7/25/19 6:49 AM, Sangwhan Moon wrote: >>> Hello, >>> >>> I've inherited a Ceph cluster from someone who has left zero documentation >>> or any handover. A couple days ago it decided to show the entire company

Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Sangwhan Moon
Original Message: > On Thu, 25 Jul 2019 13:49:22 +0900 Sangwhan Moon wrote: > > > osd: 39 osds: 39 up, 38 in > > You might want to find that out OSD. Thanks, I've identified the OSD and put it back in - doesn't seem to change anything though. :( Sangwhan ___

Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Christian Balzer
On Thu, 25 Jul 2019 13:49:22 +0900 Sangwhan Moon wrote: > osd: 39 osds: 39 up, 38 in You might want to find that out OSD. -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Mobile Inc. ___ ceph-users ma

Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Sangwhan Moon
Hello, Original Message: > > > On 7/25/19 6:49 AM, Sangwhan Moon wrote: > > Hello, > > > > I've inherited a Ceph cluster from someone who has left zero documentation > > or any handover. A couple days ago it decided to show the entire company > > what it is capable of.. > > > > The health re

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kaspar Bosma
+1 on that. We are going to add 384 OSDs next week to a 2K+ cluster. The proposed solution really works well!KasparOp 24 juli 2019 om 21:06 schreef Paul Emmerich : +1 on adding them all at the same time.All these methods that gradually increase the weight aren't really necessary in newer releases

Re: [ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Wido den Hollander
On 7/25/19 6:49 AM, Sangwhan Moon wrote: > Hello, > > I've inherited a Ceph cluster from someone who has left zero documentation or > any handover. A couple days ago it decided to show the entire company what it > is capable of.. > > The health report looks like this: > > [root@host mnt]# c

[ceph-users] HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

2019-07-24 Thread Sangwhan Moon
Hello, I've inherited a Ceph cluster from someone who has left zero documentation or any handover. A couple days ago it decided to show the entire company what it is capable of.. The health report looks like this: [root@host mnt]# ceph -s cluster: id: 809718aa-3eac-4664-b8fa-38c46cdb

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread zhanrzh...@teamsun.com.cn
I think it should to set "osd_pool_default_min_size=1" before you add osd , and the osd that you add at a time should in same Failure domain. Hi, What would be the proper way to add 100 new OSDs to a cluster? I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like to kno

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread Виталий Филиппов
Cache=writeback is perfectly safe, it's flushed when the guest calls fsync, so journaled filesystems and databases don't lose data that's committed to the journal. 25 июля 2019 г. 2:28:26 GMT+03:00, Stuart Longland пишет: >On 25/7/19 9:01 am, vita...@yourcmc.ru wrote: >>> 60 millibits per seco

Re: [ceph-users] New best practices for osds???

2019-07-24 Thread Xavier Trilla
Hi, We run few hundred HDD OSDs for our backup cluster, we set one RAID 0 per HDD in order to be able to use -battery protected- write cache from the RAID controller. It really improves performance, for both bluestore and filestore OSDs. We also avoid expanders as we had bad experiences with t

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread Stuart Longland
On 25/7/19 9:01 am, vita...@yourcmc.ru wrote: 60 millibits per second?  60 bits every 1000 seconds?  Are you serious?  Or did we get the capitalisation wrong? Assuming 60MB/sec (as 60 Mb/sec would still be slower than the 5MB/sec I was getting), maybe there's some characteristic that Bluestore i

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread vitalif
60 millibits per second? 60 bits every 1000 seconds? Are you serious? Or did we get the capitalisation wrong? Assuming 60MB/sec (as 60 Mb/sec would still be slower than the 5MB/sec I was getting), maybe there's some characteristic that Bluestore is particularly dependent on regarding the HDD

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread Stuart Longland
On 25/7/19 8:48 am, Vitaliy Filippov wrote: > I get 60 mb/s inside a VM in my home nano-ceph consisting of 5 HDDs 4 of > which are inside one PC and 5th is plugged into a ROCK64 :)) I use > Bluestore... 60 millibits per second? 60 bits every 1000 seconds? Are you serious? Or did we get the capi

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread Vitaliy Filippov
/dev/vdb: Timing cached reads: 2556 MB in 1.99 seconds = 1281.50 MB/sec Timing buffered disk reads: 62 MB in 3.03 seconds = 20.48 MB/sec That is without any special tuning, just migrating back to FileStore… journal is on the HDD (it wouldn't let me put it on the SSD like it did last time

Re: [ceph-users] New best practices for osds???

2019-07-24 Thread Simon Ironside
RAID0 mode being discussed here means several RAID0 "arrays", each with a single physical disk as a member of it. I.e. the number of OSDs is the same whether in RAID0 or JBOD mode. E.g. 12x physicals disks = 12x RAID0 single disk "arrays" or 12x JBOD physical disks = 12x OSDs. Simon On 24/07/

Re: [ceph-users] New best practices for osds???

2019-07-24 Thread Vitaliy Filippov
One RAID0 array per drive :) I can't understand how using RAID0 is better than JBOD, considering jbod would be many individual disks, each used as OSDs, instead of a single big one used as a single OSD. -- With best regards, Vitaliy Filippov ___

Re: [ceph-users] New best practices for osds???

2019-07-24 Thread solarflow99
I can't understand how using RAID0 is better than JBOD, considering jbod would be many individual disks, each used as OSDs, instead of a single big one used as a single OSD. On Mon, Jul 22, 2019 at 4:05 AM Vitaliy Filippov wrote: > OK, I meant "it may help performance" :) the main point is tha

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-07-24 Thread Patrick Donnelly
+ other ceph-users On Wed, Jul 24, 2019 at 10:26 AM Janek Bevendorff wrote: > > > what's the ceph.com mailing list? I wondered whether this list is dead but > > it's the list announced on the official ceph.com homepage, isn't it? > There are two mailing lists announced on the website. If you go

Re: [ceph-users] Future of Filestore?

2019-07-24 Thread Stuart Longland
On 23/7/19 9:59 pm, Stuart Longland wrote: > I'll do some proper measurements once the migration is complete. A starting point (I accept more rigorous disk storage tests exist): > virtatomos ~ # hdparm -tT /dev/vdb > > /dev/vdb: > Timing cached reads: 2556 MB in 1.99 seconds = 1281.50 MB/sec

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Alfredo Deza
On Wed, Jul 24, 2019 at 4:15 PM Peter Eisch wrote: > Hi, > > > > I appreciate the insistency that the directions be followed. I wholly > agree. The only liberty I took was to do a ‘yum update’ instead of just > ‘yum update ceph-osd’ and then reboot. (Also my MDS runs on the MON hosts, > so it

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
Hi, I appreciate the insistency that the directions be followed. I wholly agree. The only liberty I took was to do a ‘yum update’ instead of just ‘yum update ceph-osd’ and then reboot. (Also my MDS runs on the MON hosts, so it got update a step early.) As for the logs: [2019-07-24 15:07:22

Re: [ceph-users] Kernel, Distro & Ceph

2019-07-24 Thread Wido den Hollander
On 7/24/19 9:38 PM, dhils...@performair.com wrote: > All; > > There's been a lot of discussion of various kernel versions on this list > lately, so I thought I'd seek some clarification. > > I prefer to run CentOS, and I prefer to keep the number of "extra" > repositories to a minimum. Ceph

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Alfredo Deza
On Wed, Jul 24, 2019 at 3:49 PM Peter Eisch wrote: > > > I’m at step 6. I updated/rebooted the host to complete “installing the > new packages and restarting the ceph-osd daemon” on the first OSD host. > All the systemctl definitions to start the OSDs were deleted, all the > properties in /var/l

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
I’m at step 6. I updated/rebooted the host to complete “installing the new packages and restarting the ceph-osd daemon” on the first OSD host. All the systemctl definitions to start the OSDs were deleted, all the properties in /var/lib/ceph/osd/ceph-* directories were deleted. All the files

Re: [ceph-users] Anybody using 4x (size=4) replication?

2019-07-24 Thread Wido den Hollander
On 7/24/19 9:35 PM, Mark Schouten wrote: > I’d say the cure is worse than the issue you’re trying to fix, but that’s my > two cents. > I'm not completely happy with it either. Yes, the price goes up and latency increases as well. Right now I'm just trying to find a clever solution to this. It

[ceph-users] Kernel, Distro & Ceph

2019-07-24 Thread DHilsbos
All; There's been a lot of discussion of various kernel versions on this list lately, so I thought I'd seek some clarification. I prefer to run CentOS, and I prefer to keep the number of "extra" repositories to a minimum. Ceph requires adding a Ceph repo, and the EPEL repo. Updating the kern

Re: [ceph-users] Anybody using 4x (size=4) replication?

2019-07-24 Thread Mark Schouten
I’d say the cure is worse than the issue you’re trying to fix, but that’s my two cents. Mark Schouten > Op 24 jul. 2019 om 21:22 heeft Wido den Hollander het > volgende geschreven: > > Hi, > > Is anybody using 4x (size=4, min_size=2) replication with Ceph? > > The reason I'm asking is that

Re: [ceph-users] Anybody using 4x (size=4) replication?

2019-07-24 Thread Paul Emmerich
We got a few size=4 pools, but most of them are metadata pools paired with m=3 or m=4 erasure coded pools for the actual data. Goal is to provide the same availability and durability guarantees for the metadata as the data. But we got some older odd setup with replicated size=4 for that reason (se

[ceph-users] Anybody using 4x (size=4) replication?

2019-07-24 Thread Wido den Hollander
Hi, Is anybody using 4x (size=4, min_size=2) replication with Ceph? The reason I'm asking is that a customer of mine asked me for a solution to prevent a situation which occurred: A cluster running with size=3 and replication over different racks was being upgraded from 13.2.5 to 13.2.6. During

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Alfredo Deza
On Wed, Jul 24, 2019 at 2:56 PM Peter Eisch wrote: > Hi Paul, > > To do better to answer you question, I'm following: > http://docs.ceph.com/docs/nautilus/releases/nautilus/ > > At step 6, upgrade OSDs, I jumped on an OSD host and did a full 'yum > update' for patching the host and rebooted to pi

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Paul Emmerich
+1 on adding them all at the same time. All these methods that gradually increase the weight aren't really necessary in newer releases of Ceph. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io

Re: [ceph-users] Ceph durability during outages

2019-07-24 Thread Nathan Fish
It is inherently dangerous to accept client IO - particularly writes - when at k. Just like it's dangerous to accept IO with 1 replica in replicated mode. It is not inherently dangerous to do recovery when at k, but apparently it was originally written to use min_size rather than k. Looking at the

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Reed Dier
Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs. Set the norebalance nobackfill flags. Create all the OSDs, and verify everything looks good. Make sure my max_backfills, recovery_max_active are as expected. Make sure everything has peered. Unse

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
Hi Paul, To do better to answer you question, I'm following: http://docs.ceph.com/docs/nautilus/releases/nautilus/ At step 6, upgrade OSDs, I jumped on an OSD host and did a full 'yum update' for patching the host and rebooted to pick up the current centos kernel. I didn't do anything to speci

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Wido den Hollander
On 7/24/19 6:02 PM, Sinan Polat wrote: > Hi, > > Why not using backup tools that can do native OpenStack backups? > > We are also using Ceph as the cinder backend on our OpenStack platform. We > use CommVault to make our backups. How much data is there in that Ceph cluster? And how does it p

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
[2019-07-24 13:40:49,602][ceph_volume.process][INFO ] Running command: /bin/systemctl show --no-pager --property=Id --state=running ceph-osd@* This is the only log event. At the prompt: # ceph-volume simple scan # peter Peter Eisch Senior Site Reliability Engineer T1.612.659.3228 virginpuls

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Wido den Hollander
On 7/24/19 7:15 PM, Kevin Hrpcek wrote: > I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what > I do, you can obviously change the weight increase steps to what you are > comfortable with. This has worked well for me and my workloads. I've > sometimes seen peering take longe

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Paul Emmerich
On Wed, Jul 24, 2019 at 8:36 PM Peter Eisch wrote: > # lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 1.7T 0 disk > ├─sda1 8:1 0 100M 0 part > ├─sda2 8:2 0 1.7T 0 part > └─sda5 8:5 0 10M 0 part > sdb 8:16 0 1.7T 0 disk > ├─sdb1 8:17 0 100M 0 part > ├─sdb2 8:18 0 1.7T 0 part > └─sdb5

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
Bluestore created with 12.2.10/luminous. The OSD startup generates logs like: 2019-07-24 12:39:46.483 7f4b27649d80 0 set uid:gid to 167:167 (ceph:ceph) 2019-07-24 12:39:46.483 7f4b27649d80 0 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process ceph-osd, pi

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Paul Emmerich
Did you use ceph-disk before? Support for ceph-disk was removed, see Nautilus upgrade instructions. You'll need to run "ceph-volume simple scan" to convert them to ceph-volume Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr

Re: [ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Xavier Trilla
Hi Peter, Im not sure but maybe after some changes the OSDs are not being recongnized by ceph scripts. Ceph used to use udev to detect the OSDs and then moved to lvm, which kind of OSDs are you running? Blustore or filestore? Which version did you use to create them? Cheers! El 24 jul 2019,

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Ch Wan
I usually add 20 OSDs each time. To take control of the influence of backfilling, I will set primary-affinity to 0 of those new OSDs and adjust backfilling configurations. http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling Kevin Hrpcek 于2019年7月25日周四 上午2:02写道: > I c

[ceph-users] Upgrading and lost OSDs

2019-07-24 Thread Peter Eisch
Hi, I’m working through updating from 12.2.12/luminious to 14.2.2/nautilus on centos 7.6. The managers are updated alright: # ceph -s   cluster:     id:     2fdb5976-1234-4b29-ad9c-1ca74a9466ec     health: HEALTH_WARN             Degraded data redundancy: 24177/9555955 objects degraded (0.253%)

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
I change the crush weights. My 4 second sleep doesn't let peering finish for each one before continuing. I'd test with some small steps to get an idea of how much remaps when increasing the weight by $x. I've found my cluster is comfortable with +1 increases...also it take awhile to get to a wei

[ceph-users] Ang: How to add 100 new OSDs...

2019-07-24 Thread da...@oderland.se
CERN has a pretty nice reweight script that we run when we add OSDs in production.https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweightMight be of help!Kind regards, David Majchrzak  Ursprungligt meddelande Ämne: Re: [ceph-users] How to add 100 new OSDs...F

Re: [ceph-users] Ceph durability during outages

2019-07-24 Thread Nathan Fish
2/3 monitors is enough to maintain quorum, as is any majority. However, EC pools have a default min_size of k+1 chunks. This can be adjusted to k, but that has it's own dangers. I assume you are using failure domain = "host"? As you had k=6,m=2, and lost 2 failure domains, you had k chunks left,

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Xavier Trilla
Hi Kevin, Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by one. What do you change, the crush weight? Or the reweight? (I guess you change the crush weight, I am right?) Thanks! El 24 jul 2019, a les 19:17, Kevin Hrpcek mailto:kevin.hrp...@ssec.wisc.edu>> va esc

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Jason Dillaman
On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin wrote: > > Hi Jason, > > i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed > the crash reproduction. > The problem also re-appeared with that kernel release. > > A gunzip with 10 gunzip processes throwed 1600 write and 330

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do, you can obviously change the weight increase steps to what you are comfortable with. This has worked well for me and my workloads. I've sometimes seen peering take longer if I do steps too quickly but I don't run any

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Marc Schöchlin
Hi Jason, i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed the crash reproduction. The problem also re-appeared with that kernel release. A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS against the cluster/the rbd_ec volume with a transfer rate

[ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Xavier Trilla
Hi, What would be the proper way to add 100 new OSDs to a cluster? I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like to know how you do it. Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it can handle plenty of load, but for the sake of s

[ceph-users] How to deal with slow requests related to OSD bugs

2019-07-24 Thread Xavier Trilla
Hi, We had an strange issue while adding a new OSD to our Ceph Luminous 12.2.8 cluster. Our cluster has > 300 OSDs based on SSDs and NVMe. After adding a new OSD to the Ceph cluster one of the already running OSDs started to give us slow queries warnings. We checked the OSD and it was working

[ceph-users] Ceph durability during outages

2019-07-24 Thread Jean-Philippe Méthot
Hi, I’m running in production a 3 monitors, 10 osdnodes ceph cluster. This cluster is used to host Openstack VM rbd. My pools are set to use a k=6 m=2 erasure code profile with a 3 copy metadata pool in front. The cluster runs well, but we recently had a short outage which triggered unexpected

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Mike Christie
On 07/23/2019 12:28 AM, Marc Schöchlin wrote: >>> For testing purposes i set the timeout to unlimited ("nbd_set_ioctl >>> /dev/nbd0 0", on already mounted device). >>> >> I re-executed the problem procedure and discovered that the >>> >> compression-procedure crashes not at the same file, but cra

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Sinan Polat
Hi, Why not using backup tools that can do native OpenStack backups? We are also using Ceph as the cinder backend on our OpenStack platform. We use CommVault to make our backups. - Sinan > Op 24 jul. 2019 om 17:48 heeft Wido den Hollander het > volgende geschreven: > > > >> On 7/24/19 4:0

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Wido den Hollander
On 7/24/19 4:06 PM, Fabian Niepelt wrote: > Hi, thanks for the reply. > > Am Mittwoch, den 24.07.2019, 15:26 +0200 schrieb Wido den Hollander: >> >> On 7/24/19 1:37 PM, Fabian Niepelt wrote: >>> Hello ceph-users, >>> >>> I am currently building a Ceph cluster that will serve as a backend for >>

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Janek Bevendorff
So it looks like the problem only occurs with the kernel module, but maybe ceph-fuse is just too slow to tell. In fact, it is a magnitude slower. I only get 1.3k reqs/s compared to the 20k req/s with the kernel module, which is not practical at all. Update 2: it does indeed seem like ceph-f

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Paul Emmerich
Note that enabling rbd mirroring means taking a hit on IOPS performance, just think of it as a x2 overhead mainly on IOPS. But it does work very well for disaster recovery scenarios if you can take the performance hit. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us a

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Janek Bevendorff
Update: I had to wipe my CephFS, because after I increased the beacon grace period on the last attempt, I couldn't get the MDSs to rejoin anymore at all without running out of memory on the machine. I tried wiping all sessions and the journal, but it didn't work. In the end all I achieved was t

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Tobias Gall
Hallo, what about RGW Replication: https://ceph.com/geen-categorie/radosgw-simple-replication-example/ http://docs.ceph.com/docs/master/radosgw/multisite/ or rdb-mirroring: http://docs.ceph.com/docs/master/rbd/rbd-mirroring/ Regards, Tobias Am 24.07.19 um 13:37 schrieb Fabian Niepelt: Hello

Re: [ceph-users] Erasure Coding performance for IO < stripe_width

2019-07-24 Thread vitalif
We're seeing ~5800 IOPS, ~23 MiB/s on 4 KiB IO (stripe_width 8192) on a pool that could do 3 GiB/s with 4M blocksize. So, yeah, well, that is rather harsh, even for EC. 4kb IO is slow in Ceph even without EC. Your 3 GB/s linear writes don't matter anything. Ceph adds a significant overhead to e

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Fabian Niepelt
Hi, thanks for the reply. Am Mittwoch, den 24.07.2019, 15:26 +0200 schrieb Wido den Hollander: > > On 7/24/19 1:37 PM, Fabian Niepelt wrote: > > Hello ceph-users, > > > > I am currently building a Ceph cluster that will serve as a backend for > > Openstack and object storage using RGW. The clust

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Marc Roos
> > complete DR with Ceph to restore it back to how it was at a given point in time is a challenge. > > Trying to backup a Ceph cluster sounds very 'enterprise' and is difficult to scale as well. Hmmm, I was actually also curious how backups were done, especially on these clusters that have

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Feng Zhang
Does Ceph-fuse mount also has the same issue? On Wed, Jul 24, 2019 at 3:35 AM Janek Bevendorff wrote: > > > I mean kernel version > > Oh, of course. 4.15.0-54 on Ubuntu 18.04 LTS. > > Right now I am also experiencing a different phenomenon. Since I wrapped it > up yesterday, the MDS machines hav

Re: [ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Wido den Hollander
On 7/24/19 1:37 PM, Fabian Niepelt wrote: > Hello ceph-users, > > I am currently building a Ceph cluster that will serve as a backend for > Openstack and object storage using RGW. The cluster itself is finished and > integrated with Openstack and virtual machines for testing are being deployed.

Re: [ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread Eugen Block
Thank you very much! Zitat von EDH - Manuel Rios Fernandez : Hi Eugen, Yes its solved, we reported in 14.2.1 and team fixed in 14.2.2 Regards, Manuel -Mensaje original- De: ceph-users En nombre de Eugen Block Enviado el: miércoles, 24 de julio de 2019 15:10 Para: ceph-users@lists.c

Re: [ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread EDH - Manuel Rios Fernandez
Hi Eugen, Yes its solved, we reported in 14.2.1 and team fixed in 14.2.2 Regards, Manuel -Mensaje original- De: ceph-users En nombre de Eugen Block Enviado el: miércoles, 24 de julio de 2019 15:10 Para: ceph-users@lists.ceph.com Asunto: [ceph-users] Nautilus dashboard: crushmap viewer s

[ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread Eugen Block
Hi all, we just upgraded our cluster to: ceph version 14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f) nautilus (stable) When clicking through the dashboard to see what's new we noticed that the crushmap viewer only shows the first root of our crushmap (we have two roots

[ceph-users] Questions regarding backing up Ceph

2019-07-24 Thread Fabian Niepelt
Hello ceph-users, I am currently building a Ceph cluster that will serve as a backend for Openstack and object storage using RGW. The cluster itself is finished and integrated with Openstack and virtual machines for testing are being deployed. Now I'm a bit stumped on how to effectively backup the

[ceph-users] The num of bandwidth while ceph recovering stands for?

2019-07-24 Thread 展荣臻(信泰)
Hello all, I have a bit confusion about the num of bandwidth while ceph recovering. Below is the result of ceph -w display while ceph recovering: 2019-07-22 18:30:20.378134 mon.0 [INF] pgmap v54047611: 704 pgs: 9 active+remapped+backfilling, 695 active+clean; 3847 GB data, 7742 GB used, 5049

Re: [ceph-users] how to debug slow requests

2019-07-24 Thread Massimo Sgaravatto
Just so I understand, the duration for this operation is 329 seconds (a lot !) but all the reported events happened ~ at the same time (2019-07-20 23:13:18) Were all the events of this ops reported ? Why do you see a problem with the "waiting for subops from 4" event ? Thanks, Massimo On Wed, Ju

Re: [ceph-users] OSD replacement causes slow requests

2019-07-24 Thread Eugen Block
Hi Wido, thanks for your response. Have you tried to dump the historic slow ops on the OSDs involved to see what is going on? $ ceph daemon osd.X dump_historic_slow_ops Good question, I don't recall doing that. Maybe my colleague did but he's on vacation right now. ;-) But to be clear, a

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Janek Bevendorff
I mean kernel version Oh, of course. 4.15.0-54 on Ubuntu 18.04 LTS. Right now I am also experiencing a different phenomenon. Since I wrapped it up yesterday, the MDS machines have been trying to rejoin, but could only handle a few hundred up to a few hundred thousand inodes per second befo

Re: [ceph-users] OSD replacement causes slow requests

2019-07-24 Thread Wido den Hollander
On 7/18/19 12:21 PM, Eugen Block wrote: > Hi list, > > we're facing an unexpected recovery behavior of an upgraded cluster > (Luminous -> Nautilus). > > We added new servers with Nautilus to the existing Luminous cluster, so > we could first replace the MONs step by step. Then we moved the old

Re: [ceph-users] pools limit

2019-07-24 Thread Wido den Hollander
On 7/16/19 6:53 PM, M Ranga Swami Reddy wrote: > Thanks for your reply.. > Here, new pool creations and pg auto scale may cause rebalance..which > impact the ceph cluster performance.. > > Please share name space detail like how to use etc > Would it be RBD, Rados, CephFS? What would you be us

Re: [ceph-users] how to debug slow requests

2019-07-24 Thread Wido den Hollander
On 7/20/19 6:06 PM, Wei Zhao wrote: > Hi ceph users: > I was doing write benchmark, and found some io will be blocked for a > very long time. The following log is one op , it seems to wait for > replica to finish. My ceph version is 12.2.4, and the pool is 3+2 EC . > Does anyone give me some ad

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 3:13 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > > which version? > > Nautilus, 14.2.2. > I mean kernel version > try mounting cephfs on a machine/vm with small memory (4G~8G), then rsync > your date into mount point of that machine. > > I could try ru

Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-24 Thread nokia ceph
Hi Team, I guess for cluster installed with Nuatilus this warning will not come and it is only for upgraded systems. Please let us know disabling bluestore warn on legacy statfs is the only option for upgraded clusters. thanks, Muthu On Fri, Jul 19, 2019 at 5:22 PM Paul Emmerich wrote: > blues

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Janek Bevendorff
which version? Nautilus, 14.2.2. try mounting cephfs on a machine/vm with small memory (4G~8G), then rsync your date into mount point of that machine. I could try running it in a memory-limited Docker container, but isn't there a better way to achieve the same thing? This sounds like a bug

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 1:58 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Ceph-fuse ? > > No, I am using the kernel module. > > which version? > > Was there "Client xxx failing to respond to cache pressure" health warning? > > > At first, yes (at least with the Mimic client). T