Re: [ceph-users] ?==?utf-8?q? OSD's hang after network blip

2020-01-16 Thread Nick Fisk
On Thursday, January 16, 2020 09:15 GMT, Dan van der Ster wrote: > Hi Nick, > > We saw the exact same problem yesterday after a network outage -- a few of > our down OSDs were stuck down until we restarted their processes. > > -- Dan > > > On Wed, Jan 15, 2020

Re: [ceph-users] ?==?utf-8?q? OSD's hang after network blip

2020-01-15 Thread Nick Fisk
On Wednesday, January 15, 2020 14:37 GMT, "Nick Fisk" wrote: > Hi All, > > Running 14.2.5, currently experiencing some network blips isolated to a > single rack which is under investigation. However, it appears following a > network blip, random OSD's in unaf

[ceph-users] OSD's hang after network blip

2020-01-15 Thread Nick Fisk
Hi All, Running 14.2.5, currently experiencing some network blips isolated to a single rack which is under investigation. However, it appears following a network blip, random OSD's in unaffected racks are sometimes not recovering from the incident and are left running running in a zombie state.

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-25 Thread Nick Fisk
> -Original Message- > From: Vitaliy Filippov > Sent: 23 February 2019 20:31 > To: n...@fisk.me.uk; Serkan Çoban > Cc: ceph-users > Subject: Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow > storage for db - why? > > X-Assp-URIBL failed: 'yourcmc.ru'(black.uribl.com ) >

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-25 Thread Nick Fisk
> -Original Message- > From: Konstantin Shalygin > Sent: 22 February 2019 14:23 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow > storage for db - why? > > Bluestore/RocksDB

Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-22 Thread Nick Fisk
>Yes and no... bluestore seems to not work really optimal. For example, >it has no filestore-like journal waterlining and flushes the deferred >write queue just every 32 writes (deferred_batch_ops). And when it does >that it's basically waiting for the HDD to commit and slowing down all >further wr

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread Nick Fisk
>On 2/16/19 12:33 AM, David Turner wrote: >> The answer is probably going to be in how big your DB partition is vs >> how big your HDD disk is. From your output it looks like you have a >> 6TB HDD with a 28GB Blocks.DB partition. Even though the DB used >> size isn't currently full, I would gu

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-30 Thread Nick Fisk
> > >> On 10/18/2018 7:49 PM, Nick Fisk wrote: > > >>> Hi, > > >>> > > >>> Ceph Version = 12.2.8 > > >>> 8TB spinner with 20G SSD partition > > >>> > > >>> Perf dump shows the followin

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-20 Thread Nick Fisk
> >> On 10/18/2018 7:49 PM, Nick Fisk wrote: > >>> Hi, > >>> > >>> Ceph Version = 12.2.8 > >>> 8TB spinner with 20G SSD partition > >>> > >>> Perf dump shows the following: > >>> > >>> "b

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: 19 October 2018 08:15 > To: 'Igor Fedotov' ; ceph-users@lists.ceph.com > Subject: RE: [ceph-users] slow_used_bytes - SlowDB being used despite lots of > space free in BlockDB on SSD?

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> > On 10/18/2018 7:49 PM, Nick Fisk wrote: > > Hi, > > > > Ceph Version = 12.2.8 > > 8TB spinner with 20G SSD partition > > > > Perf dump shows the following: > > > > "bluefs": { > > "gift_bytes": 0,

[ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-18 Thread Nick Fisk
Hi, Ceph Version = 12.2.8 8TB spinner with 20G SSD partition Perf dump shows the following: "bluefs": { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 21472731136, "db_used_bytes": 3467640832, "wal_total_bytes": 0, "wal_used_bytes": 0,

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
2 PM, Igor Fedotov wrote: > > > Hi Nick. > > > > > > On 9/10/2018 1:30 PM, Nick Fisk wrote: > >> If anybody has 5 minutes could they just clarify a couple of things > >> for me > >> > >> 1. onode count, should this be equal to the number of

[ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. E

[ceph-users] Tiering stats are blank on Bluestore OSD's

2018-09-10 Thread Nick Fisk
After upgrading a number of OSD's to Bluestore I have noticed that the cache tier OSD's which have so far been upgraded are no longer logging tier_* stats "tier_promote": 0, "tier_flush": 0, "tier_flush_fail": 0, "tier_try_flush": 0, "tier_try_flush_fail":

Re: [ceph-users] help needed

2018-09-06 Thread Nick Fisk
If it helps, I’m seeing about a 3GB DB usage for a 3TB OSD about 60% full. This is with a pure RBD workload, I believe this can vary depending on what your Ceph use case is. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Turner Sent: 06 September 2018 14:09 To

Re: [ceph-users] CephFS+NFS For VMWare

2018-07-02 Thread Nick Fisk
Quoting Ilya Dryomov : On Fri, Jun 29, 2018 at 8:08 PM Nick Fisk wrote: This is for us peeps using Ceph with VMWare. My current favoured solution for consuming Ceph in VMWare is via RBD’s formatted with XFS and exported via NFS to ESXi. This seems to perform better than iSCSI+VMFS

Re: [ceph-users] CephFS+NFS For VMWare

2018-06-30 Thread Nick Fisk
greater concern. Thanks, Nick From: Paul Emmerich [mailto:paul.emmer...@croit.io] Sent: 29 June 2018 17:57 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] CephFS+NFS For VMWare VMWare can be quite picky about NFS servers. Some things that you should test before deploying

[ceph-users] CephFS+NFS For VMWare

2018-06-29 Thread Nick Fisk
This is for us peeps using Ceph with VMWare. My current favoured solution for consuming Ceph in VMWare is via RBD's formatted with XFS and exported via NFS to ESXi. This seems to perform better than iSCSI+VMFS which seems to not play nicely with Ceph's PG contention issues particularly if wor

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-14 Thread Nick Fisk
ts.ceph.com] On Behalf Of Nick Fisk Sent: 07 June 2018 14:01 To: 'ceph-users' Subject: Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) So I've recompiled a 12.2.5 ceph-osd binary with the fix included in https://github.com/ceph/ceph/pull/22396 The OSD has resta

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Nick Fisk
I’ve seen similar things like this happen if you tend to end up with extreme weighting towards a small set of OSD’s. Crush tries a slightly different combination of OSD’s at each attempt, but in an extremely lop sided weighting, it can run out of attempts before it finds a set of OSD’s which mat

Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Nick Fisk
http://docs.ceph.com/docs/master/ceph-volume/simple/ ? From: ceph-users On Behalf Of Konstantin Shalygin Sent: 08 June 2018 11:11 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) Wh

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-07 Thread Nick Fisk
sing the object-store-tool, but not sure if I want to clean the clone metadata or try and remove the actual snapshot object. -Original Message- From: ceph-users On Behalf Of Nick Fisk Sent: 05 June 2018 17:22 To: 'ceph-users' Subject: Re: [ceph-users] FAILED assert(p != recover

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
quot; snapshot object and then allow thigs to backfill? -Original Message- From: ceph-users On Behalf Of Nick Fisk Sent: 05 June 2018 16:43 To: 'ceph-users' Subject: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) Hi, After a RBD snapshot was removed, I

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
From: ceph-users On Behalf Of Paul Emmerich Sent: 05 June 2018 17:02 To: n...@fisk.me.uk Cc: ceph-users Subject: Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) 2018-06-05 17:42 GMT+02:00 Nick Fisk mailto:n...@fisk.me.uk> >: Hi, After a RBD snapsh

[ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
Hi, After a RBD snapshot was removed, I seem to be having OSD's assert when they try and recover pg 1.2ca. The issue seems to follow the PG around as OSD's fail. I've seen this bug tracker and associated mailing list post, but would appreciate if anyone can give any pointers. https://tracker.cep

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-14 Thread Nick Fisk
Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs On 05/01/2018 10:19 PM, Nick Fisk wrote: > 4.16 required? > https://www.phoronix.com/scan.php?page=news_item&px=Skylake-X-P-State- > Linux- > 4.16 > I've been trying with the 4.16 kernel for the last

[ceph-users] Scrubbing impacting write latency since Luminous

2018-05-10 Thread Nick Fisk
Hi All, I've just upgraded our main cluster to Luminous and have noticed that where before the cluster 64k write latency was always hovering around 2ms regardless of what scrubbing was going on, since the upgrade to Luminous, scrubbing takes the average latency up to around 5-10ms and deep scrubbi

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
case writing the IO's through the NVME first seems to help by quite a large margin. I'm curious what was the original rationale for 32kB? Cheers, Dan On Tue, May 1, 2018 at 10:50 PM, Nick Fisk wrote: Hi all, Slowly getting round to migrating clusters to Bluestore but I am i

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
Hi Nick, On 5/1/2018 11:50 PM, Nick Fisk wrote: Hi all, Slowly getting round to migrating clusters to Bluestore but I am interested in how people are handling the potential change in write latency coming from Filestore? Or maybe nobody is really seeing much difference? As we all know, in

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
-Original Message- From: Alex Gorbachev Sent: 02 May 2018 22:05 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences Hi Nick, On Tue, May 1, 2018 at 4:50 PM, Nick Fisk wrote: > Hi all, > > > > Slowly getting rou

[ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-01 Thread Nick Fisk
Hi all, Slowly getting round to migrating clusters to Bluestore but I am interested in how people are handling the potential change in write latency coming from Filestore? Or maybe nobody is really seeing much difference? As we all know, in Bluestore, writes are not double written and in mo

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-01 Thread Nick Fisk
4.16 required? https://www.phoronix.com/scan.php?page=news_item&px=Skylake-X-P-State-Linux- 4.16 -Original Message- From: ceph-users On Behalf Of Blair Bethwaite Sent: 01 May 2018 16:46 To: Wido den Hollander Cc: ceph-users ; Nick Fisk Subject: Re: [ceph-users] Intel Xeon Scalable

Re: [ceph-users] pgs down after adding 260 OSDs & increasing PGs

2018-01-29 Thread Nick Fisk
Hi Jake, I suspect you have hit an issue that me and a few others have hit in Luminous. By increasing the number of PG's before all the data has re-balanced, you have probably exceeded hard PG per OSD limit. See this thread https://www.spinics.net/lists/ceph-users/msg41231.html Nick > -Orig

Re: [ceph-users] BlueStore.cc: 9363: FAILED assert(0 == "unexpected error")

2018-01-26 Thread Nick Fisk
I can see this in the logs: 2018-01-25 06:05:56.292124 7f37fa6ea700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 101% full 2018-01-25 06:05:56.325404 7f3803f9c700 -1 bluestore(/var/lib/ceph/osd/ceph-9) _do_alloc_write failed to reserve 0x4000 2018-

Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-24 Thread Nick Fisk
I know this may be a bit vague, but also suggests the "try a newer kernel" approach. We had constant problems with hosts mounting a number of RBD volumes formatted with XFS. The servers would start aggressively swapping even though the actual memory in use was nowhere near even 50% and eventuall

Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Nick Fisk
Anyone with 25G ethernet willing to do the test? Would love to see what the latency figures are for that. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Maged Mokhtar Sent: 22 January 2018 11:28 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] What is the shou

Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-21 Thread Nick Fisk
How up to date is your VM environment? We saw something very similar last year with Linux VM’s running newish kernels. It turns out newer kernels supported a new feature of the vmxnet3 adapters which had a bug in ESXi. The fix was release last year some time in ESXi6.5 U1, or a workaround was to

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-11 Thread Nick Fisk
I take my hat off to you, well done for solving that!!! > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Zdenek Janda > Sent: 11 January 2018 13:01 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Cluster crash - FAILED assert(int

[ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-04 Thread Nick Fisk
Hi All, As the KPTI fix largely only affects the performance where there are a large number of syscalls made, which Ceph does a lot of, I was wondering if anybody has had a chance to perform any initial tests. I suspect small write latencies will the worse affected? Although I'm thinking the back

Re: [ceph-users] Cache tiering on Erasure coded pools

2017-12-27 Thread Nick Fisk
Also carefully read the word of caution section on David's link (which is absent in the jewel version of the docs), a cache tier in front of an ersure coded data pool for RBD is almost always a bad idea. I would say that statement is incorrect if using Bluestore. If using Bluestore, small

Re: [ceph-users] Bluestore Compression not inheriting pool option

2017-12-13 Thread Nick Fisk
Thanks for confirming, logged http://tracker.ceph.com/issues/22419 > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Stefan Kooman > Sent: 12 December 2017 20:35 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Sub

Re: [ceph-users] Odd object blocking IO on PG

2017-12-13 Thread Nick Fisk
alf Of Nick Fisk Sent: 13 December 2017 11:14 To: 'Gregory Farnum' Cc: 'ceph-users' Subject: Re: [ceph-users] Odd object blocking IO on PG On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk mailto:n...@fisk.me.uk> > wrote: > That doesn't look like an RB

Re: [ceph-users] Health Error : Request Stuck

2017-12-13 Thread Nick Fisk
onest, not exactly sure its the correct way. P.S : I had upgraded to Luminous 12.2.2 yesterday. Karun Josy On Wed, Dec 13, 2017 at 4:31 PM, Nick Fisk mailto:n...@fisk.me.uk> > wrote: Hi Karun, I too am experiencing something very similar with a PG stuck in activatin

Re: [ceph-users] Odd object blocking IO on PG

2017-12-13 Thread Nick Fisk
On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk mailto:n...@fisk.me.uk> > wrote: > That doesn't look like an RBD object -- any idea who is > "client.34720596.1:212637720"? So I think these might be proxy ops from the cache tier, as there are also block ops on one of the

Re: [ceph-users] Health Error : Request Stuck

2017-12-13 Thread Nick Fisk
Hi Karun, I too am experiencing something very similar with a PG stuck in activating+remapped state after re-introducing a OSD back into the cluster as Bluestore. Although this new OSD is not the one listed against the PG’s stuck activating. I also see the same thing as you where the up set

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
ached) is not showing in the main status that it has been blocked from peering or that there are any missing objects. I've tried restarting all OSD's I can see relating to the PG in case they needed a bit of a nudge. > > On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk wrote: > >

[ceph-users] Bluestore Compression not inheriting pool option

2017-12-12 Thread Nick Fisk
Hi All, Has anyone been testing the bluestore pool compression option? I have set compression=snappy on a RBD pool. When I add a new bluestore OSD, data is not being compressed when backfilling, confirmed by looking at the perf dump results. If I then set again the compression type on the pool to

[ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
Does anyone know what this object (0.ae78c1cf) might be, it's not your normal run of the mill RBD object and I can't seem to find it in the pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool with a ca

Re: [ceph-users] what's the maximum number of OSDs per OSD server?

2017-12-10 Thread Nick Fisk
software? Just make sure you size the nodes to a point that if one has to be taken offline for any reason, that you are happy with the resulting state of the cluster, including the peering when suddenly taking ~200 OSD’s offline/online. Nick On Sun, Dec 10, 2017 at 11:17 AM, Nic

Re: [ceph-users] what's the maximum number of OSDs per OSD server?

2017-12-10 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Igor Mendelev Sent: 10 December 2017 15:39 To: ceph-users@lists.ceph.com Subject: [ceph-users] what's the maximum number of OSDs per OSD server? Given that servers with 64 CPU cores (128 threads @ 2.7GHz) and up to 2TB RA

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-27 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of German Anders Sent: 27 November 2017 14:44 To: Maged Mokhtar Cc: ceph-users Subject: Re: [ceph-users] ceph all-nvme mysql performance tuning Hi Maged, Thanks a lot for the response. We try with different number of t

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-18 Thread Nick Fisk
le, say 3-4x over the total amount of RAM > in all of the nodes, helps you get a better idea of what the behavior is like > when those tricks are less effective. I think that's probably a more likely > scenario in most production environments, but it's up to you which worklo

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message- > From: Mark Nelson [mailto:mnel...@redhat.com] > Sent: 08 November 2017 21:42 > To: n...@fisk.me.uk; 'Wolfgang Lendl' > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bluestore - wal,db on faster devices? > > > >

Re: [ceph-users] Blog post: storage server power consumption

2017-11-08 Thread Nick Fisk
Also look at the new WD 10TB Red's if you want very low use archive storage. Because they spin at 5400, they only use 2.8W at idle. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jack > Sent: 06 November 2017 22:31 > To: ceph-users@lists.c

Re: [ceph-users] Recovery operations and ioprio options

2017-11-08 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > ??? ??? > Sent: 08 November 2017 16:21 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Recovery operations and ioprio options > > Hello, > Today we use ceph jewel with: > osd

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mark Nelson > Sent: 08 November 2017 19:46 > To: Wolfgang Lendl > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bluestore - wal,db on faster devices? > > Hi Wolfgang, > > You've

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-16 Thread Nick Fisk
Hi Matt, Well behaved applications are the problem here. ESXi sends all writes as sync writes. So although OS’s will still do their own buffering, any ESXi level operation is all done as sync. This is probably seen the greatest when migrating vm’s between datastores, everything gets done as

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-14 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Osama Hasebou Sent: 14 August 2017 12:27 To: ceph-users Subject: [ceph-users] VMware + Ceph using NFS sync/async ? Hi Everyone, We started testing the idea of using Ceph storage with VMware, the idea was to provide Ce

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-14 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Ronny Aasen > Sent: 14 August 2017 18:55 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements > > On 10.08.2017 17:30, Gregory Farnum wrote: >

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-13 Thread Nick Fisk
Sat, Aug 12, 2017, 2:40 PM Nick Fisk mailto:n...@fisk.me.uk> > wrote: I was under the impression the memory requirements for Bluestore would be around 2-3GB per OSD regardless of capacity. CPU wise, I would lean towards working out how much total Ghz you require and then get whatever CPU yo

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread Nick Fisk
I was under the impression the memory requirements for Bluestore would be around 2-3GB per OSD regardless of capacity. CPU wise, I would lean towards working out how much total Ghz you require and then get whatever CPU you need to get there, but with a preference of Ghz over cores. Yes, there will

Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mclean, Patrick > Sent: 08 August 2017 20:13 > To: David Turner ; ceph-us...@ceph.com > Cc: Colenbrander, Roelof ; Payno, > Victor ; Yip, Rae > Subject: Re: [ceph-users] ceph cluster experie

Re: [ceph-users] Kernel mounted RBD's hanging

2017-07-31 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 31 July 2017 11:36 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Thu, Jul 13, 2017 at 12:54 PM, Ilya Dryomov wrote: > &

Re: [ceph-users] RBD cache being filled up in small increases instead of 4MB

2017-07-15 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Gregory Farnum > Sent: 15 July 2017 00:09 > To: Ruben Rodriguez > Cc: ceph-users > Subject: Re: [ceph-users] RBD cache being filled up in small increases instead > of 4MB > > On Fri, Jul 14,

Re: [ceph-users] Ceph mount rbd

2017-07-14 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jason Dillaman > Sent: 14 July 2017 16:40 > To: li...@marcelofrota.info > Cc: ceph-users > Subject: Re: [ceph-users] Ceph mount rbd > > On Fri, Jul 14, 2017 at 9:44 AM, wrote: > > Gonzal

Re: [ceph-users] Kernel mounted RBD's hanging

2017-07-12 Thread Nick Fisk
> -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: 12 July 2017 13:47 > To: 'Ilya Dryomov' > Cc: 'Ceph Users' > Subject: RE: [ceph-users] Kernel mounted RBD's hanging > > > -Original Message- > >

Re: [ceph-users] Kernel mounted RBD's hanging

2017-07-08 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 07 July 2017 11:32 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Fri, Jul 7, 2017 at 12:10 PM, Nick Fisk wrote: > > M

Re: [ceph-users] Kernel mounted RBD's hanging

2017-07-07 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 01 July 2017 13:19 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Sat, Jul 1, 2017 at 9:29 AM, Nick Fisk wrote: > >>

Re: [ceph-users] Kernel mounted RBD's hanging

2017-07-01 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 30 June 2017 14:06 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Fri, Jun 30, 2017 at 2:14 PM, Nick Fisk wrote: >

Re: [ceph-users] Kernel mounted RBD's hanging

2017-06-30 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 29 June 2017 18:54 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Thu, Jun 29, 2017 at 6:22 PM, Nick Fisk wrote: > >>

Re: [ceph-users] Kernel mounted RBD's hanging

2017-06-30 Thread Nick Fisk
From: Alex Gorbachev [mailto:a...@iss-integration.com] Sent: 30 June 2017 03:54 To: Ceph Users ; n...@fisk.me.uk Subject: Re: [ceph-users] Kernel mounted RBD's hanging On Thu, Jun 29, 2017 at 10:30 AM Nick Fisk mailto:n...@fisk.me.uk> > wrote: Hi All, Putting out a call for

Re: [ceph-users] Kernel mounted RBD's hanging

2017-06-29 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 29 June 2017 16:58 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Kernel mounted RBD's hanging > > On Thu, Jun 29, 2017 at 4:30 PM, Nick Fisk wrote: > > Hi

[ceph-users] Kernel mounted RBD's hanging

2017-06-29 Thread Nick Fisk
Hi All, Putting out a call for help to see if anyone can shed some light on this. Configuration: Ceph cluster presenting RBD's->XFS->NFS->ESXi Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a pacemaker cluster Both OSD's and clients are go into a pair of switches, single L2 do

Re: [ceph-users] Ceph random read IOPS

2017-06-26 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Willem Jan Withagen > Sent: 26 June 2017 14:35 > To: Christian Wuerdig > Cc: Ceph Users > Subject: Re: [ceph-users] Ceph random read IOPS > > On 26-6-2017 09:01, Christian Wuerdig wrote: > >

Re: [ceph-users] Ceph random read IOPS

2017-06-24 Thread Nick Fisk
Apologies for the top post, I can't seem to break indents on my phone. Anyway the point of that test was as maged suggests to show the effect of serial CPU speed on latency. IO is effectively serialised by the pg lock, and so trying to reduce the time spent in this area is key. Fast cpu, fast ne

Re: [ceph-users] VMware + CEPH Integration

2017-06-22 Thread Nick Fisk
> -Original Message- > From: Adrian Saul [mailto:adrian.s...@tpgtelecom.com.au] > Sent: 19 June 2017 06:54 > To: n...@fisk.me.uk; 'Alex Gorbachev' > Cc: 'ceph-users' > Subject: RE: [ceph-users] VMware + CEPH Integration > > > Hi Alex, > > > > Have you experienced any problems with timeou

Re: [ceph-users] VMware + CEPH Integration

2017-06-17 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Alex Gorbachev > Sent: 16 June 2017 01:48 > To: Osama Hasebou > Cc: ceph-users > Subject: Re: [ceph-users] VMware + CEPH Integration > > On Thu, Jun 15, 2017 at 5:29 AM, Osama Hasebou > wro

Re: [ceph-users] 2x replica with NVMe

2017-06-08 Thread Nick Fisk
Bluestore will make 2x Replica’s “safer” to use in theory. Until Bluestore is in use in the wild, I don’t think anyone can give any guarantees. From: i...@witeq.com [mailto:i...@witeq.com] Sent: 08 June 2017 14:32 To: nick Cc: Vy Nguyen Tan ; ceph-users Subject: Re: [ceph-users] 2x replic

Re: [ceph-users] 2x replica with NVMe

2017-06-08 Thread Nick Fisk
There are two main concerns with using 2x replicas, recovery speed and coming across inconsistent objects. With spinning disks their size to access speed means recovery can take a long time and increases the chance that additional failures may happen during the recovery process. NVME will re

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Nick Fisk
data before next year, you're a > > lot braver than me. > > An early adoption scheme with Bluestore nodes being in their own > > failure domain (rack) would be the best I could see myself doing in my > > generic cluster. > > For the 2 mission critical produ

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Nick Fisk
Hi Dan, > -Original Message- > From: Dan van der Ster [mailto:d...@vanderster.com] > Sent: 17 May 2017 10:29 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] Changing SSD Landscape > > I am currently pricing out some DCS3520's, for OSDs. Word

[ceph-users] Changing SSD Landscape

2017-05-17 Thread Nick Fisk
Hi All, There seems to be a shift in enterprise SSD products to larger less write intensive products and generally costing more than what the existing P/S 3600/3700 ranges were. For example the new Intel NVME P4600 range seems to start at 2TB. Although I mention Intel products, this seems to be

Re: [ceph-users] Intel power tuning - 30% throughput performance increase

2017-05-03 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Blair Bethwaite > Sent: 03 May 2017 09:53 > To: Dan van der Ster > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Intel power tuning - 30% throughput performance > increase > > O

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-01 Thread Nick Fisk
Hi Patrick, Is there any chance that you can graph the XFS stats to see if there is an increase in inode/dentry cache misses as the ingest performance drops off? At least that might confirm the issue. Only other thing I can think of would be to try running the OSD’s on top of something l

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-20 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jogi > Hofmüller > Sent: 20 April 2017 13:51 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] slow requests and short OSD failures in small > cluster > > Hi, > > Am Dienstag, den 1

Re: [ceph-users] Mon not starting after upgrading to 10.2.7

2017-04-12 Thread Nick Fisk
Dan van der Ster [mailto:d...@vanderster.com] > Sent: 12 April 2017 10:53 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] Mon not starting after upgrading to 10.2.7 > > Can't help, but just wanted to say that the upgrade worked for us: > > # ceph health

[ceph-users] Mon not starting after upgrading to 10.2.7

2017-04-12 Thread Nick Fisk
Hi, I just upgraded one of my mons to 10.2.7 and it is now failing to start properly. What's really odd is all the mon specific commands are now missing from the admin socket. ceph --admin-daemon /var/run/ceph/ceph-mon.gp-ceph-mon2.asok help { "config diff": "dump diff of current config and d

Re: [ceph-users] Preconditioning an RBD image

2017-04-10 Thread Nick Fisk
t; To: n...@fisk.me.uk; 'ceph-users' > Subject: Re: [ceph-users] Preconditioning an RBD image > > On 03/25/17 23:01, Nick Fisk wrote: > > > >> I think I owe you another graph later when I put all my VMs on there > >> (probably finally fixed my rbd snapshot

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Nick Fisk
> -Original Message- > From: David Disseldorp [mailto:dd...@suse.de] > Sent: 06 April 2017 14:06 > To: Nick Fisk > Cc: 'Maged Mokhtar' ; 'Brady Deetz' > ; 'ceph-users' > Subject: Re: [ceph-users] rbd iscsi gateway question > >

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Maged Mokhtar > Sent: 06 April 2017 12:21 > To: Brady Deetz ; ceph-users > Subject: Re: [ceph-users] rbd iscsi gateway question > > The io hang (it is actually a pause not hang) is done by Ce

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Nick Fisk
I assume Brady is referring to the death spiral LIO gets into with some initiators, including vmware, if an IO takes longer than about 10s. I haven’t heard of anything, and can’t see any changes, so I would assume this issue still remains. I would look at either SCST or NFS for now. From

Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Nick Fisk
pes of issues, and it was exclusive to the 8TB OSDs. I'm not sure how that would cause such a problem, but it's an interesting data point. On Thu, 2017-03-30 at 17:33 +0100, Nick Fisk wrote: Hi Steve, If you can recreate or if you can remember the object name, it might be worth

Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Nick Fisk
Hi Steve, If you can recreate or if you can remember the object name, it might be worth trying to run "ceph osd map" on the objects and see where it thinks they map to. And/or maybe pg query might show something? Nick From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behal

Re: [ceph-users] New hardware for OSDs

2017-03-28 Thread Nick Fisk
Hi Christian, > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Christian Balzer > Sent: 28 March 2017 00:59 > To: ceph-users@lists.ceph.com > Cc: Nick Fisk > Subject: Re: [ceph-users] New hardware for OSDs > >

Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Wido den Hollander > Sent: 27 March 2017 12:35 > To: ceph-users@lists.ceph.com; Christian Balzer > Subject: Re: [ceph-users] New hardware for OSDs > > > > Op 27 maart 2017 om 13:22 schreef C

Re: [ceph-users] Preconditioning an RBD image

2017-03-25 Thread Nick Fisk
3. I assume with 4.9 kernel you don't have the bcache fix which allows partitions. What method are you using to create OSDs? 4. As mentioned above any stats around percentage of MB/s that is hitting your cache device vs journal (assuming journal is 100% of IO). This is to calculate extra wea

Re: [ceph-users] cephfs cache tiering - hitset

2017-03-23 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mike Lovell Sent: 20 March 2017 22:31 To: n...@fisk.me.uk Cc: Webert de Souza Lima ; ceph-users Subject: Re: [ceph-users] cephfs cache tiering - hitset On Mon, Mar 20, 2017 at 4:20 PM, Nick Fisk mailto:n

Re: [ceph-users] Preconditioning an RBD image

2017-03-23 Thread Nick Fisk
Hi Peter, Interesting graph. Out of interest, when you use bcache, do you then just leave the journal collocated on the combined bcache device and rely on the writeback to provide journal performance, or do you still create a separate partition on whatever SSD/NVME you use, effectively giving t

Re: [ceph-users] cephfs cache tiering - hitset

2017-03-20 Thread Nick Fisk
Just a few corrections, hope you don't mind > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mike Lovell > Sent: 20 March 2017 20:30 > To: Webert de Souza Lima > Cc: ceph-users > Subject: Re: [ceph-users] cephfs cache tiering - hitset > >

  1   2   3   4   5   6   7   8   >