Re: [ceph-users] Failed to encode map errors

2019-12-03 Thread Martin Verges
Hello, what versions of Ceph are you running? -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB

[ceph-users] Shall host weight auto reduce on hdd failure?

2019-12-03 Thread Milan Kupcevic
On hdd failure the number of placement groups on the rest of osds on the same host goes up. I would expect equal distribution of failed placement groups across the cluster, not just on the troubled host. Shall the host weight auto reduce whenever an osd gets out? Exibit 1: Attached osd-df-tree

Re: [ceph-users] Revert a CephFS snapshot?

2019-12-03 Thread Luis Henriques
On Tue, Dec 03, 2019 at 02:09:30PM -0500, Jeff Layton wrote: > On Tue, 2019-12-03 at 07:59 -0800, Robert LeBlanc wrote: > > On Thu, Nov 14, 2019 at 11:48 AM Sage Weil wrote: > > > On Thu, 14 Nov 2019, Patrick Donnelly wrote: > > > > On Wed, Nov 13, 2019 at 6:36 PM Jerry Lee > > > > wrote: > > >

Re: [ceph-users] osds way ahead of gateway version?

2019-12-03 Thread Gregory Farnum
Unfortunately RGW doesn't test against extended version differences like this and I don't think it's compatible across more than one major release. Basically it's careful to support upgrades between long-term stable releases but nothing else is expected to work. That said, getting off of Giant

[ceph-users] osds way ahead of gateway version?

2019-12-03 Thread Philip Brown
Im in a situation where it would be extremely strategically advantageous to run some OSDs on luminous (so we can try out bluestore) while the gateways stay on giant. Is this a terrible terrible thing, or can we reasonably get away with it? points of interest: 1. i plan to make a new pool for

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Paul Emmerich
On Tue, Dec 3, 2019 at 6:43 PM Robert LeBlanc wrote: > > On Tue, Dec 3, 2019 at 9:11 AM Ed Fisher wrote: >> >> >> >> On Dec 3, 2019, at 10:28 AM, Robert LeBlanc wrote: >> >> Did you make progress on this? We have a ton of < 64K objects as well and >> are struggling to get good performance out

[ceph-users] Failed to encode map errors

2019-12-03 Thread John Hearns
And me again for the second time in one day. ceph -w is now showing messages like this: 2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with expected crc Any advice please? -- *Kheiron Medical Technologies* kheironmed.com | supporting

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Robert LeBlanc
On Tue, Dec 3, 2019 at 9:11 AM Ed Fisher wrote: > > > On Dec 3, 2019, at 10:28 AM, Robert LeBlanc wrote: > > Did you make progress on this? We have a ton of < 64K objects as well and > are struggling to get good performance out of our RGW. Sometimes we have > RGW instances that are just

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Ed Fisher
> On Dec 3, 2019, at 10:28 AM, Robert LeBlanc wrote: > > Did you make progress on this? We have a ton of < 64K objects as well and are > struggling to get good performance out of our RGW. Sometimes we have RGW > instances that are just gobbling up CPU even when there are no requests to >

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Robert LeBlanc
On Tue, Nov 19, 2019 at 9:34 AM Christian wrote: > Hi, > > I used https://github.com/dvassallo/s3-benchmark to measure some >> performance values for the rgws and got some unexpected results. >> Everything above 64K has excellent performance but below it drops down to >> a fraction of the speed

Re: [ceph-users] Missing Ceph perf-counters in Ceph-Dashboard or Prometheus/InfluxDB...?

2019-12-03 Thread Ernesto Puerta
Thanks, Benjeman! I created this pad (https://pad.ceph.com/p/perf-counters-to-expose) so we can list them there. An alternative approach could also be to allow for whitelisting some perf-counters, so they would become exported no matter their priority. This would allow users to customize which

Re: [ceph-users] Revert a CephFS snapshot?

2019-12-03 Thread Robert LeBlanc
On Thu, Nov 14, 2019 at 11:48 AM Sage Weil wrote: > On Thu, 14 Nov 2019, Patrick Donnelly wrote: > > On Wed, Nov 13, 2019 at 6:36 PM Jerry Lee > wrote: > > > > > > On Thu, 14 Nov 2019 at 07:07, Patrick Donnelly > wrote: > > > > > > > > On Wed, Nov 13, 2019 at 2:30 AM Jerry Lee > wrote: > > >

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-03 Thread Dan van der Ster
I created https://tracker.ceph.com/issues/43106 and we're downgrading our osds back to 13.2.6. -- dan On Tue, Dec 3, 2019 at 4:09 PM Dan van der Ster wrote: > > Hi all, > > We're midway through an update from 13.2.6 to 13.2.7 and started > getting OSDs crashing regularly like this [1]. > Does

[ceph-users] RGW bucket stats - strange behavior & slow performance requiring RGW restarts

2019-12-03 Thread David Monschein
Hi all, I've been observing some strange behavior with my object storage cluster running Nautilus 14.2.4. We currently have around 1800 buckets (A small percentage of those buckets are actively used), with a total of 13.86M objects. We have 20 RGWs right now, 10 for regular S3 access, and 10 for

[ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-03 Thread Dan van der Ster
Hi all, We're midway through an update from 13.2.6 to 13.2.7 and started getting OSDs crashing regularly like this [1]. Does anyone obviously know what the issue is? (Maybe https://github.com/ceph/ceph/pull/26448/files ?) Or is it some temporary problem while we still have v13.2.6 and v13.2.7

Re: [ceph-users] HA and data recovery of CEPH

2019-12-03 Thread Wido den Hollander
On 12/3/19 3:07 PM, Aleksey Gutikov wrote: That is true. When an OSD goes down it will take a few seconds for it's Placement Groups to re-peer with the other OSDs. During that period writes to those PGs will stall for a couple of seconds. I wouldn't say it's 40s, but it can take ~10s.

Re: [ceph-users] HA and data recovery of CEPH

2019-12-03 Thread Aleksey Gutikov
That is true. When an OSD goes down it will take a few seconds for it's Placement Groups to re-peer with the other OSDs. During that period writes to those PGs will stall for a couple of seconds. I wouldn't say it's 40s, but it can take ~10s. Hello, According to my experience, in case of

Re: [ceph-users] Osd auth del

2019-12-03 Thread John Hearns
Thankyou. ceph auth add did work I did try ceph auth get-or-create this does not read from an input file - it will generate a new key. On Tue, 3 Dec 2019 at 13:50, Willem Jan Withagen wrote: > On 3-12-2019 11:43, Wido den Hollander wrote: > > > > > > On 12/3/19 11:40 AM, John Hearns

Re: [ceph-users] Osd auth del

2019-12-03 Thread Willem Jan Withagen
On 3-12-2019 11:43, Wido den Hollander wrote: On 12/3/19 11:40 AM, John Hearns wrote: I had a fat fingered moment yesterday I typed                       ceph auth del osd.3 Where osd.3 is an otherwise healthy little osd I have not set noout or down on  osd.3 yet This is a Nautilus

Re: [ceph-users] Missing Ceph perf-counters in Ceph-Dashboard or Prometheus/InfluxDB...?

2019-12-03 Thread Benjeman Meekhof
I'd like to see a few of the cache tier counters exposed. You get some info on cache activity in 'ceph -s' so it makes sense from my perspective to have similar availability in exposed counters. There's a tracker for this request (opened by me a while ago): https://tracker.ceph.com/issues/37156

[ceph-users] Missing Ceph perf-counters in Ceph-Dashboard or Prometheus/InfluxDB...?

2019-12-03 Thread Ernesto Puerta
Hi Cephers, As a result of this tracker (https://tracker.ceph.com/issues/42961) Neha and I were wondering if there would be other perf-counters deemed by users/operators as worthy to be exposed via ceph-mgr modules for monitoring purposes. The default behaviour is that only perf-counters with

Re: [ceph-users] Osd auth del

2019-12-03 Thread Wido den Hollander
On 12/3/19 11:40 AM, John Hearns wrote: I had a fat fingered moment yesterday I typed                       ceph auth del osd.3 Where osd.3 is an otherwise healthy little osd I have not set noout or down on  osd.3 yet This is a Nautilus cluster. ceph health reports everything is OK

[ceph-users] Osd auth del

2019-12-03 Thread John Hearns
I had a fat fingered moment yesterday I typed ceph auth del osd.3 Where osd.3 is an otherwise healthy little osd I have not set noout or down on osd.3 yet This is a Nautilus cluster. ceph health reports everything is OK However ceph tell osd.* version hangs when it