[ceph-users] Single OSD crash/restarting during scrub operation on specific PG

2021-04-20 Thread Mark Johnson
We've recently recovered from a bit of a disaster where we had some power outages (combination of data centre power maintenance and us not having our redundant power supplies connected to the correct redundant power circuits - lesson learnt). We ended up with one OSD that wouldn't start -

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Mark Lehrer
> - The pattern is mainly write centric, so write latency is > probably the real factor > - The HDD OSDs behind the raid controllers can cache / reorder > writes and reduce seeks potentially OK that makes sense. Unfortunately, re-ordering HDD writes without a battery backup is kind of

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Anthony D'Atri
> It's not a 100% clear to me, but is the pdcache the same as the disk > internal (non battery backed up) cache? Yes, AIUI. > As we are located very nearby the hydropower plant, we actually connect > each server individually to an UPS. Lucky you. I’ve seen an entire DC go dark with a power

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Nico Schottelius
Reed Dier writes: > I don't have any performance bits to offer, but I do have one experiential > bit to offer. > > My initial ceph deployment was on existing servers, that had LSI raid > controllers (3108 specifically). > We created R0 vd's for each disk, and had BBUs so were using write

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Nico Schottelius
Mark Lehrer writes: >> One server has LSI SAS3008 [0] instead of the Perc H800, >> which comes with 512MB RAM + BBU. On most servers latencies are around >> 4-12ms (average 6ms), on the system with the LSI controller we see >> 20-60ms (average 30ms) latency. > > Are these reads, writes, or a

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Nico Schottelius
Marc writes: > This is what I have when I query prometheus, most hdd's are still sata > 5400rpm, there are also some ssd's. I also did not optimize cpu frequency > settings. (forget about the instance=c03, that is just because the data comes > from mgr c03, these drives are on different

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Anthony D'Atri
I don’t have the firmware versions handy, but at one point around the 2014-2015 timeframe I found that both LSI’s firmware and storcli claimed that the default setting was DiskDefault, ie. leave whatever the drive has alone. It turned out, though, that for the 9266 and 9271, at least, behind

[ceph-users] Re: EC Backfill Observations

2021-04-20 Thread Josh Durgin
Hey Josh, adding the dev list where you may get more input. Generally I think your analysis is correct about the current behavior. In particular if another copy of a shard is available, backfill or recovery will read from just that copy, not the rest of the OSDs. Otherwise, k shards must be

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Reed Dier
I don't have any performance bits to offer, but I do have one experiential bit to offer. My initial ceph deployment was on existing servers, that had LSI raid controllers (3108 specifically). We created R0 vd's for each disk, and had BBUs so were using write back caching. The big problem that

[ceph-users] Issues upgrading to 16.2.1

2021-04-20 Thread Radoslav Milanov
Hello Tried cephadm upgrade form 16.2.0 to 16.2.1 Managers were updated first then process halted on first monitor being upgraded. The monitor fails to start: root@host3:/var/lib/ceph/c8ee2878-9d54-11eb-bbca-1c34da4b9fb6/mon.host3# /usr/bin/docker run --rm --ipc=host --net=host

[ceph-users] Re: [Ceph-maintainers] v14.2.20 Nautilus released

2021-04-20 Thread Mike Perez
I've updated these entries with the appropriate link. Thanks Ilya. On Tue, Apr 20, 2021 at 2:27 AM Ilya Dryomov wrote: > > On Tue, Apr 20, 2021 at 2:01 AM David Galloway wrote: > > > > This is the 20th bugfix release in the Nautilus stable series. It > > addresses a security vulnerability in

[ceph-users] Re: HBA vs caching Raid controller

2021-04-20 Thread Mark Lehrer
> One server has LSI SAS3008 [0] instead of the Perc H800, > which comes with 512MB RAM + BBU. On most servers latencies are around > 4-12ms (average 6ms), on the system with the LSI controller we see > 20-60ms (average 30ms) latency. Are these reads, writes, or a mixed workload? I would expect

[ceph-users] Re: BlueFS spillover detected (Nautilus 14.2.16)

2021-04-20 Thread by morphin
There is a lot of bug-fix on RGW between 14.2.16 --> 19 and this is a prod environment. I always follow few versions behind to minimize the risk. Only for OSD improvement I'll not take the risk at RGW side. It's better to play with rocksdb options. Thanks for the advice. Konstantin Shalygin , 19

[ceph-users] Re: [Ceph-maintainers] v14.2.20 Nautilus released

2021-04-20 Thread Ilya Dryomov
On Tue, Apr 20, 2021 at 11:30 AM Dan van der Ster wrote: > > On Tue, Apr 20, 2021 at 11:26 AM Ilya Dryomov wrote: > > > > On Tue, Apr 20, 2021 at 2:01 AM David Galloway wrote: > > > > > > This is the 20th bugfix release in the Nautilus stable series. It > > > addresses a security vulnerability

[ceph-users] Re: [Ceph-maintainers] v14.2.20 Nautilus released

2021-04-20 Thread Dan van der Ster
On Tue, Apr 20, 2021 at 11:26 AM Ilya Dryomov wrote: > > On Tue, Apr 20, 2021 at 2:01 AM David Galloway wrote: > > > > This is the 20th bugfix release in the Nautilus stable series. It > > addresses a security vulnerability in the Ceph authentication framework. > > We recommend users to update

[ceph-users] Re: [Ceph-maintainers] v16.2.1 Pacific released

2021-04-20 Thread Ilya Dryomov
On Tue, Apr 20, 2021 at 2:02 AM David Galloway wrote: > > This is the first bugfix release in the Pacific stable series. It > addresses a security vulnerability in the Ceph authentication framework. > We recommend users to update to this release. For a detailed release > notes with links &

[ceph-users] Re: [Ceph-maintainers] v15.2.11 Octopus released

2021-04-20 Thread Ilya Dryomov
On Tue, Apr 20, 2021 at 1:56 AM David Galloway wrote: > > This is the 11th bugfix release in the Octopus stable series. It > addresses a security vulnerability in the Ceph authentication framework. > We recommend users to update to this release. For a detailed release > notes with links &

[ceph-users] Re: [Ceph-maintainers] v14.2.20 Nautilus released

2021-04-20 Thread Ilya Dryomov
On Tue, Apr 20, 2021 at 2:01 AM David Galloway wrote: > > This is the 20th bugfix release in the Nautilus stable series. It > addresses a security vulnerability in the Ceph authentication framework. > We recommend users to update to this release. For a detailed release > notes with links &

[ceph-users] Re: any experience on using Bcache on top of HDD OSD

2021-04-20 Thread Matthias Ferdinand
On Tue, Apr 20, 2021 at 08:27:50AM +0200, huxia...@horebdata.cn wrote: > Dear Mattias, > > Very glad to know that your setting with Bcache works well in production. > > How long have you been puting XFS on bcache on HDD in production? Which > bcache version (i mean the kernel) do you use? or

[ceph-users] Re: any experience on using Bcache on top of HDD OSD

2021-04-20 Thread huxia...@horebdata.cn
Dear Mattias, Very glad to know that your setting with Bcache works well in production. How long have you been puting XFS on bcache on HDD in production? Which bcache version (i mean the kernel) do you use? or do you use a special version of bcache? thanks in advance, samuel