[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-16 Thread Janek Bevendorff
As noted in the bug report, the issue has affected only multipart objects at this time. I have added some more remarks there. And yes, multipart objects tend to have 0 byte head objects in general. The affected objects are simply missing all shadow objects, leaving us with nothing but the empt

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Robert Sander
Am 12.11.20 um 23:18 schrieb Phil Merricks: > Thanks for the reply Robert.  Could you briefly explain the issue with > the current setup and "what good looks like" here, or point me to some > documentation that would help me figure that out myself? > > I'm guessing here it has something  to do wit

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Robert Sander
Am 11.11.20 um 13:05 schrieb Hans van den Bogert: > And also the erasure coded profile, so an example on my cluster would be: > > k=2 > m=1 With this profile you can only loose one OSD at a time, which is really not that redundant. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Hans van den Bogert
> With this profile you can only loose one OSD at a time, which is really > not that redundant. That's rather situation dependent. I don't have really large disks, so the repair time isn't that large. Further, my SLO isn't that high that I need 99.xxx% uptime, if 2 disks break in the same repair wi

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Janne Johansson
Den mån 16 nov. 2020 kl 10:54 skrev Hans van den Bogert < hansbog...@gmail.com>: > > With this profile you can only loose one OSD at a time, which is really > > not that redundant. > That's rather situation dependent. I don't have really large disks, so > the repair time isn't that large. > Furthe

[ceph-users] Re: build nautilus 14.2.13 packages and container

2020-11-16 Thread Engelmann Florian
I was able to fix this dependency problem by deleting the 'BuildRequires: python%{_python_buildid}-scipy' line from the ceph.spec.in file: docker pull centos:7.7.1908 docker run -ti centos:7.7.1908 /bin/bash cd root yum install -y epel-release yum install -y git wget

[ceph-users] Using rbd-nbd tool in Ceph development cluster

2020-11-16 Thread Bobby
Hi, I have to use this *rbd-nbd *tool from Ceph. This is part of Ceph source code. Here: https://github.com/ceph/ceph/tree/master/src/tools/rbd_nbd My question is: Can we use this *rbd-nbd* tool in the Ceph cluster? By Ceph cluster I mean the development cluster we build through *vstart.sh* scrip

[ceph-users] Re: which of cpu frequency and number of threads servers osd better?

2020-11-16 Thread Frank Schilder
We are starting to use 18TB spindles, have loads of cold data and only a thin layer of hot data. One 4/8TB NVMe drive as a cache in front of 6x18TB will provide close to or even matching SSD performance for the hot data at a reasonable extra cost per TB storage. My plan is to wait for 1-2 more y

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Hans van den Bogert
I think we're deviating from the original thread quite a bit and I would never argue that in a production environment with plenty OSDs you should go for R=2 or K+1, so my example cluster which happens to be 2+1 is a bit unlucky. However I'm interested in the following On 11/16/20 11:31 AM, Ja

[ceph-users] Mimic updated to Nautilus - pg's 'update_creating_pgs' in log, but they exist and cluster is healthy.

2020-11-16 Thread m . sliwinski
Hi With the help of Dan van der Steri i managed to confirm my suspicions that the problem with osdmap not trimming correctly was caused by PGs that somehow wasn't marked as created by MONs. For example, from log: 2020-11-16 12:57:00.514 7f131496f700 10 mon.monb01@0(probing).osd e72792 update

[ceph-users] Re: [Ceph-qa] Using rbd-nbd tool in Ceph development cluster

2020-11-16 Thread Mykola Golub
On Mon, Nov 16, 2020 at 12:19:35PM +0100, Bobby wrote: > My question is: Can we use this *rbd-nbd* tool in the Ceph cluster? By Ceph > cluster I mean the development cluster we build through *vstart.sh* script. > I am quite sure we could use it. I have this script running. I can *start* > and *st

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Janne Johansson
> > However I'm interested in the following > > On 11/16/20 11:31 AM, Janne Johansson wrote: > > So while one could always say "one more drive is better than your > > amount", there are people losing data with repl=2 or K+1 because some > > more normal operation was in flight and _then_ a single

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Hans van den Bogert
All good points (also replying to Frank Schilder) On 11/16/20 2:36 PM, Janne Johansson wrote: Not trying to say you don't understand this, but rather that people who run small ceph clusters tend to start out with R=2 or K+1 EC because the larger faults are easier to imagine. TBH, I think I did

[ceph-users] Re: BLUEFS_SPILLOVER BlueFS spillover detected

2020-11-16 Thread Dave Hall
Zhenshi, I've been doing the same periodically over the past couple weeks. I haven't had to do it a second time on any of my OSDs, but I'm told that I can expect to do so in the future. I believe that the conclusion in this list was that for a workload with many small files it might be necessary

[ceph-users] Re: Octopus OSDs dropping out of cluster: _check_auth_rotating possible clock skew, rotating keys expired way too early

2020-11-16 Thread Lazuardi Nasution
Hi, I have the same situation with some OSD on Octopus 15.2.5 (Ubuntu 20,04). But, I have no problem with MGR. Any clue about this? Best regards, Date: Tue, 9 Jun 2020 23:47:24 +0200 > From: Wido den Hollander > Subject: [ceph-users] Octopus OSDs dropping out of cluster: > _check_auth_

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Frank Schilder
To throw in my 5 cents. Choosing m in k+m EC replication is not random and the argument that anyone with larger m could always say lower m is wrong is also not working. Why are people recommending m>=2 for production (or R>=3 replicas)? Its very simple. What is forgotten below is maintenance. W

[ceph-users] How to configure restful cert/key under nautilus

2020-11-16 Thread Gary Molenkamp
I am attempting to configure the restful api for integration with zabbix agent2, but I am unable to configure the ssl cert and key based on the documentation here: https://docs.ceph.com/en/nautilus/mgr/restful/ #> ceph config-key set mgr/restful/controller02/crt -i ./mycert.crt WARNING: it look

[ceph-users] Re: OSD memory leak?

2020-11-16 Thread Frank Schilder
Dear all, I collected memory allocation data over a period of 2 months; see the graphs here: . I need to revise my statement about accelerated growth. The new graphs indicate that we are looking at linear growth, that is, probably a small memory leak in a regularly

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-16 Thread Phil Merricks
Thanks for all the replies folks. I think it's testament to the versatility of Ceph that there are some differences of opinion and experience here. With regards to the purpose of this cluster, it is providing distributed storage for stateful workloads of containers. The data produced is somewhat

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-16 Thread Eric Ivancich
I’m wondering if anyone experiencing this bug would mind running `radosgw-admin gc list --include-all` on a schedule and saving the results. I’d like to know whether these tail objects are getting removed by the gc process. If we find that that’s the case then there’s the issue of how they got o