[ceph-users] Re: Upgrade Luminous to Nautilus on a Debian system

2020-04-30 Thread Alex Gorbachev
Herve, On Wed, Apr 29, 2020 at 2:57 PM Herve Ballans wrote: > Hi Alex, > > Thanks a lot for your tips. I note that for my planned upgrade. > > I take the opportunity here to add a complementary question regarding the > require-osd-release functionality (ceph osd require-osd-release nautilus )

[ceph-users] Re: 回复: Re: OSDs continuously restarting under load

2020-04-30 Thread David Turner
I have 2 filestore OSDs in a cluster facing "Caught signal (Bus error)" as well and can't find anything about it. Ceph 12.2.12. The disks are less than 50% full and basic writes have been successful. Both disks are on different nodes. The other 14 disks on each node are unaffected. Restarting the

[ceph-users] test -- please ignore

2020-04-30 Thread Tim Serong
Just testing, sorry for the noise. Regards, Tim -- Tim Serong Senior Clustering Engineer SUSE tser...@suse.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 4.14 kernel or greater recommendation for multiple active MDS

2020-04-30 Thread Gregory Farnum
On Tue, Apr 28, 2020 at 11:52 AM Robert LeBlanc wrote: > > In the Nautilus manual it recommends >= 4.14 kernel for multiple active > MDSes. What are the potential issues for running the 4.4 kernel with > multiple MDSes? We are in the process of upgrading the clients, but at > times overrun the

[ceph-users] ceph packages

2020-04-30 Thread Mazzystr
I try to install ceph octopus rpms and some dependent packages still pull in from untrusted sources Total 8.2 MB/s | 70 MB 00:08 warning:

[ceph-users] Re: ceph-ansible question

2020-04-30 Thread Robert LeBlanc
I think if the device is detected as non-rotational, it's treated the same as NVMe, but I don't have any to test with. I did all the provisioning ahead of time because I couldn't get Ansible to also create a regular OSD on the NVMe as well as use it for DB. I provided it as an example to show the

[ceph-users] Re: ceph crash hangs forever and recovery stop

2020-04-30 Thread Francois Legrand
Is there a way to purge the crashs ? For example is it safe and sufficient to delete everything in /var/lib/ceph/crash on the nodes ? F. Le 30/04/2020 à 17:14, Paul Emmerich a écrit : Best guess: the recovery process doesn't really stop, but it's just that the mgr is dead and it no longer

[ceph-users] Re: Ceph MDS - busy?

2020-04-30 Thread Paul Emmerich
Things to check: * metadata is on SSD? * try multiple active MDS servers * try a larger cache for the MDS * try a recent version of Ceph Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel:

[ceph-users] Re: ceph crash hangs forever and recovery stop

2020-04-30 Thread Paul Emmerich
Best guess: the recovery process doesn't really stop, but it's just that the mgr is dead and it no longer reports the progress And yeah, I can confirm that having a huge number of crash reports is a problem (had a case where a monitoring script crashed due to a radosgw-admin bug... lots of crash

[ceph-users] ceph crash hangs forever and recovery stop

2020-04-30 Thread Francois Legrand
Hi everybody (again), We recently had a lot of osd crashs (more than 30 osd crashed). This is now fixed, but it triggered a huge rebalancing+recovery. More or less in the same time, we noticed that the ceph crash ls (or whatever other ceph crash command) hangs forever and never returns. And

[ceph-users] Re: adding block.db to OSD

2020-04-30 Thread Igor Fedotov
Hi Stefan, hmm... could you please collect performance counters for these two cases. Using the following sequence 1) reset perf counters for the specific OSD 2) run bench 3) dump perf counters. Collecting disks' (both main and db) activity with iostat would be nice too. But please either

[ceph-users] Re: osd crashing and rocksdb corruption

2020-04-30 Thread Igor Fedotov
Francois, With regard to OSD.8 - IMO the root cause is pretty the same - incomplete large write occurred in buffered mode. But unfortunately it looks like it happened a while ago and now data at rest are corrupted - OSD detects that on startup (trying to perform DB compaction) and fails to

[ceph-users] Re: How to apply ceph.conf changes using new tool cephadm

2020-04-30 Thread Gencer W . Genç
Hi JC, Thank you for the reply. I believe global will override all (take precedence of) "mon.{id}" settings right? Thanks, Gencer. On 30.04.2020 02:34:18, JC Lopez wrote: Hi, later version of Ceph do not rely on the configuration file anymore but on a MON centralized configuration which

[ceph-users] Ceph crushtool in developer mode

2020-04-30 Thread Bobby
Hi Cephers, Can we use *crushtool* in developer mode? I have deployed a fake local cluster for development purpose as described by Ceph documentation here ( https://docs.ceph.com/docs/mimic/dev/dev_cluster_deployement/) Best regards Bobby ! ___

[ceph-users] Re: osd crashing and rocksdb corruption

2020-04-30 Thread Francois Legrand
Thanks again for your reactivity and your advices. You saved our lives ! We reactivate recovery/backfilling/rebalancing and it starts the recovery. We now have to wait to see how it will evolve. Last question : We noticed (a few days ago and it still occurs) that after ~1h the recovery was

[ceph-users] Re: osd crashing and rocksdb corruption

2020-04-30 Thread Igor Fedotov
I created the following ticket and PR to track/fix the issue with incomplete large writes when bluefs_buffereed_io=1. https://tracker.ceph.com/issues/45337 https://github.com/ceph/ceph/pull/34836 But In fact setting bluefs_buffered_io to false is the mainstream for now, see

[ceph-users] Ceph MDS - busy?

2020-04-30 Thread jesper
Hi. How do I find out if the MDS is "busy" - being the one limiting CephFS metadata throughput. (12.2.8). $ time find . | wc -l 1918069 real8m43.008s user0m2.689s sys 0m7.818s or 3.667ms per file. In the light of "potentially batching" and a network latency of ~0.20ms to the MDS -