Herve,
On Wed, Apr 29, 2020 at 2:57 PM Herve Ballans
wrote:
> Hi Alex,
>
> Thanks a lot for your tips. I note that for my planned upgrade.
>
> I take the opportunity here to add a complementary question regarding the
> require-osd-release functionality (ceph osd require-osd-release nautilus )
I have 2 filestore OSDs in a cluster facing "Caught signal (Bus error)" as
well and can't find anything about it. Ceph 12.2.12. The disks are less
than 50% full and basic writes have been successful. Both disks are on
different nodes. The other 14 disks on each node are unaffected.
Restarting the
Just testing, sorry for the noise.
Regards,
Tim
--
Tim Serong
Senior Clustering Engineer
SUSE
tser...@suse.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
On Tue, Apr 28, 2020 at 11:52 AM Robert LeBlanc wrote:
>
> In the Nautilus manual it recommends >= 4.14 kernel for multiple active
> MDSes. What are the potential issues for running the 4.4 kernel with
> multiple MDSes? We are in the process of upgrading the clients, but at
> times overrun the
I try to install ceph octopus rpms and some dependent packages still pull
in from untrusted sources
Total
8.2 MB/s | 70 MB 00:08
warning:
I think if the device is detected as non-rotational, it's treated the same
as NVMe, but I don't have any to test with. I did all the provisioning
ahead of time because I couldn't get Ansible to also create a regular OSD
on the NVMe as well as use it for DB. I provided it as an example to show
the
Is there a way to purge the crashs ?
For example is it safe and sufficient to delete everything in
/var/lib/ceph/crash on the nodes ?
F.
Le 30/04/2020 à 17:14, Paul Emmerich a écrit :
Best guess: the recovery process doesn't really stop, but it's just
that the mgr is dead and it no longer
Things to check:
* metadata is on SSD?
* try multiple active MDS servers
* try a larger cache for the MDS
* try a recent version of Ceph
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel:
Best guess: the recovery process doesn't really stop, but it's just that
the mgr is dead and it no longer reports the progress
And yeah, I can confirm that having a huge number of crash reports is a
problem (had a case where a monitoring script crashed due to a
radosgw-admin bug... lots of crash
Hi everybody (again),
We recently had a lot of osd crashs (more than 30 osd crashed). This is
now fixed, but it triggered a huge rebalancing+recovery.
More or less in the same time, we noticed that the ceph crash ls (or
whatever other ceph crash command) hangs forever and never returns.
And
Hi Stefan,
hmm... could you please collect performance counters for these two
cases. Using the following sequence
1) reset perf counters for the specific OSD
2) run bench
3) dump perf counters.
Collecting disks' (both main and db) activity with iostat would be nice
too. But please either
Francois,
With regard to OSD.8 - IMO the root cause is pretty the same -
incomplete large write occurred in buffered mode. But unfortunately it
looks like it happened a while ago and now data at rest are corrupted -
OSD detects that on startup (trying to perform DB compaction) and fails
to
Hi JC,
Thank you for the reply.
I believe global will override all (take precedence of) "mon.{id}" settings
right?
Thanks,
Gencer.
On 30.04.2020 02:34:18, JC Lopez wrote:
Hi,
later version of Ceph do not rely on the configuration file anymore but on a
MON centralized configuration which
Hi Cephers,
Can we use *crushtool* in developer mode? I have deployed a fake local
cluster for development purpose as described by Ceph documentation here (
https://docs.ceph.com/docs/mimic/dev/dev_cluster_deployement/)
Best regards
Bobby !
___
Thanks again for your reactivity and your advices. You saved our lives !
We reactivate recovery/backfilling/rebalancing and it starts the
recovery. We now have to wait to see how it will evolve.
Last question : We noticed (a few days ago and it still occurs) that
after ~1h the recovery was
I created the following ticket and PR to track/fix the issue with
incomplete large writes when bluefs_buffereed_io=1.
https://tracker.ceph.com/issues/45337
https://github.com/ceph/ceph/pull/34836
But In fact setting bluefs_buffered_io to false is the mainstream for
now, see
Hi.
How do I find out if the MDS is "busy" - being the one limiting CephFS
metadata throughput. (12.2.8).
$ time find . | wc -l
1918069
real8m43.008s
user0m2.689s
sys 0m7.818s
or 3.667ms per file.
In the light of "potentially batching" and a network latency of ~0.20ms to
the MDS -
17 matches
Mail list logo