Hi,
this is not an easy topic and there is no formula that can be applied
to all clusters. From my experience, it is exactly how the discussion
went in the thread you mentioned, trial & error.
Looking at your session ls output, this reminds of a debug session we
had a few years ago:
Hi Goetz,
Which method you finally choose?
We've done a successful migration from Centos 8 to ubuntu 20.04 but we have a
centos 7 nautilus cluster which we'd like to move to Ubuntu 20.04 octopus same
as you.
Wonder any of you tried to skip Rocky 8 from the flow?
Thank you
__
This my active MDS perf dump output:
root@ud-01:~# ceph tell mds.ud-data.ud-02.xcoojt perf dump
{
"AsyncMessenger::Worker-0": {
"msgr_recv_messages": 17179307,
"msgr_send_messages": 15867134,
"msgr_recv_bytes": 445239812294,
"msgr_send_bytes": 42003529245,
All of my clients are servers located at 2 hop away with 10Gbit network and
2x Xeon CPU/16++ cores and minimum 64GB ram with SSD OS drive + 8GB spare.
I use ceph kernel mount only and this is the command:
- mount.ceph admin@$fsid.ud-data=/volumes/subvolumegroup ${MOUNT_DIR} -o
name=admin,secret=XXX
Let me share some outputs about my cluster.
root@ud-01:~# ceph fs status
ud-data - 84 clients
===
RANK STATE MDS ACTIVITY DNSINOS DIRS
CAPS
0active ud-data.ud-02.xcoojt Reqs: 31 /s 3022k 3021k 52.6k
385k
POOL TYPE USED AVAIL
Hello Eugen.
Thank you for the answer.
According to knowledge and test results at this issue:
https://github.com/ceph/ceph/pull/38574
I've tried their advice and I've applied the following changes.
max_mds = 4
standby_mds = 1
mds_cache_memory_limit = 16GB
mds_recall_max_caps = 4
When I set t
>Groovy. Channel drives are IMHO a pain, though in the case of certain
>manufacturers it can be the only way to get firmware updates. Channel drives
>often only have a 3 year warranty, vs 5 for generic drives.
Yes, we have run into this with Kioxia as far as being able to find new
firmware. W
Hi Jan,
I've just fired an upstream ticket for your case, see
https://tracker.ceph.com/issues/64053 for more details.
You might want to tune (or preferably just remove) your custom
bluestore_cache_.*_ratio settings to fix the issue.
This is reproducible and fixable in my lab this way.
Hop
>
> NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain
> tier-one chassis manufacturers mark both the fsck up. You can get a better
> warranty and pricing by buying drives from a VAR.
>
> We stopped buying “Vendor FW” drives a long time ago.
Groovy. Cha
Sridhar,
Thanks a lot for this explantation. It's clearer now.
So at the end of the day (at least with balanced profile) it's a lower bound
and no upper limit and a balanced distribution between client and cluster IOPS.
Regards,
Frédéric.
-Message original-
De: Sr
By HBA I suspect you mean a non-RAID HBA?
Yes, something like the HBA355
NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain
tier-one chassis manufacturers mark both the fsck up. You can get a better
warranty and pricing by buying drives from a VAR.
We st
Hi,
just in case someone else might run into this or similar issues.
The following helped to solve the issue:
1. restarting the active mgr
brought:
pg 10.17 is stuck inactive for 18m, current state unknown, last acting []
.. the pg into inactive without last acting
2. so we recreated the p
Hello Ceph users,
we see strange issue on last recent Ceph installation v17.6.2. We store
data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME
partition. Benchmarks didn't expose any issues with cluster, but since we
placed production load on it we see constantly growing OSD late
Good morning Eugen,
I just found this thread and saw that I had a test image for rgw in the
config.
After removing the global and the rgw config value everything was instantly
fine.
Cheers and a happy week
Boris
Am Di., 16. Jan. 2024 um 10:20 Uhr schrieb Eugen Block :
> Hi,
>
> there have bee
Hi owners of ceph-users list, I've been trying to post new message for the first time. First has been bounced because I've registered, but not subscribed to list. Than I've subscribed and sent message with picture, which was larger than allowed 500KB and got into quarantine as well. I've decided to
Hi,
there have been a few threads with this topic, one of them is this one
[1]. The issue there was that different ceph container images were in
use. Can you check your container versions? If you don't configure a
global image for all ceph daemons, e.g.:
quincy-1:~ # ceph config set globa
Hi,
I don't really have an answer, I just wanted to mention that I created
a tracker issue [1] because I believe there's a bug in the LRC plugin.
But there hasn't been any response yet.
[1] https://tracker.ceph.com/issues/61861
Zitat von Ansgar Jazdzewski :
hi folks,
I currently test er
Hi,
could you provide more details what exactly you tried and which
configs you set? Which compression mode are you running?
In a small Pacific test cluster I just set the mode to "force"
(default "none"):
storage:~ # ceph config set osd bluestore_compression_mode force
And then after a
Hi,
I have dealt with this topic multiple times, the SUSE team helped
understanding what's going on under the hood. The summary can be found
in this thread [1].
What helped in our case was to reduce the mds_recall_max_caps from 30k
(default) to 3k. We tried it in steps of 1k IIRC. So I su
Did you find an existing tracker issue for that? I suggest to report
your findings there.
Thanks!
Eugen
Zitat von Reto Gysi :
Hi Eugen
LV tags seem to look ok to me.
LV_tags:
-
root@zephir:~# lvs -a -o +devices,tags | egrep 'osd1| LV' | grep -v osd12
LV
20 matches
Mail list logo