Hi Igor and Stefan,
thanks a lot for your help! Our cluster is almost finished with recovery and I
would like to switch to off-line conversion of the SSD OSDs. In one of Stefan's
I coud find the command for manual compaction:
ceph-kvstore-tool bluestore-kv "/var/lib/ceph/osd/ceph-${OSD_ID}" com
Dear Ceph users,
my cluster is stuck since several days with some PG backfilling. The
number of misplaced objects slowly decreases down to 5%, and at that
point jumps up again to about 7%, and so on. I found several possible
reasons for this behavior. One is related to the balancer, which anyw
Hi Stefan,
super thanks!
I found a quick-fix command in the help output:
# ceph-bluestore-tool -h
[...]
Positional options:
--command arg fsck, repair, quick-fix, bluefs-export,
bluefs-bdev-sizes, bluefs-bdev-expand,
bluefs-bdev-new-db
Unfortunately, that isn't the case: the drive is perfectly healthy and,
according to all measurements I did on the host itself, it isn't any
different from any other drive on that host size-, health- or
performance-wise.
The only difference I noticed is that this drive sporadically does more I/O
t
Hello,
I've now cluster healthy.
I've studied OSDMonitor.cc file and I've found, that there is
some problematic logic.
Assumptions:
1) require_osd_release can be only raise.
2) ceph-mon in version 17.2.3 can set require_osd_release to
minimal value 'octopus'.
I have two variants:
1) If I can
Hi Jan,
It looks like you got into this situation by not setting
require-osd-release to pacific while you were running 16.2.7.
The code has that expectation, and unluckily for you if you had
upgraded to 16.2.8 you would have had a HEALTH_WARN that pointed out
the mismatch between require_osd_relea
Hi Zakhar,
I can back up what Konstantin has reported -- we occasionally have
HDDs performing very slowly even though all smart tests come back
clean. Besides ceph osd perf showing a high latency, you could see
high ioutil% with iostat.
We normally replace those HDDs -- usually by draining and ze
Hi,
I just wanted to reshard a bucket but mistyped the amount of shards. In a
reflex I hit ctrl-c and waited. It looked like the resharding did not
finish so I canceled it, and now the bucket is in this state.
How can I fix it. It does not show up in the stale-instace list. It's also
a multisite en
Hi Casey,
thanks a lot. I added the full stack trace from our ceph-client log.
Cheers
Boris
Am Do., 6. Okt. 2022 um 19:21 Uhr schrieb Casey Bodley :
> hey Boris,
>
> that looks a lot like https://tracker.ceph.com/issues/40018 where an
> exception was thrown when trying to read a socket's remote
Hi Frank,
one more thing I realized during the night :)
Whe performing conversion DB gets a significant bunch of new data
(approx. on par with the original OMAP volume) without old one being
immediately removed. Hence one should expect DB size grows dramatically
at this point. Which should go
For format updates one can use quick-fix command instead of repair, it
might work a bit faster..
On 10/7/2022 10:07 AM, Stefan Kooman wrote:
On 10/7/22 09:03, Frank Schilder wrote:
Hi Igor and Stefan,
thanks a lot for your help! Our cluster is almost finished with
recovery and I would like t
Just FYI:
standalone ceph-bluestore-tool's quick-fix behaves pretty similar to the
action performed on start-up with bluestore_fsck_quick_fix_on_mount = true
On 10/7/2022 10:18 AM, Frank Schilder wrote:
Hi Stefan,
super thanks!
I found a quick-fix command in the help output:
# ceph-blues
The situation got solved by itself, since probably there was no error. I
manually increased the number of PGs and PGPs to 128 some days ago, and
the PGP count was being updated step by step. Actually after a bump from
5% to 7% in the count of misplaced object I noticed that the number of
PGPs w
Hi Dan,
thanks for this point, it's at least minimum, which can be done.
But can you imagine, what I have to do, when I have not an
ability to change OSDMonitor.cc, recompile and raise
require-osd-release? Or have require-osd-release lower than
nautilus?
Parameter min_mon_release raised automati
Hi Frank,
there no tools to defragment OSD atm. The only way to defragment OSD is
to redeploy it...
Thanks,
Igor
On 10/7/2022 3:04 AM, Frank Schilder wrote:
Hi Igor,
sorry for the extra e-mail. I forgot to ask: I'm interested in a tool to
de-fragment the OSD. It doesn't look like the fs
Thanks for this!
The drive doesn't show increased utilization on average, but it does
sporadically get more I/O than other drives, usually in short bursts. I am
now trying to find a way to trace this to a specific PG, pool and object
(s) – not sure if that is possible.
/Z
On Fri, 7 Oct 2022, 12:
Hi all,
trying to respond to 4 past emails :)
We started using manual conversion and, if the conversion fails, it fails in
the last step. So far, we have a fail on 1 out of 8 OSDs. The OSD can be
repaired with running a compaction + another repair, which will complete the
last step. Looks lik
Finally how is your pg distribution? How many pg/disk?
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---
-Original Message-
From: Frank Schi
Hi,
I’d look for deep-scrubs on that OSD, those are logged, maybe those
timestamps match your observations.
Zitat von Zakhar Kirpichenko :
Thanks for this!
The drive doesn't show increased utilization on average, but it does
sporadically get more I/O than other drives, usually in short bur
You can try PetaSAN
www.petasan.org
We are open source solution on top of Ceph. we provide scalable
active/active iSCSI which supports VMWare VAAI and Microsoft clustered
shared volumes for hyper-v clustering.
Cheers /maged
On 30/09/2022 19:36, Filipe Mendes wrote:
Hello!
I'm consideri
Hello,
we are encountering a strange behavior on our Ceph. (All Ubuntu 20 / All
mons Quincy 17.2.4 / Oldest OSD Quincy 17.2.0 )
Administrative commands like rbd ls or create are so slow, that libvirtd is
running into timeouts and creating new VMs on our Cloudstack, on behalf of
creating new vol
Hi folks,
The company I recently joined has a Proxmox cluster of 4 hosts with a CEPH
implementation that was set-up using the Proxmox GUI. It is running terribly,
and as a CEPH newbie I'm trying to figure out if the configuration is at fault.
I'd really appreciate some help and guidance on th
On 10/7/22 16:56, Tino Todino wrote:
Hi folks,
The company I recently joined has a Proxmox cluster of 4 hosts with a CEPH
implementation that was set-up using the Proxmox GUI. It is running terribly,
and as a CEPH newbie I'm trying to figure out if the configuration is at fault.
I'd really
Hi Tino,
Am 07.10.22 um 16:56 schrieb Tino Todino:
I know some of these are consumer class, but I'm working on replacing these.
This would be your biggest issue. SSD performance can vary drastically.
Ceph needs "multi-use" enterprise SSDs, not read-optimized consumer ones.
All 4 hosts are se
Hi,
You want to also check disk_io_weighted via some kind of metric system.
That will detect which SSDs that are hogging the systems, if there are any
specific ones. Also check their error levels and endurance.
On Fri, 7 Oct 2022 at 17:05, Stefan Kooman wrote:
> On 10/7/22 16:56, Tino Todino wr
Zakhar, try to look to top of slow ops in daemon socket for this osd, you may
find 'snapc' operations, for example. By rbd head you can find rbd image, and
then try to look how much snapshots in chain for this image. More than 10 snaps
for one image can increase client ops latency to tens millis
As of Nautilus+, when you set pg_num, it actually internally sets
pg(p)_num_target, and then slowly increases (or decreases, if you're
merging) pg_num and then pgp_num until it reaches the target. The
amount of backfill scheduled into the system is controlled by
target_max_misplaced_ratio.
Josh
O
I've observed this occur on v14.2.22 and v15.2.12. Wasn't able to find anything
obviously relevant in changelogs, bug tickets, or existing mailing list threads.
In both cases, every RGW in the cluster starts spamming logs with lines that
look like the following:
2022-09-04 14:20:45.231 7fc7b2
Thanks for the suggestions, I will try this.
/Z
On Fri, 7 Oct 2022 at 18:13, Konstantin Shalygin wrote:
> Zakhar, try to look to top of slow ops in daemon socket for this osd, you
> may find 'snapc' operations, for example. By rbd head you can find rbd
> image, and then try to look how much sna
29 matches
Mail list logo