[ceph-users] Re: Question regarding bluestore labels

2024-06-10 Thread Igor Fedotov
scribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 W

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-05 Thread Igor Fedotov
On 05/04/2024 17:28, xu chenhui wrote: Hi, Igor Thank you for providing the repair procedure. I will try it when I am back to my workstation. Can you provide any possible reasons for this problem? Unfortunately no. I recall a few cases like that but I doubt any one knows the root cause. ceph

[ceph-users] Re: A couple OSDs not starting after host reboot

2024-04-05 Thread Igor Fedotov
Hi chenhui, there is still a work in progress to support multiple labels to avoid the issue (https://github.com/ceph/ceph/pull/55374). But this is of little help for your current case. If your disk is fine (meaning it's able to read/write block at offset 0) you might want to try to recover

[ceph-users] Re: log_latency slow operation observed for submit_transact, latency = 22.644258499s

2024-03-22 Thread Igor Fedotov
h large periods of time in between with normal low latencies I think it unlikely that it is just because the cluster is busy. Also, how come there's only a small amount of PGs doing backfill when we have such a large misplaced percentage? Can this be just from backfill reservation logjam? Mvh. Tor

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
cing the issue using only files in this directory. This way, you will be sure that nobody else is writing any data to the new pool. On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov <mailto:igor.fedo...@croit.io> wrote: Hi Thorn, given the amount of files

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
cing the issue using only files in this directory. This way, you will be sure that nobody else is writing any data to the new pool. On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov <mailto:igor.fedo...@croit.io> wrote: Hi Thorn, given the amount of files

[ceph-users] Re: CephFS space usage

2024-03-20 Thread Igor Fedotov
there must be Ceph diagnostic tools to describe what those objects are being used for, surely? We're talking about an extra 10TB of space. How hard can it be to determine which file those objects are associated with? On 19/03/2024 8:39 pm, Igor Fedotov wrote: Hi Thorn, given the amoun

[ceph-users] Re: OSD does not die when disk has failures

2024-03-20 Thread Igor Fedotov
Hi Robert, I presume the plan was to support handling EIO at upper layers. But apparently that hasn't been completed. Or there are some bugs... Will take a look. Thanks, Igor On 3/19/2024 3:36 PM, Robert Sander wrote: Hi, On 3/19/24 13:00, Igor Fedotov wrote: translating EIO to upper

[ceph-users] Re: OSD does not die when disk has failures

2024-03-19 Thread Igor Fedotov
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO

[ceph-users] Re: CephFS space usage

2024-03-19 Thread Igor Fedotov
mapping between them and CephFS file list. Could be a bit tricky though. On 15/03/2024 7:18 pm, Igor Fedotov wrote: ceph df detail --format json-pretty -- Regards, Thorne Lawler - Senior System Administrator *DDNS* | ABN 76 088 607 265 First registrar certified ISO 27001-2013 Data Security

[ceph-users] Re: CephFS space usage

2024-03-15 Thread Igor Fedotov
06a14 59  ./xcp_nfs_sr 1   ./failover_test 2   ./template/iso 3   ./template 1   ./xcpiso 67  . root@pmx101:/mnt/pve/iso# rados lssnap -p cephfs.shared.data 0 snaps What/where are all the other objects?!? On 15/03/2024 3:36 am, Igor Fedotov wrote: Thorn, you might

[ceph-users] Re: CephFS space usage

2024-03-14 Thread Igor Fedotov
ys: # du -sh . 5.5T    . So yeah - 14TB is replicated to 41TB, that's fine, but 14TB is a lot more than 5.5TB, so... where is that space going? On 14/03/2024 2:09 am, Igor Fedotov wrote: Hi Thorn, could you please share the output of "ceph df detail" command representing the problem?

[ceph-users] Re: CephFS space usage

2024-03-13 Thread Igor Fedotov
that space, and if possible, how can I reclaim that space? Thank you. -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht

[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-13 Thread Igor Fedotov
that all osds originally built under octopus be redeployed with default settings and that default settings continue to be used going forward. Is that correct? Thanks for your assistance, Joel On Tue, Mar 12, 2024 at 4:13 AM Igor Fedotov wrote: Hi Joel, my primary statement would be -

[ceph-users] Re: bluestore_min_alloc_size and bluefs_shared_alloc_size

2024-03-12 Thread Igor Fedotov
Hi Joel, my primary statement would be - do not adjust "alloc size" settings on your own and use default values! We've had pretty long and convoluted evolution of this stuff so tuning recommendations and their aftermaths greatly depend on the exact Ceph version. While using improper

[ceph-users] Re: has anyone enabled bdev_enable_discard?

2024-03-01 Thread Igor Fedotov
I played with this feature a while ago and recall it had visible negative impact on user operations due to the need to submit tons of discard operations - effectively each data overwrite operation triggers one or more discard operation submission to disk. And I doubt this has been widely used

[ceph-users] Re: How can I clone data from a faulty bluestore disk?

2024-02-02 Thread Igor Fedotov
Hi Carl, you might want to use ceph-objectstore-tool to export PGs from faulty OSDs and import them back to healthy ones. The process could be quite tricky though. There is also pending PR (https://github.com/ceph/ceph/pull/54991) to make the tool more tolerant to disk errors. The patch

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-17 Thread Igor Fedotov
about osd.0, which have a problem with bluestore fsck? Is there a way to repair it? Sincerely Jan Dne Út, led 16, 2024 at 08:15:03 CET napsal(a) Igor Fedotov: Hi Jan, I've just fired an upstream ticket for your case, see https://tracker.ceph.com/issues/64053 for more details. You might want

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-16 Thread Igor Fedotov
of memory... My ceph config dump - see attached dump.txt Sincerely Jan Marek Dne Čt, led 11, 2024 at 04:02:02 CET napsal(a) Igor Fedotov: Hi Jan, unfortunately this wasn't very helpful. Moreover the log looks a bit messy - looks like a mixture of outputs from multiple running instances

[ceph-users] Re: Pacific bluestore_volume_selection_policy

2024-01-11 Thread Igor Fedotov
ueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71]  21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609]  22: clone()  NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. On Jan 10, 2024, at 12:06 PM, Igor Fedotov wrote: Hi Reed, it looks to

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-11 Thread Igor Fedotov
, led 10, 2024 at 01:03:07 CET napsal(a) Igor Fedotov: Hi Jan, indeed this looks like some memory allocation problem - may be OSD's RAM usage threshold reached or something? Curious if you have any custom OSD settings or may be any memory caps for Ceph containers? Could you please set

[ceph-users] Re: Pacific bluestore_volume_selection_policy

2024-01-10 Thread Igor Fedotov
Hi Reed, it looks to me like your settings aren't effective. You might want to check OSD log rather than crash info and see the assertion's backtrace. Does it mention RocksDBBlueFSVolumeSelector as the one in https://tracker.ceph.com/issues/53906: ceph version 17.0.0-10229-g7e035110

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-10 Thread Igor Fedotov
/osd.1 --command repair and then start osd.1 ceph-osd podman service. It semms, that there is problem with memory allocation, see attached log... Sincerely Jan Dne Út, led 09, 2024 at 02:23:32 CET napsal(a) Igor Fedotov: Hi Marek, I haven't looked through those upgrade logs yet but here are some

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-09 Thread Igor Fedotov
it seems to be correct (for my eyes). Can be this coincidence for this problem? Sincerely Jan Marek Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov: Hi Jan, may I see the fsck logs from all the failing OSDs to see the pattern. IIUC the full node is suffering from the issue, right?

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-08 Thread Igor Fedotov
correct (for my eyes). Can be this coincidence for this problem? Sincerely Jan Marek Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov: Hi Jan, may I see the fsck logs from all the failing OSDs to see the pattern. IIUC the full node is suffering from the issue, right? Thanks, Igo

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-04 Thread Igor Fedotov
5/20" ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck And I've sending /tmp/osd.0.log file attached. Sincerely Jan Marek Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov: Hi Jan, this doesn't look like RocksDB corruption but r

[ceph-users] Re: Stuck in upgrade process to reef

2023-12-30 Thread Igor Fedotov
ting while starting OSD.0. Many thanks for help. Sincerely Jan Marek Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov: Hi Jan, IIUC the attached log is for ceph-kvstore-tool, right? Can you please share full OSD startup log as well? Thanks, Igor On 12/27/2023 4:30 PM, Jan Marek wro

[ceph-users] Re: Stuck in upgrade process to reef

2023-12-27 Thread Igor Fedotov
? It seems, that rocksdb is even non-compatible or corrupted :-( Thanks in advance. Sincerely Jan Marek ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer

[ceph-users] Re: OSD has Rocksdb corruption that crashes ceph-bluestore-tool repair

2023-12-18 Thread Igor Fedotov
Hi Malcolm, you might want to try ceph-objectstore-tool's export command to save the PG into a file and then import it to another OSD. Thanks, Igor On 18/12/2023 02:59, Malcolm Haak wrote: Hello all, I had an OSD go offline due to UWE. When restarting the OSD service, to try and at least

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-12-01 Thread Igor Fedotov
://github.com/ceph/ceph/pull/54677 were approved/tested and ready for merge. What is the status/plan for https://tracker.ceph.com/issues/63618? On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov wrote: https://tracker.ceph.com/issues/63618 to be considered as a blocker for the next Reef release

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-29 Thread Igor Fedotov
https://tracker.ceph.com/issues/63618 to be considered as a blocker for the next Reef release. On 07/11/2023 00:30, Yuri Weinstein wrote: Details of this release are summarized here: https://tracker.ceph.com/issues/63443#note-1 Seeking approvals/reviews for: smoke - Laura, Radek, Prashant,

[ceph-users] Re: CLT Meeting minutes 2023-11-23

2023-11-23 Thread Igor Fedotov
/59580 -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us athttps://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web:https://croit.io | YouTube:https://goo.gl

[ceph-users] Re: migrate wal/db to block device

2023-11-15 Thread Igor Fedotov
ng else? And how *should* moving the wal/db be done? Cheers, Chris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Conta

[ceph-users] Re: migrate wal/db to block device

2023-11-15 Thread Igor Fedotov
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead

[ceph-users] Re: Ceph Allocation - used space is unreasonably higher than stored space

2023-11-13 Thread Igor Fedotov
Hi Motahare, On 13/11/2023 14:44, Motahare S wrote: Hello everyone, Recently we have noticed that the results of "ceph df" stored and used space does not match; as the amount of stored data *1.5 (ec factor) is still like 5TB away from used amount: POOLID PGS

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-20 Thread Igor Fedotov
the previous version, 16.2.13 which worked for us without any issues, is an option, or would you advise against such a downgrade from 16.2.14? /Z On Fri, 20 Oct 2023 at 14:46, Igor Fedotov wrote: Hi Zakhar, Definitely we expect one more (and apparently the last) Pacif

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-20 Thread Igor Fedotov
th regards to a possible fix? We're experiencing several OSD crashes caused by this issue per day. /Z On Mon, 16 Oct 2023 at 14:19, Igor Fedotov wrote: That's true. On 16/10/2023 14:13, Zakhar Kirpichenko wrote: Many thanks, Igor. I found previously submitted b

[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2023-10-16 Thread Igor Fedotov
Hi Chris, for the first question (osd.76) you might want to try ceph-volume's "lvm migrate --from data --target " command. Looks like some persistent DB remnants are still kept at main device causing the alert. W.r.t osd.86's question - the line "SLOW    0 B 3.0 GiB 59 GiB"

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Igor Fedotov
That's true. On 16/10/2023 14:13, Zakhar Kirpichenko wrote: Many thanks, Igor. I found previously submitted bug reports and subscribed to them. My understanding is that the issue is going to be fixed in the next Pacific minor release. /Z On Mon, 16 Oct 2023 at 14:03, Igor Fedotov wrote

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Igor Fedotov
Hi Zakhar, please see my reply for the post on the similar issue at: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/ Thanks, Igor On 16/10/2023 09:26, Zakhar Kirpichenko wrote: Hi, After upgrading to Ceph 16.2.14 we had several OSD

[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-04 Thread Igor Fedotov
Hi Zakhar, do reduce rocksdb logging verbosity you might want to set debug_rocksdb to 3 (or 0). I presume it produces a  significant part of the logging traffic. Thanks, Igor On 04/10/2023 20:51, Zakhar Kirpichenko wrote: Any input from anyone, please? On Tue, 19 Sept 2023 at 09:01,

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-28 Thread Igor Fedotov
Hi Sudhin, It looks like manual DB compactions are (periodically?) issued via admin socket for your OSDs, which (my working hypothesis) triggers DB access stalls. Here are the log lines indicating such calls debug 2023-09-22T11:24:55.234+ 7fc4efa20700  1 osd.1 1192508 triggering manual

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-26 Thread Igor Fedotov
Hi Sudhin, any publicly available cloud storage, e.g. Google drive should work. Thanks, Igor On 26/09/2023 22:52, sbeng...@gmail.com wrote: Hi Igor, Please let where can I upload the OSD logs. Thanks. Sudhin ___ ceph-users mailing list --

[ceph-users] Re: Recently started OSD crashes (or messages thereof)

2023-09-21 Thread Igor Fedotov
Hi Luke, highly likely this is caused by the issue covered https://tracker.ceph.com/issues/53906 Unfortunately it looks like we missed proper backport in Pacific. You can apparently work around the issue by setting 'bluestore_volume_selection_policy' config parameter to rocksdb_original.

[ceph-users] Re: After power outage, osd do not restart

2023-09-21 Thread Igor Fedotov
ting unit. Ignoring. sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Patrick Le 21/09/2023 à 12:44, Igor Fedotov a écrit : Hi Patrick, please share osd restart log to investi

[ceph-users] Re: After power outage, osd do not restart

2023-09-21 Thread Igor Fedotov
Hi Patrick, please share osd restart log to investigate that. Thanks, Igor On 21/09/2023 13:41, Patrick Begou wrote: Hi, After a power outage on my test ceph cluster, 2 osd fail to restart.  The log file show: 8e5f-00266cf8869c@osd.2.service: Failed with result 'timeout'. Sep 21

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-21 Thread Igor Fedotov
Hi! Can you share OSD logs demostrating such a restart? Thanks, Igor On 20/09/2023 20:16, sbeng...@gmail.com wrote: Since upgrading to 18.2.0 , OSDs are very frequently restarting due to livenessprobe failures making the cluster unusable. Has anyone else seen this behavior? Upgrade path:

[ceph-users] Re: cannot create new OSDs - ceph version 17.2.6 (810db68029296377607028a6c6da1ec06f5a2b27) quincy (stable)

2023-09-13 Thread Igor Fedotov
Mack Im Köller 3, 70794 Filderstadt, Germany On 2023-09-11 22:08, Igor Fedotov wrote: Hi Martin, could you please share the full existing log and also set debug_bluestore and debug_bluefs to 20 and collect new osd startup log. Thanks, Igor On 11/09/2023 20:53, Konold, Martin wrote: Hi, I

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-12 Thread Igor Fedotov
: Hi Igor, On 12 Sep 2023, at 15:28, Igor Fedotov wrote: Default hybrid allocator (as well as AVL one it's based on) could take dramatically long time to allocate pretty large (hundreds of MBs) 64K-aligned chunks for BlueFS. At the original cluster it was exposed as 20-30 sec OSD stalls

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-12 Thread Igor Fedotov
HI All, as promised here is a postmortem analysis on what happened. the following ticket (https://tracker.ceph.com/issues/62815) with accompanying materials provide a low-level overview on the issue. In a few words it is as follows: Default hybrid allocator (as well as AVL one it's based

[ceph-users] Re: cannot create new OSDs - ceph version 17.2.6 (810db68029296377607028a6c6da1ec06f5a2b27) quincy (stable)

2023-09-11 Thread Igor Fedotov
Hi Martin, could you please share the full existing log and also set debug_bluestore and debug_bluefs to 20 and collect new osd startup log. Thanks, Igor On 11/09/2023 20:53, Konold, Martin wrote: Hi, I want to create a new OSD on a 4TB Samsung MZ1L23T8HBLA-00A07 enterprise nvme device

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-29 Thread Igor Fedotov
ers mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking

[ceph-users] Re: Lots of space allocated in completely empty OSDs

2023-08-14 Thread Igor Fedotov
Hi Andras, does ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2925 --op meta-list show nothing as well? On 8/11/2023 11:00 PM, Andras Pataki wrote: ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2925  --op list -- Igor Fedotov Ceph Lead Developer Looking for help

[ceph-users] Re: ceph-volume lvm migrate error

2023-08-02 Thread Igor Fedotov
eph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web:

[ceph-users] Re: OSD stuck on booting state after upgrade (v15.2.17 -> v17.2.6)

2023-07-27 Thread Igor Fedotov
Hi, looks like you've hit into https://tracker.ceph.com/issues/58156 IIRC there is no way to work around the issue other than having custom build with the proper patch (Quincy backport is https://github.com/ceph/ceph/pull/51102). Unfortunately the fix hasn't been merged into Quincy/Pacific

[ceph-users] Re: Bluestore compression - Which algo to choose? Zstd really still that bad?

2023-06-27 Thread Igor Fedotov
Hi Christian, I can't say anything about your primary question on zstd benefits/drawbacks but I'd like to emphasize that compression ratio at BlueStore is (to a major degree) determined by the input data flow characteristics (primarily write block size), object store allocation unit size

[ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

2023-06-22 Thread Igor Fedotov
On 22/06/2023 00:04, Fox, Kevin M wrote: Does quincy automatically switch existing things to 4k or do you need to do a new ost to get the 4k size? Thanks, Kevin From: Igor Fedotov Sent: Wednesday, June 21, 2023 5:56 AM To: Carsten Grommel; ceph-users

[ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

2023-06-21 Thread Igor Fedotov
be to deploy the osd and db on the same NVMe but with different logical volumes or updating to quincy. Thank you! Carsten *Von: *Igor Fedotov *Datum: *Dienstag, 20. Juni 2023 um 12:48 *An: *Carsten Grommel , ceph-users@ceph.io *Betreff: *Re: [ceph-users] Ceph Pacific bluefs enospc bug wi

[ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

2023-06-20 Thread Igor Fedotov
Hi Carsten, first of all Quincy does have a fix for the issue, see https://tracker.ceph.com/issues/53466 (and its Quincy counterpart https://tracker.ceph.com/issues/58588) Could you please share a bit more info on OSD disk layout? SSD or HDD? Standalone or shared DB volume? I presume the

[ceph-users] Re: BlueStore fragmentation woes

2023-05-31 Thread Igor Fedotov
bluestore(/var/lib/ceph/osd/ceph-183) probe -20: 0, 0, 0 Thanks, Kevin From: Fox, Kevin M Sent: Thursday, May 25, 2023 9:36 AM To: Igor Fedotov; Hector Martin; ceph-users@ceph.io Subject: Re: [ceph-users] Re: BlueStore fragmentation woes Ok, I'm

[ceph-users] Re: BlueStore fragmentation woes

2023-05-29 Thread Igor Fedotov
led immediately. So there is no specific requirement to re-provision OSD at Quincy+. Hence you're free to go with Pacific and enable 4K for BlueFS later in Quincy. Thanks, Igor On 26/05/2023 16:03, Stefan Kooman wrote: On 5/25/23 22:12, Igor Fedotov wrote: On 25/05/2023 20:36, Stefan Kooman wr

[ceph-users] Re: BlueStore fragmentation woes

2023-05-29 Thread Igor Fedotov
later with the raw bluestore free space map, since I still have a bunch of rebalancing and moving data around planned (I'm moving my cluster to new machines). On 26/05/2023 00.29, Igor Fedotov wrote: Hi Hector, I can advise two tools for further fragmentation analysis: 1) One might want to use ceph

[ceph-users] Re: BlueStore fragmentation woes

2023-05-26 Thread Igor Fedotov
yeah, definitely this makes sense On 26/05/2023 09:39, Konstantin Shalygin wrote: Hi Igor, Should we backpot this to the p,q and reef release's? Thanks, k Sent from my iPhone On 25 May 2023, at 23:13, Igor Fedotov wrote: You might be facing the issue fixed by https://github.com/ceph/ceph

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
the metrics, restart and gather some more after and let you know. Thanks, Kevin ________ From: Igor Fedotov Sent: Thursday, May 25, 2023 9:29 AM To: Fox, Kevin M; Hector Martin; ceph-users@ceph.io Subject: Re: [ceph-users] Re: BlueStore fragmentation woes Just run th

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
afterwards) be observed for other OSDs? On 25/05/2023 19:20, Fox, Kevin M wrote: If you can give me instructions on what you want me to gather before the restart and after restart I can do it. I have some running away right now. Thanks, Kevin ____ From: Ig

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
://tracker.ceph.com/issues/58022 ? We still see run away osds at times, somewhat randomly, that causes runaway fragmentation issues. Thanks, Kevin From: Igor Fedotov Sent: Thursday, May 25, 2023 8:29 AM To: Hector Martin; ceph-users@ceph.io Subject: [ceph-users] Re

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Igor Fedotov
Hi Hector, I can advise two tools for further fragmentation analysis: 1) One might want to use ceph-bluestore-tool's free-dump command to get a list of free chunks for an OSD and try to analyze whether it's really highly fragmented and lacks long enough extents. free-dump just returns a list

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-10 Thread Igor Fedotov
Thank you, Igor. I will try to see how to collect the perf values. Not sure about restarting all OSDs as it's a production cluster, is there a less invasive way? /Z On Tue, 9 May 2023 at 23:58, Igor Fedotov wrote: Hi Zakhar, Let's leave questions regarding cache usage/tuning to a

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-09 Thread Igor Fedotov
simple Openstack / RBD. I also noticed that OSD cache usage doesn't increase over time (see my message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated 26 April 2023, which received no comments), despite OSDs are being used rather heavily and there's plenty of host

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-08 Thread Igor Fedotov
d out from colleagues, that those 3 nodes were added to cluster lately with direct install of 17.2.5 (others were installed 15.2.16 and later upgraded) not sure whether this is related to our problem though.. I see very similar crash reported here:https://tracker.ceph.com/issues/56346 so I'm not repor

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-03 Thread Igor Fedotov
with best regards nikola ciprich -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov
to set osd_fast_shutdown_timeout to zero to workaround the above. IMO this assertion is a nonsence and I don't see any usage of this timeout parameter other than just throw an assertion. -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov
ng simimar issue? I'd like to ask for hints on what should I check further.. we're running lots of 14.2.x and 15.2.x clusters, none showing similar issue, so I'm suspecting this is something related to quincy thanks a lot in advance with best regards nikola ciprich -- Igor Fedotov Ceph Lead

[ceph-users] Re: Ceph 16.2.12, particular OSD shows higher latency than others

2023-04-27 Thread Igor Fedotov
Hi Zakhar, you might want to try offline DB compaction using ceph-kvstore-tool for this specific OSD. Periodically we observe OSD perf drop due to degraded RocksDB performance, particularly after bulk data removal/migration.. Compaction is quite helpful in this case. Thanks, Igor On

[ceph-users] Re: Mysteriously dead OSD process

2023-04-15 Thread Igor Fedotov
Hi J-P Methot, perhaps my response is a bit late  but this to some degree recalls me an issue we've been facing yesterday. First of all you might want to set debug-osd to 20 for this specific OSD and see if log would be more helpful. Please share if possible. Secondly I'm curious if the

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-04 Thread Igor Fedotov
re in another device). Is that a bug of some sort? On Tue, Apr 4, 2023 at 6:31 AM Igor Fedotov wrote: Please also note that total cluster size reported below as SIZE apparently includes DB volumes: # ceph df --- RAW STORAGE --- CLASS  SIZE     AVAIL    USED     RAW USED  %RAW

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-04 Thread Igor Fedotov
Please also note that total cluster size reported below as SIZE apparently includes DB volumes: # ceph df --- RAW STORAGE --- CLASS SIZE AVAILUSED RAW USED %RAW USED hdd373 TiB 364 TiB 9.3 TiB 9.3 TiB 2.50 On 4/4/2023 12:22 PM, Igor Fedotov wrote: Do you have

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-04 Thread Igor Fedotov
sd down out interval: 120 osd: bluestore min alloc size hdd: 65536 ``` Any tip or help on how to explain this situation is welcome! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Igor Fedotov
want to me to update the ticket with my findings and the logs from pastebin? Feel free to update if you like but IMO we still lack the understanding what was the trigger for perf improvements in you case - OSD redeployment, disk trimming or both? -- Igor Fedotov Ceph Lead Developer -- croit GmbH, Fr

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Igor Fedotov
ter osd.22 = 8TB osd.343 = 2TB https://pastebin.com/EfSSLmYS Pacific Cluster before recreating OSDs osd.40 = 8TB osd.162 = 2TB https://pastebin.com/wKMmSW9T Pacific Cluster after recreation OSDs osd.40 = 8TB osd.162 = 2TB https://pastebin.com/80eMwwBW Am Mi., 22. März 2023 um 11:09 Uhr schrieb Ig

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-22 Thread Igor Fedotov
lock_based/filter_policy.cc:579] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_version>=5. Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov < igor.fedo...@croit.io>: Hi Boris, additionally you might want

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-21 Thread Igor Fedotov
eichend im groüen Saal. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. -- Igor Fedotov Ceph Lead Developer -- croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263

[ceph-users] Re: BlueFS spillover warning gone after upgrade to Quincy

2023-01-16 Thread Igor Fedotov
rs anymore. As others mentioned, you can get the relevant metrics from Prometheus and setup alerts there instead. But it does make me wonder how many people might have spillover in their clusters and not even realize it, since there's no warning by default. Cheers, -- Igor Fedotov Ceph Lead Develope

[ceph-users] Re: OSD crash on Onode::put

2023-01-12 Thread Igor Fedotov
king here how few onode items are acceptable before performance drops painfully. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 09 January 2023 13:34:42 To: Dongdong Tao;ceph-users@ceph.io Cc:d.

[ceph-users] Re: OSD crash on Onode::put

2023-01-09 Thread Igor Fedotov
/ceph/+bug/1996010 Cheers, Dongdong -- Igor Fedotov Ceph Lead Developer -- croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> | Yo

[ceph-users] Re: LVM osds loose connection to disk

2022-11-17 Thread Igor Fedotov
:03:58 To: Igor Fedotov;ceph-users@ceph.io Subject: [ceph-users] Re: LVM osds loose connection to disk I can't reproduce the problem with artificial workloads, I need to get one of these OSDs running in the meta-data pool until it crashes. My plan is to reduce time-outs and increase log

[ceph-users] Re: LVM osds loose connection to disk

2022-11-10 Thread Igor Fedotov
any thanks and best regards! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 10 October 2022 23:33:32 To: Igor Fedotov; ceph-users@ceph.io Subject: [ceph-users] Re: LVM osds loose connection to disk Hi Igor. T

[ceph-users] Re: Is it a bug that OSD crashed when it's full?

2022-11-01 Thread Igor Fedotov
l, bool)+0x2f7) [0x55858de4c8f7] 26: (BlueStore::_mount()+0x204) [0x55858de4f7b4] 27: (OSD::init()+0x380) [0x55858d91d1d0] 28: main() 29: __libc_start_main() 30: _start() NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Thanks! Tony ___ ceph-users mailing li

[ceph-users] Re: LVM osds loose connection to disk

2022-10-09 Thread Igor Fedotov
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin V

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Igor Fedotov
the fsck command does that. Is there any such tool? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 07 October 2022 01:53:20 To: Igor Fedotov; ceph-users@ceph.io Subject: [ceph-users] Re: OSD

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Igor Fedotov
tober 2022 09:07:37 To: Frank Schilder; Igor Fedotov; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octopus On 10/7/22 09:03, Frank Schilder wrote: Hi Igor and Stefan, thanks a lot for your help! Our cluster is almost finished with recovery and I would l

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Igor Fedotov
this list. If someone could send me the command, I would be most grateful. for osd in `ls /var/lib/ceph/osd/`; do ceph-bluestore-tool repair --path  /var/lib/ceph/osd/$osd;done That's what I use. Gr. Stefan -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact u

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-07 Thread Igor Fedotov
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 07 October 2022 01:19:44 To: Frank Schilder; ceph-users@ceph.io Cc: Stefan Kooman Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octo

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
, Igor Fedotov wrote: The log I inspected was for osd.16  so please share that OSD utilization... And honestly I trust allocator's stats more so it's rather CLI stats are incorrect if any. Anyway free dump should provide additional proofs.. And once again - do other non-starting OSDs show

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 07 October 2022 00:37:34 To: Frank Schilder; ceph-users@ceph.io Cc: Stefan Kooman Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octo

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
still need to convert most of our OSDs and I cannot afford to loose more. The rebuild simply takes too long in the current situation. Thanks for your help and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Igor

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
um S14 From: Igor Fedotov Sent: 06 October 2022 14:39 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octopus Are crashing OSDs still bound to two hosts? If not - does any died OSD unconditionally mean its underl

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
a similar process it's safe to assume it will be the same. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 06 October 2022 14:26 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octopus On 10/6/2022 2:55 PM, Frank Schilder wrote: Hi Igor, it

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-06 Thread Igor Fedotov
Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Igor Fedotov Sent: 06 October 2022 13:45:17 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] OSD crashes during upgrade mimic->octopus From your response to Stefan I'

  1   2   3   4   5   >