[ceph-users] Re: Full list of metrics provided by ceph exporter daemon

2024-06-20 Thread Peter Razumovsky
e{instance_id="a"} 0 ceph_rgw_qactive{instance_id="a"} 0 чт, 20 июн. 2024 г. в 20:09, Anthony D'Atri : > curl http://endpoint:port/metrics > > > On Jun 20, 2024, at 10:15, Peter Razumovsky > wrote: > > > > Hello! > > > > I'm using Ceph Reef w

[ceph-users] Full list of metrics provided by ceph exporter daemon

2024-06-20 Thread Peter Razumovsky
will appreciate it if someone points us to the full list. [1] https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics [2] https://docs.ceph.com/en/latest/monitoring/ -- Best regards, Peter Razumovsky ___ ceph-users

[ceph-users] Ceph Reef v18.2.3 - release date?

2024-05-29 Thread Peter Razumovsky
Hello! We're waiting brand new minor 18.2.3 due to https://github.com/ceph/ceph/pull/56004. Why? Timing in our work is a tough thing. Could you kindly share an estimation of 18.2.3 release timeframe? It is 16 days passed from original tag creation so I want to understand when it will be

[ceph-users] Ceph Squid release / release candidate timeline?

2024-05-17 Thread Peter Sabaini
Hi, is there a ballpark timeline for a Squid release candidate / release? I'm aware of this pad that tracks blockers, is that still accurate or should I be looking at another resource? https://pad.ceph.com/p/squid-upgrade-failures Thanks! peter

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-04-30 Thread Peter van Heusden
st ls". How can it be re-added to this list? Thank you, Peter BTW full error message: Inferring fsid ed7b2c16-b053-45e2-a1fe-bf3474f90508 Using ceph image with id '59248721b0c7' and tag 'v17' created on 2024-04-24 16:06:51 + UTC quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a4

[ceph-users] Reconstructing an OSD server when the boot OS is corrupted

2024-04-24 Thread Peter van Heusden
? Is it possible to do a clean install of the operating system and scan the existing drives in order to reconstruct the OSD configuration? Thank you, Peter P.S. the cause of the original corruption is likely due to an unplanned power outage, an event that hopefully will not recur

[ceph-users] Re: Help with deep scrub warnings (probably a bug ... set on pool for effect)

2024-03-05 Thread Peter Maloney
t all scrubs done. I changed that to 0.01 so it doesn't bother me now. Peter On 2024-03-05 07:58, Anthony D'Atri wrote: * Try applying the settings to global so that mons/mgrs get them. * Set your shallow scrub settings back to the default. Shallow scrubs take very few resources * Set

[ceph-users] Re: Performance improvement suggestion

2024-02-21 Thread Peter Grandi
> 1. Write object A from client. > 2. Fsync to primary device completes. > 3. Ack to client. > 4. Writes sent to replicas. [...] As mentioned in the discussion this proposal is the opposite of what the current policy, is, which is to wait for all replicas to be written before writes are

[ceph-users] Re: concept of ceph and 2 datacenters

2024-02-14 Thread Peter Sabaini
g like this? [0] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#limitations-of-stretch-mode cheers, peter. > Sincerely, > Vladimir > > Get Outlook for Android<https://aka.ms/AAb9ysg> > > From: ronny.lipp...@spark5.d

[ceph-users] cephx client key rotation

2024-01-24 Thread Peter Sabaini
Hi, this question has come up once in the past[0] afaict, but it was kind of inconclusive so I'm taking the liberty of bringing it up again. I'm looking into implementing a key rotation scheme for Ceph client keys. As it potentially takes some non-zero amount of time to update key material

[ceph-users] Re: Scrubbing?

2024-01-24 Thread Peter Grandi
> [...] After a few days, I have on our OSD nodes around 90MB/s > read and 70MB/s write while 'ceph -s' have client io as > 2,5MB/s read and 50MB/s write. [...] This is one of my pet-peeves: that a storage system must have capacity (principally IOPS) to handle both a maintenance workload and a

[ceph-users] Re: Performance impact of Heterogeneous environment

2024-01-17 Thread Peter Sabaini
On 17.01.24 11:13, Tino Todino wrote: > Hi folks. > > I had a quick search but found nothing concrete on this so thought I would > ask. > > We currently have a 4 host CEPH cluster with an NVMe pool (1 OSD per host) > and an HDD Pool (1 OSD per host). Both OSD's use a separate NVMe for DB/WAL.

[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-15 Thread Peter Grandi
>> So we were going to replace a Ceph cluster with some hardware we had >> laying around using SATA HBAs but I was told that the only right way >> to build Ceph in 2023 is with direct attach NVMe. My impression are somewhat different: * Nowadays it is rather more difficult to find 2.5in SAS or

[ceph-users] Re: rbd persistent cache configuration

2024-01-08 Thread Peter
rbd --version ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd persistent cache configuration

2024-01-05 Thread Peter
Thanks for ressponse! Yes, it is in use "watcher=10.1.254.51:0/1544956346 client.39553300 cookie=140244238214096" this is indicating the client is connect the image. I am using fio perform write task on it. I guess it is the feature not enable correctly or setting somewhere incorrect. Should I

[ceph-users] rbd persistent cache configuration

2024-01-04 Thread Peter
I follow below document to setup image level rbd persistent cache, however I get error output while i using the command provide by the document. I have put my commands and descriptions below. Can anyone give some instructions? thanks in advance.

[ceph-users] Re: Assistance Needed with Ceph Cluster Slow Ops Issue

2023-12-06 Thread Peter
:13 To: Peter Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Assistance Needed with Ceph Cluster Slow Ops Issue Hi Peter, try to set the cluster to nosnaptrim If this helps, you might need to upgrade to pacific, because you are hit by the pg dups bug. See: https://www.clyso.com/blog/how

[ceph-users] Assistance Needed with Ceph Cluster Slow Ops Issue

2023-12-06 Thread Peter
Dear all, I am reaching out regarding an issue with our Ceph cluster that has been recurring every six hours. Upon investigating the problem using the "ceph daemon dump_historic_slow_ops" command, I observed that the issue appears to be related to slow operations, specifically getting stuck

[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-13 Thread Peter Grandi
> the speed of data transfer is varying a lot over time (200KB/s > – 120MB/s). [...] The FS in question, has a lot of small files > in it and I suspect this is the cause of the variability – ie, > the transfer of many small files will be more impacted by > greater site-site latency. 200KB/s on

[ceph-users] Re: CEPH Cluster performance review

2023-11-12 Thread Peter Grandi
>>> during scrubbing, OSD latency spikes to 300-600 ms, >> I have seen Ceph clusters spike to several seconds per IO >> operation as they were designed for the same goals. >>> resulting in sluggish performance for all VMs. Additionally, >>> some OSDs fail during the scrubbing process. >> Most

[ceph-users] Re: CEPH Cluster performance review

2023-11-12 Thread Peter Grandi
> during scrubbing, OSD latency spikes to 300-600 ms, I have seen Ceph clusters spike to several seconds per IO operation as they were designed for the same goals. > resulting in sluggish performance for all VMs. Additionally, > some OSDs fail during the scrubbing process. Most likely they time

[ceph-users] Re: HDD cache

2023-11-08 Thread Peter
This server configured Dell R730 with HBA 330 card HDD are configured write through mode. From: David C. Sent: Wednesday, November 8, 2023 10:14 To: Peter Cc: ceph-users@ceph.io Subject: Re: [ceph-users] HDD cache Without (raid/jbod) controller ? Le mer. 8

[ceph-users] HDD cache

2023-11-08 Thread Peter
Hi All, I note that HDD cluster commit delay improves after i turn off HDD cache. However, i also note that not all HDDs are able to turn off the cache. special I found that two HDD with same model number, one can turn off, anther doesn't. i guess i have my system config or something different

[ceph-users] Re: How do you handle large Ceph object storage cluster?

2023-10-19 Thread Peter Grandi
> [...] (>10k OSDs, >60 PB of data). 6TBs on average per OSD? Hopully SSDs or RAID10 (or low-number, 3-5) RAID5. > It is entirely dedicated to object storage with S3 interface. > Maintenance and its extension are getting more and more > problematic and time consuming. Ah the joys of a single

[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Peter Grandi
> * Ceph cluster with old nodes having 6TB HDDs > * Add new node with new 12TB HDDs Halving IOPS-per-TB? https://www.sabi.co.uk/blog/17-one.html?170610#170610 https://www.sabi.co.uk/blog/15-one.html?150329#150329 > Is it supported/recommended to pack 2 6TB HDDs handled by 2 > old OSDs into 1

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-18 Thread Peter Grandi
[...] > What is being done is a serial tree walk and copy in 3 > replicas of all objects in the CephFS metadata pool, so it > depends on both the read and write IOPS rate for the metadata > pools, but mostly in the write IOPS. [...] Wild guess: > metadata is on 10x 3.84TB SSDs without persistent

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-13 Thread Peter Grandi
>> However, I've observed that the cephfs-data-scan scan_links step has >> been running for over 24 hours on 35 TB of data, which is replicated >> across 3 OSDs, resulting in more than 100 TB of raw data. What matters is the number of "inodes" (and secondarily their size), that is the number of

[ceph-users] Re: Time Estimation for cephfs-data-scan scan_links

2023-10-13 Thread Peter Grandi
>> However, I've observed that the cephfs-data-scan scan_links step has >> been running for over 24 hours on 35 TB of data, which is replicated >> across 3 OSDs, resulting in more than 100 TB of raw data. What matters is the number of "inodes" (and secondarily their size), that is the number of

[ceph-users] Re: VM hangs when overwriting a file on erasure coded RBD

2023-10-04 Thread Peter Linder
more. Instead it looks like something that over time broke with ext4. Sending this to the list in case someone else has a similar problem in the future. /Peter Den 2023-09-29 kl. 19:02, skrev peter.lin...@fiberdirekt.se: Yes, this is all set up. It was working fine until after the problem

[ceph-users] VM hangs when overwriting a file on erasure coded RBD

2023-10-03 Thread Peter Linder
this one that cant overwrite. I'm thinking there is somehow something wrong with just this image? Regards, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rgw: disallowing bucket creation for specific users?

2023-10-01 Thread Peter Goron
Hi Matthias, One possible way to achieve your need is to set a quota on number of buckets at user level (see https://docs.ceph.com/en/reef/radosgw/admin/#quota-management). Quotas are under admin control. Rgds, Peter Le dim. 1 oct. 2023, 10:51, Matthias Ferdinand a écrit : > Hi, >

[ceph-users] Re: VM hangs when overwriting a file on erasure coded RBD

2023-09-29 Thread peter . linder
d", "release": "luminous", "num": 12 } ], "mgr": [ { "features": "0x3f01cfbf7ffd", "release": "luminous", "num":

[ceph-users] Re: VM hangs when overwriting a file on erasure coded RBD

2023-09-29 Thread peter . linder
    {     "features": "0x3f01cfb87fec",     "release": "luminous",     "num": 4     },     {     "features": "0x3f01cfbf7ffd",     "release": "luminous",    

[ceph-users] VM hangs when overwriting a file on erasure coded RBD

2023-09-29 Thread Peter Linder
this one that cant overwrite. I'm thinking there is somehow something wrong with just this image? Regards, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-22 Thread Peter Goron
to other commands (especially the ones sent by the liveness probe). Default liveness probe timeout setup by rook is probably too small in regards of device_health_check duration. In our case, we disabled device_health_check on mgr side. Rgds, Peter Le jeu. 21 sept. 2023 à 21:35, Sudhin Bengeri

[ceph-users] ceph osd error log

2023-08-22 Thread Peter
Hi Ceph community, My cluster has lots of logs regarding an error that ceph-osd. I am encountering the following error message in the logs: Aug 22 00:01:28 host008 ceph-osd[3877022]: 2023-08-22T00:01:28.347-0700 7fef85251700 -1 Fail to open '/proc/3850681/cmdline' error = (2) No such file

[ceph-users] Re: Decrepit ceph cluster performance

2023-08-14 Thread Peter Grandi
> We recently started experimenting with Proxmox Backup Server, > which is really cool, but performs enough IO to basically lock > out the VM being backed up, leading to IO timeouts, leading to > user complaints. :-( The two most common things I have had to fix over years as to storage systems I

[ceph-users] PG backfilled slow

2023-07-26 Thread Peter
, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
[...] S3 workload, that will need to delete 100M file daily [...] >> [...] average (what about peaks?) around 1,200 committed >> deletions per second (across the traditional 3 metadata >> OSDs) sustained, that may not leave a lot of time for file > creation, writing or reading. :-)[...]

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
>>> On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van >>> said: > [...] S3 workload, that will need to delete 100M file daily [...] So many people seem to think that distributed (or even local) filesystems (and in particular their metadata servers) can sustain the same workload as high volume

[ceph-users] Re: ls: cannot access '/cephfs': Stale file handle

2023-05-18 Thread Peter Grandi
>>> On Wed, 17 May 2023 16:52:28 -0500, Harry G Coin >>> said: > I have two autofs entries that mount the same cephfs file > system to two different mountpoints.  Accessing the first of > the two fails with 'stale file handle'.  The second works > normally. [...] Something pretty close to that

[ceph-users] Re: Deleting millions of objects

2023-05-18 Thread Peter Grandi
> [...] We have this slow and limited delete issue also. [...] That usually, apart from command list length limitations, happens because so many Ceph storage backends have too low committed IOPS (write, but not just) for mass metadata (and equivalently small data) operations, never mind for

[ceph-users] Re: pg deep-scrub issue

2023-05-08 Thread Peter
}, "rocksdb": { "get": 769, "submit_transaction": 18292538, "submit_transaction_sync": 13561020, "get_latency": { "avgcount": 769, "sum": 739.658957592, &quo

[ceph-users] pg deep-scrub issue

2023-05-04 Thread Peter
:58.070854+0800 pg 4.7a4 not scrubbed since 2023-04-24T02:55:25.912789+0800 pg 4.7b4 not scrubbed since 2023-04-24T10:04:46.889422+0800 pg 4.7c8 not scrubbed since 2023-04-24T13:36:07.284271+0800 pg 4.7d2 not scrubbed since 2023-04-24T14:47:19.365551+0800 Peter

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2023-05-04 Thread Peter van Heusden
Hi Emmaneul It was a while ago, but as I recall I evicted all clients and that allowed me to restart the MDS servers. There was something clearly "broken" in how at least one of the clients was interacting with the system. Peter On Thu, 4 May 2023 at 07:18, Emmanuel Jaep wrote: >

[ceph-users] Re: PVE CEPH OSD heartbeat show

2023-05-01 Thread Peter
limit the UDP triggering and resolve the "corosync" issue. I appreciate your help in this matter and look forward to your response. Peter -Original Message- From: Fabian Grünbichler Sent: Wednesday, April 26, 2023 12:42 AM To: ceph-users@ceph.io; Peter Subject: Re: [ceph-users

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only numbers already divided by replication factor), you need 55 days to scrub it once. That's 8x larger than the default scrub factor [...] Also, even if I set the default scrub interval to 8x larger, it my disks will still be thrashing

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
> On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only > numbers already divided by replication factor), you need 55 days > to scrub it once. > That's 8x larger than the default scrub factor [...] Also, even > if I set the default scrub interval to 8x larger, it my disks > will still

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Peter Grandi
> On a 38 TB cluster, if you scrub 8 MB/s on 10 disks (using only > numbers already divided by replication factor), you need 55 days > to scrub it once. > That's 8x larger than the default scrub factor [...] Also, even > if I set the default scrub interval to 8x larger, it my disks > will still

[ceph-users] PVE CEPH OSD heartbeat show

2023-04-25 Thread Peter
Dear all, We are experiencing with Ceph after deploying it by PVE with the network backed by a 10G Cisco switch with VPC feature on. We are encountering a slow OSD heartbeat and have not been able to identify any network traffic issues. Upon checking, we found that the ping is around 0.1ms,

[ceph-users] Re: Corrupt bluestore after sudden reboot (17.2.5)

2023-02-09 Thread Peter van Heusden
I am trying to do this, but the log file is 26 GB and growing. Is there perhaps a subset of the logs that would be useful? Peter On Mon, 16 Jan 2023 at 18:42, wrote: > Hi Peter, > > Could you add debug_bluestore = 20 to your ceph.conf and restart the OSD, > then send the log afte

[ceph-users] Re: OSD logs missing from Centralised Logging

2023-02-09 Thread Peter van Heusden
the daemons solved the problem. Peter On Thu, 9 Feb 2023 at 16:27, Tarrago, Eli (RIS-BCT) < eli.tarr...@lexisnexisrisk.com> wrote: > Please include your promtai; logs, loki logs, promtail configuration, and > your loki configuration. > > > > *From: *Peter van Heus

[ceph-users] OSD logs missing from Centralised Logging

2023-02-08 Thread Peter van Heusden
daemons are running on each node including the OSDs. The Loki server and Grafana are running on one of our monitor nodes. Thanks for any clarifications you can provide. Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email

[ceph-users] Corrupt bluestore after sudden reboot (17.2.5)

2023-01-14 Thread Peter van Heusden
st and rebuild them? Thanks, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: BlueFS spillover warning gone after upgrade to Quincy

2023-01-12 Thread Peter van Heusden
Thanks. The command definitely shows "slow_bytes": "db_total_bytes": 1073733632, "db_used_bytes": 240123904, "slow_total_bytes": 4000681103360, "slow_used_bytes": 8355381248, So I am not sure why the warnings ar

[ceph-users] BlueFS spillover warning gone after upgrade to Quincy

2023-01-12 Thread Peter van Heusden
is despite bluestore_warn_on_bluefs_spillover still being set to true. Is there a way to investigate the current state of the DB to see if spillover is, indeed, still happening? Thank you, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] VolumeGroup must have a non-empty name / 17.2.5

2023-01-08 Thread Peter Eisch
up must have a non-empty name This host is the only one which has 14 drives which aren't being used. I'm guessing this is why its getting this error. The drives may have been used previous in a cluster (maybe not the same cluster) or something. I don't know. Any suggestions for what to try to

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-07-26 Thread Peter Lieven
Am 21.07.22 um 17:50 schrieb Ilya Dryomov: On Thu, Jul 21, 2022 at 11:42 AM Peter Lieven wrote: Am 19.07.22 um 17:57 schrieb Ilya Dryomov: On Tue, Jul 19, 2022 at 5:10 PM Peter Lieven wrote: Am 24.06.22 um 16:13 schrieb Peter Lieven: Am 23.06.22 um 12:59 schrieb Ilya Dryomov: On Thu, Jun

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-07-22 Thread Peter Lieven
Am 21.07.22 um 17:50 schrieb Ilya Dryomov: > On Thu, Jul 21, 2022 at 11:42 AM Peter Lieven wrote: >> Am 19.07.22 um 17:57 schrieb Ilya Dryomov: >>> On Tue, Jul 19, 2022 at 5:10 PM Peter Lieven wrote: >>>> Am 24.06.22 um 16:13 schrieb Peter Lieven: >>>>&

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-07-21 Thread Peter Lieven
Am 19.07.22 um 17:57 schrieb Ilya Dryomov: On Tue, Jul 19, 2022 at 5:10 PM Peter Lieven wrote: Am 24.06.22 um 16:13 schrieb Peter Lieven: Am 23.06.22 um 12:59 schrieb Ilya Dryomov: On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven wrote: Am 22.06.22 um 15:46 schrieb Josh Baergen: Hey Peter

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-07-19 Thread Peter Lieven
Am 24.06.22 um 16:13 schrieb Peter Lieven: Am 23.06.22 um 12:59 schrieb Ilya Dryomov: On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven wrote: Am 22.06.22 um 15:46 schrieb Josh Baergen: Hey Peter, I found relatively large allocations in the qemu smaps and checked the contents. It contained

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-24 Thread Peter Lieven
Am 23.06.22 um 12:59 schrieb Ilya Dryomov: > On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven wrote: >> Am 22.06.22 um 15:46 schrieb Josh Baergen: >>> Hey Peter, >>> >>>> I found relatively large allocations in the qemu smaps and checked the >>>>

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-23 Thread Peter Lieven
Am 23.06.22 um 12:59 schrieb Ilya Dryomov: On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven wrote: Am 22.06.22 um 15:46 schrieb Josh Baergen: Hey Peter, I found relatively large allocations in the qemu smaps and checked the contents. It contained several hundred repetitions of osd and pool

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-23 Thread Peter Lieven
Am 22.06.22 um 15:46 schrieb Josh Baergen: Hey Peter, I found relatively large allocations in the qemu smaps and checked the contents. It contained several hundred repetitions of osd and pool names. We use the default builds on Ubuntu 20.04. Is there a special memory allocator in place

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-22 Thread Peter Lieven
> Am 22.06.2022 um 14:28 schrieb Ilya Dryomov : > > On Wed, Jun 22, 2022 at 11:14 AM Peter Lieven wrote: >> >> >> >> Von meinem iPhone gesendet >> >>>> Am 22.06.2022 um 10:35 schrieb Ilya Dryomov : >>> >>&g

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-22 Thread Peter Lieven
s far as I know there is no special allocator on place so I wonder why there are so big allocations. Peter > > -- > May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-22 Thread Peter Lieven
Von meinem iPhone gesendet > Am 22.06.2022 um 10:35 schrieb Ilya Dryomov : > > On Tue, Jun 21, 2022 at 8:52 PM Peter Lieven wrote: >> >> Hi, >> >> >> we noticed that some of our long running VMs (1 year without migration) seem >> t

[ceph-users] librbd leaks memory on crushmap updates

2022-06-21 Thread Peter Lieven
for a very small dev cluster with approx. 40 OSDs and 5 pools. We have observed this issue first with Nautilus 14.2.22 and then also tried Octopus 15.2.16 where some issues #38403 should have been fixed. Any ideas except from migrating VMs when PSS usage gets too high? Thanks Peter

[ceph-users] ceph-ansible to install mons without containers

2022-02-18 Thread Peter Eisch
ble starts installing sandbox containers and things on a monitor which only exists to be a mon/mgr host after running this successfully: ansible-playbook infrastructure-playbooks/add-mon.yml --limit cephmon-t03 Any advice? peter Peter Eisch Senior Site Reliability Engineer peter.ei...@virgi

[ceph-users] Re: 14.2.22 dashboard periodically dies and didn't failover

2022-01-17 Thread Peter Lieven
, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 14.2.22 dashboard periodically dies and didn't failover

2022-01-13 Thread Peter Lieven
n't track the number of stand bys you might end up with 0 managers in the end... Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 14.2.22 dashboard periodically dies and didn't failover

2022-01-13 Thread Peter Lieven
we have never seen it before 14.2.22. Maybe it broke things. Our workaround (which works so far) is to disable the prometheus module and use Digital Ocean Ceph Exporter. https://github.com/digitalocean/ceph_exporter Best, Peter > > 2022-01-13 13:15:59.330 7fe7e085e700 -1

[ceph-users] Re: 14.2.22 dashboard periodically dies and didn't failover

2022-01-12 Thread Peter Lieven
ster elects another mgr as primary, but the original primary does not recover. The process is stuck. I have a (large) backtrace if someone is interested. For us it seems that the prometheus exporter module is the cause. Do you have it enabled? Peter ___

[ceph-users] Re: RBD bug #50787

2021-12-22 Thread Peter Lieven
client and it works as expected. With Octopus 15.2.12 I can reproduce the issue. Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: SATA SSD recommendations.

2021-11-22 Thread Peter Lieven
still >> at 1% for our use. > > Thanks, that's really useful to know. Whatever SSD you choose, look if they support power-loss-protection and make sure you disable the write cache. Peter ___ ceph-users mailing list -- ceph-users@ce

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Peter Lieven
is transferred unencrypted over the wire. RBD encryption takes place in the client. Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-17 Thread Peter Lieven
use the latest N release with very few additional fixes. If the people longing for an LTS release mainly are those who are using Ceph as VM Storage, we could use this as a basis. Thanks, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-17 Thread Peter Lieven
Am 17.11.21 um 12:20 schrieb Igor Fedotov: > Hi Peter, > > sure, why not... See [1]. I read it that it is not wanted by upstream developers. If we want it the community has to do it. Nevertheless, I have put it on the list. Peter [1] https://lists.ceph.io/hyperkitty/list/d..

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-16 Thread Peter Lieven
s://pad.ceph.com/p/ceph-user-dev-monthly-minutes > > Any volunteers to extend the agenda and advocate the idea? Hi Igor, do you still think we can add the LTS topic to the agenda? I will attend tomorrow and can try to advocate it. Best, Peter > > Thanks, > Igor > >>

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-10 Thread Peter Lieven
ack a switch for all those who don't require the read lease feature and are happy with reading from just the primary? Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-10 Thread Peter Lieven
ebug this and finally I > missed only one step in the upgrade. > > Only during the update itself, until require_osd_release is set to the new > version, there will be interruptions However, in Octopus the issue does still exist, right? Peter __

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-09 Thread Peter Lieven
r because they are just working too good. Best, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-05 Thread Peter Lieven
ted at the point where the osd compat level is set to octopus. So one of my initial guesses back when I tried to analyze this issue was that it has something to do with the new "read from all osds not just the primary" feature. Makes that sense? Best, Peter __

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-02 Thread Peter Lieven
n process can take more time than a peer OSD getting ECONNREFUSED. The combination above is the recommended combation (and the default). When we fast this issue we had a fresh Octopus install with default values... If necessary I can upgrade our development cluster to Octopus again

[ceph-users] Re: octupus: stall i/o during recovery

2021-10-28 Thread Peter Lieven
Hi Istvan, I have not given Octopus another try yet. But as far as I remember Manuel figured out the root cause. Maybe he can give more insights. Best, Peter Am 28.10.21 um 13:07 schrieb Szabo, Istvan (Agoda): Hi Peter, Have you figured out what was the issue? Istvan Szabo Senior

[ceph-users] Re: Ceph performance optimization with SSDs

2021-10-22 Thread Peter Sabaini
On 22.10.21 11:29, Mevludin Blazevic wrote: > Dear Ceph users, > > I have a small Ceph cluster where each host consist of a small amount of SSDs > and a larger number of HDDs. Is there a way to use the SSDs as performance > optimization such as putting OSD Journals to the SSDs and/or using SSDs

[ceph-users] Re: Can CEPH RBD devices be assigned to virtual machines in pre-allocation mode?

2021-10-22 Thread Peter Lieven
ning. To get fast >preprovisioning you need Octopus (available) and an updated QEMU driver (not >yet available). I have recently made several improvements to the Qemu driver and if there is need for it I can look into preprovisioning suppo

[ceph-users] Re: Can CEPH RBD devices be assigned to virtual machines in pre-allocation mode?

2021-10-22 Thread Peter Lieven
> Am 22.10.2021 um 11:12 schrieb Tommy Sway : > > Even hypervisor support is useless if ceph itself does not support it. Thick provisioning is supported from Octopus onwards. If you are using Qemu I can look into adding support for preprovisioning in the Qemu drive

[ceph-users] Re: Tool to cancel pending backfills

2021-10-04 Thread Peter Lieven
Am 01.10.21 um 16:52 schrieb Josh Baergen: Hi Peter, When I check for circles I found that running the upmap balancer alone never seems to create any kind of circle in the graph By a circle, do you mean something like this? pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1) pg 1.b: 2

[ceph-users] Re: Tool to cancel pending backfills

2021-10-01 Thread Peter Lieven
be a nice addition to pgremapper to add an option to optimze the upmap table. When searching for circles you might want to limit the depth of the DFS otherwise the runtime will be crazy. Thanks, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Tool to cancel pending backfills

2021-09-27 Thread Peter Lieven
upmap for a pg from OSD A to OSD B and an upmap for another pg from OSB B to OSD A whereas it would just be enough to have no upmap at all. Thanks, Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: SATA vs SAS

2021-08-23 Thread Peter Lieven
performance penalty. sdparm --clear WCE /dev/sdX Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] nautilus: abort in EventCenter::create_file_event

2021-08-19 Thread Peter Lieven
) [0x7f3c62a7fbe3]  7: (EventCenter::process_events(unsigned int, std::chrono::duration >*)+0xd57) [0x7f3c62ad4ae7]  8: (()+0x61c1d8) [0x7f3c62ad91d8]  9: (()+0x8fa4af) [0x7f3c62db74af]  10: (()+0x76ba) [0x7f3c6c33a6ba]  11: (clone()+0x6d) [0x7f3c6c0705

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Peter Lieven
for osd_op_queue_cut_off was set to low by mistake prior to Octopus. Peter ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-11 Thread Peter Lieven
Have you tried setting osd op queue cut off to high? Peter > Am 11.08.2021 um 15:24 schrieb Frank Schilder : > > The recovery_sleep options are the next choice to look at. Increase it and > clients will get more I/O time slots. However, with your settings, I'm > su

[ceph-users] Re: Cephadm Upgrade from Octopus to Pacific

2021-08-06 Thread Peter Childs
) and CentOs 8 does not support the hardware I've got (the disks are not detected, and I can't find the right drivers) I suspect I've not got to do some tidying up before I continue but this does look smoother than when I tried with 16.2.0, which was 4 months ago. Thanks Peter. On Fri, 6 Aug 2021

[ceph-users] Cephadm Upgrade from Octopus to Pacific

2021-08-06 Thread Peter Childs
12a34-osd.conf' The good news this is a pre-production proof of concept cluster still so I'm attempting to iron out issues, before we try and make it a production service. Any ideas would be helpful. I guess deploy might be an option but that does not feel very future proof. Thanks Peter Childs ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-08-06 Thread Peter van Heusden
a ticket. Peter On Fri, 6 Aug 2021 at 10:00, Yann Dupont wrote: > > Le 28/06/2021 à 10:52, Peter van Heusden a écrit : > > I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers > > have started failing with the error message > > > > In function 'v

[ceph-users] Cephadm and multipath.

2021-07-28 Thread Peter Childs
switch multipath off the disks work, but I'd only get half the bandwidth. (Oh and ceph will get confused as it can see each drive twice). Thanks. Peter. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Continuing Ceph Issues with OSDs falling over

2021-07-07 Thread Peter Childs
to look for the problems rather than any exact answers, I'm yet to see any clues that might help Thanks in advance Peter Childs ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

  1   2   >