[ceph-users] Controlling the number of open files from ceph client

2023-03-30 Thread bhattacharya . soumya . ou
Hi Ceph Users, My goal is to control the number of files a ceph client can open to the backend ceph filesystem at once to control the metadata transaction load. In this experiment, I have a ceph client on version Quincy on a physical server. The fstab entry below shows the options with which

[ceph-users] Call for Submissions IO500 ISC23

2023-03-30 Thread IO500 Committee
Stabilization Period: Monday, April 3rd - Friday, April 14th, 2023 Submission Deadline: Tuesday, May 16st, 2023 AoE The IO500 is now accepting and encouraging submissions for the upcoming 12th semi-annual IO500 list, in conjunction with ISC23. Once again, we are also accepting submissions to

[ceph-users] OSD will not start - ceph_assert(r == q->second->file_map.end())

2023-03-30 Thread Pat Vaughan
I have a cluster that I increased the the number of PGs on because the autoscaler wasn't working as expected. It's recovering the misplaced objects, but a OSD just failed, and refuses to come back up. The device is readable to the OS, and there are 2 other OSDs on the same node that are online. I

[ceph-users] Re: ceph orch ps mon, mgr, osd shows for version, image and container id

2023-03-30 Thread Adiga, Anantha
Hi Adam, Cephadm ls lists all details: NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID osd.61 zp3110b001a0101

[ceph-users] Re: Upgrade from 16.2.7. to 16.2.11 failing on OSDs

2023-03-30 Thread Lo Re Giuseppe
To add, on this, the issue seemed related to a process (ceph-volume) which was doing check operations on all devices. The systemctl osd service was timing out because of that and the osd daemon was going into error state. We noticed that version 17.2.5 had a change related to ceph-volume, in

[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-03-30 Thread Konstantin Shalygin
Hi, > On 25 Mar 2023, at 23:15, Matthias Ferdinand wrote: > > from "ceph daemon osd.X perf dump"? No, from ceph-mgr prometheus exporter You can enable it via `ceph mgr module enable prometheus` > Please bear with me :-) I just try to get some rough understanding what > the numbers to be

[ceph-users] 17.2.6 RC available

2023-03-30 Thread Yuri Weinstein
We are publishing a release candidate this time for users to try for testing only. Please note this RC had only limited testing. Full testing is being done now. Upgrading has been tested on some internal clusters, and the final upgrade of the longest-running cluster there is in progress. The

[ceph-users] Re: Eccessive occupation of small OSDs

2023-03-30 Thread Nicola Mori
Thank you Nathan for your insight. Actually I don't know if a single PG occupies a large fraction of the OSD or not, I'll search for how to check this. Anyway, on the culprit OSD I effectively have a large amount of PGs respect to the size, and also in other 500 GB OSDs I see a similar

[ceph-users] Re: OSD down cause all OSD slow ops

2023-03-30 Thread Boris Behrens
Hi, you might suffer from the same bug we suffered: https://tracker.ceph.com/issues/53729 https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KG35GRTN4ZIDWPLJZ5OQOKERUIQT5WQ6/#K45MJ63J37IN2HNAQXVOOT3J6NTXIHCA Basically there is a bug that prevents the removal of PGlog items. You need

[ceph-users] how ceph OSD bench works?

2023-03-30 Thread Luis Domingues
Hi, I am currently testing some new disks, doing some benchmarks and stuff, and I would like to undertand how the OSD bench works. If I quicky explain our setup, we have a small ceph cluster, where our new disks are inserted. And we have some pools with no replication at all, and 1 PG only,

[ceph-users] Re: Eccessive occupation of small OSDs

2023-03-30 Thread Boris Behrens
Hi Nicola, can you send the output of ceph osd df tree ceph df ? Cheers Boris Am Do., 30. März 2023 um 16:36 Uhr schrieb Nicola Mori : > Dear Ceph users, > > my cluster is made up of 10 old machines, with uneven number of disks and > disk size. Essentially I have just one big data pool (6+2

[ceph-users] Re: RGW can't create bucket

2023-03-30 Thread Boris Behrens
Hi Kamil, is this with all new buckets or only the 'test' bucket? Maybe the name is already taken? Can you check s3cmd --debug if you are connecting to the correct endpoint? Also I see that the user seems to not be allowed to create bukets ... "max_buckets": 0, ... Cheers Boris Am Do., 30.

[ceph-users] Re: Ceph Failure and OSD Node Stuck Incident

2023-03-30 Thread Fox, Kevin M
I've seen this twice in production on two separate occasions as well. one osd gets stuck. a bunch of pg's go into laggy state. ceph pg dump | grep laggy shows all the laggy pg's share the same osd. Restarting the affected osd restored full service.

[ceph-users] Re: RGW can't create bucket

2023-03-30 Thread Kamil Madac
Hi Eugen It is version 16.2.6, we checked quotas and we can't see any applied quotas for users. As I wrote, every user is affected. Are there any non-user or global quotas, which can cause that no user can create a bucket? Here is example output of newly created user which cannot create buckets

[ceph-users] Re: Cephadm - Error ENOENT: Module not found

2023-03-30 Thread Adam King
for the specific issue with that traceback, you can probably resolve that by removing the stored upgrade state. We put it at `mgr/cephadm/upgrade_state` I believe (can check "ceph config-key ls" and look for something related to upgrade state if that doesn't work) so running "ceph config-key rm

[ceph-users] Re: ceph orch ps mon, mgr, osd shows for version, image and container id

2023-03-30 Thread Adam King
if you put a copy of the cephadm binary onto one of these hosts (e.g. a002s002) and run "cephadm ls" what does it give for the OSDs? That's where the orch ps information comes from. On Thu, Mar 30, 2023 at 10:48 AM wrote: > Hi , > > Why is ceph orch ps showing ,unknown version, image and

[ceph-users] ceph osd new: possible inconsistency whether UUID is a mandatory argument

2023-03-30 Thread Oliver Schmidt
Hi everyone, I discovered a documentation inconsistency in Ceph Nautilus and would like to know whether this is still the case in the latest ceph release before reporting a bug. Unfortunately, I only have access to a Nautilus cluster right now. The quincy docs state [1]: > Create the OSD. If

[ceph-users] osd_mclock_max_capacity_iops_ssd && multiple osd by nvme ?

2023-03-30 Thread DERUMIER, Alexandre
Hi, I would like to advise to correctly tune  osd_mclock_max_capacity_iops_ssd when you have multiple osd by nvme ? Does it need simply to divide the total iops of the nvme by number of osd ? But maybe it'll impact performance if more read/write are done on one of the osd ? I really don't

[ceph-users] RGW can't create bucket

2023-03-30 Thread kamil . madac
Hi, One of my customers had a correctly working RGW cluster with two zones in one zonegroup and since a few days ago users are not able to create buckets and are always getting Access denied. Working with existing buckets works (like listing/putting objects into existing bucket). The only

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-30 Thread Radoslaw Zarzynski
rados: approved! On Mon, Mar 27, 2023 at 7:02 PM Laura Flores wrote: > Rados review, second round: > > Failures: > 1. https://tracker.ceph.com/issues/58560 > 2. https://tracker.ceph.com/issues/58476 > 3. https://tracker.ceph.com/issues/58475 -- pending Q backport > 4.

[ceph-users] Re: Ceph Failure and OSD Node Stuck Incident

2023-03-30 Thread Ramin Najjarbashi
On Thu, Mar 30, 2023 at 6:08 PM wrote: > We encountered a Ceph failure where the system became unresponsive with no > IOPS or throughput after encountering a failed node. Upon investigation, it > appears that the OSD process on one of the Ceph storage nodes is stuck, but > ping is still

[ceph-users] Cephadm - Error ENOENT: Module not found

2023-03-30 Thread elia . oggian
Hello, After a successful upgrade of a Ceph cluster from 16.2.7 to 16.2.11, I needed to downgrade it back to 16.2.7 as I found an issue with the new version. I expected that running the downgrade with:`ceph orch upgrade start --ceph-version 16.2.7` should have worked fine. However, it blocked

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-30 Thread Kaleb Keithley
I'm not entirely following you. Your examples in your README.rst — lua-devel and nasm — are available in RHEL9 and CentOS Stream 9. They are in the CodeReady Builder repos. I sampled a few of the packages in https://copr.fedorainfracloud.org/coprs/ceph/el9/packages/ too. libev is in the base.

[ceph-users] Re: Eccessive occupation of small OSDs

2023-03-30 Thread Nathan Fish
When a single pg is a substantial percentage of an OSD (eg 10%) it's hard for the upmap balancer to do much. It's possible you'd have just as much space, or more, if you removed the 500GB HDDs. Another option might be to mdraid-0 the 500GB's in pairs, and make OSDs from the pairs; this would

[ceph-users] OSD down cause all OSD slow ops

2023-03-30 Thread petersun
We experienced a Ceph failure causing the system to become unresponsive with no IOPS or throughput due to a problematic OSD process on one node. This resulted in slow operations and no IOPS for all other OSDs in the cluster. The incident timeline is as follows: Alert triggered for OSD problem.

[ceph-users] ceph orch ps shows unknown in version, container and image id columns

2023-03-30 Thread anantha . adiga
Hi Has anybody noticed this issue? For all mgr, mon and osd daemons orch ps shows version, container and image ids as unknown. Ceph health is ok and all daemons are running fine. cephadm ls shows correct details of version, container and image ids . What could be the issue? and how

[ceph-users] Upgrade from 16.2.7. to 16.2.11 failing on OSDs

2023-03-30 Thread Lo Re Giuseppe
Dear all, On one of our clusters I started the upgrade process from 16.2.7 to 16.2.11. Mon and mgr and crash processes were done easily/quickly, then at the first attempt of upgrading a OSD container the upgrade process stopped because of the OSD process is not able to start after the upgrade.

[ceph-users] Workload performance varying between 2 executions

2023-03-30 Thread Nguetchouang Ngongang Kevin
Good morning. I just setup a ceph environment, 9 storage nodes and i mount it on a cephfs on a 10th independent node. I execute a fio workload once i got a 3Mb/s throughput. When i reexecute the same workload after a certain time i got this time a 9Mb/s throughput. Do you know why this

[ceph-users] Re: cephadm cluster move /var/lib/docker to separate device fails

2023-03-30 Thread anantha . adiga
Was there a resolution for this condition? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph orch ps mon, mgr, osd shows for version, image and container id

2023-03-30 Thread anantha . adiga
Hi , Why is ceph orch ps showing ,unknown version, image and container id ? root@a002s002:~# cephadm shell ceph mon versions Inferring fsid 682863c2-812e-41c5-8d72-28fd3d228598 Using recent ceph image quay.io/ceph/daemon@sha256:9889075a79f425c2f5f5a59d03c8d5bf823856ab661113fa17a8a7572b16a997

[ceph-users] Re: Unbalanced OSDs when pg_autoscale enabled

2023-03-30 Thread 郑亮
I set the target_size_ratio of pools by mistake as multiple pools sharing the same raw capacity. After I adjust it, a large number of pgs are in the backfill state, but the usage rate of osds is still growing, How do I need to adjust it? [root@node01 smd]# ceph osd pool autoscale-statusPOOL

[ceph-users] RGW can't create bucket

2023-03-30 Thread Kamil Madac
Hi, One of my customers had a correctly working RGW cluster with two zones in one zonegroup and since a few days ago users are not able to create buckets and are always getting Access denied. Working with existing buckets works (like listing/putting objects into existing bucket). The only

[ceph-users] Ceph Failure and OSD Node Stuck Incident

2023-03-30 Thread petersun
We encountered a Ceph failure where the system became unresponsive with no IOPS or throughput after encountering a failed node. Upon investigation, it appears that the OSD process on one of the Ceph storage nodes is stuck, but ping is still responsive. However, during the failure, Ceph was

[ceph-users] Eccessive occupation of small OSDs

2023-03-30 Thread Nicola Mori
Dear Ceph users, my cluster is made up of 10 old machines, with uneven number of disks and disk size. Essentially I have just one big data pool (6+2 erasure code, with host failure domain) for which I am currently experiencing a very poor available space (88 TB of which 40 TB occupied, as

[ceph-users] compiling Nautilus for el9

2023-03-30 Thread Marc
Is it possible to compile Nautilus for el9? Or maybe just the osd's? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW can't create bucket

2023-03-30 Thread Eugen Block
Hi, what ceph version is this? Could you have hit some quota? Zitat von Kamil Madac : Hi, One of my customers had a correctly working RGW cluster with two zones in one zonegroup and since a few days ago users are not able to create buckets and are always getting Access denied. Working with

[ceph-users] Re: RGW access logs with bucket name

2023-03-30 Thread Boris Behrens
Sadly not. I only see the the path/query of a request, but not the hostname. So when a bucket is accessed via hostname (https://bucket.TLD/object?query) I only see the object and the query (GET /object?query). When a bucket is accessed bia path (https://TLD/bucket/object?query) I can see also the

[ceph-users] Re: RGW access logs with bucket name

2023-03-30 Thread Szabo, Istvan (Agoda)
It has the full url begins with the bucket name in the beast logs http requests, hasn’t it? Istvan Szabo Staff Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: RGW access logs with bucket name

2023-03-30 Thread Boris Behrens
Bringing up that topic again: is it possible to log the bucket name in the rgw client logs? currently I am only to know the bucket name when someone access the bucket via https://TLD/bucket/object instead of https://bucket.TLD/object. Am Di., 3. Jan. 2023 um 10:25 Uhr schrieb Boris Behrens : >

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-30 Thread Boris Behrens
A short correction: The IOPS from the bench in out pacific cluster are also down to 40 again for the 4/8TB disks , but the apply latency seems to stay in the same place. But I still don't understand why it is down again. Even when I synced out the OSD so it receives 0 traffic it is still slow.

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-30 Thread Boris Behrens
After some digging in the nautilus cluster I see that the disks with the exceptional high IOPS performance are actually SAS attached NVME disks (these: https://semiconductor.samsung.com/ssd/enterprise-ssd/pm1643-pm1643a/mzilt7t6hala-7/ ) and these disk make around 45% of cluster capacity.