[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-06 Thread Justin Lee
The actual mount command doesn't hang, we just can't interact with any of the directory's contents once mounted. I couldn't find anything unusual in the logs. Best, Justin Lee On Fri, Aug 2, 2024 at 10:38 AM Dhairya Parmar wrote: > So the mount hung? Can you see anything

[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-06 Thread Justin Lee
Hi Dhairya, Thanks for the response! We tried removing it as you suggested with `rm -rf` but the command just hangs indefinitely with no output. We are also unable to `ls lost_found`, or otherwise interact with the directory's contents. Best, Justin lee On Fri, Aug 2, 2024 at 8:24 AM Dh

[ceph-users] Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-01 Thread Justin Lee
After we updated our ceph cluster from 17.2.7 to 18.2.0 the MDS kept being marked as damaged and stuck in up:standby with these errors in the log. debug-12> 2024-07-14T21:22:19.962+ 7f020cf3a700 1 mds.0.cache.den(0x4 1000b3bcfea) loaded already corrupt dentry: [dentry #0x1/lost+found/1000

[ceph-users] RGW/Lua script does not show logs

2024-04-08 Thread soyoon . lee
Hello, I wrote a Lua script in order to retrieve RGW logs such as bucket name, bucket owner, etc. However, when I apply a lua script I wrote using the below command, I do not see any logs start with Lua: INFO radosgw-admin script put --infile=/usr/tmp/testPreRequest.lua --context=postrequest

[ceph-users] Re: Ceph OSDs suddenly use public network for heardbeat_check

2023-05-17 Thread Lee, H. (Hurng-Chun)
On Wed, 2023-05-17 at 17:23 +, Marc wrote: > > > > > > In fact, when we start up the cluster, we don't have DNS available > > to > > resolve the IP addresses, and for a short while, all OSDs are > > located > > in a new host called "localhost.localdomain".  At that point, I > > fixed > > it b

[ceph-users] Ceph OSDs suddenly use public network for heardbeat_check

2023-05-17 Thread Lee, H. (Hurng-Chun)
we could fix it and get the OSDs to use cluster network to do heartbeat checks. Any help would be highly appreciated. Thank you very much. Cheers, Hong -- Hurng-Chun (Hong) Lee, PhD ICT manager Donders Institute for Brain, Cognition and Behaviour,  Centre for Cognitive Neuroimaging Radboud Univ

[ceph-users] Re: OSD booting gets stuck after log_to_monitors step

2022-11-30 Thread Felix Lee
Dear experts, Sorry, I missed to mention that the initial symptom is that those OSDs will suffer: "wait_auth_rotating timed out" and "unable to obtain rotating service keys; retrying" I then increased rotating_keys_bootstrap_timeout, but it doesn't really help. Best

[ceph-users] OSD booting gets stuck after log_to_monitors step

2022-11-30 Thread Felix Lee
ithout_osd_lock The ceph version is Octopus: 15.2.17. OSD storage backend: bluestore OS: CentOS7 64bit. Any idea? Thanks & Best regards, Felix Lee ~ -- Felix H.T Lee Academia Sinica Grid & Cloud. Tel: +886-2-27898308 Office: Room P111, Institute of Phy

[ceph-users] Re: [**SPAM**] Re: cephadm node-exporter extra_container_args for textfile_collector

2022-10-31 Thread Lee Carney
Much appreciated. From: Adam King Sent: 28 October 2022 19:25 To: Lee Carney Cc: Wyll Ingersoll; ceph-users@ceph.io Subject: [**SPAM**] [ceph-users] Re: cephadm node-exporter extra_container_args for textfile_collector We had actually considered adding an

[ceph-users] Re: A lot of pg repair, IO performance drops seriously

2022-10-29 Thread Frank Lee
> resolved later, it's probably not critical at the moment. > If the slow requests resolve you can repair one PG at a time after > inspecting the output of 'rados -p list-inconsistent-obj > '. > > > Zitat von Frank Lee : > > > Hi again, > > &

[ceph-users] A lot of pg repair, IO performance drops seriously

2022-10-28 Thread Frank Lee
Hi again, My CEPH came up a while ago: 3 pgs not deep-scrubbed in time. I googled to increase osd_scrub_begin_hour and osd_scrub_end_hour but not seems to work. There was a discussion on proxmox, a similar situation, he ran "ceph osd repair all" and got it fixed. But it doesn't seem to work a da

[ceph-users] Re: cephadm node-exporter extra_container_args for textfile_collector

2022-10-28 Thread Lee Carney
y etc From: Wyll Ingersoll Sent: 28 October 2022 15:19:17 To: Lee Carney; ceph-users@ceph.io Subject: Re: cephadm node-exporter extra_container_args for textfile_collector I ran into the same issue - wanted to add the textfile.directory to the node_exporter using "extra_cont

[ceph-users] cephadm node-exporter extra_container_args for textfile_collector

2022-10-27 Thread Lee Carney
Has anyone had success in using cephadm to add extra_container_args onto the node-exporter config? For example changing the collector config. I am trying and failing using the following: 1. Create ne.yml service_type: node-exporter service_name: node-exporter placement: host_pattern: '*'

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Felix Lee
gives us good motivation to speed up the Ceph upgrade. Again, thanks you all for the great inputs & Best regards, Felix Lee ~ On 5/17/22 19:41, Dan van der Ster wrote: Hi Felix, "rejoin" took awhile in the past because the MDS needs to reload all inodes for all the open directorie

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Felix Lee
ere is any way for us to estimate the rejoin time? So that we can decide whether to wait or take proactive action if necessary. Best regards, Felix Lee ~ On 5/17/22 16:15, Jos Collin wrote: I suggest you to upgrade the cluster to the latest release [1], as nautilus reached EOL.

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-16 Thread Felix Lee
o 20 for a while as ceph-mds.ceph16.log-20220516.gz Thanks & Best regards, Felix Lee ~ On 5/16/22 14:45, Jos Collin wrote: It's hard to suggest without the logs. Do verbose logging debug_mds=20. What's the ceph version? Do you have the logs why the MDS crashed? On 16/05/22 11:

[ceph-users] Reasonable MDS rejoin time?

2022-05-15 Thread Felix Lee
oin time and maybe improve it? because we always need to tell user the time estimation of its recovery. Thanks & Best regards, Felix Lee ~ -- Felix H.T Lee Academia Sinica Grid & Cloud. Tel: +886-2-27898308 Office: Room P111, Institute of Physics, 128 Academia

[ceph-users] Re: OSDs use 200GB RAM and crash

2022-01-11 Thread Lee
We had the exact same issue last week, in the end unless the dataset can fit in memory it will never boot.. To be honest this bug seems to being seen by quite a few, in our case it happened after a PGNUM change on a pool.. In the end I had to manually export the PG's from the OSD, ad them back in

[ceph-users] Re: Correct Usage of the ceph-objectstore-tool??

2022-01-06 Thread Lee
. Patrakov wrote: > пт, 7 янв. 2022 г. в 06:21, Lee : > >> Hello, >> >> As per another post I been having a huge issue since a PGNUM increase took >> my cluster offline.. >> >> I have got to a point where I have just 20 PG's Down / Unavailable due to >>

[ceph-users] Correct Usage of the ceph-objectstore-tool??

2022-01-06 Thread Lee
PH to used the pg on the OSD to rebuild? When I query the PG at the end it complains about marking the offline OSD as offline? I have looked online and cannot find a definitive guide on how the process / steps that should be taken. Cheers Lee ___ ceph-

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Lee
I tried with disk based swap on a SATA SSD. I think that might be the last option. I have exported already all the down PG's from the OSD that they are waiting for. Kind Regards Lee On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov wrote: > пт, 7 янв. 2022 г. в 00:50, Alexander E.

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
"bytes": 4854818176 }, "osdmap": { "items": 3792, "bytes": 140872 }, "osdmap_mapping": { "items": 0, "bytes": 0

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
d memory target", and "mds > cache memory limit". Osd processes have become noisy neighbors in the last > few versions. > > > > On Wed, Jan 5, 2022 at 1:47 PM Lee wrote: > >> I'm not rushing, >> >> I have found the issue, Im am getting OOM

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
bb-ceph-enc-rm63-osd03-31 init.scope Stopped Ceph object storage daemon osd.51. I have just increased the RAM physically in one of the node's removed the other OSD's physically for now and managed to get one of the 3 down to come up. Just stepping through each at the moment. Regards Lee

[ceph-users] Help - Multiple OSD's Down

2022-01-05 Thread Lee
rm63-osd03-31 init.scope ceph-osd@51.service: Scheduled restart job, restart counter is at 2. The problem I have this has basically taken the production and metadata SSD pool's down fully and all 3 copies are offline. And I cannot find a way to find out what is causing these to crash. Kind Regards Lee ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-26 Thread Jerry Lee
remapped" comment: "not enough up instances of this PG to go active" With only 2 OSDs out, a PG of the EC8+3 pool enters "down+remapped" state. So, it seems that the min_size of a erasure coded K+M pool should be set to K+1 which ensures that the data is intact even o

[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-25 Thread Jerry Lee
two more failures in the system > without losing data (or losing access to data, given that min_size=k, > though I believe it's recommended to set min_size=k+1). > > However, that sequence of acting sets doesn't make a whole lot of > sense to me for a single OSD failure (thou

[ceph-users] Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-23 Thread Jerry Lee
Hello, I would like to know the maximum number of node failures for a EC8+3 pool in a 12-node cluster with 3 OSDs in each node. The size and min_size of the EC8+3 pool is configured as 11 and 8, and OSDs of each PG are selected by host. When there is no node failure, the maximum number of node f

[ceph-users] Re: Problem with centos7 repository

2020-07-08 Thread Lee, H. (Hurng-Chun)
ng list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Hurng-Chun (Hong) Lee, PhD ICT manager Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging Radboud University Nijmegen e-mail: h@donders.ru.nl tel: +31(0) 243610

[ceph-users] Re: YUM doesn't find older release version of nautilus

2020-07-02 Thread Lee, H. (Hurng-Chun)
Hi, On Thu, 2020-07-02 at 16:15 +0200, Janne Johansson wrote: Den tors 2 juli 2020 kl 14:42 skrev Lee, H. (Hurng-Chun) mailto:h@donders.ru.nl>>: Hi, We use the official Ceph RPM repository (http://download.ceph.com/rpm- nautilus/el7<http://download.ceph.com/rpm-nautilus/el7>) fo

[ceph-users] YUM doesn't find older release version of nautilus

2020-07-02 Thread Lee, H. (Hurng-Chun)
t the official repo no longer provides RPM packages for older versions? Thanks! Cheers, Hong -- Hurng-Chun (Hong) Lee, PhD ICT manager Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging Radboud University Nijmegen e-mail: h@donders.ru.nl tel:

[ceph-users] large difference between "STORED" and "USED" size of ceph df

2020-05-03 Thread Lee, H. (Hurng-Chun)
the actual stored data is less. Is my interpretation correct? If so, does it mean that we will be wasting a lot of space when we have a lot files smaller than the object size of 4MB in the system? Thanks for the help! Cheers, Hong -- Hurng-Chun (Hong) Lee, PhD ICT manager Donders Institute for

[ceph-users] Ceph Ansible - - name: set grafana_server_addr fact - ipv4

2019-08-28 Thread Lee Norvall
Hi Ceph: nautilus (14.2.2) NFS-Ganesha v 2.8 ceph-ansible stable 4.0 << git checkout 28th Aug CentOS 7 I am trying to do a fresh installation using Ceph Ansible and I am getting the following error when running the playbook.  I have not enabled or config dashboard/grafana/prometheus yet. fatal