[ceph-users] Re: Help

2020-08-17 Thread Jarett DeAngelis
Configuring it with respect to what about these applications? What are you trying to do? Do you have existing installations of any of these? We need a little more about your requirements. > On Apr 17, 2020, at 1:14 PM, Randy Morgan wrote: > > We are seeking information on configuring Ceph to w

[ceph-users] Re: Help

2020-08-17 Thread DHilsbos
Randy; Nextcloud is easy, it has a "standard" S3 client capability, though it also has Swift client capability. As a S3 client, it does look for the older path style (host/bucket), rather than Amazons newer DNS style (bucket.host). You can find information on configuring Nextcloud's primary st

[ceph-users] Re: help

2019-08-29 Thread Janne Johansson
Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P : > Hi, > > I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs. > my ceph health status showing warning . > > "ceph health" > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects degraded > (15.499%) > > "ceph health deta

[ceph-users] Re: help

2019-08-29 Thread Heðin Ejdesgaard Møller
In adition to ceph -s, could you provide the output of ceph osd tree and specify what your failure domain is ? /Heðin On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote: > > > Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P : > > Hi, > > > > I am using ceph version 13.2.6 (mimic) on tes

[ceph-users] Re: help

2019-08-29 Thread Amudhan P
output from "ceph -s " cluster: id: 7c138e13-7b98-4309-b591-d4091a1742b4 health: HEALTH_WARN Degraded data redundancy: 1141587/7723191 objects degraded (14.781%), 15 pgs degraded, 16 pgs undersized services: mon: 1 daemons, quorum mon01 mgr: mon01(active) m

[ceph-users] Re: help

2019-08-29 Thread Heðin Ejdesgaard Møller
What's the output of ceph osd pool ls detail On hós, 2019-08-29 at 18:06 +0530, Amudhan P wrote: > output from "ceph -s " > > cluster: > id: 7c138e13-7b98-4309-b591-d4091a1742b4 > health: HEALTH_WARN > Degraded data redundancy: 1141587/7723191 objects > degraded (14.78

[ceph-users] Re: help

2019-08-29 Thread Amudhan P
output from "ceph osd pool ls detail" pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 74 lfor 0/64 flags hashpspool stripe_width 0 application cephfs pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash

[ceph-users] Re: help

2019-08-29 Thread Burkhard Linke
Hi, ceph uses a pseudo random distribution within crush to select the target hosts. As a result, the algorithm might not be able to select three different hosts out of three hosts in the configured number of tries. The affected PGs will be shown as undersized and only list two OSDs instead o

[ceph-users] Re: help

2019-08-29 Thread Caspar Smit
Hi, This output doesn't show anything 'wrong' with the cluster. It's just still recovering (backfilling) from what seems like one of your OSD's crashed and restarted. The backfilling is taking a while because max_backfills = 1 and you only have 3 OSD's total so the backfilling per PG has to have f

[ceph-users] Re: help

2019-08-30 Thread Amudhan P
After leaving 12 hours time now cluster status is healthy, but why did it take such a long time for backfill? How do I fine-tune? if in case of same kind error pop-out again. On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit wrote: > Hi, > > This output doesn't show anything 'wrong' with the cluster

[ceph-users] Re: help

2019-08-30 Thread Janne Johansson
Den fre 30 aug. 2019 kl 10:49 skrev Amudhan P : > After leaving 12 hours time now cluster status is healthy, but why did it > take such a long time for backfill? > How do I fine-tune? if in case of same kind error pop-out again. > > The backfilling is taking a while because max_backfills = 1 and y

[ceph-users] Re: help

2019-08-30 Thread Amudhan P
my cluster health status went to warning mode only after running mkdir of 1000's of folders with multiple subdirectories. if this has made OSD crash does it really takes that long to heal empty directories. On Fri, Aug 30, 2019 at 3:12 PM Janne Johansson wrote: > Den fre 30 aug. 2019 kl 10:49 sk

[ceph-users] Re: Help with Mirroring

2024-07-11 Thread Anthony D'Atri
> > I would like to use mirroring to facilitate migrating from an existing > Nautilus cluster to a new cluster running Reef. RIght now I'm looking at > RBD mirroring. I have studied the RBD Mirroring section of the > documentation, but it is unclear to me which commands need to be issued on > ea

[ceph-users] Re: Help with Mirroring

2024-07-11 Thread Eugen Block
Hi, just one question coming to mind, if you intend to migrate the images separately, is it really necessary to set up mirroring? You could just 'rbd export' on the source cluster and 'rbd import' on the destination cluster. Zitat von Anthony D'Atri : I would like to use mirroring to

[ceph-users] Re: Help with Mirroring

2024-07-12 Thread Frédéric Nass
- Le 11 Juil 24, à 20:50, Dave Hall kdh...@binghamton.edu a écrit : > Hello. > > I would like to use mirroring to facilitate migrating from an existing > Nautilus cluster to a new cluster running Reef. RIght now I'm looking at > RBD mirroring. I have studied the RBD Mirroring section of th

[ceph-users] Re: Help with Mirroring

2024-07-12 Thread Anthony D'Atri
> Hi, > > just one question coming to mind, if you intend to migrate the images > separately, is it really necessary to set up mirroring? You could just 'rbd > export' on the source cluster and 'rbd import' on the destination cluster. That can be slower if using a pipe, and require staging sp

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Eugen Block
Hi, is there any chance to recover the other failing OSDs that seem to have one chunk of this PG? Do the other OSDs fail with the same error? Zitat von Jake Grimmett : Dear All, We are "in a bit of a pickle"... No reply to my message (23/03/2020),  subject  "OSD: FAILED ceph_assert(clo

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Jake Grimmett
Hi Eugen, Many thanks for your reply. The other two OSD's are up and running, and being used by other pgs with no problem, for some reason this pg refuses to use these OSD's. The other two OSDs that are missing from this pg crashed at different times last month, each OSD crashed when we trie

[ceph-users] Re: Help: corrupt pg

2020-03-26 Thread Gregory Farnum
On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett wrote: > > Dear All, > > We are "in a bit of a pickle"... > > No reply to my message (23/03/2020), subject "OSD: FAILED > ceph_assert(clone_size.count(clone))" > > So I'm presuming it's not possible to recover the crashed OSD From your later email i

[ceph-users] Re: Help: corrupt pg

2020-03-27 Thread Jake Grimmett
Hi Greg, Yes, this was caused by a chain of event. As a cautionary tale, the main ones were: 1) minor nautilus release upgrade, followed by a rolling node restart script that mistakenly relied on "ceph -s" for cluster health info, i.e. it didn't wait for the cluster to return to health bef

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
First of all, do not rush into bad decisions. Production is down and you wanna make it online but you should fix the problem and be sure first. If a second crash occurs in a healing state you will lose metadata. You don't need to debug first! You didn't mention your cluster status and we don't kno

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
I'm not rushing, I have found the issue, Im am getting OOM errors as the OSD boots, basically is starts to process the PG's and then the node runs out of memory and the daemon kill's 2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51 2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
It's nice to hear that. You can also decrease the osd ram usage from 4gb to 2gb. If you have enough spare ram go for it. Good luck. Lee , 6 Oca 2022 Per, 00:46 tarihinde şunu yazdı: > > I'm not rushing, > > I have found the issue, Im am getting OOM errors as the OSD boots, basically > is starts t

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Mazzystr
and that is exactly why I run osds containerized with limited cpu and memory as well as "bluestore cache size", "osd memory target", and "mds cache memory limit". Osd processes have become noisy neighbors in the last few versions. On Wed, Jan 5, 2022 at 1:47 PM Lee wrote: > I'm not rushing, >

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
The first OSD took 156Gb of Ram to boot.. :( Is there a easy way to stop Mempool pulling so much memory. On Wed, 5 Jan 2022 at 22:12, Mazzystr wrote: > and that is exactly why I run osds containerized with limited cpu and > memory as well as "bluestore cache size", "osd memory target", and "mds

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
For Example top - 22:53:47 up 1:29, 2 users, load average: 2.23, 2.08, 1.92 Tasks: 255 total, 2 running, 253 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.2 us, 4.5 sy, 0.0 ni, 91.1 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 161169.7 total, 23993.9 free, 132036.5 used, 5139.3 buff/

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Igor Fedotov
Hi Lee, could you please raise debug-bluestore and debug-osd to 20 (via ceph tell osd.N injectargs command) when OSD starts to eat up the RAM. Then drop it back to defaults after a few seconds (10s is enough) to avoid huge log size and share the resulting OSD log. Also I'm curious if you hav

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc
Running your osd's with resource limitations is not so straightforward. I can guess that if you are running close to full resource utilization on your nodes, it makes more sense to make sure everything stays as much within their specified limits. (Aside from the question if you would even want t

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc
> I assume the huge memory consumption is temporary. Once the OSD is up and > stable, it would release the memory. > > So how about allocate a large swap temporarily just to let the OSD up. I > remember that someone else on the list have resolved a similar issue with > swap. But is this alread

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
чт, 6 янв. 2022 г. в 12:21, Lee : > I've tried add a swap and that fails also. > How exactly did it fail? Did you put it on some disk, or in zram? In the past I had to help a customer who hit memory over-use when upgrading Ceph (due to shallow_fsck), and we were able to fix it by adding 64 GB GB

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov : > чт, 6 янв. 2022 г. в 12:21, Lee : > >> I've tried add a swap and that fails also. >> > > How exactly did it fail? Did you put it on some disk, or in zram? > > In the past I had to help a customer who hit memory over-use when > upgrading Ceph (d

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Lee
I tried with disk based swap on a SATA SSD. I think that might be the last option. I have exported already all the down PG's from the OSD that they are waiting for. Kind Regards Lee On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov wrote: > пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right direction. Thank

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Thanks for replying, Greg. I'll give you a detailed sequence I did on the upgrade at below. Step 1: upgrade ceph mgr and Monitor --- reboot. Then mgr and mon are all up running. Step 2: upgrade one OSD node --- reboot and OSDs are all up. Step 3: upgrade a second OSD node named OSD-node2. I did

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Patrick Donnelly
Hello Justin, On Tue, May 23, 2023 at 4:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right d

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Thanks Patrick. We're making progress! After issuing below cmd (ceph config) you gave me, ceph cluster health shows HEALTH_WARN and mds is back up. However, cephfs can't be mounted showing below error. Ceph mgr portal also show 500 internal error when I try to browse the cephfs folder. I'll be u

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Sorry Patrick, last email was restricted as attachment size. I attached a link for you to download the log. Thanks. https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing Justin Li Senior Technical Officer School of Information Technology Faculty of Science, Enginee

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Hi Patrick, Sorry for keeping bothering you but I found that MDS service kept crashing even cluster shows MDS is up. I attached another log of MDS server - eowyn at below. Look forward to hearing more insights. Thanks a lot. https://drive.google.com/file/d/1nD_Ks7fNGQp0GE5Q_x8M57HldYurPhuN/view

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Patrick Donnelly
Hello Justin, Please do: ceph config set mds debug_mds 20 ceph config set mds debug_ms 1 Then wait for a crash. Please upload the log. To restore your file system: ceph config set mds mds_abort_on_newly_corrupt_dentry false Let the MDS purge the strays and then try: ceph config set mds mds_a

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Justin Li
Hi Patrick, Thanks for the instructions. We started the MDS recovery scan with below cmds following the link below. The first bit of scan extens has finished and we're waiting on scan inodes. Probably we shouldn't interrupt the process. Once this procedure failed, I'll follow your steps and let

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-25 Thread Justin Li
Hi Patrick, The disaster recovery process with cephfs-data-scan tool didn't fix our MDS issue. It still kept crashing. I've uploaded a detailed MDS log with below ID. The restore procedure below didn't get it working either. Should I set mds_go_bad_corrupt_dentry to false alongside with mds_ab

[ceph-users] Re: Help needed with Grafana password

2023-11-08 Thread Eugen Block
Hi, you mean you forgot your password? You can remove the service with 'ceph orch rm grafana', then re-apply your grafana.yaml containing the initial password. Note that this would remove all of the grafana configs or custom dashboards etc., you would have to reconfigure them. So before do

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
Hi, Well to get promtail working with Loki, you need to setup a password in Grafana. But promtail wasn't working with the 17.2.6 release, the URL was set to containers.local. So I stopped using it, but forgot to click on save in KeePass :( I didn't configure anything special in Grafana, the

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
To bad, that doesn't work :( > Op 09-11-2023 09:07 CET schreef Sake Ceph : > > > Hi, > > Well to get promtail working with Loki, you need to setup a password in > Grafana. > But promtail wasn't working with the 17.2.6 release, the URL was set to > containers.local. So I stopped using it, bu

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
What doesn't work exactly? For me it did... Zitat von Sake Ceph : To bad, that doesn't work :( Op 09-11-2023 09:07 CET schreef Sake Ceph : Hi, Well to get promtail working with Loki, you need to setup a password in Grafana. But promtail wasn't working with the 17.2.6 release, the URL was

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
Using podman version 4.4.1 on RHEL 8.8, Ceph 17.2.7 I used 'podman system prune -a -f' and 'podman volume prune -f' to cleanup files, but this leaves a lot of files over in /var/lib/containers/storage/overlay and a empty folder /var/lib/ceph//custom_config_files/grafana.. Found those files with

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
Usually, removing the grafana service should be enough. I also have this directory (custom_config_files/grafana.) but it's empty. Can you confirm that after running 'ceph orch rm grafana' the service is actually gone ('ceph orch ls grafana')? The directory underneath /var/lib/ceph/{fsid}/gr

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
I tried everything at this point, even waited a hour, still no luck. Got it 1 time accidentally working, but with a placeholder for a password. Tried with correct password, nothing and trying again with the placeholder didn't work anymore. So I thought to switch the manager, maybe something is

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
I just tried it on a 17.2.6 test cluster, although I don't have a stack trace the complicated password doesn't seem to be applied (don't know why yet). But since it's an "initial" password you can choose something simple like "admin", and during the first login you are asked to change it an

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
It's the '#' character, everything after (including '#' itself) is cut off. I tried with single and double quotes which also failed. But as I already said, use a simple password and then change it within grafana. That way you also don't have the actual password lying around in clear text in

[ceph-users] Re: Help needed with Grafana password

2023-11-10 Thread Sake Ceph
Thank you Eugen! This worked :) > Op 09-11-2023 14:55 CET schreef Eugen Block : > > > It's the '#' character, everything after (including '#' itself) is cut > off. I tried with single and double quotes which also failed. But as I > already said, use a simple password and then change it with

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Anthony D'Atri
* Try applying the settings to global so that mons/mgrs get them. * Set your shallow scrub settings back to the default. Shallow scrubs take very few resources * Set your randomize_ratio back to the default, you’re just bunching them up * Set the load threshold back to the default, I can’t ima

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Nicola Mori
Hi Anthony, thanks for the tips. I reset all the values but osd_deep_scrub_interval to their defaults as reported at https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ : # ceph config set osd osd_scrub_sleep 0.0 # ceph config set osd osd_scrub_load_threshold 0.5 # ceph config

[ceph-users] Re: Help with deep scrub warnings

2024-05-23 Thread Sascha Lucas
Hi, just for the archives: On Tue, 5 Mar 2024, Anthony D'Atri wrote: * Try applying the settings to global so that mons/mgrs get them. Setting osd_deep_scrub_interval at global instead at osd immediately turns health to OK and removes the false warning from PGs not scrubbed in time. HTH,

[ceph-users] Re: Help with osd spec needed

2024-08-02 Thread Eugen Block
Hi, if you assigned the SSD to be for block.db it won't be available from the orchestrator's point of view as a data device. What you could try is to manually create a partition or LV on the remaining SSD space and then point the service spec to that partition/LV via path spec. I haven't

[ceph-users] Re: HELP NEEDED : cephadm adopt osd crash

2022-11-08 Thread Eugen Block
You can either provide an image with the adopt command (—image) or you configure it globally with ceph config set (I don’t have the exact command right now). Which image does it fail to pull? You should see that in cephadm.log. Does that node with osd.17 have access to the image repo? Zit

[ceph-users] Re: Help on rgw metrics (was rgw_user_counters_cache)

2024-01-31 Thread Casey Bodley
On Wed, Jan 31, 2024 at 3:43 AM garcetto wrote: > > good morning, > i was struggling trying to understand why i cannot find this setting on > my reef version, is it because is only on latest dev ceph version and not > before? that's right, this new feature will be part of the squid release. we

[ceph-users] Re: Help needed to recover 3node-cluster

2022-01-03 Thread Michael Moyles
You should prioritise recovering quorum of your monitors. Cephs documentation can help here https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/ Check to see if the failed mon is still part of the monmap on the other nodes, if it is you might need to remove it manually (which

[ceph-users] Re: help with failed osds after reboot

2020-06-12 Thread Eugen Block
Hi, which ceph release are you using? You mention ceph-disk so your OSDs are not LVM based, I assume? I've seen these messages a lot when testing in my virtual lab environment although I don't believe it's the cluster's fsid but the OSD's fsid that's in the error message (the OSDs have th

[ceph-users] Re: help with failed osds after reboot

2020-06-12 Thread Marc Roos
Maybe you have the same issue? https://tracker.ceph.com/issues/44102#change-167531 In my case an update(?) disabled osd runlevels. systemctl is-enabled ceph-osd@0 -Original Message- To: ceph-users@ceph.io Subject: [ceph-users] Re: help with failed osds after reboot Hi, which ceph

[ceph-users] Re: help with failed osds after reboot

2020-06-15 Thread seth . duncan2
Ceph version 10.2.7 ceph.conf [global] fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 mon_initial_members = chad, jesse, seth mon_host = 192.168.10.41,192.168.10.40,192.168.10.39 mon warn on legacy crush tunables = false auth_cluster_required = cephx auth_service_required = cephx auth_client_require

[ceph-users] Re: help with failed osds after reboot

2020-06-15 Thread Paul Emmerich
On Mon, Jun 15, 2020 at 7:01 PM wrote: > Ceph version 10.2.7 > > ceph.conf > [global] > fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 > (...) > mount_activate: Failed to activate > ceph-disk: Error: No cluster conf found in /etc/ceph with fsid > e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 > -- Paul

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Dan van der Ster
Hi Jasper, I suggest to disable all the crush-compat and reweighting approaches. They rarely work out. The state of the art is: ceph balancer on ceph balancer mode upmap ceph config set mgr mgr/balancer/upmap_max_deviation 1 Cheers, Dan -- Dan van der Ster CTO Clyso GmbH p: +49 89 215252722 |

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Anthony D'Atri
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had > 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago, > we onboarded an additional 20 OSDs of 14TiB each. That's a big difference in size. I suggest increasing mon_max_pg_per_osd to 1000 --

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Jasper Tan
Hi Anthony and everyone else We have found the issue. Because the new 20x 14 TiB OSDs were onboarded onto a single node, there was not only an imbalance in the capacity of each OSD but also between the nodes (other nodes each have around 15x 1.7TiB). Furthermore, CRUSH rule sets default failure do

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-03 Thread Xiubo Li
Hi Nicolas, This is a known issue and Venky is working on it, please see https://tracker.ceph.com/issues/63259. Thanks - Xiubo On 6/3/24 20:04, nbarb...@deltaonline.net wrote: Hello, First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with 4 nodes, everythin

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi, A little break into this thread, but I have some questions: * How does this happen, that the filesystem gets into readonly modus * Is this avoidable? * How-to fix the issue, because I didn't see a workaround in the mentioned tracker (or I missed it) * With this bug around, should you use c

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Xiubo Li
On 6/4/24 15:20, Sake Ceph wrote: Hi, A little break into this thread, but I have some questions: * How does this happen, that the filesystem gets into readonly modus The detail explanation you can refer to the ceph PR: https://github.com/ceph/ceph/pull/55421. * Is this avoidable? * How-

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi Xiubo Thank you for the explanation! This won't be a issue for us, but made me think twice :) Kind regards, Sake > Op 04-06-2024 12:30 CEST schreef Xiubo Li : > > > On 6/4/24 15:20, Sake Ceph wrote: > > Hi, > > > > A little break into this thread, but I have some questions: > > * How d

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread nbarbier
First, thanks Xiubo for your feedback ! To go further on the points raised by Sake: - How does this happen ? -> There were no preliminary signs before the incident - Is this avoidable? -> Good question, I'd also like to know how! - How to fix the issue ? -> So far, no fix nor workaround from w

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-07-14 Thread Olli Rajala
Hi, I believe our KL studio has hit this same bug after deleting a pool that was used only for testing. So, is there any procedure to get rid of those bad journal events and get the mds back to rw state? Thanks, --- Olli Rajala - Lead TD Anima Vitae Ltd. www.anima.fi -

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello, is podman installed on the new node? also make sure the NTP time sync is on for new node. The ceph orch checks those on the new node and then dies if not ready with an error like you see. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubs

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread davidthuong2424
hello, i use docker, i will check ntp, Do new node need to be installed? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello, Yes, make sure docker & ntp is setup on the new node first. Also, make sure the public key is added on the new node and firewall is allowing it through ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread Hoài Thương
Will do, thanks! Vào Th 4, 22 thg 7, 2020 vào lúc 12:27 steven prothero < ste...@marimo-tech.com> đã viết: > Hello, > > Yes, make sure docker & ntp is setup on the new node first. > Also, make sure the public key is added on the new node and firewall > is allowing it through > _

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-22 Thread David Thuong
tks you, after install docker for new node, i can add node ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-22 Thread Hoài Thương
it working Vào Th 4, 22 thg 7, 2020 vào lúc 14:41 David Thuong < davidthuong2...@gmail.com> đã viết: > tks you, after install docker for new node, i can add node > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to c

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-04 Thread Michel Jouvin
Answering to myself, I found the reason for 2147483647: it's documented as a failure to find enough OSD (missing OSDs). And it is normal as I selected different hosts for the 15 OSDs but I have only 12 hosts! I'm still interested by an "expert" to confirm that LRC  k=9, m=3, l=4 configuration

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-05 Thread Michel Jouvin
Hi, Is somebody using LRC plugin ? I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as jerasure k=9, m=6 in terms of protection against failures and that I should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9, m=6. The example in the documentation (k=4, m=2, l

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-24 Thread Michel Jouvin
Hi, I'm still interesting by getting feedback from those using the LRC plugin about the right way to configure it... Last week I upgraded from Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host by host, checking if an OSD is ok to stop before actually upgrading it. I had

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-28 Thread Michel Jouvin
Hi, I think I found a possible cause of my PG down but still understand why. As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9, m=6) but I have only 12 OSD servers in the cluster. To workaround the problem I defined the failure domain as 'osd' with the reasoning that as I w

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Curt
Hello, What is your current setup, 1 server pet data center with 12 osd each? What is your current crush rule and LRC crush rule? On Fri, Apr 28, 2023, 12:29 Michel Jouvin wrote: > Hi, > > I think I found a possible cause of my PG down but still understand why. > As explained in a previous mai

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Michel Jouvin
Hi, No... our current setup is 3 datacenters with the same configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12 OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal of bei

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-02 Thread Eugen Block
Hi, disclaimer: I haven't used LRC in a real setup yet, so there might be some misunderstandings on my side. But I tried to play around with one of my test clusters (Nautilus). Because I'm limited in the number of hosts (6 across 3 virtual DCs) I tried two different profiles with lower nu

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-03 Thread Eugen Block
I think I got it wrong with the locality setting, I'm still limited by the number of hosts I have available in my test cluster, but as far as I got with failure-domain=osd I believe k=6, m=3, l=3 with locality=datacenter could fit your requirement, at least with regards to the recovery band

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Michel Jouvin
Hi, I had to restart one of my OSD server today and the problem showed up again. This time I managed to capture "ceph health detail" output showing the problem with the 2 PGs: [WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down     pg 56.1 is down, acting [208,65,73,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Eugen Block
Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : Hi, I had to restart one of my OSD server today and the problem showed up again

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Frank Schilder
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : > Hi,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-16 Thread Michel Jouvin
Hi Eugen, Yes, sure, no problem to share it. I attach it to this email (as it may clutter the discussion if inline). If somebody on the list has some clue on the LRC plugin, I'm still interested by understand what I'm doing wrong! Cheers, Michel Le 04/05/2023 à 15:07, Eugen Block a écrit 

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-17 Thread Curt
Hi, I've been following this thread with interest as it seems like a unique use case to expand my knowledge. I don't use LRC or anything outside basic erasure coding. What is your current crush steps rule? I know you made changes since your first post and had some thoughts I wanted to share, but

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-18 Thread Eugen Block
Hi, I don’t have a good explanation for this yet, but I’ll soon get the opportunity to play around with a decommissioned cluster. I’ll try to get a better understanding of the LRC plugin, but it might take some time, especially since my vacation is coming up. :-) I have some thoughts about th

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-21 Thread Michel Jouvin
Hi Eugen, My LRC pool is also somewhat experimental so nothing really urgent. If you manage to do some tests that help me to understand the problem I remain interested. I propose to keep this thread for that. Zitat, I shared my crush map in the email you answered if the attachment was not su

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-26 Thread Michel Jouvin
Hi,  I realize that the crushmap I attached to one of my email, probably required to understand the discussion here, has been stripped down by mailman. To avoid poluting the thread with a long output, I put it on at https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are inte

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, I have a real hardware cluster for testing available now. I'm not sure whether I'm completely misunderstanding how it's supposed to work or if it's a bug in the LRC plugin. This cluster has 18 HDD nodes available across 3 rooms (or DCs), I intend to use 15 nodes to be able to recover if o

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Michel Jouvin
Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of how it should work (based on the documentation). Is there any other way that this list to enter in contact with the plugin developers

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, adding the dev mailing list, hopefully someone there can chime in. But apparently the LRC code hasn't been maintained for a few years (https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's see... Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these detaile

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-30 Thread Eugen Block
I created a tracker issue, maybe that will get some attention: https://tracker.ceph.com/issues/61861 Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of ho

[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-07 Thread Patrick Donnelly
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao wrote: > > Hi, > > I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the > monitor log, I can found something like `audit [DBG] from='client.431973 -' > entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname", > "target": ["mon-mg

[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-14 Thread Patrick Donnelly
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao wrote: > > Hi, thanks for your help. > > I am using ceph Pacific 16.2.7. > > Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became > readonly. Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking to the read-only CephFS

[ceph-users] Re: [Help] Does MSGR2 protocol use openssl for encryption

2022-09-02 Thread Gregory Farnum
We partly rolled our own with AES-GCM. See https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format -Greg On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote: > > Hi, > > I have a question about the MSGR protocol Ceph used

  1   2   >