Configuring it with respect to what about these applications? What are you
trying to do? Do you have existing installations of any of these? We need a
little more about your requirements.
> On Apr 17, 2020, at 1:14 PM, Randy Morgan wrote:
>
> We are seeking information on configuring Ceph to w
Randy;
Nextcloud is easy, it has a "standard" S3 client capability, though it also has
Swift client capability. As a S3 client, it does look for the older path style
(host/bucket), rather than Amazons newer DNS style (bucket.host).
You can find information on configuring Nextcloud's primary st
Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P :
> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
> my ceph health status showing warning .
>
> "ceph health"
> HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects degraded
> (15.499%)
>
> "ceph health deta
In adition to ceph -s, could you provide the output of
ceph osd tree
and specify what your failure domain is ?
/Heðin
On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:
>
>
> Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P :
> > Hi,
> >
> > I am using ceph version 13.2.6 (mimic) on tes
output from "ceph -s "
cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
Degraded data redundancy: 1141587/7723191 objects degraded
(14.781%), 15 pgs degraded, 16 pgs undersized
services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
m
What's the output of
ceph osd pool ls detail
On hós, 2019-08-29 at 18:06 +0530, Amudhan P wrote:
> output from "ceph -s "
>
> cluster:
> id: 7c138e13-7b98-4309-b591-d4091a1742b4
> health: HEALTH_WARN
> Degraded data redundancy: 1141587/7723191 objects
> degraded (14.78
output from "ceph osd pool ls detail"
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 last_change 74 lfor 0/64 flags hashpspool
stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
object_hash
Hi,
ceph uses a pseudo random distribution within crush to select the target
hosts. As a result, the algorithm might not be able to select three
different hosts out of three hosts in the configured number of tries.
The affected PGs will be shown as undersized and only list two OSDs
instead o
Hi,
This output doesn't show anything 'wrong' with the cluster. It's just still
recovering (backfilling) from what seems like one of your OSD's crashed and
restarted.
The backfilling is taking a while because max_backfills = 1 and you only
have 3 OSD's total so the backfilling per PG has to have f
After leaving 12 hours time now cluster status is healthy, but why did it
take such a long time for backfill?
How do I fine-tune? if in case of same kind error pop-out again.
On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit wrote:
> Hi,
>
> This output doesn't show anything 'wrong' with the cluster
Den fre 30 aug. 2019 kl 10:49 skrev Amudhan P :
> After leaving 12 hours time now cluster status is healthy, but why did it
> take such a long time for backfill?
> How do I fine-tune? if in case of same kind error pop-out again.
>
> The backfilling is taking a while because max_backfills = 1 and y
my cluster health status went to warning mode only after running mkdir of
1000's of folders with multiple subdirectories. if this has made OSD crash
does it really takes that long to heal empty directories.
On Fri, Aug 30, 2019 at 3:12 PM Janne Johansson wrote:
> Den fre 30 aug. 2019 kl 10:49 sk
>
> I would like to use mirroring to facilitate migrating from an existing
> Nautilus cluster to a new cluster running Reef. RIght now I'm looking at
> RBD mirroring. I have studied the RBD Mirroring section of the
> documentation, but it is unclear to me which commands need to be issued on
> ea
Hi,
just one question coming to mind, if you intend to migrate the images
separately, is it really necessary to set up mirroring? You could just
'rbd export' on the source cluster and 'rbd import' on the destination
cluster.
Zitat von Anthony D'Atri :
I would like to use mirroring to
- Le 11 Juil 24, à 20:50, Dave Hall kdh...@binghamton.edu a écrit :
> Hello.
>
> I would like to use mirroring to facilitate migrating from an existing
> Nautilus cluster to a new cluster running Reef. RIght now I'm looking at
> RBD mirroring. I have studied the RBD Mirroring section of th
> Hi,
>
> just one question coming to mind, if you intend to migrate the images
> separately, is it really necessary to set up mirroring? You could just 'rbd
> export' on the source cluster and 'rbd import' on the destination cluster.
That can be slower if using a pipe, and require staging sp
Hi,
is there any chance to recover the other failing OSDs that seem to
have one chunk of this PG? Do the other OSDs fail with the same error?
Zitat von Jake Grimmett :
Dear All,
We are "in a bit of a pickle"...
No reply to my message (23/03/2020), subject "OSD: FAILED
ceph_assert(clo
Hi Eugen,
Many thanks for your reply.
The other two OSD's are up and running, and being used by other pgs with
no problem, for some reason this pg refuses to use these OSD's.
The other two OSDs that are missing from this pg crashed at different
times last month, each OSD crashed when we trie
On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett wrote:
>
> Dear All,
>
> We are "in a bit of a pickle"...
>
> No reply to my message (23/03/2020), subject "OSD: FAILED
> ceph_assert(clone_size.count(clone))"
>
> So I'm presuming it's not possible to recover the crashed OSD
From your later email i
Hi Greg,
Yes, this was caused by a chain of event. As a cautionary tale, the main
ones were:
1) minor nautilus release upgrade, followed by a rolling node restart
script that mistakenly relied on "ceph -s" for cluster health info,
i.e. it didn't wait for the cluster to return to health bef
First of all, do not rush into bad decisions.
Production is down and you wanna make it online but you should fix the
problem and be sure first. If a second crash occurs in a healing state
you will lose metadata.
You don't need to debug first!
You didn't mention your cluster status and we don't kno
I'm not rushing,
I have found the issue, Im am getting OOM errors as the OSD boots,
basically is starts to process the PG's and then the node runs out of
memory and the daemon kill's
2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261
It's nice to hear that. You can also decrease the osd ram usage from
4gb to 2gb. If you have enough spare ram go for it.
Good luck.
Lee , 6 Oca 2022 Per, 00:46 tarihinde şunu yazdı:
>
> I'm not rushing,
>
> I have found the issue, Im am getting OOM errors as the OSD boots, basically
> is starts t
and that is exactly why I run osds containerized with limited cpu and
memory as well as "bluestore cache size", "osd memory target", and "mds
cache memory limit". Osd processes have become noisy neighbors in the last
few versions.
On Wed, Jan 5, 2022 at 1:47 PM Lee wrote:
> I'm not rushing,
>
The first OSD took 156Gb of Ram to boot.. :(
Is there a easy way to stop Mempool pulling so much memory.
On Wed, 5 Jan 2022 at 22:12, Mazzystr wrote:
> and that is exactly why I run osds containerized with limited cpu and
> memory as well as "bluestore cache size", "osd memory target", and "mds
For Example
top - 22:53:47 up 1:29, 2 users, load average: 2.23, 2.08, 1.92
Tasks: 255 total, 2 running, 253 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.2 us, 4.5 sy, 0.0 ni, 91.1 id, 0.1 wa, 0.0 hi, 0.1 si,
0.0 st
MiB Mem : 161169.7 total, 23993.9 free, 132036.5 used, 5139.3 buff/
Hi Lee,
could you please raise debug-bluestore and debug-osd to 20 (via ceph
tell osd.N injectargs command) when OSD starts to eat up the RAM. Then
drop it back to defaults after a few seconds (10s is enough) to avoid
huge log size and share the resulting OSD log.
Also I'm curious if you hav
Running your osd's with resource limitations is not so straightforward. I can
guess that if you are running close to full resource utilization on your nodes,
it makes more sense to make sure everything stays as much within their
specified limits. (Aside from the question if you would even want t
> I assume the huge memory consumption is temporary. Once the OSD is up and
> stable, it would release the memory.
>
> So how about allocate a large swap temporarily just to let the OSD up. I
> remember that someone else on the list have resolved a similar issue with
> swap.
But is this alread
чт, 6 янв. 2022 г. в 12:21, Lee :
> I've tried add a swap and that fails also.
>
How exactly did it fail? Did you put it on some disk, or in zram?
In the past I had to help a customer who hit memory over-use when upgrading
Ceph (due to shallow_fsck), and we were able to fix it by adding 64 GB GB
пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :
> чт, 6 янв. 2022 г. в 12:21, Lee :
>
>> I've tried add a swap and that fails also.
>>
>
> How exactly did it fail? Did you put it on some disk, or in zram?
>
> In the past I had to help a customer who hit memory over-use when
> upgrading Ceph (d
I tried with disk based swap on a SATA SSD.
I think that might be the last option. I have exported already all the down
PG's from the OSD that they are waiting for.
Kind Regards
Lee
On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov
wrote:
> пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right direction. Thank
Thanks for replying, Greg. I'll give you a detailed sequence I did on the
upgrade at below.
Step 1: upgrade ceph mgr and Monitor --- reboot. Then mgr and mon are all up
running.
Step 2: upgrade one OSD node --- reboot and OSDs are all up.
Step 3: upgrade a second OSD node named OSD-node2. I did
Hello Justin,
On Tue, May 23, 2023 at 4:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right d
Thanks Patrick. We're making progress! After issuing below cmd (ceph config)
you gave me, ceph cluster health shows HEALTH_WARN and mds is back up. However,
cephfs can't be mounted showing below error. Ceph mgr portal also show 500
internal error when I try to browse the cephfs folder. I'll be u
Sorry Patrick, last email was restricted as attachment size. I attached a link
for you to download the log. Thanks.
https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing
Justin Li
Senior Technical Officer
School of Information Technology
Faculty of Science, Enginee
Hi Patrick,
Sorry for keeping bothering you but I found that MDS service kept crashing even
cluster shows MDS is up. I attached another log of MDS server - eowyn at below.
Look forward to hearing more insights. Thanks a lot.
https://drive.google.com/file/d/1nD_Ks7fNGQp0GE5Q_x8M57HldYurPhuN/view
Hello Justin,
Please do:
ceph config set mds debug_mds 20
ceph config set mds debug_ms 1
Then wait for a crash. Please upload the log.
To restore your file system:
ceph config set mds mds_abort_on_newly_corrupt_dentry false
Let the MDS purge the strays and then try:
ceph config set mds mds_a
Hi Patrick,
Thanks for the instructions. We started the MDS recovery scan with below cmds
following the link below. The first bit of scan extens has finished and we're
waiting on scan inodes. Probably we shouldn't interrupt the process. Once this
procedure failed, I'll follow your steps and let
Hi Patrick,
The disaster recovery process with cephfs-data-scan tool didn't fix our MDS
issue. It still kept crashing. I've uploaded a detailed MDS log with below ID.
The restore procedure below didn't get it working either. Should I set
mds_go_bad_corrupt_dentry to false alongside with
mds_ab
Hi,
you mean you forgot your password? You can remove the service with
'ceph orch rm grafana', then re-apply your grafana.yaml containing the
initial password. Note that this would remove all of the grafana
configs or custom dashboards etc., you would have to reconfigure them.
So before do
Hi,
Well to get promtail working with Loki, you need to setup a password in
Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was set to
containers.local. So I stopped using it, but forgot to click on save in KeePass
:(
I didn't configure anything special in Grafana, the
To bad, that doesn't work :(
> Op 09-11-2023 09:07 CET schreef Sake Ceph :
>
>
> Hi,
>
> Well to get promtail working with Loki, you need to setup a password in
> Grafana.
> But promtail wasn't working with the 17.2.6 release, the URL was set to
> containers.local. So I stopped using it, bu
What doesn't work exactly? For me it did...
Zitat von Sake Ceph :
To bad, that doesn't work :(
Op 09-11-2023 09:07 CET schreef Sake Ceph :
Hi,
Well to get promtail working with Loki, you need to setup a
password in Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was
Using podman version 4.4.1 on RHEL 8.8, Ceph 17.2.7
I used 'podman system prune -a -f' and 'podman volume prune -f' to cleanup
files, but this leaves a lot of files over in
/var/lib/containers/storage/overlay and a empty folder
/var/lib/ceph//custom_config_files/grafana..
Found those files with
Usually, removing the grafana service should be enough. I also have
this directory (custom_config_files/grafana.) but it's
empty. Can you confirm that after running 'ceph orch rm grafana' the
service is actually gone ('ceph orch ls grafana')? The directory
underneath /var/lib/ceph/{fsid}/gr
I tried everything at this point, even waited a hour, still no luck. Got it 1
time accidentally working, but with a placeholder for a password. Tried with
correct password, nothing and trying again with the placeholder didn't work
anymore.
So I thought to switch the manager, maybe something is
I just tried it on a 17.2.6 test cluster, although I don't have a
stack trace the complicated password doesn't seem to be applied (don't
know why yet). But since it's an "initial" password you can choose
something simple like "admin", and during the first login you are
asked to change it an
It's the '#' character, everything after (including '#' itself) is cut
off. I tried with single and double quotes which also failed. But as I
already said, use a simple password and then change it within grafana.
That way you also don't have the actual password lying around in clear
text in
Thank you Eugen! This worked :)
> Op 09-11-2023 14:55 CET schreef Eugen Block :
>
>
> It's the '#' character, everything after (including '#' itself) is cut
> off. I tried with single and double quotes which also failed. But as I
> already said, use a simple password and then change it with
* Try applying the settings to global so that mons/mgrs get them.
* Set your shallow scrub settings back to the default. Shallow scrubs take
very few resources
* Set your randomize_ratio back to the default, you’re just bunching them up
* Set the load threshold back to the default, I can’t ima
Hi Anthony,
thanks for the tips. I reset all the values but osd_deep_scrub_interval
to their defaults as reported at
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ :
# ceph config set osd osd_scrub_sleep 0.0
# ceph config set osd osd_scrub_load_threshold 0.5
# ceph config
Hi,
just for the archives:
On Tue, 5 Mar 2024, Anthony D'Atri wrote:
* Try applying the settings to global so that mons/mgrs get them.
Setting osd_deep_scrub_interval at global instead at osd immediately turns
health to OK and removes the false warning from PGs not scrubbed in time.
HTH,
Hi,
if you assigned the SSD to be for block.db it won't be available from
the orchestrator's point of view as a data device. What you could try
is to manually create a partition or LV on the remaining SSD space and
then point the service spec to that partition/LV via path spec. I
haven't
You can either provide an image with the adopt command (—image) or you
configure it globally with ceph config set (I don’t have the exact
command right now). Which image does it fail to pull? You should see
that in cephadm.log. Does that node with osd.17 have access to the
image repo?
Zit
On Wed, Jan 31, 2024 at 3:43 AM garcetto wrote:
>
> good morning,
> i was struggling trying to understand why i cannot find this setting on
> my reef version, is it because is only on latest dev ceph version and not
> before?
that's right, this new feature will be part of the squid release. we
You should prioritise recovering quorum of your monitors. Cephs
documentation can help here
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
Check to see if the failed mon is still part of the monmap on the other
nodes, if it is you might need to remove it manually (which
Hi,
which ceph release are you using? You mention ceph-disk so your OSDs
are not LVM based, I assume?
I've seen these messages a lot when testing in my virtual lab
environment although I don't believe it's the cluster's fsid but the
OSD's fsid that's in the error message (the OSDs have th
Maybe you have the same issue?
https://tracker.ceph.com/issues/44102#change-167531
In my case an update(?) disabled osd runlevels.
systemctl is-enabled ceph-osd@0
-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] Re: help with failed osds after reboot
Hi,
which ceph
Ceph version 10.2.7
ceph.conf
[global]
fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
mon_initial_members = chad, jesse, seth
mon_host = 192.168.10.41,192.168.10.40,192.168.10.39
mon warn on legacy crush tunables = false
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_require
On Mon, Jun 15, 2020 at 7:01 PM wrote:
> Ceph version 10.2.7
>
> ceph.conf
> [global]
> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>
(...)
> mount_activate: Failed to activate
> ceph-disk: Error: No cluster conf found in /etc/ceph with fsid
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>
--
Paul
Hi Jasper,
I suggest to disable all the crush-compat and reweighting approaches.
They rarely work out.
The state of the art is:
ceph balancer on
ceph balancer mode upmap
ceph config set mgr mgr/balancer/upmap_max_deviation 1
Cheers, Dan
--
Dan van der Ster
CTO
Clyso GmbH
p: +49 89 215252722 |
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had
> 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago,
> we onboarded an additional 20 OSDs of 14TiB each.
That's a big difference in size. I suggest increasing mon_max_pg_per_osd to
1000 --
Hi Anthony and everyone else
We have found the issue. Because the new 20x 14 TiB OSDs were onboarded
onto a single node, there was not only an imbalance in the capacity of each
OSD but also between the nodes (other nodes each have around 15x 1.7TiB).
Furthermore, CRUSH rule sets default failure do
Hi Nicolas,
This is a known issue and Venky is working on it, please see
https://tracker.ceph.com/issues/63259.
Thanks
- Xiubo
On 6/3/24 20:04, nbarb...@deltaonline.net wrote:
Hello,
First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with
4 nodes, everythin
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
* Is this avoidable?
* How-to fix the issue, because I didn't see a workaround in the mentioned
tracker (or I missed it)
* With this bug around, should you use c
On 6/4/24 15:20, Sake Ceph wrote:
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
The detail explanation you can refer to the ceph PR:
https://github.com/ceph/ceph/pull/55421.
* Is this avoidable?
* How-
Hi Xiubo
Thank you for the explanation! This won't be a issue for us, but made me think
twice :)
Kind regards,
Sake
> Op 04-06-2024 12:30 CEST schreef Xiubo Li :
>
>
> On 6/4/24 15:20, Sake Ceph wrote:
> > Hi,
> >
> > A little break into this thread, but I have some questions:
> > * How d
First, thanks Xiubo for your feedback !
To go further on the points raised by Sake:
- How does this happen ? -> There were no preliminary signs before the incident
- Is this avoidable? -> Good question, I'd also like to know how!
- How to fix the issue ? -> So far, no fix nor workaround from w
Hi,
I believe our KL studio has hit this same bug after deleting a pool that
was used only for testing.
So, is there any procedure to get rid of those bad journal events and get
the mds back to rw state?
Thanks,
---
Olli Rajala - Lead TD
Anima Vitae Ltd.
www.anima.fi
-
Hello,
is podman installed on the new node? also make sure the NTP time sync
is on for new node. The ceph orch checks those on the new node and
then dies if not ready with an error like you see.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubs
hello,
i use docker, i will check ntp,
Do new node need to be installed?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hello,
Yes, make sure docker & ntp is setup on the new node first.
Also, make sure the public key is added on the new node and firewall
is allowing it through
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le
Will do, thanks!
Vào Th 4, 22 thg 7, 2020 vào lúc 12:27 steven prothero <
ste...@marimo-tech.com> đã viết:
> Hello,
>
> Yes, make sure docker & ntp is setup on the new node first.
> Also, make sure the public key is added on the new node and firewall
> is allowing it through
>
_
tks you, after install docker for new node, i can add node
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
it working
Vào Th 4, 22 thg 7, 2020 vào lúc 14:41 David Thuong <
davidthuong2...@gmail.com> đã viết:
> tks you, after install docker for new node, i can add node
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to c
Answering to myself, I found the reason for 2147483647: it's documented
as a failure to find enough OSD (missing OSDs). And it is normal as I
selected different hosts for the 15 OSDs but I have only 12 hosts!
I'm still interested by an "expert" to confirm that LRC k=9, m=3, l=4
configuration
Hi,
Is somebody using LRC plugin ?
I came to the conclusion that LRC k=9, m=3, l=4 is not the same as
jerasure k=9, m=6 in terms of protection against failures and that I
should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9,
m=6. The example in the documentation (k=4, m=2, l
Hi,
I'm still interesting by getting feedback from those using the LRC
plugin about the right way to configure it... Last week I upgraded from
Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host
by host, checking if an OSD is ok to stop before actually upgrading it.
I had
Hi,
I think I found a possible cause of my PG down but still understand why.
As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
m=6) but I have only 12 OSD servers in the cluster. To workaround the
problem I defined the failure domain as 'osd' with the reasoning that as
I w
Hello,
What is your current setup, 1 server pet data center with 12 osd each? What
is your current crush rule and LRC crush rule?
On Fri, Apr 28, 2023, 12:29 Michel Jouvin
wrote:
> Hi,
>
> I think I found a possible cause of my PG down but still understand why.
> As explained in a previous mai
Hi,
No... our current setup is 3 datacenters with the same configuration,
i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12
OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found
that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal
of bei
Hi,
disclaimer: I haven't used LRC in a real setup yet, so there might be
some misunderstandings on my side. But I tried to play around with one
of my test clusters (Nautilus). Because I'm limited in the number of
hosts (6 across 3 virtual DCs) I tried two different profiles with
lower nu
I think I got it wrong with the locality setting, I'm still limited by
the number of hosts I have available in my test cluster, but as far as
I got with failure-domain=osd I believe k=6, m=3, l=3 with
locality=datacenter could fit your requirement, at least with regards
to the recovery band
Hi,
I had to restart one of my OSD server today and the problem showed up
again. This time I managed to capture "ceph health detail" output
showing the problem with the 2 PGs:
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
pg 56.1 is down, acting
[208,65,73,
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
Hi,
I had to restart one of my OSD server today and the problem showed
up again
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
> Hi,
Hi Eugen,
Yes, sure, no problem to share it. I attach it to this email (as it may
clutter the discussion if inline).
If somebody on the list has some clue on the LRC plugin, I'm still
interested by understand what I'm doing wrong!
Cheers,
Michel
Le 04/05/2023 à 15:07, Eugen Block a écrit
Hi,
I've been following this thread with interest as it seems like a unique use
case to expand my knowledge. I don't use LRC or anything outside basic
erasure coding.
What is your current crush steps rule? I know you made changes since your
first post and had some thoughts I wanted to share, but
Hi, I don’t have a good explanation for this yet, but I’ll soon get
the opportunity to play around with a decommissioned cluster. I’ll try
to get a better understanding of the LRC plugin, but it might take
some time, especially since my vacation is coming up. :-)
I have some thoughts about th
Hi Eugen,
My LRC pool is also somewhat experimental so nothing really urgent. If you
manage to do some tests that help me to understand the problem I remain
interested. I propose to keep this thread for that.
Zitat, I shared my crush map in the email you answered if the attachment
was not su
Hi,
I realize that the crushmap I attached to one of my email, probably
required to understand the discussion here, has been stripped down by
mailman. To avoid poluting the thread with a long output, I put it on at
https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are
inte
Hi, I have a real hardware cluster for testing available now. I'm not
sure whether I'm completely misunderstanding how it's supposed to work
or if it's a bug in the LRC plugin.
This cluster has 18 HDD nodes available across 3 rooms (or DCs), I
intend to use 15 nodes to be able to recover if o
Hi Eugen,
Thank you very much for these detailed tests that match what I observed
and reported earlier. I'm happy to see that we have the same
understanding of how it should work (based on the documentation). Is
there any other way that this list to enter in contact with the plugin
developers
Hi,
adding the dev mailing list, hopefully someone there can chime in. But
apparently the LRC code hasn't been maintained for a few years
(https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's
see...
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detaile
I created a tracker issue, maybe that will get some attention:
https://tracker.ceph.com/issues/61861
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detailed tests that match what I
observed and reported earlier. I'm happy to see that we have the
same understanding of ho
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao wrote:
>
> Hi,
>
> I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the
> monitor log, I can found something like `audit [DBG] from='client.431973 -'
> entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname",
> "target": ["mon-mg
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao wrote:
>
> Hi, thanks for your help.
>
> I am using ceph Pacific 16.2.7.
>
> Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became
> readonly.
Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking
to the read-only CephFS
We partly rolled our own with AES-GCM. See
https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes
and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format
-Greg
On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote:
>
> Hi,
>
> I have a question about the MSGR protocol Ceph used
1 - 100 of 118 matches
Mail list logo