I am deploying Rook 1.10.13 with Ceph 17.2.6 on our Kubernetes clusters. We are
using the Ceph Shared Filesystem a lot and, we have never faced an issue.
Lately, we have deployed it on Oracle Linux 9 VMs (previous/existing
deployments use Centos/RHEL 7) and we are facing the next issue:
We
Hi Eugen,
Yes, you are right.
After upgrade from v18.2.0 ---> v18.2.1 it is necessary to create the
ceph-exporter service manually and deploy to all hosts.
The dasboard is fine as well.
Thanks for help.
Martin
On 26/01/2024 00:17, Eugen Block wrote:
Ah, there they are (different p
--:--:-- --:--:-- --:--:--
12.1M
Before the upgrading to reef 18.2.1 I could get all the metrics.
Martin
On 18/01/2024 12:32, Jose Vicente wrote:
Hi,
After upgrading from Quincy to Reef the ceph-mgr daemon is not
throwing some throughput OSD metrics like: ceph_osd_op_*
curl http://localhost:9283/metrics | grep
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Hector Martin
> Sent: Friday, January 19, 2024 10:12 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Degraded P
I'm having a bit of a weird issue with cluster rebalances with a new EC
pool. I have a 3-machine cluster, each machine with 4 HDD OSDs (+1 SSD).
Until now I've been using an erasure coded k=5 m=3 pool for most of my
data. I've recently started to migrate to a k=5 m=4 pool, so I can
configure the
ere is nothing being
logged for these OSDs. If I was to "kick the cluster" these scrub logs would
probably start showing up again.
When this is happening the PGs mentioned seem to stay in a "scrub queued" state.
Any light you could shine my way would be appreciated.
Than
Martin Conway wrote:
> I find that backfilling and possibly scrubbing often comes to a halt for no
> apparent
> reason. If I put a server into maintenance mode or kill and restart OSDs it
> bursts back
> into life again.
>
> Not sure how to diagnose why the recovery proces
to diagnose why the recovery processes have stalled.
Regards,
Martin
> -Original Message-
> From: John Mulligan
> Sent: Saturday, October 28, 2023 12:58 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: "cephadm version" in reef returns "AttributeError:
I just had another look through the issues tracker and found this bug already
listed.
https://tracker.ceph.com/issues/59428
I need to go back to the other issues I am having and figure out if they are
related or something different.
Hi
I wrote before about issues I was having with cephadm
attr__
return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'
I am running into other issues as well, but I think they may point back to this
issue of "'CephadmContext' object has no attribute 'fsid'"
Any help would be appreciated.
Regards,
h upgrade start quay.io/ceph/ceph:v18.2.0
Does work as expected.
Let me know if there is any other information that would be helpful, but I have
since worked around these issues and have my ceph back in a happy state.
Regards,
Martin Conway
IT and Digital Media Manager
Research School of Physic
to
your response.
Regards
ppa. Martin Konold
--
Martin Konold - Prokurist, CTO
KONSEC GmbH - make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany
On 2023-09-11 22:08, Igor Fedotov wrote:
Hi Martin,
could you please share
7f99aa28f3c0 1 bdev(0x5565c261fc00
/var/lib/ceph/osd/ceph-43/block) close
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1 osd.43 0 OSD:init: unable
to mount object store
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1 ** ERROR: osd init failed:
(5) Input/output error
--
Regards,
ppa. Martin Konold
error
I verified that the hardware of the new nvme is working fine.
--
Regards,
ppa. Martin Konold
--
Viele Grüße
ppa. Martin Konold
--
Martin Konold - Prokurist, CTO
KONSEC GmbH - make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt
Hi
We are facing error with OSD crash after reboot of the server where it is
installed
We rebooted servers in our ceph cluster for a patching and after rebooting two
OSD where crashing
One of them finally recovered but the other is still down
Cluster is currently rebalancing objects :
#
ing weeks. I can probably
get the script done partway through doing this, so I can see how the
distributions evolve over a bunch of data movement.
>
>
> Thanks,
>
> Igor
>
>
> On 28/05/2023 08:31, Hector Martin wrote:
>> So chiming in, I think something is definit
On 29/05/2023 20.55, Anthony D'Atri wrote:
> Check the uptime for the OSDs in question
I restarted all my OSDs within the past 10 days or so. Maybe OSD
restarts are somehow breaking these stats?
>
>> On May 29, 2023, at 6:44 AM, Hector Martin wrote:
>>
>> Hi,
>
Hi,
I'm watching a cluster finish a bunch of backfilling, and I noticed that
quite often PGs end up with zero misplaced objects, even though they are
still backfilling.
Right now the cluster is down to 6 backfilling PGs:
data:
volumes: 1/1 healthy
pools: 6 pools, 268 pgs
>
> Just FYI: allocation probe printing interval is controlled by
> bluestore_alloc_stats_dump_interval parameter.
>
>
> Thanks,
>
> Igor
>
>
>
> On 24/05/2023 17:18, Hector Martin wrote:
>> On 24/05/2023 22.07, Mark Nelson wrote:
>>> Yep,
On 25/05/2023 01.40, 胡 玮文 wrote:
> Hi Hector,
>
> Not related to fragmentation. But I see you mentioned CephFS, and your OSDs
> are at high utilization. Is your pool NEAR FULL? CephFS write performance is
> severely degraded if the pool is NEAR FULL. Buffered write will be disabled,
> and
ave some clearer data
beyond my hunch/deduction after seeing the I/O patterns and the sole
fragmentation number :). Also would be interesting to get some kind of
trace of the bluestore ops the OSD is doing, so I can find out whether
it's doing something pathological that causes more fragmentatio
rom the current 7, the load per MDS will be a
bit less than double.
>
> Emmanuel
>
> On Wed, May 24, 2023 at 2:31 PM Hector Martin wrote:
>
>> On 24/05/2023 21.15, Emmanuel Jaep wrote:
>>> Hi,
>>>
>>> we are currently running a ceph fs clust
On 24/05/2023 21.15, Emmanuel Jaep wrote:
> Hi,
>
> we are currently running a ceph fs cluster at the following version:
> MDS version: ceph version 16.2.10
> (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
>
> The cluster is composed of 7 active MDSs and 1 standby MDS:
> RANK STATE
On 24/05/2023 21.15, Emmanuel Jaep wrote:
> Hi,
>
> we are currently running a ceph fs cluster at the following version:
> MDS version: ceph version 16.2.10
> (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
>
> The cluster is composed of 7 active MDSs and 1 standby MDS:
> RANK STATE
Hi,
I've been seeing relatively large fragmentation numbers on all my OSDs:
ceph daemon osd.13 bluestore allocator score block
{
"fragmentation_rating": 0.77251526920454427
}
These aren't that old, as I recreated them all around July last year.
They mostly hold CephFS data with erasure
On 17/05/2023 03.07, 胡 玮文 wrote:
> Hi Sake,
>
> We are experiencing the same. I set “osd_mclock_cost_per_byte_usec_hdd” to
> 0.1 (default is 2.6) and get about 15 times backfill speed, without
> significant affect client IO. This parameter seems calculated wrongly, from
> the description 5e-3
Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Martin Buss
Sent: 14 December 2022 19:32
To: ceph-users@ceph.io
Subject: [ceph-users] Re: New pool created with 2048 pg_num not executed
will do, that will take another day or so.
Can this have to do
GiB 5.2 TiB 43.22 2.15 21 up osd.67
TOTAL 639 TiB 128 TiB 128 TiB 137 GiB
507 GiB 510 TiB 20.08
MIN/MAX VAR: 0/2.20 STDDEV: 11.36
On 14.12.22 22:58, Frank Schilder wrote:
Hi Martin,
I can't find the output of
ceph osd df tree
ceph sta
Hi Frank,
thanks for coming in on this, setting target_max_misplaced_ratio to 1
does not help
Regards,
Martin
On 14.12.22 21:32, Frank Schilder wrote:
Hi Eugen: déjà vu again?
I think the way autoscaler code in the MGRs interferes with operations is
extremely confusing.
Could
pg_num 187 pgp_num 59 autoscale_mode off
Does disabling the autoscaler leave it like that when you disable it in
the middle of scaling? What is the current 'ceph status'?
Zitat von Martin Buss :
Hi Eugen,
thanks, sure, below:
pg_num stuck at 1152 and pgp_num stuck at 1024
Regards,
Martin
Hi list admins, I accidentally posted my private address, can you please
delete that post?
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/JMFG73QMB3MJKHDMNPIKZHQOUUCJPJJN/
Thanks,
Martin
On 14.12.22 15:18, Eugen Block wrote:
Hi,
I haven't been dealing with ceph-volume too
Hi Eugen,
thanks, sure, below:
pg_num stuck at 1152 and pgp_num stuck at 1024
Regards,
Martin
ceph config set global mon_max_pg_per_osd 400
ceph osd pool create cfs_data 2048 2048 --pg_num_min 2048
pool 'cfs_data' created
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
Hi Eugen,
thanks, sure, below:
pg_num stuck at 1152 and pgp_num stuck at 1024
Regards,
Martin
ceph config set global mon_max_pg_per_osd 400
ceph osd pool create cfs_data 2048 2048 --pg_num_min 2048
pool 'cfs_data' created
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
pg_num
and 2048 pgp_num.
What config option am I missing that does not allow me to grow the pool
to 2048? Although I specified pg_num and pgp_num be the same, it is not.
Please some help and guidance.
Thank you,
Martin
___
ceph-users mailing list
+undersized+degraded+remapped+backfilling, 474 active+clean; 2.4
TiB data, 4.3 TiB used, 10 TiB / 15 TiB avail; 150 KiB/s wr, 10 op/s;
123337/1272524 objects degraded (9.692%); 228 MiB/s, 58 objects/s
recovering
Is this Quincy specific?
Regards
--martin
rd, 1.3 MiB/s wr, 56 op/s
I am running Ceph Quincy 17.2.5 on a test system with dedicated
1Gbit/9000MTU storage network, while the public ceph network
1GBit/1500MTU is shared with the vm network.
I am looking forward to you suggestions.
Regards,
ppa. Martin Konold
--
Martin Konold
Could you explain? I have just deployed Ceph CSI just like the docs
specified. What mode is it running in if not container mode?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 10:56 AM Marc wrote:
> Wtf, unbelievable that it is still like this. You can't fix it, I had to
>
How should we fix it? Should we remove the directory and add back the
keyring file?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen wrote:
> Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
> that is causing this?
>
>
Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
that is causing this?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 9:44 AM Marc wrote:
> >
> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> > times a
>
ot remove
'/etc/ceph/ceph.client.admin.keyring': Is a directory".
Best Regards,
Martin Johansen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi, thank you, we replaced the domain of the service in text before
reporting the issue. Sorry, I should have mentioned.
admin.ceph.example.com was turned into admin.ceph. for
privacy sake.
Best Regards,
Martin Johansen
On Mon, Oct 24, 2022 at 2:53 PM Murilo Morais wrote:
> Hello Mar
now healthy
---
These statuses go offline and online sporadically. The block devices seem
to be working fine all along. The cluster alternates between HEALTH_OK
and HEALTH_WARN
Best Regards,
Martin Johansen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
On Fri, 2022-08-19 at 14:03 +0200, Ilya Dryomov wrote:
> On Fri, Aug 19, 2022 at 1:21 PM Martin Traxl
> wrote:
> > Hi Ilya,
> >
> > On Thu, 2022-08-18 at 13:27 +0200, Ilya Dryomov wrote:
> > > On Tue, Aug 16, 2022 at 12:44 PM Martin Traxl <
>
Hi Ilya,
On Thu, 2022-08-18 at 13:27 +0200, Ilya Dryomov wrote:
> On Tue, Aug 16, 2022 at 12:44 PM Martin Traxl
> wrote:
[...]
> >
> >
>
> Hi Martin,
>
> For obscure backwards compatibility reasons, the kernel client
> defaults
> to messenger v1. You w
uot;text": "profile rbd"
},
"authenticated": true,
"global_id": 256359885,
"global_id_status": "reclaim_ok",
"osd_epoch": 13120,
"remote_host": "
findings is
in power of 2, like 2+x, 4+x, 8+x, 16+x. Especially when you deploy OSDs
before the Ceph allocation change patch, you might end up consuming way
more space if you don't use power of 2. With the 4k allocation size at
least this has been greatly improved for newer deployed OSDs.
--
Martin
it should be faster than pacific. Maybe try
to jump away from the pacific release into the unknown!
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register
. Martin Konold
--
Martin Konold - Prokurist, CTO
KONSEC GmbH - make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany
On 2022-04-02 03:36, York Huang wrote:
Hi,
How about this "osd: 7 osds: 6 up (since 3h), 6 in (since 6w)&qu
,
hbase18, hbase13, hbase15, hbase16
mds: 1/1 daemons up, 5 standby
osd: 7 osds: 6 up (since 3h), 6 in (since 6w)
data:
volumes: 1/1 healthy
pools: 4 pools, 448 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
448 unknown
---
Viele Grüße
ppa. Martin Konold
ts, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
448 unknown
--
Kind Regards
ppa. Martin Konold
--
Martin Konold - Prokurist, CTO
KONSEC GmbH - make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller
Hello, ceph-users community
I have watched the recording of "Ceph Performance Meeting 2022-03-03" (in
the Ceph channel, link https://www.youtube.com/watch?v=syq_LTg25T4) about
OpenCAS and block caching yesterday and it was really informative to me (I
especially liked the part where the filtering
Some say it is, some say it's not.
Every time I try it, it's buggy as hell and I can destroy my test clusters
with ease. That's why I still avoid it. But as you can see in my signature,
I am biased ;).
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
.
Leave min_size on 2 as well and accept the potential downtime!
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https
As the price for SSDs is the same regardless of the interface, I would not
invest so much money in a still slow and outdated platform.
Just buy some new chassis as well and go NVMe. It adds only a little cost
but will increase performance drastically.
--
Martin Verges
Managing director
Mobile
lose
data, and to run even if one datacenter burns down without a downtime. But
this all come with costs, sometimes quite high costs. Often it's cheaper to
live with a short interruption or to build 2 separated systems than to get
more nines to your availability on a single one.
--
Martin Verges
or structure to be a support organization.
Therefore companies like ours do the support around Ceph.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register
adding complexity, you
endangering that.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web
way from that crap and you won't
have a hard time.
We as croit started our own OBS Infrastructure to build packages for x86_64
and arm64. This should help us to maintain packages and avoid the useless
Ceph containers. I can post an update to the user ML when it's ready for
public service.
--
Mar
Octopus to get them running again or have propper tooling to fix the issue.
But I agree, we as croit are still afraid of pushing our users to Pacific,
as we encounter bugs in our tests. This however will change soon, as we are
close to a stable enough Pacific release as we believe.
--
Martin Verge
ld happily support that.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo
active
number to your liking.
> From the perspective of getting the maximum bandwidth, which one should i
choose, CephFS or Ceph S3?
Choose what's best for your application / use case scenario.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
c
Use pacific for new deployments.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https
will be a litte bit
faster as well.
We have quite some experience with that and can be of help if you need more
details and vendor suggestions.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID
. In
addition put 2*25GbE/40GbE in the servers and you need only a few of them
to simulate a lot. This would save costs, makes it easier to maintain, and
you are much more flexible. For example running tests on different OS,
injecting latency, simulating errors and more.
--
Martin Verges
Managing
Hi Zakhar,
Out of curiosity, what does your crushmap look like? Probably a long shot
but are you sure your crush map is targeting the NVME's for the rados bench
you are performing?
Tor Martin Ølberg
On Tue, Oct 5, 2021 at 9:31 PM Christian Wuerdig <
christian.wuer...@gmail.com> wrote:
&
=4, m=2
without any rationale behind them.
The table is taken from
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/storage_strategies_guide/erasure_code_pools.
Thanks!
Regards,
Martin
___
ceph-users mailing list -- ceph
Just PXE boot whatever OS you like at the time. If you need to switch to
another, a reboot is enough to switch OS. It's even possible without
containers, so absolute no problem at all.
--
Martin Verges
Managing director
Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges
croit GmbH
and could verify to fix the leak
https://github.com/ceph/ceph/pull/42803
So hopefully this fix gets backported soon.
Regards,
Martin
Von: Martin Traxl
Gesendet: Freitag, 13. August 2021 13:51
An: Konstantin Shalygin
Cc: ceph-users@ceph.io
Betreff: AW: [ceph-users
7a2f62c1b862ff3fd8b1eec13391a5be)
octopus (stable)": 38,
"ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a)
pacific (stable)": 2
}
}
which solved performance issues. All OSDs were newly created and fully
synced from other nodes when upgrading and downgrading back to 15.2.
Best Regard
,
"file": "#/26",
"final": "#/26"
},
"rbd_default_features": {
"default": "61",
"final": "61"
},
"rgw_dns_name": {
ing on CentOS 8.4, kernel 4.18. As
already mentioned, Ceph 14.2.22. radosgw is the only notable service running on
this machine.
Any suggestions on this? Are there maybe any tuning settings? How could I debug
this further?
____
Von: Martin Traxl
Gesendet: Dienstag,
I should mention we are using the S3 interface, so it is S3 traffic.
Von: Martin Traxl
Gesendet: Dienstag, 10. August 2021 14:35
An: ceph-users@ceph.io
Betreff: [ceph-users] RGW memory consumption
Hi everyone,
we are running a ceph nautilus 14.2.22
total": {
"items": 71162589,
"bytes": 1712643376
which is only 1,7 Gbyte. Almost all of it used as "buffer_anon".
Are there any setting that might help tuning the memory consumption? Do you
need further informat
Hi all !
We are facing strange behaviors from two clusters we have at work (both v15.2.9
/ CentOS 7.9):
* In the 1st cluster we are getting errors about multiple degraded pgs and
all of them are linked with a "rogue" osd which ID is very big (as
"osd.2147483647"). This osd doesn't show
new temporary file in pool
remove old file
rename temp file to old file location
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE
After an upgrade to 15.2.13 from 15.2.4 this small home lab cluster ran
into issues with OSDs failing on all four hosts. This might be unrelated to
the upgrade but it looks like the trigger has been an autoscaling event
where the RBD PG pool has been scaled from 128 PGs to 512 PGs.
Only some OSDs
red supported
clusters, we never encountered a BGP deployment in the field. It's always
just the theoretical or testing where we hear from BGP.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81
There are even problems with containers if you don't use
version X from docker. That's what the past told us, why should it be
better in the future with even more container environments. Have you tried
running rancher on debian in the past? It breaks apart due to iptables or
other stuff.
--
Martin V
alternatives
that are massively cheaper per IO and only minimally more expensive per GB.
I therefore believe, stripping out overhead is also an important topic for
the future of Ceph.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinV
Hello Szabo,
you can try it with our docs at
https://croit.io/docs/master/hypervisors/proxmox, maybe it helps you to
connect your Ceph cluster to Proxmox.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH
and the services will come up
without a problem when you have all configs in place.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register
harder to debug. But of course we do it anyways :).
To have a perfect storage, strip away anything unneccessary. Avoid any
complexity, avoid anything that might affect your system. Keep it simply
stupid.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat
Hi Weiwen,
Amazing, that actually worked. So simple, thanks!
Fra: 胡 玮文
Sendt: 27. maj 2021 09:02
Til: Martin Rasmus Lundquist Hansen ; ceph-users@ceph.io
Emne: 回复: MDS stuck in up:stopping state
Hi Martin,
You may hit
https://tracker.ceph.com/issues
After scaling the number of MDS daemons down, we now have a daemon stuck in the
"up:stopping" state. The documentation says it can take several minutes to stop
the
daemon, but it has been stuck in this state for almost a full day. According to
the "ceph fs status" output attached below, it still
with the
free forever version.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io
Hello,
what is the currently preferred method, in terms of stability and
performance, for exporting a CephFS directory with Samba?
- locally mount the CephFS directory and export it via Samba?
- using the "vfs_ceph" module of Samba?
Be
new-croit-host-C0DE01.service: Failed with
result 'exit-code'.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgeri
for Debian users and if it's
still possible we should try to avoid that. Is there something we
could help to make it happen?
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
C
e my confidence.
Sorry I can't. But as Stefan Kooman wrote, "Best way is to find out
for yourself and your use case(s) with a PoC cluster"
We use it for our own stuff and customers of croit use it in lot of
installations. However nothing is better then a PoC to gain more
confidence.
--
Martin
You can change the crush rule to be OSD instead of HOST specific. That way
Ceph will put a chunk per OSD and multiple Chunks per Host.
Please keep in mind, that will cause an outage if one of your hosts are
offline.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver
> So no, I am not convinced yet. Not against it, but personally I would say
it's not the only way forward.
100% agree to your whole answer
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr.
by adding a hook script within croit "onHealthDegrate" and
"onHealthRecover" that notifies us using telegram/slack/... ;)
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 3
lem with that, not a single one.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io
YouTube: ht
ter cluster that fall apart or have a meltdown just because
they run out of memory and we use tricks like zram to help them out and
recover their clusters. If I now go and do it per container/osd in a finer
grained way, it will just blow up even more.
--
Martin Verges
Managing director
Mobile: +49
simply press the reboot button and
get a nice fresh and clean OS booted in your memory. Besides that, It
is easy to maintain, solid, and all your hosts run on exactly the same
software and configuration state (kernel, libs, Ceph, everything).
--
Martin Verges
Managing director
Mobile: +49 174
> So perhaps we'll need to change the OSD to allow for 500 or 1000 PGs
We had a support case last year where we where forced to set the OSD
limit to >4000 for a few days, and had more then 4k active PGs on that
single OSD. You can do that, however it is quite uncommon.
--
Martin Verges
Ma
the same as to having multiple OSDs per NVMe
as some people do it.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register
> failure-domain=host
yes (or rack/room/datacenter/..), for regular clusters it's therefore
absolute no problem as you correctly assumed.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr.
want to
do some specific Ceph workload benchmarks, feel free to drop me a
mail.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com
throughput. This in turn leads to a higher impact during replication
work, which is particularly prevalent in EC. In EC, not only write
accesses but also read accesses must be loaded from several OSDs.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https
1 - 100 of 202 matches
Mail list logo