OK, so I set autoscaling to off for all five pools, and the "ceph -s"
has not changed:
~~~
cluster:
id: [REDACTED]
health: HEALTH_WARN
Reduced data availability: 256 pgs inactive, 256 pgs incomplete
Degraded data redundancy: 12 pgs undersized
services:
Hi Frank,
I suggest you should file the ticket as you have the full story and the
use case to go with it.
I'm just an interested bystander, I just happened to know a little about
this area because of a filestore to bluestore migration I'd done recently.
Cheers,
Chris
On Fri, Mar 12, 2021
Thanks for that - I'll disable the autoscaler on all 5 pools and see
what happens
Cheers
Matthew J
On 16/03/2021 13:53, ash...@amerrick.co.uk wrote:
I think your issue is you also have the PG scaler trying to change to 1PG.
Due to the size of the OSD/no data it thinks you only need 1PG, I
I think your issue is you also have the PG scaler trying to change to 1PG.
Due to the size of the OSD/no data it thinks you only need 1PG, I would suggest
disabling the PG Auto Scaler on small test clusters.
Thanks
On 16 Mar 2021, 10:50 +0800, duluxoz , wrote:
> Hi Guys,
>
> Is the below "ceph
Hi Guys,
Is the below "ceph -s" normal?
This is a brand new cluster with (at the moment) a single Monitor and 7
OSDs (each 6 GiB) that has no data in it (yet), and yet its taking
almost a day to "heal itself" from adding in the 2nd OSD.
~~~
cluster:
id: [REDACTED]
health:
Thanks for your support.
> The question is, does the MDS you're using return an inode structure
> version >=2 ?
How do I check that?
I'm somewhat certain that the pinning actually works, the load distribution
between the two active MDSes is consistent with expectations of pinning. Is it
Hi Sebastian,
thanks that seems to have worked. At least on one of the two nodes. But now I
have another problem. It seems that all mgr daemons are gone and ceph command
is stuck.
[root@gedasvl02 ~]# cephadm ls | grep mgr
I tried to deploy a new mgr but this doesn't seem to work either:
We're happy to announce the 18th backport release in the Nautilus
series. It fixes a regression introduced in 14.2.17 in which the manager
module tries to use a couple python modules that do not exist in some
environments. We recommend users to update to this release. For a
detailed release notes
Hello,
If anybody out there has tried this or thought about it, I'd like to know...
I've been thinking about ways to squeeze as much performance as possible
from the NICs on a Ceph OSD node. The nodes in our cluster (6 x OSD, 3
x MGR/MON/MDS/RGW) currently have 2 x 10GB ports. Currently,
We have a cluster with a huge amount of warnings like this even if nothing is
going on in the cluster.
It makes mgr physical memory full, mon db maxed out 5 osds can't start :/
[WRN] slow request osd_op(mds.0.537792:26453 43.38
43:1d6c5587:::1fe56a6.:head [create,setxattr parent
Have you tried a more aggressive reweight value?
I've seen some stubborn crush maps that don't start moving date until 0.9 or
lower in some cases.
Reed
> On Mar 11, 2021, at 10:29 AM, Brent Kennedy wrote:
>
> We have a ceph octopus cluster running 15.2.6, its indicating a near full
> osd
Not a direct answer to your question, but it looks like Samsung's DC Toolkit
may allow for user adjusted over-provisioning.
https://www.samsung.com/semiconductor/global.semi.static/S190311-SAMSUNG-Memory-Over-Provisioning-White-paper.pdf
They shouldn't, but you can have cases where you will have OSDs/pools that may
be active+clean, and it will trip a rebalance, etc while other pools/OSDs are
still backfilling, potentially further implicating the backfilling OSDs.
It may be better now, but peace of mind from knowing I won't be
On Mon, Mar 15, 2021 at 10:42 AM Jeff Layton wrote:
> The question is, does the MDS you're using return an inode structure
> version >=2 ?
Yes, he needs to upgrade to at least nautilus. Mimic is missing commit
8469a81625180668a9dec840293013be019236b8.
--
Patrick Donnelly, Ph.D.
He / Him / His
Looking at the client, most of the dir_pin support went in ~v5.1:
-8<---
static bool ceph_vxattrcb_dir_pin_exists(struct ceph_inode_info *ci)
{
Dave
That’s the way our cluster is setup. It’s relatively small, 5 hosts, 12 osd’s.
Each host has 2x10G with LACP to the switches. We’ve vlan’d public/private
networks.
Making best use of the LACP lag will to a greater extent be down to choosing
the best hashing policy. At the moment
Hi,
How wide are your EC profiles? If they are really wide, you might be
reaching the limits of what is physically possible. Also, I'm not sure
that upmap in 14.2.11 is very smart about *improving* existing upmap
rules for a given PG, in the case that a PG already has an upmap-items
entry but it
Absolutly:
[root@s3db1 ~]# ceph osd df tree
ID CLASS WEIGHTREWEIGHT SIZERAW USE DATA OMAP META
AVAIL%USE VAR PGS STATUS TYPE NAME
-1 673.54224- 674 TiB 496 TiB 468 TiB 97 GiB 1.2 TiB 177
TiB 73.67 1.00 -root default
-258.30331-
OK thanks. Indeed "prepared 0/10 changes" means it thinks things are balanced.
Could you again share the full ceph osd df tree?
On Mon, Mar 15, 2021 at 2:54 PM Boris Behrens wrote:
>
> Hi Dan,
>
> I've set the autoscaler to warn, but it actually does not warn for now. So
> not touching it for
Hi, I seem to have a problem with the extended attributes for MDS pinning. Ceph
version is mimic-13.2.10 and the documentation
(https://web.archive.org/web/20190716110503/http://docs.ceph.com/docs/mimic/cephfs/multimds/)
says I should be able to
setfattr -n ceph.dir.pin -v 2 path/to/dir
Hi Dan,
I've set the autoscaler to warn, but it actually does not warn for now. So
not touching it for now.
this is what the log says in minute intervals:
2021-03-15 13:51:00.970 7f307d5fd700 4 mgr get_config get_config key:
mgr/balancer/active
2021-03-15 13:51:00.970 7f307d5fd700 4 mgr
On Mon, Mar 15, 2021 at 2:58 AM Frank Schilder wrote:
>
> Hi, I seem to have a problem with the extended attributes for MDS pinning.
> Ceph version is mimic-13.2.10 and the documentation
> (https://web.archive.org/web/20190716110503/http://docs.ceph.com/docs/mimic/cephfs/multimds/)
> says I
I suggest to just disable the autoscaler until your balancing is understood.
What does your active mgr log say (with debug_mgr 4/5), grep balancer
/var/log/ceph/ceph-mgr.*.log
-- Dan
On Mon, Mar 15, 2021 at 1:47 PM Boris Behrens wrote:
>
> Hi,
> this unfortunally did not solve my problem. I
Hi,
this unfortunally did not solve my problem. I still have some OSDs that
fill up to 85%
According to the logging, the autoscaler might want to add more PGs to one
Bucken and reduce almost all other buckets to 32.
2021-03-15 12:19:58.825 7f307f601700 4 mgr[pg_autoscaler] Pool
No, currently there is no media type distinction when dealing with
memory target.
On 3/15/2021 3:00 PM, Konstantin Shalygin wrote:
Hi,
Current default's (for nautilus, for example) is respect media type
(hdd/ssd/hybrid) or for all OSD's current memory_target is 4GiB?
Thanks,
k
Hi,
Am 15.03.21 um 12:09 schrieb Dan van der Ster:
> We're looking for a way to reeactivate osd like this without rebooting.
>
> For example, logs showing sdd disappear then reappear as sdq from this
> morning are in the P.S.
>
> We tried pvscan, vgscan, lvscan, but in all cases when trying to
Hi,
Current default's (for nautilus, for example) is respect media type
(hdd/ssd/hybrid) or for all OSD's current memory_target is 4GiB?
Thanks,
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to
On 15/03/2021 11:29, Matthew Vernon wrote:
On 15/03/2021 11:09, Dan van der Ster wrote:
Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.
We're
On 15/03/2021 11:09, Dan van der Ster wrote:
Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.
We're looking for a way to reeactivate osd like
Hi,
On Thu, 11 Mar 2021 19:40:09 +0100
Marc 'risson' Schmitt wrote:
> Cephadm is supposed to include some default Prometheus configuration
> for alerting[1], if this configuration is present in the container. It
> gets the path to this configuration from
> `mgr/cephadm/prometheus_alerts_path`,
Hi all,
Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.
We're looking for a way to reeactivate osd like this without rebooting.
For example, logs
Hi,
I encountered an issue lately. I have a cephfs cluster on 14.2.11
with 5 active MDS and 5 stand-replay MDS. Metadata pool is on SSD
and datapool is on SATA. 2 of MDS restart frequently and the replay
MDS stuck into replay and resolve states and never active.
What's wrong with my MDS?
*the
Hi,
I thought the balancer and pg_autoscaler are only doing something if all
the PG's are in active+clean state?
So if there is any backfilling going around it just bails out.
Or did you mean during the norecover/nobackfill/noout phase?
Kind regards,
Caspar
Op do 11 mrt. 2021 om 23:54 schreef
> After you have filled that up, if such a host crashes or needs
> maintenance, another 80-100TB will need recreating from the other huge
> drives.
A judicious setting of mon_osd_down_out_subtree_limit can help mitigate the
thundering herd FWIW.
> I don't think there are specific limitations
34 matches
Mail list logo