it help to just one mds and one monitor ?
thanks!
On Tue, Oct 5, 2021 at 1:42 PM Eugen Block wrote:
All your PGs are inactive, if two of four OSDs are down and you
probably have a pool size of 3 then no IO can be served. You’d need at
least three up ODSs to resolve that.
Zitat von Abdelillah
All your PGs are inactive, if two of four OSDs are down and you
probably have a pool size of 3 then no IO can be served. You’d need at
least three up ODSs to resolve that.
Zitat von Abdelillah Asraoui :
Ceph is reporting warning on slow metdataIOs on one of the MDS server,
this is
a new
ugh memory.
发件人: Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>
发送时间: 2021年10月4日 0:46
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: [ceph-users] Re: is it possible to remove the db+wal from an
external device (nvme)
Seem
7f8633cc1f00 -1 auth: unable to find a
keyring on /var/lib/ceph/osd/ceph-3/keyring: (13) Permission denied
debug 2021-10-04T16:06:38.288+ 7f8633cc1f00 -1 monclient: keyring not
found
failed to fetch mon config (--no-mon-config to skip)
thanks!
On Fri, Oct 1, 2021 at 2:02 AM Eugen Block wrote:
I'm
I can't access the pastebin, did you verify if you hit the same issue
as Stefan referenced
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/)? Before deleting or rebuilding anything I would first check what the root cause is. As Stefan said,
Hi,
I'm not entirely sure if this really is the same issue here. One of
our customers also works with k8s in openstack and I saw similar
messages. We never investigated it, I don't know if the customer did,
but one thing they encountered was that k8s didn't properly clean up
Hi,
I don't know for sure but I believe you can have only one rbd mirror
daemon per cluster. So you can either configure one-way or two-way
mirroring between two clusters. With your example the third cluster
would then require two mirror daemons which is not possible AFAIK. I
can't tell
Hi,
I'm not sure if setting min_size to 4 would also fix the PGs, but the
client IO would probably be restored. Marking it as lost is the last
straw according to this list, luckily I haven't been in such a
situation yet. So give it a try with min_size = 4 but don't forget to
increase
n Thu, Sep 30, 2021 at 1:18 AM Eugen Block wrote:
Is the content of OSD.3 still available in the filesystem? If the
answer is yes you can get the OSD's keyring from
/var/lib/ceph/osd/ceph-3/keyring
Then update your osd.3.export file with the correct keyring and then
import the correct back t
Hi,
there is no information about your ceph cluster, e. g. hdd/ssd/nvme
disks. This information can be crucial with regards to performance.
Also why would you use
osd_pool_default_min_size = 1
osd_pool_default_size = 2
There have been endless discussions in this list why a pool size of
must have imported osd.2 key instead, now osd.3 has the same key as osd.2
ceph auth import -i osd.3.export
How do we update this ?
thanks!
On Wed, Sep 29, 2021 at 2:13 AM Eugen Block wrote:
Just to clarify, you didn't simply import the unchanged keyring but
modified it to reflect the actual
e Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---
-Original Message-----
From: Eugen Block
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文
Cc: Igor Fedotov ; Szabo, Istvan (Agoda)
; ceph-users@ceph.io
Subject: Re: is i
ceph-volume in the new shell.
发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com>
抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan
(Agoda)<mailto:istvan.sz...@agoda.com>;
ceph-users@ceph.io<mailto:ceph-users@ceph.
14:26 unit.poststop
-rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw--- 1 ceph ceph 142 Sep 17 14:26 unit.stop
-rw--- 1 ceph ceph2 Sep 20 04:15 whoami
发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: 胡 玮文<mail
Just to clarify, you didn't simply import the unchanged keyring but
modified it to reflect the actual key of OSD.3, correct? If not, run
'ceph auth get osd.3' first and set the key in the osd.3.export file
before importing it to ceph.
Zitat von Abdelillah Asraoui :
i have created
abo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---
-Original Message-----
From: Eugen Block
Sent: Monday, September 27, 2021 7:42 PM
To: ceph-users@ceph.io
Subject: [
mit: 1
wal_devices:
rotational: 0
limit: 1
Do you know what to change to apply the plan you described? I'd be happy
to try it!
From: Eugen Block
To: ceph-users@ceph.io
Cc:
Bcc:
Date: Mon, 27 Sep 2021 10:06:43 +
Subject: [ceph-users] Re: Orchestrator is internally ignoring applying a
Hi,
I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use
here. I haven't tried it in a production environment yet, only in
virtual labs.
Regards,
Eugen
Zitat von "Szabo, Istvan (Agoda)" :
Hi,
Seems like in our config the nvme device as a wal+db in front of
the ssd
, Eugen Block wrote:
Good morning,
could anyone tell me if the patch [1] for this tracker issue [2] is already
available in any new (open)SUSE kernel (maybe Leap 15.3)? We seem to be
hitting [2] on openSUSE Leap 15.1 and if there's a chance to fix it by
upgrading the kernel it would be great news
Hi,
the logs states:
2021-09-27 10:47:20,415 DEBUG Could not locate podman: podman not found
Have you verified if it's installed?
Zitat von Manuel Holtgrewe :
Hi,
I have a 15.2.14 ceph cluster running on an up to date CentOS 7 that I want
to adopt to cephadm. I'm trying to follow this:
Hi,
I read your first email again and noticed that ceph-volume already
identifies the drives sdr and sds as non-rotational and as available.
That would also explain the empty rejected_reasons field because they
are not rejected at (this stage?). Where do you read that information
that
Good morning,
could anyone tell me if the patch [1] for this tracker issue [2] is
already available in any new (open)SUSE kernel (maybe Leap 15.3)? We
seem to be hitting [2] on openSUSE Leap 15.1 and if there's a chance
to fix it by upgrading the kernel it would be great news!
Thanks!
Hi,
as a workaround you could just set the rotational flag by yourself:
echo 0 > /sys/block/sd[X]/queue/rotational
That's the one ceph-volume is searching for and it should at least
enable you to deploy the rest of the OSDs. Of course, you'll need to
figure out why the rotational flag is
/ containers but
that is still a WIP. (Our situation is complicated by the fact that
we'll need to continue puppet managing things like firewall with
cephadm doing the daemon placement).
Cheers, Dan
On Wed, Sep 22, 2021 at 10:32 AM Eugen Block wrote:
Thanks for the summary, Dan!
I'm still
Thanks for the summary, Dan!
I'm still hesitating upgrading our production environment from N to O,
your experience sounds reassuring though. I have one question, did you
also switch to cephadm and containerize all daemons? We haven't made a
decision yet, but I guess at some point we'll
Hi,
IIRC in a different thread you pasted your max-backfill config and it
was the lowest possible value (1), right? That's why your backfill is
slow.
Zitat von "Szabo, Istvan (Agoda)" :
Hi,
By default in the newer versions of ceph when you increase the
pg_num the cluster will start
Hi,
Yes! I did play with another cluster before and forgot to completely
clear that node! And the fsid "46e2b13c-dab7-11eb-810b-a5ea707f1ea1"
from that cluster. But then there is an error in CEPH. Because the
mon the existing cluster complained about (with fsid
And we are quite happy with our cache tier. When we got new HDD OSDs
we tested if things would improve without the tier but we had to stick
to it, otherwise working with our VMs was almost impossible. But this
is an RBD cache so I can't tell how the other protocols perform with a
cache
Hi,
Hmm. 'cephadm ls' running directly on the node does show that there
is mon. I don't quite understand where it came from and I don't
understand why 'ceph orch ps' didn't show this service.
Thank you very much for your help.
no problem. Maybe you played around and had this node in the
Since I'm trying to test different erasure encoding plugin and
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my
surprise I did not get any degraded messages at all, and the cluster
was in HEALTH_OK the whole time.
Interesting, maybe
Was there a MON running previously on that host? Do you see the daemon
when running 'cephadm ls'? If so, remove it with 'cephadm rm-daemon
--name mon.s-26-9-17'
Zitat von Fyodor Ustinov :
Hi!
After upgrading to version 16.2.6, my cluster is in this state:
root@s-26-9-19-mon-m1:~# ceph
You’re absolutely right, of course, the balancer wouldn’t cause
degraded PGs. Flapping OSDs seems very likely here.
Zitat von Josh Baergen :
I assume it's the balancer module. If you write lots of data quickly
into the cluster the distribution can vary and the balancer will try
to even out
Hi,
I assume it's the balancer module. If you write lots of data quickly
into the cluster the distribution can vary and the balancer will try
to even out the placement. You can check the status with
ceph balancer status
and disable it if necessary:
ceph balancer mode none
Regards,
Eugen
m solution?
Thanks!
[]'s
Arthur
On 15/09/2021 08:30, Eugen Block wrote:
Hi,
ceph-crash services are standalone containers, they are not running
inside other containers:
host1:~ # ceph orch ls
NAME RUNNING REFRESHED AGE
PLACEMENT
Hi,
ceph-crash services are standalone containers, they are not running
inside other containers:
host1:~ # ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT
IMAGE NAME
IMAGE ID
Hi,
db_slots is still not implemented:
pacific:~ # ceph orch apply -i osd.yml --dry-run
Error EINVAL: Failed to validate Drive Group: Filtering for
is not supported
Question 2: If db_slots still *doesn't* work, is there a coherent
way to divide up a solid state DB drive for use by a
Hi Frank,
I think the snapshot rotation could be an explanation.
Just a few days ago we had a host failure over night and some OSDs
couldn't be rebalanced entirely because they were too full. Deleting a
few (large) snapshots I created last week resolved the issue. If you
monitored 'ceph
Hi,
consider yourself lucky that you haven't had a host failure. But I
would not draw the wrong conclusions here and change the
failure-domain based on luck.
In our production cluster we have an EC pool for archive purposes, it
all went well for quite some time and last Sunday one of the
Edit your rgw service specs and set „unmanaged“ to true so cephadm
won’t redeploy a daemon, then remove it as you did before.
See [1] for more details.
[1] https://docs.ceph.com/en/pacific/cephadm/service-management.html
Zitat von Cem Zafer :
Hi,
How to remove rgw from hosts? When I
is the best practice? redeploy failed mon?
On 10. Sep 2021, at 13:08, Eugen Block wrote:
Yes, give it a try. If the cluster is healthy otherwise it
shouldn't be a problem.
Zitat von mk :
Thx Eugen,
just stopping mon and remove/rename only store.db and start mon?
BR
Max
On 10. Sep 2021, at 12
: Failed
with result 'exit-code'.
Sep 10 13:35:55 amon3 systemd[1]: Failed to start Ceph cluster
monitor daemon.
On 10. Sep 2021, at 13:08, Eugen Block wrote:
Yes, give it a try. If the cluster is healthy otherwise it
shouldn't be a problem.
Zitat von mk :
Thx Eugen,
just stopping mon
Yes, give it a try. If the cluster is healthy otherwise it shouldn't
be a problem.
Zitat von mk :
Thx Eugen,
just stopping mon and remove/rename only store.db and start mon?
BR
Max
On 10. Sep 2021, at 12:50, Eugen Block wrote:
I don't have an explanation but removing the mon store from
I don't have an explanation but removing the mon store from the failed
mon has resolved similar issues in the past. Could you give that a try?
Zitat von mk :
Hi CephFolks,
I have a cluster 14.2.21-22/Ubuntu 18.04 with 3 mon’s. After going
down/restart of 1 mon(amon3) it stucks on probing
You must have missed the response to your thread, I suppose:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/
Zitat von mabi :
Hello,
A few days later the ceph status progress bar is still stuck and the
third mon is for some unknown reason
n.
Still not exactly sure why that fixed it, but at least it’s working
again. Thanks for the suggestion.
-Paul
On Sep 8, 2021, at 4:12 AM, Eugen Block wrote:
If you only configured 1 iscsi gw but you see 3 running, have you
tried to destroy them with 'cephadm rm-daemon --name ...'? On the
activ
address) for
the domain?
I think 99% of the confusion is due to VERY POOR documentation!!
Thanks for help.
Francesco
Il 01.09.21 14:14, Eugen Block ha scritto:
That basically was my check list, it was all I had to do in my lab
to set it up. The guide to setup a RGW manually refers to
non-co
I assume the cluster is used in roughly the same way as before the
upgrade and the load has not increased since, correct? What is the
usual load, can you share some 'ceph daemonperf mds.' output? It
might be unrelated but have you tried to compact the OSDs belonging to
this pool, online or
If you only configured 1 iscsi gw but you see 3 running, have you
tried to destroy them with 'cephadm rm-daemon --name ...'? On the
active MGR host run 'journalctl -f' and you'll see plenty of
information, it should also contain information about the iscsi
deployment. Or run 'cephadm logs
Hi,
from an older cloud version I remember having to increase these settings:
[DEFAULT]
block_device_allocate_retries = 300
block_device_allocate_retries_interval = 10
block_device_creation_timeout = 300
The question is what exactly could cause a timeout. You write that you
only see these
Could you share the exact command you're trying and then also 'ceph
auth get client.'?
Zitat von Hendrik Peyerl :
Hi Eugen,
thanks for the idea but i didn’t have anything mounted that i could unmount
On 6. Sep 2021, at 09:15, Eugen Block wrote:
Hi,
I just got the same message in my
Hi,
I just got the same message in my lab environment (octopus) which I
had redeployed. The client's keyring had changed after redeployment
and I think I had a stale mount. After 'umount' and 'mount' with the
proper keyring it worked as expected.
Zitat von Hendrik Peyerl :
Hello All,
stack
side but fails . are there openastack-shwift packages which are needed?
if are there please help me to get . may be also it is the cause I am
failing to run swift command on openstack cli side.
thank you for your continued support.
Micheal
On Thu, Sep 2, 2021 at 9:14 AM Eugen Block wrote
not finding the object storage on
the horizon dashboard , but it appears in the system information services.
[image: image.png]
so my question is how to configure it in order that it can appear in the
dashboard .
Michel
On Wed, Sep 1, 2021 at 3:49 PM Eugen Block wrote:
Sorry, one little detail
Sorry, one little detail slipped through, the '--region' flag has to
be put before the 'service' name. The correct command would be:
openstack endpoint create --region RegionOne swift admin
http://ceph-osd3:8080/swift/v1
and respectively for the other interfaces.
Zitat von Eugen Block
Please read carefully and inspect the (helpful) error message. The
command I provided doesn't have the '--publicurl' in it because that
is from an older identity api version (v2). In newer versions (v3) the
endpoint commands only require one of the values 'internal', 'public'
or 'admin',
Hi,
this is not a ceph issue but your openstack cli command as the error
message states.
Try one interface at a time:
openstack endpoint create swift public http://ceph-osd3:8080/swift/v1
--region RegionOne swift
openstack endpoint create swift admin http://ceph-osd3:8080/swift/v1
alues that conforms to my
installation and applying all the radosgw-admin setup you indicated
lead me no results: Always as the beginning
The wrong must be somewhere else...
Do you have a checklist?
Francesco
Il 31.08.21 14:53, Eugen Block ha scritto:
How exactly did you create the rgw(s),
s3 gateway; scheme is really http (checked
querying with get-rgw-api-scheme).
Any clue / suggestion is welcome.
Francesco
Il 24.08.21 11:22, Eugen Block ha scritto:
Hi,
I assume that the "latest" docs are already referring to quincy, if
you check the pacific docs
(https://
Hi,
1. two disks would fail where both failed disks are not on the same
host? I think ceph would be able to find a PG distributed across all
hosts avoiding the two failed disks, so ceph would be able to repair
and reach a healthy status after a while?
yes, if there is enough disk space
Hi,
I assume that the "latest" docs are already referring to quincy, if
you check the pacific docs
(https://docs.ceph.com/en/pacific/mgr/dashboard/) that command is not
mentioned. So you'll probably have to use the previous method of
configuring the credentials.
Regards,
Eugen
Zitat
note I using
containers, not standalone OSDs.
Any ideas?
Regards,
Eric
Message: 2
Date: Fri, 20 Aug 2021 06:56:59 +
From: Eugen Block
Subject: [ceph-users] Re: Missing OSD in SSD after disk failure
To: ceph-users@ceph.io
Message-ID:
<20210820065
Hi,
you can just set the config option with 'ceph config set ...' after
your cluster has been bootstrapped. See [1] for more details about the
config store.
[1]
https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#monitor-configuration-database
Zitat von Dong Xie :
Dear
What is the output of 'ceph orch upgrade status'? Did you (maybe
accidentally) start an update? You can stop it with 'ceph orch upgrade
stop'.
Zitat von "Paul Giralt (pgiralt)" :
The output of my ’ceph status’ shows the following:
progress:
Updating node-exporter deployment (-1 ->
Hi,
1. In my cluster I have three monitors; when one monitor is down (I
simply shut down) raising a ceph -s underline that there are two
monitors alive and one down; when 2/3 of monitors down the cluster
became unresponsive (ceph -s remains stuck); is this normal?
yes, this is expected.
Hi,
this seems to be a reoccuring issue, I had the same just yesterday in
my lab environment running on 15.2.13. If I don't specify other
criteria in the yaml file then I'll end up with standalone OSDs
instead of the desired rocksDB on SSD. Maybe this is still a bug, I
didn't check. My
Hi, have you checked ‚rbd sparsify‘ to reclaim unused space?
Zitat von Boris Behrens :
Hi everybody,
we just stumbled over a problem where the rbd image does not shrink, when
files are removed.
This only happenes when the rbd image is partitioned.
* We tested it with centos8/ubuntu20.04 with
Hi,
there's plenty of information available online, e.g. the Red Hat docs
[1], mailing list threads [2].
[1]
Hi,
you can disable or modify the configured alerts in:
/var/lib/ceph//etc/prometheus/alerting/ceph_alerts.yml
After restarting the container those changes should be applied.
Regards,
Eugen
Zitat von E Taka <0eta...@gmail.com>:
Hi,
we have enabled Cluster → Monitoring in the Dashboard.
iced that before,
that it were just these daemons (just FYI, no further help needed
here).
Am Mi., 28. Juli 2021 um 09:10 Uhr schrieb Eugen Block :
Hi,
the docs [1] only state:
> /var/lib/ceph//removed contains old daemon data
> directories for stateful daemons (e.g., monitor, prometheus) th
Hi,
the docs [1] only state:
/var/lib/ceph//removed contains old daemon data
directories for stateful daemons (e.g., monitor, prometheus) that
have been removed by cephadm.
So that directory should not grow, I'm not sure if does in your case
because you write "now 12 GB". Are you
smartmontools 7.1, which will crash the
kernel on e.g. "smartctl -a /dev/nvme0". Before switching to
Octopus containers, I was using smartmontools from Debian backports,
which does not have this problem.
Does Pacific have newer smartmontools?
// Best wishes; Johan
On 2021-07-2
Hi,
did you read this thread [1] reporting a similar issue? It refers to a
solution described in [2] but the OP in [1] recreated all OSDs, so
it's not clear what the root cause was.
Can you start the OSD with more verbose (debug) output and share that?
Does your cluster really have only
igest..."
Today's Topics:
1. Re: inbalancing data distribution for osds with custom device class
(renjianxinlover)
2. Re: inbalancing data distribution for osds with custom device class
(Eugen Block)
--
Message: 1
Date: Wed, 21 J
Hi,
you can find the ceph.conf here:
/var/lib/ceph/7bdffde0-623f-11eb-b3db-fa163e672db2/mon.ses7-host1/config
If you edit that file and restart the container you'll see the
changes. But as I wrote in your other thread, this won't be enough to
migrate MONs to a different IP address, you
Note that there's a similar field in the nova database (connection_info):
---snip---
MariaDB [nova]> select connection_info from block_device_mapping where
instance_uuid='bbc33a1d-10c0-47b1-8179-304899c4546c';
Hi,
three OSDs is just not enough, if possible you should add more SSDs to
the index pool. Have you checked the disk saturation (e.g. with
iostat)? I would expect a high usage.
Zitat von renjianxinlover :
Ceph: ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous
Hi,
I'm not sure if that's what you need but ceph file layouts [1] could
meet your requirements. Your CephFS can consist of multiple pools
(replicated or EC), and with xattr you can define different pools to
be used for specific directories. Does that help?
Regards,
Eugen
[1]
"halflife": 60
},
"recall_caps_throttle": {
"value": 0,
"halflife": 1.5
},
"recall_caps_throttle2o": {
"value": 0,
"halflife": 0.5
},
"session_cache_livenes
Hi,
I just setup a virtual one-node cluster (16.2.5) to check out
cephfs-top. Regarding the number of clients I was a little surprised,
too, in the first couple of minutes the number switched back and forth
between 0 and 1 although I had not connected any client yet. But after
a while
Hi,
do you see the daemon on that iscsi host(s) with 'cephadm ls'? If the
answer is yes, you could remove it with cephadm, too:
cephadm rm-daemon --name iscsi.iscsi
Does that help?
Zitat von Fyodor Ustinov :
Hi!
I have fresh installed pacific
root@s-26-9-19-mon-m1:~# ceph version
ceph
Hi,
what does your 'ceph osd df tree' look like?
I've read about these warnings when PGs are incomplete but not when
all are active+clean.
Zitat von Andres Rojas Guerrero :
Hi, recently in a Nautilus cluster version 14.2.6 I have changed the
rule crush map to host type instead osd, all
Hi,
can you tell a bit more what exactly happens?
Currently I'm having an issue where every time I add a new server it adds
the osd on the node and then a few random ods on the current hosts will all
fall over and I'll only be able to get them up again by restart the daemons.
What is the
Hi,
don't give up on Ceph. ;-)
Did you try any of the steps from the troubleshooting section [1] to
gather some events and logs? Could you share them, and maybe also some
more details about that cluster? Did you enable any non-default mgr
modules? There have been a couple reports related
strange that the commands are all using 'octopus' instead
of 'pacific'. Ceph docu is always some a detective work...
===
Ralph
On 14.06.21 15:31, Eugen Block wrote:
Hi,
I asked a similar question three weeks ago
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message
Hi,
I asked a similar question three weeks ago
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/63U7WYHFTHSPUQTAR73W2AIE3E3PJJ4X/), but in my case the bootstrap worked fine. But adding a label (e. g. _admin) had no effect although the host should have had the admin keyring
# ceph orch start grafana
Thanks for your help
===
Ralph
On 10.06.21 09:31, Eugen Block wrote:
Hi,
you can edit the config file
/var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created
by cephadm) and then restart the container. This works in my
octopus lab environment.
Hi,
you can edit the config file
/var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created by
cephadm) and then restart the container. This works in my octopus lab
environment.
Regards,
Eugen
Zitat von Ralph Soika :
Hello,
I have installed and bootsraped a Ceph manager node via
Can you share your 'ceph osd tree'?
You can remove the stray osd "old school" with 'ceph osd purge 1
[--force]' if you're really sure.
Zitat von mabi :
Small correction in my mail below, I meant to say Octopus and not
Nautilus, so I am running ceph 15.2.13.
‐‐‐ Original Message
cannot be brought back to up
state for some reason, even though osd processes are running on the host.
Kind regards,
Rok
On Thu, May 27, 2021 at 3:32 PM Eugen Block wrote:
Hi,
this sounds like your crush rule(s) for one or more pools can't place
the PGs because the host is missing. Please
On Thursday, May 27, 2021 3:28 PM, Eugen Block wrote:
Can you try with both cluster and osd fsid? Something like this:
pacific2:~ # cephadm deploy --name osd.2 --fsid
acbb46d6-bde3-11eb-9cf2-fa163ebb2a74 --osd-fsid
bc241cd4-e284-4c5a-aad2-5744632fc7fc
I tried to reproduce a similar scena
Hi,
this sounds like your crush rule(s) for one or more pools can't place
the PGs because the host is missing. Please share
ceph pg dump pgs_brief | grep undersized
ceph osd tree
ceph osd pool ls detail
and the crush rule(s) for the affected pool(s).
Zitat von Rok Jaklič :
Hi,
I have
91a86f20-8083-40b1-8bf1-fe35fac3d677
osd id2
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sda
‐‐‐ Original Message ‐‐‐
On Thursday, May 27, 2021 12:32 PM,
assert osd_fsid
AssertionError
Any ideas what is wrong here?
Regards,
Mabi
‐‐‐ Original Message ‐‐‐
On Thursday, May 27, 2021 12:13 PM, Eugen Block wrote:
Hi,
I posted a link to the docs [1], [2] just yesterday ;-)
You should see the respective OSD in the output of 'cephadm
ceph-v
Hi,
I posted a link to the docs [1], [2] just yesterday ;-)
You should see the respective OSD in the output of 'cephadm
ceph-volume lvm list' on that node. You should then be able to get it
back to cephadm with
cephadm deploy --name osd.x
But I haven't tried this yet myself, so please
Stian Olstad :
On 27.05.2021 11:17, Eugen Block wrote:
That's not how it's supposed to work. I tried the same on an Octopus
cluster and removed all filters except:
data_devices:
rotational: 1
db_devices:
rotational: 0
My Octopus test osd nodes have two HDDs and one SSD, I removed all
OSDs and redeployed
That's not how it's supposed to work. I tried the same on an Octopus
cluster and removed all filters except:
data_devices:
rotational: 1
db_devices:
rotational: 0
My Octopus test osd nodes have two HDDs and one SSD, I removed all
OSDs and redeployed on one node. This spec file results
milar behaviour.
Zitat von Kai Stian Olstad :
On 26.05.2021 18:12, Eugen Block wrote:
Could you share the output of
lsblk -o name,rota,size,type
from the affected osd node?
# lsblk -o name,rota,size,type
NAME
Kai Stian Olstad :
On 26.05.2021 11:16, Eugen Block wrote:
Yes, the LVs are not removed automatically, you need to free up the
VG, there are a couple of ways to do so, for example remotely:
pacific1:~ # ceph orch device zap pacific4 /dev/vdb --force
or directly on the host with:
pacific1
tion rules, so it does not try and create too many osds on
the same node at the same time.
On Wed, 26 May 2021 at 08:25, Eugen Block wrote:
Hi,
I believe your current issue is due to a missing keyring for
client.bootstrap-osd on the OSD node. But even after fixing that
you'll probably still
Kai Stian Olstad :
On 26.05.2021 08:22, Eugen Block wrote:
Hi,
did you wipe the LV on the SSD that was assigned to the failed HDD? I
just did that on a fresh Pacific install successfully, a couple of
weeks ago it also worked on an Octopus cluster.
No, I did not wipe the LV.
Not sure what you
901 - 1000 of 1353 matches
Mail list logo