Use journalctl -xe (maybe with -S/-U if you want to filter) to find
the time period in which a restart attempt has happened, and see
what's logged at that period. If that's not helpful, then what you may
want to do is disable that service (systemctl disable blah) then get
the ExecStart out of it, t
“podman logs ceph-xxx-osd-xxx” may contains additional logs.
> 在 2021年3月19日,04:29,Philip Brown 写道:
>
> I've been banging on my ceph octopus test cluster for a few days now.
> 8 nodes. each node has 2 SSDs and 8 HDDs.
> They were all autoprovisioned so that each HDD gets an LVM slice of an
On 3/18/21 9:28 PM, Philip Brown wrote:
I've been banging on my ceph octopus test cluster for a few days now.
8 nodes. each node has 2 SSDs and 8 HDDs.
They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a
db partition.
service_type: osd
service_id: osd_spec_default
pl
yup cephadm and orch was used to set all this up.
Current state of things:
ceph osd tree shows
33hdd1.84698 osd.33 destroyed 0 1.0
cephadm logs --name osd.33 --fsid xx-xx-xx-xx
along with the systemctl stuff I already saw, showed me new things such as
Unfortunately, the pod wont stay up. So "podman logs" wont work for it.
it is not even visible with "podman ps -a"
- Original Message -
From: "胡 玮文"
To: "Philip Brown"
Cc: "ceph-users"
Sent: Thursday, March 18, 2021 5:56:20 PM
Subject: Re: [ceph-users] ceph octopus mysterious OSD cra
On 3/19/21 2:20 AM, Philip Brown wrote:
yup cephadm and orch was used to set all this up.
Current state of things:
ceph osd tree shows
33hdd1.84698 osd.33 destroyed 0 1.0
^^ Destroyed, ehh, this doesn't look good to me. Ceph thinks this OSD is
dest
mkay.
Sooo... what's the new and nifty proper way to clean this up?
The outsider's view is,
"I should just be able to run 'ceph orch osd rm 33'"
but that returns
Unable to find OSDs: ['33']
- Original Message -
From: "Stefan Kooman"
To: "Philip Brown"
Cc: "ceph-users"
Sent: Thursday
I made *some* progress for cleanup.
I could already do "ceph osd rm 33" from my master. But doing the cleanup on
the actual OSD node was problematical.
ceph-volume lvm zap xxx
wasnt working properly.. because the device wasnt fully released because at
the regular OS level, it cant even SEE
Unfortunately, neither of those things will work.
because
ceph orch daemon add
does not have a syntax that lets me add an SSD as a journal to a HDD
and likewise
ceph orch apply osd --all-available-devices
will not do the right thing. both for mixed ssd/hdd.. but also, even though I
have a l
if we cant replace a drive on a node in a crash situation, without blowing away
the entire node
seems to me ceph octopus fails the "test" part of the "test cluster" :-/
I vaguely recall running into this "doesnt have PARTUUID" problem before.
THAT time, I did end up wiping the entire machine
On 3/19/21 3:53 PM, Philip Brown wrote:
mkay.
Sooo... what's the new and nifty proper way to clean this up?
The outsider's view is,
"I should just be able to run 'ceph orch osd rm 33'"
Can you spawn a cephadm shell and run: ceph osd rm 33?
And / or: ceph osd crush rm 33, or try to do it with
I am quite sure that this case is covered by cephadm already. A few
months ago I tested it after a major rework of ceph-volume. I don’t
have any links right now. But I had a lab environment with multiple
OSDs per node with rocksDB on SSD and after wiping both HDD and DB LV
cephadm automatic
We also ran into a scenario in which I did exactly this, and it did
_not_ work. It created the OSD, but did not put the DB/WAL on the NVME
(didn't even create an LV). I'm wondering if there's some constraint
applied (haven't looked at code yet) that when the NVME already has
all but the one DB on i
On 3/19/21 6:22 PM, Philip Brown wrote:
I made *some* progress for cleanup.
I could already do "ceph osd rm 33" from my master. But doing the cleanup on
the actual OSD node was problematical.
ceph-volume lvm zap xxx
wasnt working properly.. because the device wasnt fully released because a
On 3/19/21 7:47 PM, Philip Brown wrote:
I see.
I dont think it works when 7/8 devices are already configured, and the SSD is
already mostly sliced.
OK. If it is a test cluster you might just blow it all away. By doing
this you are simulating a "SSD" failure taking down all HDDs with it. It
To: "Stefan Kooman"
Cc: "ceph-users" , "Philip Brown"
Sent: Friday, March 19, 2021 2:19:55 PM
Subject: [BULK] Re: [ceph-users] Re: ceph octopus mysterious OSD crash
I am quite sure that this case is covered by cephadm already. A few
months ago I tested it after a
re's still the concern about why the thing mysteriosly crashed in the
first place :-/
(on TWO osd's!)
But at least I know how to rebuild a single disk.
- Original Message -
From: "Eugen Block"
To: "Stefan Kooman"
Cc: "ceph-users" , "Philip
On 3/19/21 9:11 PM, Philip Brown wrote:
if we cant replace a drive on a node in a crash situation, without blowing away
the entire node
seems to me ceph octopus fails the "test" part of the "test cluster" :-/
I agree. This should not be necessary. And I'm sure there is, or there
will be f
As we wanted to verify this behavior with 15.2.10, we went ahead and
tested with a failed OSD. The drive was replaced, and we followed the
steps below (comments for clarity on our process) - this assumes you
have a service specification that will perform deployment once
matched:
# capture "db devi
ay, March 25, 2021 12:04:17 PM
Subject: Re: [ceph-users] Re: ceph octopus mysterious OSD crash
As we wanted to verify this behavior with 15.2.10, we went ahead and
tested with a failed OSD. The drive was replaced, and we followed the
steps below (comments for clarity on our process) - this assumes y
20 matches
Mail list logo