Hi
I'm trying to enable Cephx on a cluster already running without Cephx.
Here is what I did.
1. I shutdown the cluster.
2. Enabled Cephx in ceph.conf, Mon and Mgr.
3. Brought the Monitor cluster up. No issue.
4. Tried to bring first Manager up, I'm getting following error:
=== mgr.a ===
Hi all
I have a running Ceph cluster (ceph_mon, ceph_mgr, ceph_osd and ceph_mds) on IP
address A, B and C.
I have installed Ceph radosgw on IP address X (Ubuntu 22.04) and configured to
listen on port 9000.
When I bring up the Ceph radosgw, port 9000 not seems to active and I'm
following
On Sunday, May 23, 2021, 01:16:12 AM GMT+8, Eugen Block
wrote: Awesome! I'm glad it worked out this far! At least you have a working
filesystem now even it means that you may have to use a backup.
But now I can say it: Having only three OSDs is really not the best
idea. ;-) Are all
Hi Eugen
Now the Ceph is HEALTH_OK.
> I think what we need to do now is:
> 1. Get the MDS.0 recover, discard if necessary part of the object
> 200.6048 and bring the MSD.0 up.
Yes, I agree, I just can't tell what the best way is here, maybe
remove all three objects from the disks
Sorry, above post has to be corrected as:"Out of the info now emerged so far
seems Ceph client wanted to write an object of size 1555896 but managed to
write only 1540096 bytes to the journal."
Sagara
On Saturday, May 22, 2021, 08:29:34 PM GMT+8, Sagara Wijetunga
wrote:
Out of the info now emerged so far seems Ceph client wanted to write an object
of size 1555896 but managed to write only 1555896 bytes to the journal.
I think what we need to do now is:1. Get the MDS.0 recover, discard if
necessary part of the object 200.6048 and bring the MSD.0 up.
2. Do
On Saturday, May 22, 2021, 03:14:13 PM GMT+8, Eugen Block
wrote:
What does the MDS report in its logs from when it went down?
NOTE: Power failure happened somewhere 2021-05-20 23:56:
Here are log messages from MDS.0 log:
2021-05-20 17:26:19.358 2192d80 1 mds.a Updating MDS map to version
Here are the physical file sizes of the "200.6048*":
OSD.0:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47
/var/lib/ceph/osd/ceph-0/current/2.44_head/200.6048__head_56F5F744__
OSD.1:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47
On Saturday, May 22, 2021, 03:14:13 PM GMT+8, Eugen Block
wrote:
What does the MDS report in its logs from when it went down?
What size do you get when you run
rados -p cephfs_metadata stat 200.6048
# rados -p cephfs_metadata stat 200.6048cephfs_metadata/200.6048 mtime
Hi Eugen
Thanks for the reply.
Ceph Version:# ceph versionceph version 14.2.11
(f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
> Can you share
>
> rados list-inconsistent-obj 2.44
># rados list-inconsistent-obj 2.44
{"epoch":6996,"inconsistents":[]}
> ceph tell mds. damage ls
>
#
Hi all
An accidental power failure happened.
That resulted CephFS offline and cannot be mounted.
I have 3 MDS daemons but it complains "1 mds daemon damaged".
It seems a PG of cephfs_metadata is inconsistent. I tried to repair, but
doesn't get it repaired.
How do I repair the damaged MDS and
Hi Frank
Found the issue and fixed. It was a one copy of 0 byte object. Removed it. Deep
scrub the PG fixed the issue.
# find /var/lib/ceph/osd/ -type f -name "123675e*"
/var/lib/ceph/osd/ceph-2/current/3.b_head/DIR_B/DIR_A/DIR_E/123675e.__head_AE97EEAB__3
# ls -l
Hi Frank
1. We will disable the disk controller and disk-level caching to avoid future
issues.
2. My pools are:
ceph osd lspools
2 cephfs_metadata
3 cephfs_data
4 rbd
The PG now inconsistent is 3.b, therefore, it belongs to cephfs_data pool.
Following also shows the PG 3.b belongs
> Hmm, I'm getting a bit confused. Could you also send the output of "ceph osd
> pool ls detail".
File ceph-osd-pool-ls-detail.txt attached.
> Did you look at the disk/controller cache settings?
I don't have disk controllers on Ceph machines. The hard disk is directly
attached to the
Hi Frank
> the primary OSD is probably not listed as a peer. Can you post the complete
> output of
> - ceph pg 3.b query
> - ceph pg dump
> - ceph osd df tree
> in a pastebin?
Yes, the Primary OSD is 0.
I have attached above as .txt files. Please let me know if you still cannot
read them.
Hi Frank
> Please note, there is no peer 0 in "ceph pg 3.b query". Also no word osd.
I checked other PGs with "active+clean", there is a "peer": "0".
But "ceph pg pgid query" always shows only two peers, sometime peer 0 and 1, or
1 and 2, 0 and 2, etc.
Regards
Sagara
Hi Frank
> Please note, there is no peer 0 in "ceph pg 3.b query". Also no word osd.I
> checked other PGs with active+clean, there is a "peer": "0".
But "ceph pg pgid query" always shows only two peers, sometime peer 0 and 1, or
1 and 2, 0 and 2, etc.
Regards
Sagara
Hi Frank
> looks like you have one on a new and 2 on an old version. Can you add the
> information about which OSD each version resides?
The "ceph pg 3.b query" shows following:
"peer_info": [
{
"peer": "1",
"pgid": "3.b",
"last_update":
Hi Frank
> I'm not sure if my hypothesis can be correct. Ceph sends an acknowledge of a
> write only after all copies are on disk. In other words, if PGs end up on
> different versions after a power outage, one always needs to roll back. Since
> you have two healthy OSDs in the PG and the PG
Hi Frank
Thanks for the reply.
> I think this happens when a PG has 3 different copies and cannot decide which
> one is correct. You might have hit a very rare case. You should start with
> the scrub errors, check which PGs and which copies (OSDs) are affected. It
> sounds almost like all 3
Hi all
I have a Ceph cluster (Nautilus 14.2.11) with 3 Ceph nodes.
A crash happened and all 3 Ceph nodes went down.
One (1) PG turned "active+clean+inconsistent", I tried to repair it. After the
repair, now shows "active+clean+inconsistent+failed_repair" for the PG in the
question and cannot
21 matches
Mail list logo